* [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver
@ 2020-01-20 17:02 Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 01/38] net/mlx5: separate DevX commands interface Matan Azrad
                   ` (39 more replies)
  0 siblings, 40 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Steps:
- Prepare net/mlx5 for code sharing.
- Introduce new common lib for mlx5 devices.
- Share code from net/mlx5 to common/mlx5.
- Introduce vDPA driver for Mellanox devices.
Matan Azrad (38):
  net/mlx5: separate DevX commands interface
  mlx5: prepare common library
  mlx5: share the mlx5 glue reference
  mlx5: share mlx5 PCI device detection
  mlx5: share mlx5 devices information
  drivers: introduce mlx5 vDPA driver
  common/mlx5: expose vDPA DevX capabilities
  vdpa/mlx5: support queues number operation
  vdpa/mlx5: support features get operations
  common/mlx5: glue null memory region allocation
  common/mlx5: support DevX indirect mkey creation
  common/mlx5: glue event queue query
  common/mlx5: glue event interrupt commands
  common/mlx5: glue UAR allocation
  common/mlx5: add DevX command to create CQ
  common/mlx5: glue VAR allocation
  common/mlx5: add DevX virtio emulation commands
  vdpa/mlx5: prepare memory regions
  mlx5: share CQ entry check
  vdpa/mlx5: prepare completion queues
  vdpa/mlx5: handle completions
  vdpa/mlx5: prepare virtio queues
  vdpa/mlx5: support stateless offloads
  common/mlx5: allow type configuration for DevX RQT
  common/mlx5: add TIR fields constants
  common/mlx5: add DevX command to modify RQT
  common/mlx5: get DevX capability for max RQT size
  vdpa/mlx5: add basic steering configurations
  vdpa/mlx5: support queue state operation
  vdpa/mlx5: map doorbell
  vdpa/mlx5: support live migration
  vdpa/mlx5: support close and config operations
  mlx5: skip probing according to the vDPA mode
  net/mlx5: separate Netlink commands interface
  net/mlx5: reduce Netlink commands dependencies
  mlx5: share Netlink commands
  common/mlx5: support ROCE disable through Netlink
  vdpa/mlx5: disable ROCE
 MAINTAINERS                                     |    9 +
 config/common_base                              |    5 +
 doc/guides/rel_notes/release_20_02.rst          |    5 +
 doc/guides/vdpadevs/features/mlx5.ini           |   27 +
 doc/guides/vdpadevs/index.rst                   |    1 +
 doc/guides/vdpadevs/mlx5.rst                    |   89 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  350 ++++
 drivers/common/mlx5/meson.build                 |  210 +++
 drivers/common/mlx5/mlx5_common.c               |  332 ++++
 drivers/common/mlx5/mlx5_common.h               |  214 +++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            | 1363 ++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  322 ++++
 drivers/common/mlx5/mlx5_glue.c                 | 1296 ++++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  305 ++++
 drivers/common/mlx5/mlx5_nl.c                   | 1699 ++++++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   63 +
 drivers/common/mlx5/mlx5_prm.h                  | 2165 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   48 +
 drivers/meson.build                             |    8 +-
 drivers/net/mlx5/Makefile                       |  307 +---
 drivers/net/mlx5/meson.build                    |  257 +--
 drivers/net/mlx5/mlx5.c                         |  196 +-
 drivers/net/mlx5/mlx5.h                         |  324 +---
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_devx_cmds.c               |  967 ----------
 drivers/net/mlx5/mlx5_ethdev.c                  |  161 +-
 drivers/net/mlx5/mlx5_flow.c                    |   12 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |   12 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 ------------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ---
 drivers/net/mlx5/mlx5_mac.c                     |   16 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_nl.c                      | 1402 ---------------
 drivers/net/mlx5/mlx5_prm.h                     | 1883 --------------------
 drivers/net/mlx5/mlx5_rss.c                     |    2 +-
 drivers/net/mlx5/mlx5_rxmode.c                  |   12 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    7 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |   46 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    5 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |  137 +-
 drivers/vdpa/Makefile                           |    2 +
 drivers/vdpa/meson.build                        |    3 +-
 drivers/vdpa/mlx5/Makefile                      |   43 +
 drivers/vdpa/mlx5/meson.build                   |   34 +
 drivers/vdpa/mlx5/mlx5_vdpa.c                   |  539 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h                   |  294 +++
 drivers/vdpa/mlx5/mlx5_vdpa_cq.c                |  283 +++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c                |  132 ++
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c               |  351 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c             |  288 +++
 drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |   20 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             |  397 +++++
 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |    3 +
 mk/rte.app.mk                                   |   14 +-
 68 files changed, 11239 insertions(+), 7000 deletions(-)
 create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
 create mode 100644 doc/guides/vdpadevs/mlx5.rst
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/meson.build
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cq.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
 create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 01/38] net/mlx5: separate DevX commands interface
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 02/38] mlx5: prepare common library Matan Azrad
                   ` (38 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The DevX commands interfaces is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency from DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |   1 +
 drivers/net/mlx5/mlx5.h           | 219 +-----------------------------------
 drivers/net/mlx5/mlx5_devx_cmds.c |  33 +++---
 drivers/net/mlx5/mlx5_devx_cmds.h | 227 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c    |   1 +
 drivers/net/mlx5/mlx5_flow.c      |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c   |   1 +
 drivers/net/mlx5/mlx5_rxq.c       |   1 +
 drivers/net/mlx5/mlx5_rxtx.c      |   1 +
 drivers/net/mlx5/mlx5_txq.c       |   1 +
 drivers/net/mlx5/mlx5_vlan.c      |   1 +
 11 files changed, 259 insertions(+), 232 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ffee39c..2f91e50 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -46,6 +46,7 @@
 #include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 01542e7..0b8b1b6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -38,6 +38,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
@@ -156,60 +157,6 @@ struct mlx5_stats_ctrl {
 	uint64_t imissed_base;
 };
 
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
 /* Flow list . */
 TAILQ_HEAD(mlx5_flows, rte_flow);
 
@@ -289,133 +236,6 @@ struct mlx5_dev_config {
 	struct mlx5_lro_config lro; /* LRO configuration. */
 };
 
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
 
 /**
  * Type of object being allocated.
@@ -1022,43 +842,6 @@ void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
 void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
 			    struct mlx5_vf_vlan *vf_vlan);
 
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					     struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				struct mlx5_devx_create_rq_attr *rq_attr,
-				int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
-	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq
-	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
-	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-
-int mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh, FILE *file);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 9985d30..1302919 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -1,13 +1,15 @@
 // SPDX-License-Identifier: BSD-3-Clause
 /* Copyright 2018 Mellanox Technologies, Ltd */
 
+#include <unistd.h>
+
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
-#include <unistd.h>
 
-#include "mlx5.h"
-#include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_utils.h"
+
 
 /**
  * Allocate flow counters via devx interface.
@@ -934,8 +936,12 @@ struct mlx5_devx_obj *
 /**
  * Dump all flows to file.
  *
- * @param[in] sh
- *   Pointer to context.
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
  * @param[out] file
  *   Pointer to file stream.
  *
@@ -943,23 +949,24 @@ struct mlx5_devx_obj *
  *   0 on success, a nagative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh __rte_unused,
-			FILE *file __rte_unused)
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
 {
 	int ret = 0;
 
 #ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (sh->fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, sh->fdb_domain);
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
 		if (ret)
 			return ret;
 	}
-	assert(sh->rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->rx_domain);
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
 	if (ret)
 		return ret;
-	assert(sh->tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->tx_domain);
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
 #else
 	ret = ENOTSUP;
 #endif
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..0c5afde
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,227 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3b4c5db..ce0109c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include "mlx5.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 970123b..34f3a53 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -31,6 +31,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_flow.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
@@ -5704,6 +5705,8 @@ struct mlx5_flow_counter *
 		   struct rte_flow_error *error __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
 
-	return mlx5_devx_cmd_flow_dump(priv->sh, file);
+	return mlx5_devx_cmd_flow_dump(sh->fdb_domain, sh->rx_domain,
+				       sh->tx_domain, file);
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 5a1b426..d70dd4f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -32,6 +32,7 @@
 #include "mlx5.h"
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_flow.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c936a7f..89168cd 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -37,6 +37,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 67cafd1..2eede1b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -29,6 +29,7 @@
 #include <rte_flow.h>
 
 #include "mlx5.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1a76f6e..5adb4dc 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -34,6 +34,7 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index 5f6554a..feac0f1 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -30,6 +30,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 02/38] mlx5: prepare common library
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 01/38] net/mlx5: separate DevX commands interface Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 03/38] mlx5: share the mlx5 glue reference Matan Azrad
                   ` (37 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and big
part of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h,  are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Includes lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  331 ++++
 drivers/common/mlx5/meson.build                 |  205 +++
 drivers/common/mlx5/mlx5_common.c               |   17 +
 drivers/common/mlx5/mlx5_common.h               |   87 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            |  974 ++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  227 +++
 drivers/common/mlx5/mlx5_glue.c                 | 1138 ++++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  265 ++++
 drivers/common/mlx5/mlx5_prm.h                  | 1884 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   20 +
 drivers/net/mlx5/Makefile                       |  303 +---
 drivers/net/mlx5/meson.build                    |  256 +--
 drivers/net/mlx5/mlx5.c                         |    7 +-
 drivers/net/mlx5/mlx5.h                         |    9 +-
 drivers/net/mlx5/mlx5_devx_cmds.c               |  974 ------------
 drivers/net/mlx5/mlx5_devx_cmds.h               |  227 ---
 drivers/net/mlx5/mlx5_ethdev.c                  |    5 +-
 drivers/net/mlx5/mlx5_flow.c                    |    9 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |    9 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 --------------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ----
 drivers/net/mlx5/mlx5_mac.c                     |    2 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_prm.h                     | 1883 ----------------------
                      |    2 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    8 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    2 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |    5 +-
 mk/rte.app.mk                                   |    1 +
 45 files changed, 5296 insertions(+), 5133 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 8cd037c..4b0d524 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -736,6 +736,7 @@ M: Matan Azrad <matan@mellanox.com>
 M: Shahaf Shuler <shahafs@mellanox.com>
 M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
 T: git://dpdk.org/next/dpdk-next-net-mlx
+F: drivers/common/mlx5/
 F: drivers/net/mlx5/
 F: buildtools/options-ibverbs-static.sh
 F: doc/guides/nics/mlx5.rst
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 3254c52..4775d4b 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,4 +35,8 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+DIRS-y += mlx5
+endif
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/common/meson.build b/drivers/common/meson.build
index fc620f7..ffd06e2 100644
--- a/drivers/common/meson.build
+++ b/drivers/common/meson.build
@@ -2,6 +2,6 @@
 # Copyright(c) 2018 Cavium, Inc
 
 std_deps = ['eal']
-drivers = ['cpt', 'dpaax', 'iavf', 'mvep', 'octeontx', 'octeontx2', 'qat']
+drivers = ['cpt', 'dpaax', 'iavf', 'mlx5', 'mvep', 'octeontx', 'octeontx2', 'qat']
 config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON'
 driver_name_fmt = 'rte_common_@0@'
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
new file mode 100644
index 0000000..b94d3c0
--- /dev/null
+++ b/drivers/common/mlx5/Makefile
@@ -0,0 +1,331 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_common_mlx5.a
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
+LIB_GLUE_VERSION = 20.02.0
+
+# Sources.
+ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+endif
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+endif
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+CFLAGS_mlx5_glue.o += -fPIC
+LDLIBS += -ldl
+else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
+LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
+else
+LDLIBS += -libverbs -lmlx5
+endif
+
+LDLIBS += -lrte_eal
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
+
+EXPORT_MAP := rte_common_mlx5_version.map
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up mlx5_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+mlx5_autoconf.h.new: FORCE
+
+mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
+	$Q $(RM) -f -- '$@'
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_MPLS_SUPPORT \
+		infiniband/verbs.h \
+		enum IBV_FLOW_SPEC_MPLS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAG_RX_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_SWP \
+		infiniband/mlx5dv.h \
+		type 'struct mlx5dv_sw_parsing_caps' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_MPW \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DV_SUPPORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_create_flow_action_packet_reformat \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ESWITCH \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_VLAN \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_push_vlan \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_DEVX_PORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_query_devx_port \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_OBJ \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_create \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DEVX_COUNTERS \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_ASYNC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_query_async \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_dest_devx_tir \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_flow_meter \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_FLOW_DUMP \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dump_dr_domain \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
+		infiniband/mlx5dv.h \
+		enum MLX5_MMAP_GET_NC_PAGES_CMD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_25G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_50G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_100G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
+		infiniband/verbs.h \
+		type 'struct ibv_counter_set_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
+		infiniband/verbs.h \
+		type 'struct ibv_counters_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NL_NLDEV \
+		rdma/rdma_netlink.h \
+		enum RDMA_NL_NLDEV \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_PORT_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_PORT_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_PORT_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_SWITCH_ID \
+		linux/if_link.h \
+		enum IFLA_PHYS_SWITCH_ID \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_PORT_NAME \
+		linux/if_link.h \
+		enum IFLA_PHYS_PORT_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_STATIC_ASSERT \
+		/usr/include/assert.h \
+		define static_assert \
+		$(AUTOCONF_OUTPUT)
+
+# Create mlx5_autoconf.h or update it in case it differs from the new one.
+
+mlx5_autoconf.h: mlx5_autoconf.h.new
+	$Q [ -f '$@' ] && \
+		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
+		mv '$<' '$@'
+
+$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+
+# Generate dependency plug-in for rdma-core when the PMD must not be linked
+# directly, so that applications do not inherit this dependency.
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+
+$(LIB): $(LIB_GLUE)
+
+ifeq ($(LINK_USING_CC),1)
+GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
+else
+GLUE_LDFLAGS := $(LDFLAGS)
+endif
+$(LIB_GLUE): mlx5_glue.o
+	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
+		-shared -o $@ $< -libverbs -lmlx5
+
+mlx5_glue.o: mlx5_autoconf.h
+
+endif
+
+clean_mlx5: FORCE
+	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
+	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
+
+clean: clean_mlx5
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
new file mode 100644
index 0000000..d2eeb45
--- /dev/null
+++ b/drivers/common/mlx5/meson.build
@@ -0,0 +1,205 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+build = true
+
+pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
+LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
+LIB_GLUE_VERSION = '20.02.0'
+LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
+if pmd_dlopen
+	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
+	cflags += [
+		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
+		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
+	]
+endif
+
+libnames = [ 'mlx5', 'ibverbs' ]
+libs = []
+foreach libname:libnames
+	lib = dependency('lib' + libname, required:false)
+	if not lib.found()
+		lib = cc.find_library(libname, required:false)
+	endif
+	if lib.found()
+		libs += [ lib ]
+	else
+		build = false
+		reason = 'missing dependency, "' + libname + '"'
+	endif
+endforeach
+
+if build
+	allow_experimental_apis = true
+	deps += ['hash', 'pci', 'net', 'eal']
+	ext_deps += libs
+	sources = files(
+		'mlx5_devx_cmds.c',
+		'mlx5_common.c',
+	)
+	if not pmd_dlopen
+		sources += files('mlx5_glue.c')
+	endif
+	cflags_options = [
+		'-std=c11',
+		'-Wno-strict-prototypes',
+		'-D_BSD_SOURCE',
+		'-D_DEFAULT_SOURCE',
+		'-D_XOPEN_SOURCE=600'
+	]
+	foreach option:cflags_options
+		if cc.has_argument(option)
+			cflags += option
+		endif
+	endforeach
+	if get_option('buildtype').contains('debug')
+		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+	else
+		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
+	endif
+	# To maintain the compatibility with the make build system
+	# mlx5_autoconf.h file is still generated.
+	# input array for meson member search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search", "struct member to search" ]
+	has_member_args = [
+		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
+		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
+		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
+		'struct ibv_counters_init_attr', 'comp_mask' ],
+	]
+	# input array for meson symbol search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search" ]
+	has_sym_args = [
+		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
+		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
+		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
+		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_create_flow_action_packet_reformat' ],
+		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
+		'IBV_FLOW_SPEC_MPLS' ],
+		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
+		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAG_RX_END_PADDING' ],
+		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_query_devx_port' ],
+		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_create' ],
+		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
+		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
+		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_query_async' ],
+		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_flow_meter' ],
+		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
+		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
+		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
+		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
+		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseLR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseLR4_Full' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
+		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
+		'IFLA_PHYS_SWITCH_ID' ],
+		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
+		'IFLA_PHYS_PORT_NAME' ],
+		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
+		'RDMA_NL_NLDEV' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_GET' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_PORT_GET' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_NAME' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
+		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
+		'mlx5dv_dump_dr_domain'],
+	]
+	config = configuration_data()
+	foreach arg:has_sym_args
+		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
+			dependencies: libs))
+	endforeach
+	foreach arg:has_member_args
+		file_prefix = '#include <' + arg[1] + '>'
+		config.set(arg[0], cc.has_member(arg[2], arg[3],
+			prefix : file_prefix, dependencies: libs))
+	endforeach
+	configure_file(output : 'mlx5_autoconf.h', configuration : config)
+endif
+# Build Glue Library
+if pmd_dlopen and build
+	dlopen_name = 'mlx5_glue'
+	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
+	dlopen_so_version = LIB_GLUE_VERSION
+	dlopen_sources = files('mlx5_glue.c')
+	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
+	dlopen_includes = [global_inc]
+	dlopen_includes += include_directories(
+		'../../../lib/librte_eal/common/include/generic',
+	)
+	shared_lib = shared_library(
+		dlopen_lib_name,
+		dlopen_sources,
+		include_directories: dlopen_includes,
+		c_args: cflags,
+		dependencies: libs,
+		link_args: [
+		'-Wl,-export-dynamic',
+		'-Wl,-h,@0@'.format(LIB_GLUE),
+		],
+		soversion: dlopen_so_version,
+		install: true,
+		install_dir: dlopen_install_dir,
+	)
+endif
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
new file mode 100644
index 0000000..14ebd30
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#include "mlx5_common.h"
+
+
+int mlx5_common_logtype;
+
+
+RTE_INIT(rte_mlx5_common_pmd_init)
+{
+	/* Initialize driver log type. */
+	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
+	if (mlx5_common_logtype >= 0)
+		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
new file mode 100644
index 0000000..9f10def
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_H_
+#define RTE_PMD_MLX5_COMMON_H_
+
+#include <assert.h>
+
+#include <rte_log.h>
+
+
+/*
+ * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
+ * manner.
+ */
+#define PMD_DRV_LOG_STRIP(a, b) a
+#define PMD_DRV_LOG_OPAREN (
+#define PMD_DRV_LOG_CPAREN )
+#define PMD_DRV_LOG_COMMA ,
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+	const char *n = s;
+
+	while (*n)
+		if (*(n++) == '/')
+			s = n;
+	return s;
+}
+
+#define PMD_DRV_LOG___(level, type, name, ...) \
+	rte_log(RTE_LOG_ ## level, \
+		type, \
+		RTE_FMT(name ": " \
+			RTE_FMT_HEAD(__VA_ARGS__,), \
+		RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
+ */
+#ifndef NDEBUG
+
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, "%s:%u: %s(): " __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name,\
+		s "\n" PMD_DRV_LOG_COMMA \
+		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
+		__LINE__ PMD_DRV_LOG_COMMA \
+		__func__, \
+		__VA_ARGS__)
+
+#else /* NDEBUG */
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name, s "\n", __VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* claim_zero() does not perform any check when debugging is disabled. */
+#ifndef NDEBUG
+
+#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
+
+#else /* NDEBUG */
+
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+#define claim_nonzero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
+	char name[mkstr_size_##name + 1]; \
+	\
+	snprintf(name, sizeof(name), "" __VA_ARGS__)
+
+#endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_utils.h b/drivers/common/mlx5/mlx5_common_utils.h
new file mode 100644
index 0000000..32c3adf
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_UTILS_H_
+#define RTE_PMD_MLX5_COMMON_UTILS_H_
+
+#include "mlx5_common.h"
+
+
+extern int mlx5_common_logtype;
+
+#define MLX5_COMMON_LOG_PREFIX "common_mlx5"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_common_logtype, MLX5_COMMON_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_COMMON_UTILS_H_ */
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
new file mode 100644
index 0000000..67e5929
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -0,0 +1,974 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/* Copyright 2018 Mellanox Technologies, Ltd */
+
+#include <unistd.h>
+
+#include <rte_errno.h>
+#include <rte_malloc.h>
+
+#include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_common_utils.h"
+
+
+/**
+ * Allocate flow counters via devx interface.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param dcs
+ *   Pointer to counters properties structure to be filled by the routine.
+ * @param bulk_n_128
+ *   Bulk counter numbers in 128 counters units.
+ *
+ * @return
+ *   Pointer to counter object on success, a negative value otherwise and
+ *   rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
+{
+	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		rte_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
+/**
+ * Query flow counters values.
+ *
+ * @param[in] dcs
+ *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
+ * @param[in] clear
+ *   Whether hardware should clear the counters after the query or not.
+ * @param[in] n_counters
+ *   0 in case of 1 counter to read, otherwise the counter number to read.
+ *  @param pkts
+ *   The number of packets that matched the flow.
+ *  @param bytes
+ *    The number of bytes that matched the flow.
+ *  @param mkey
+ *   The mkey key for batch query.
+ *  @param addr
+ *    The address in the mkey range for batch query.
+ *  @param cmd_comp
+ *   The completion object for asynchronous batch query.
+ *  @param async_id
+ *    The ID to be returned in the asynchronous batch query response.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				 int clear, uint32_t n_counters,
+				 uint64_t *pkts, uint64_t *bytes,
+				 uint32_t mkey, void *addr,
+				 struct mlx5dv_devx_cmd_comp *cmd_comp,
+				 uint64_t async_id)
+{
+	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			MLX5_ST_SZ_BYTES(traffic_counter);
+	uint32_t out[out_len];
+	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
+	void *stats;
+	int rc;
+
+	MLX5_SET(query_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
+	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
+	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
+	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
+
+	if (n_counters) {
+		MLX5_SET(query_flow_counter_in, in, num_of_counters,
+			 n_counters);
+		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
+		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
+		MLX5_SET64(query_flow_counter_in, in, address,
+			   (uint64_t)(uintptr_t)addr);
+	}
+	if (!cmd_comp)
+		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
+					       out_len);
+	else
+		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
+						     out_len, async_id,
+						     cmd_comp);
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
+		rte_errno = rc;
+		return -rc;
+	}
+	if (!n_counters) {
+		stats = MLX5_ADDR_OF(query_flow_counter_out,
+				     out, flow_statistics);
+		*pkts = MLX5_GET64(traffic_counter, stats, packets);
+		*bytes = MLX5_GET64(traffic_counter, stats, octets);
+	}
+	return 0;
+}
+
+/**
+ * Create a new mkey.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] attr
+ *   Attributes of the requested mkey.
+ *
+ * @return
+ *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
+ *   is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+			  struct mlx5_devx_mkey_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
+	void *mkc;
+	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
+	size_t pgsize;
+	uint32_t translation_size;
+
+	if (!mkey) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pgsize = sysconf(_SC_PAGESIZE);
+	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 translation_size);
+	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 0x1);
+	MLX5_SET(mkc, mkc, lr, 0x1);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, attr->pd);
+	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
+	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
+	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
+	MLX5_SET64(mkc, mkc, len, attr->size);
+	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+					       sizeof(out));
+	if (!mkey->obj) {
+		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		rte_errno = errno;
+		rte_free(mkey);
+		return NULL;
+	}
+	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
+	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
+	return mkey;
+}
+
+/**
+ * Get status of devx command response.
+ * Mainly used for asynchronous commands.
+ *
+ * @param[in] out
+ *   The out response buffer.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+int
+mlx5_devx_get_out_command_status(void *out)
+{
+	int status;
+
+	if (!out)
+		return -EINVAL;
+	status = MLX5_GET(query_flow_counter_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
+
+		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
+			syndrome);
+	}
+	return status;
+}
+
+/**
+ * Destroy any object allocated by a Devx API.
+ *
+ * @param[in] obj
+ *   Pointer to a general object.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
+{
+	int ret;
+
+	if (!obj)
+		return 0;
+	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
+	rte_free(obj);
+	return ret;
+}
+
+/**
+ * Query NIC vport context.
+ * Fills minimal inline attribute.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] vport
+ *   vport index
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+static int
+mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
+				      unsigned int vport,
+				      struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
+	void *vctx;
+	int status, syndrome, rc;
+
+	/* Query NIC vport context to determine inline mode. */
+	MLX5_SET(query_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+	if (vport)
+		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_nic_vport_context_out, out, status);
+	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+			    nic_vport_context);
+	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
+					   min_wqe_inline_mode);
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query HCA attributes.
+ * Using those attributes we can check on run time if the device
+ * is having the required capabilities.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+			     struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr;
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in), out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->flow_counter_bulk_alloc_bitmap =
+			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
+	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
+					    flow_counters_dump);
+	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
+	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
+					  eth_net_offloads);
+	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
+	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
+					       flex_parser_protocols);
+	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	if (attr->qos.sup) {
+		MLX5_SET(query_hca_cap_in, in, op_mod,
+			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
+			 MLX5_HCA_CAP_OPMOD_GET_CUR);
+		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
+						 out, sizeof(out));
+		if (rc)
+			goto error;
+		if (status) {
+			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
+				" status %x, syndrome = %x",
+				status, syndrome);
+			return -1;
+		}
+		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+		attr->qos.srtcm_sup =
+				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
+		attr->qos.log_max_flow_meter =
+				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
+		attr->qos.flow_meter_reg_c_ids =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
+	}
+	if (!attr->eth_net_offloads)
+		return 0;
+
+	/* Query HCA offloads for Ethernet protocol. */
+	memset(in, 0, sizeof(in));
+	memset(out, 0, sizeof(out));
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc) {
+		attr->eth_net_offloads = 0;
+		goto error;
+	}
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		attr->eth_net_offloads = 0;
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_vlan_insert);
+	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_cap);
+	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
+					hcattr, tunnel_lro_gre);
+	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
+					  hcattr, tunnel_lro_vxlan);
+	attr->lro_max_msg_sz_mode = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, lro_max_msg_sz_mode);
+	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
+		attr->lro_timer_supported_periods[i] =
+			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_timer_supported_periods[i]);
+	}
+	attr->tunnel_stateless_geneve_rx =
+			    MLX5_GET(per_protocol_networking_offload_caps,
+				     hcattr, tunnel_stateless_geneve_rx);
+	attr->geneve_max_opt_len =
+		    MLX5_GET(per_protocol_networking_offload_caps,
+			     hcattr, max_geneve_opt_len);
+	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_inline_mode);
+	attr->tunnel_stateless_gtp = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, tunnel_stateless_gtp);
+	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return 0;
+	if (attr->eth_virt) {
+		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
+		if (rc) {
+			attr->eth_virt = 0;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query TIS transport domain from QP verbs object using DevX API.
+ *
+ * @param[in] qp
+ *   Pointer to verbs QP returned by ibv_create_qp .
+ * @param[in] tis_num
+ *   TIS number of TIS to query.
+ * @param[out] tis_td
+ *   Pointer to TIS transport domain variable, to be set by the routine.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+			      uint32_t *tis_td)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
+	int rc;
+	void *tis_ctx;
+
+	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
+	MLX5_SET(query_tis_in, in, tisn, tis_num);
+	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query QP using DevX");
+		return -rc;
+	};
+	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
+	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
+	return 0;
+}
+
+/**
+ * Fill WQ data for DevX API command.
+ * Utility function for use when creating DevX objects containing a WQ.
+ *
+ * @param[in] wq_ctx
+ *   Pointer to WQ context to fill with data.
+ * @param [in] wq_attr
+ *   Pointer to WQ attributes structure to fill in WQ context.
+ */
+static void
+devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
+{
+	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
+	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
+	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
+	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
+	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
+	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
+	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
+	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
+	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
+	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
+	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
+	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
+	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
+	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
+	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
+	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
+	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
+	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
+	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
+		 wq_attr->log_hairpin_num_packets);
+	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
+	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
+		 wq_attr->single_wqe_log_num_of_strides);
+	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
+	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
+		 wq_attr->single_stride_log_num_of_bytes);
+	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
+	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
+	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
+}
+
+/**
+ * Create RQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rq_attr
+ *   Pointer to create RQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+			struct mlx5_devx_create_rq_attr *rq_attr,
+			int socket)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *rq = NULL;
+
+	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
+	if (!rq) {
+		DRV_LOG(ERR, "Failed to allocate RQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
+	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
+	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
+	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
+	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
+	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
+	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
+	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
+	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+	wq_attr = &rq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						  out, sizeof(out));
+	if (!rq->obj) {
+		DRV_LOG(ERR, "Failed to create RQ using DevX");
+		rte_errno = errno;
+		rte_free(rq);
+		return NULL;
+	}
+	rq->id = MLX5_GET(create_rq_out, out, rqn);
+	return rq;
+}
+
+/**
+ * Modify RQ using DevX API.
+ *
+ * @param[in] rq
+ *   Pointer to RQ object structure.
+ * @param [in] rq_attr
+ *   Pointer to modify RQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			struct mlx5_devx_modify_rq_attr *rq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	int ret;
+
+	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
+	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
+	MLX5_SET(modify_rq_in, in, rqn, rq->id);
+	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
+	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
+		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
+		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
+		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
+		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
+	}
+	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIR using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tir_attr
+ *   Pointer to TIR attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+			 struct mlx5_devx_tir_attr *tir_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
+	void *tir_ctx, *outer, *inner;
+	struct mlx5_devx_obj *tir = NULL;
+	int i;
+
+	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
+	if (!tir) {
+		DRV_LOG(ERR, "Failed to allocate TIR data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
+	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
+	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
+	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
+		 tir_attr->lro_timeout_period_usecs);
+	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
+	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
+	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
+	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
+	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
+		 tir_attr->tunneled_offload_en);
+	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
+	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
+	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
+	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
+	for (i = 0; i < 10; i++) {
+		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
+			 tir_attr->rx_hash_toeplitz_key[i]);
+	}
+	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
+	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, selected_fields,
+		 tir_attr->rx_hash_field_selector_outer.selected_fields);
+	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
+	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, selected_fields,
+		 tir_attr->rx_hash_field_selector_inner.selected_fields);
+	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						   out, sizeof(out));
+	if (!tir->obj) {
+		DRV_LOG(ERR, "Failed to create TIR using DevX");
+		rte_errno = errno;
+		rte_free(tir);
+		return NULL;
+	}
+	tir->id = MLX5_GET(create_tir_out, out, tirn);
+	return tir;
+}
+
+/**
+ * Create RQT using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t *in = NULL;
+	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
+	void *rqt_ctx;
+	struct mlx5_devx_obj *rqt = NULL;
+	int i;
+
+	in = rte_calloc(__func__, 1, inlen, 0);
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT IN data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
+	if (!rqt) {
+		DRV_LOG(ERR, "Failed to allocate RQT data");
+		rte_errno = ENOMEM;
+		rte_free(in);
+		return NULL;
+	}
+	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
+	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (!rqt->obj) {
+		DRV_LOG(ERR, "Failed to create RQT using DevX");
+		rte_errno = errno;
+		rte_free(rqt);
+		return NULL;
+	}
+	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
+	return rqt;
+}
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
+
+/**
+ * Dump all flows to file.
+ *
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
+ * @param[out] file
+ *   Pointer to file stream.
+ *
+ * @return
+ *   0 on success, a nagative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
+{
+	int ret = 0;
+
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
+		if (ret)
+			return ret;
+	}
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
+	if (ret)
+		return ret;
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
+#else
+	ret = ENOTSUP;
+#endif
+	return -ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..0c5afde
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,227 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
new file mode 100644
index 0000000..d5bc84e
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -0,0 +1,1138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+/*
+ * Not needed by this file; included to work around the lack of off_t
+ * definition for mlx5dv.h with unpatched rdma-core versions.
+ */
+#include <sys/types.h>
+
+#include <rte_config.h>
+
+#include "mlx5_glue.h"
+
+static int
+mlx5_glue_fork_init(void)
+{
+	return ibv_fork_init();
+}
+
+static struct ibv_pd *
+mlx5_glue_alloc_pd(struct ibv_context *context)
+{
+	return ibv_alloc_pd(context);
+}
+
+static int
+mlx5_glue_dealloc_pd(struct ibv_pd *pd)
+{
+	return ibv_dealloc_pd(pd);
+}
+
+static struct ibv_device **
+mlx5_glue_get_device_list(int *num_devices)
+{
+	return ibv_get_device_list(num_devices);
+}
+
+static void
+mlx5_glue_free_device_list(struct ibv_device **list)
+{
+	ibv_free_device_list(list);
+}
+
+static struct ibv_context *
+mlx5_glue_open_device(struct ibv_device *device)
+{
+	return ibv_open_device(device);
+}
+
+static int
+mlx5_glue_close_device(struct ibv_context *context)
+{
+	return ibv_close_device(context);
+}
+
+static int
+mlx5_glue_query_device(struct ibv_context *context,
+		       struct ibv_device_attr *device_attr)
+{
+	return ibv_query_device(context, device_attr);
+}
+
+static int
+mlx5_glue_query_device_ex(struct ibv_context *context,
+			  const struct ibv_query_device_ex_input *input,
+			  struct ibv_device_attr_ex *attr)
+{
+	return ibv_query_device_ex(context, input, attr);
+}
+
+static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+			  struct ibv_values_ex *values)
+{
+	return ibv_query_rt_values_ex(context, values);
+}
+
+static int
+mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
+		     struct ibv_port_attr *port_attr)
+{
+	return ibv_query_port(context, port_num, port_attr);
+}
+
+static struct ibv_comp_channel *
+mlx5_glue_create_comp_channel(struct ibv_context *context)
+{
+	return ibv_create_comp_channel(context);
+}
+
+static int
+mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
+{
+	return ibv_destroy_comp_channel(channel);
+}
+
+static struct ibv_cq *
+mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
+		    struct ibv_comp_channel *channel, int comp_vector)
+{
+	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
+}
+
+static int
+mlx5_glue_destroy_cq(struct ibv_cq *cq)
+{
+	return ibv_destroy_cq(cq);
+}
+
+static int
+mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
+		       void **cq_context)
+{
+	return ibv_get_cq_event(channel, cq, cq_context);
+}
+
+static void
+mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
+{
+	ibv_ack_cq_events(cq, nevents);
+}
+
+static struct ibv_rwq_ind_table *
+mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
+			       struct ibv_rwq_ind_table_init_attr *init_attr)
+{
+	return ibv_create_rwq_ind_table(context, init_attr);
+}
+
+static int
+mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
+{
+	return ibv_destroy_rwq_ind_table(rwq_ind_table);
+}
+
+static struct ibv_wq *
+mlx5_glue_create_wq(struct ibv_context *context,
+		    struct ibv_wq_init_attr *wq_init_attr)
+{
+	return ibv_create_wq(context, wq_init_attr);
+}
+
+static int
+mlx5_glue_destroy_wq(struct ibv_wq *wq)
+{
+	return ibv_destroy_wq(wq);
+}
+static int
+mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
+{
+	return ibv_modify_wq(wq, wq_attr);
+}
+
+static struct ibv_flow *
+mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
+{
+	return ibv_create_flow(qp, flow);
+}
+
+static int
+mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
+{
+	return ibv_destroy_flow(flow_id);
+}
+
+static int
+mlx5_glue_destroy_flow_action(void *action)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_destroy(action);
+#else
+	struct mlx5dv_flow_action_attr *attr = action;
+	int res = 0;
+	switch (attr->type) {
+	case MLX5DV_FLOW_ACTION_TAG:
+		break;
+	default:
+		res = ibv_destroy_flow_action(attr->action);
+		break;
+	}
+	free(action);
+	return res;
+#endif
+#else
+	(void)action;
+	return ENOTSUP;
+#endif
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
+{
+	return ibv_create_qp(pd, qp_init_attr);
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp_ex(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
+{
+	return ibv_create_qp_ex(context, qp_init_attr_ex);
+}
+
+static int
+mlx5_glue_destroy_qp(struct ibv_qp *qp)
+{
+	return ibv_destroy_qp(qp);
+}
+
+static int
+mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
+{
+	return ibv_modify_qp(qp, attr, attr_mask);
+}
+
+static struct ibv_mr *
+mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
+{
+	return ibv_reg_mr(pd, addr, length, access);
+}
+
+static int
+mlx5_glue_dereg_mr(struct ibv_mr *mr)
+{
+	return ibv_dereg_mr(mr);
+}
+
+static struct ibv_counter_set *
+mlx5_glue_create_counter_set(struct ibv_context *context,
+			     struct ibv_counter_set_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)init_attr;
+	return NULL;
+#else
+	return ibv_create_counter_set(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)cs;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counter_set(cs);
+#endif
+}
+
+static int
+mlx5_glue_describe_counter_set(struct ibv_context *context,
+			       uint16_t counter_set_id,
+			       struct ibv_counter_set_description *cs_desc)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)counter_set_id;
+	(void)cs_desc;
+	return ENOTSUP;
+#else
+	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
+#endif
+}
+
+static int
+mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
+			    struct ibv_counter_set_data *cs_data)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)query_attr;
+	(void)cs_data;
+	return ENOTSUP;
+#else
+	return ibv_query_counter_set(query_attr, cs_data);
+#endif
+}
+
+static struct ibv_counters *
+mlx5_glue_create_counters(struct ibv_context *context,
+			  struct ibv_counters_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)context;
+	(void)init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return ibv_create_counters(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counters(struct ibv_counters *counters)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counters(counters);
+#endif
+}
+
+static int
+mlx5_glue_attach_counters(struct ibv_counters *counters,
+			  struct ibv_counter_attach_attr *attr,
+			  struct ibv_flow *flow)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)attr;
+	(void)flow;
+	return ENOTSUP;
+#else
+	return ibv_attach_counters_point_flow(counters, attr, flow);
+#endif
+}
+
+static int
+mlx5_glue_query_counters(struct ibv_counters *counters,
+			 uint64_t *counters_value,
+			 uint32_t ncounters,
+			 uint32_t flags)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)counters_value;
+	(void)ncounters;
+	(void)flags;
+	return ENOTSUP;
+#else
+	return ibv_read_counters(counters, counters_value, ncounters, flags);
+#endif
+}
+
+static void
+mlx5_glue_ack_async_event(struct ibv_async_event *event)
+{
+	ibv_ack_async_event(event);
+}
+
+static int
+mlx5_glue_get_async_event(struct ibv_context *context,
+			  struct ibv_async_event *event)
+{
+	return ibv_get_async_event(context, event);
+}
+
+static const char *
+mlx5_glue_port_state_str(enum ibv_port_state port_state)
+{
+	return ibv_port_state_str(port_state);
+}
+
+static struct ibv_cq *
+mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
+{
+	return ibv_cq_ex_to_cq(cq);
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_table(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
+#else
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_dest_vport(domain, port);
+#else
+	(void)domain;
+	(void)port;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_drop(void)
+{
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_drop();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
+					  rte_be32_t vlan_tag)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
+#else
+	(void)domain;
+	(void)vlan_tag;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_pop_vlan(void)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_pop_vlan();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_create(domain, level);
+#else
+	(void)domain;
+	(void)level;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_destroy(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_domain(struct ibv_context *ctx,
+			   enum  mlx5dv_dr_domain_type domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_create(ctx, domain);
+#else
+	(void)ctx;
+	(void)domain;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_domain(void *domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_destroy(domain);
+#else
+	(void)domain;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_cq_ex *
+mlx5_glue_dv_create_cq(struct ibv_context *context,
+		       struct ibv_cq_init_attr_ex *cq_attr,
+		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
+{
+	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
+}
+
+static struct ibv_wq *
+mlx5_glue_dv_create_wq(struct ibv_context *context,
+		       struct ibv_wq_init_attr *wq_attr,
+		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
+{
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+	(void)context;
+	(void)wq_attr;
+	(void)mlx5_wq_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
+#endif
+}
+
+static int
+mlx5_glue_dv_query_device(struct ibv_context *ctx,
+			  struct mlx5dv_context *attrs_out)
+{
+	return mlx5dv_query_device(ctx, attrs_out);
+}
+
+static int
+mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
+			      enum mlx5dv_set_ctx_attr_type type, void *attr)
+{
+	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
+}
+
+static int
+mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
+{
+	return mlx5dv_init_obj(obj, obj_type);
+}
+
+static struct ibv_qp *
+mlx5_glue_dv_create_qp(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
+{
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
+#else
+	(void)context;
+	(void)qp_init_attr_ex;
+	(void)dv_qp_init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
+				 struct mlx5dv_flow_matcher_attr *matcher_attr,
+				 void *tbl)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)context;
+	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
+					matcher_attr->match_criteria_enable,
+					matcher_attr->match_mask);
+#else
+	(void)tbl;
+	return mlx5dv_create_flow_matcher(context, matcher_attr);
+#endif
+#else
+	(void)context;
+	(void)matcher_attr;
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow(void *matcher,
+			 void *match_value,
+			 size_t num_actions,
+			 void *actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
+				     (struct mlx5dv_dr_action **)actions);
+#else
+	struct mlx5dv_flow_action_attr actions_attr[8];
+
+	if (num_actions > 8)
+		return NULL;
+	for (size_t i = 0; i < num_actions; i++)
+		actions_attr[i] =
+			*((struct mlx5dv_flow_action_attr *)(actions[i]));
+	return mlx5dv_create_flow(matcher, match_value,
+				  num_actions, actions_attr);
+#endif
+#else
+	(void)matcher;
+	(void)match_value;
+	(void)num_actions;
+	(void)actions;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)offset;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
+	action->obj = counter_obj;
+	return action;
+#endif
+#else
+	(void)counter_obj;
+	(void)offset;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
+	action->obj = qp;
+	return action;
+#endif
+#else
+	(void)qp;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
+{
+#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
+	return mlx5dv_dr_action_create_dest_devx_tir(tir);
+#else
+	(void)tir;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_modify_header
+					(struct ibv_context *ctx,
+					 enum mlx5dv_flow_table_type ft_type,
+					 void *domain, uint64_t flags,
+					 size_t actions_sz,
+					 uint64_t actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
+						     (__be64 *)actions);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)domain;
+	(void)flags;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_modify_header
+		(ctx, actions_sz, actions, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)actions_sz;
+	(void)actions;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_packet_reformat
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
+						       reformat_type, data_sz,
+						       data);
+#else
+	(void)domain;
+	(void)flags;
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_packet_reformat
+		(ctx, data_sz, data, reformat_type, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)reformat_type;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)data_sz;
+	(void)data;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_tag(tag);
+#else
+	struct mlx5dv_flow_action_attr *action;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_TAG;
+	action->tag_value = tag;
+	return action;
+#endif
+#endif
+	(void)tag;
+	errno = ENOTSUP;
+	return NULL;
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_create_flow_meter(attr);
+#else
+	(void)attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dv_modify_flow_action_meter(void *action,
+				      struct mlx5dv_dr_flow_meter_attr *attr,
+				      uint64_t modify_bits)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
+#else
+	(void)action;
+	(void)attr;
+	(void)modify_bits;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow(void *flow_id)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_destroy(flow_id);
+#else
+	return ibv_destroy_flow(flow_id);
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow_matcher(void *matcher)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_matcher_destroy(matcher);
+#else
+	return mlx5dv_destroy_flow_matcher(matcher);
+#endif
+#else
+	(void)matcher;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_context *
+mlx5_glue_dv_open_device(struct ibv_device *device)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_open_device(device,
+				  &(struct mlx5dv_context_attr){
+					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
+				  });
+#else
+	(void)device;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static struct mlx5dv_devx_obj *
+mlx5_glue_devx_obj_create(struct ibv_context *ctx,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_destroy(obj);
+#else
+	(void)obj;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
+			 const void *in, size_t inlen,
+			 void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
+			   const void *in, size_t inlen,
+			   void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_cmd_comp *
+mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_create_cmd_comp(ctx);
+#else
+	(void)ctx;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
+#else
+	(void)cmd_comp;
+	errno = -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
+			       size_t inlen, size_t outlen, uint64_t wr_id,
+			       struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
+					   cmd_comp);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)outlen;
+	(void)wr_id;
+	(void)cmd_comp;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
+				  size_t cmd_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
+					      cmd_resp_len);
+#else
+	(void)cmd_comp;
+	(void)cmd_resp;
+	(void)cmd_resp_len;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_umem *
+mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
+			uint32_t access)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_reg(context, addr, size, access);
+#else
+	(void)context;
+	(void)addr;
+	(void)size;
+	(void)access;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_dereg(dv_devx_umem);
+#else
+	(void)dv_devx_umem;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_qp_query(struct ibv_qp *qp,
+			const void *in, size_t inlen,
+			void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
+#else
+	(void)qp;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_devx_port_query(struct ibv_context *ctx,
+			  uint32_t port_num,
+			  struct mlx5dv_devx_port *mlx5_devx_port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
+#else
+	(void)ctx;
+	(void)port_num;
+	(void)mlx5_devx_port;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dr_dump_domain(FILE *file, void *domain)
+{
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	return mlx5dv_dump_dr_domain(file, domain);
+#else
+	RTE_SET_USED(file);
+	RTE_SET_USED(domain);
+	return -ENOTSUP;
+#endif
+}
+
+alignas(RTE_CACHE_LINE_SIZE)
+const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
+	.version = MLX5_GLUE_VERSION,
+	.fork_init = mlx5_glue_fork_init,
+	.alloc_pd = mlx5_glue_alloc_pd,
+	.dealloc_pd = mlx5_glue_dealloc_pd,
+	.get_device_list = mlx5_glue_get_device_list,
+	.free_device_list = mlx5_glue_free_device_list,
+	.open_device = mlx5_glue_open_device,
+	.close_device = mlx5_glue_close_device,
+	.query_device = mlx5_glue_query_device,
+	.query_device_ex = mlx5_glue_query_device_ex,
+	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
+	.query_port = mlx5_glue_query_port,
+	.create_comp_channel = mlx5_glue_create_comp_channel,
+	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
+	.create_cq = mlx5_glue_create_cq,
+	.destroy_cq = mlx5_glue_destroy_cq,
+	.get_cq_event = mlx5_glue_get_cq_event,
+	.ack_cq_events = mlx5_glue_ack_cq_events,
+	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
+	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
+	.create_wq = mlx5_glue_create_wq,
+	.destroy_wq = mlx5_glue_destroy_wq,
+	.modify_wq = mlx5_glue_modify_wq,
+	.create_flow = mlx5_glue_create_flow,
+	.destroy_flow = mlx5_glue_destroy_flow,
+	.destroy_flow_action = mlx5_glue_destroy_flow_action,
+	.create_qp = mlx5_glue_create_qp,
+	.create_qp_ex = mlx5_glue_create_qp_ex,
+	.destroy_qp = mlx5_glue_destroy_qp,
+	.modify_qp = mlx5_glue_modify_qp,
+	.reg_mr = mlx5_glue_reg_mr,
+	.dereg_mr = mlx5_glue_dereg_mr,
+	.create_counter_set = mlx5_glue_create_counter_set,
+	.destroy_counter_set = mlx5_glue_destroy_counter_set,
+	.describe_counter_set = mlx5_glue_describe_counter_set,
+	.query_counter_set = mlx5_glue_query_counter_set,
+	.create_counters = mlx5_glue_create_counters,
+	.destroy_counters = mlx5_glue_destroy_counters,
+	.attach_counters = mlx5_glue_attach_counters,
+	.query_counters = mlx5_glue_query_counters,
+	.ack_async_event = mlx5_glue_ack_async_event,
+	.get_async_event = mlx5_glue_get_async_event,
+	.port_state_str = mlx5_glue_port_state_str,
+	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
+	.dr_create_flow_action_dest_flow_tbl =
+		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
+	.dr_create_flow_action_dest_port =
+		mlx5_glue_dr_create_flow_action_dest_port,
+	.dr_create_flow_action_drop =
+		mlx5_glue_dr_create_flow_action_drop,
+	.dr_create_flow_action_push_vlan =
+		mlx5_glue_dr_create_flow_action_push_vlan,
+	.dr_create_flow_action_pop_vlan =
+		mlx5_glue_dr_create_flow_action_pop_vlan,
+	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
+	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
+	.dr_create_domain = mlx5_glue_dr_create_domain,
+	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
+	.dv_create_cq = mlx5_glue_dv_create_cq,
+	.dv_create_wq = mlx5_glue_dv_create_wq,
+	.dv_query_device = mlx5_glue_dv_query_device,
+	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
+	.dv_init_obj = mlx5_glue_dv_init_obj,
+	.dv_create_qp = mlx5_glue_dv_create_qp,
+	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
+	.dv_create_flow = mlx5_glue_dv_create_flow,
+	.dv_create_flow_action_counter =
+		mlx5_glue_dv_create_flow_action_counter,
+	.dv_create_flow_action_dest_ibv_qp =
+		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
+	.dv_create_flow_action_dest_devx_tir =
+		mlx5_glue_dv_create_flow_action_dest_devx_tir,
+	.dv_create_flow_action_modify_header =
+		mlx5_glue_dv_create_flow_action_modify_header,
+	.dv_create_flow_action_packet_reformat =
+		mlx5_glue_dv_create_flow_action_packet_reformat,
+	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
+	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
+	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
+	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
+	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
+	.dv_open_device = mlx5_glue_dv_open_device,
+	.devx_obj_create = mlx5_glue_devx_obj_create,
+	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
+	.devx_obj_query = mlx5_glue_devx_obj_query,
+	.devx_obj_modify = mlx5_glue_devx_obj_modify,
+	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
+	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
+	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
+	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
+	.devx_umem_reg = mlx5_glue_devx_umem_reg,
+	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
+	.devx_qp_query = mlx5_glue_devx_qp_query,
+	.devx_port_query = mlx5_glue_devx_port_query,
+	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+};
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
new file mode 100644
index 0000000..f4c3180
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -0,0 +1,265 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#ifndef MLX5_GLUE_H_
+#define MLX5_GLUE_H_
+
+#include <stddef.h>
+#include <stdint.h>
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+#ifndef MLX5_GLUE_VERSION
+#define MLX5_GLUE_VERSION ""
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+struct ibv_counter_set;
+struct ibv_counter_set_data;
+struct ibv_counter_set_description;
+struct ibv_counter_set_init_attr;
+struct ibv_query_counter_set_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+struct ibv_counters;
+struct ibv_counters_init_attr;
+struct ibv_counter_attach_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+struct mlx5dv_qp_init_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+struct mlx5dv_wq_init_attr;
+#endif
+
+#ifndef HAVE_IBV_FLOW_DV_SUPPORT
+struct mlx5dv_flow_matcher;
+struct mlx5dv_flow_matcher_attr;
+struct mlx5dv_flow_action_attr;
+struct mlx5dv_flow_match_parameters;
+struct mlx5dv_dr_flow_meter_attr;
+struct ibv_flow_action;
+enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
+enum mlx5dv_flow_table_type { flow_table_type = 0, };
+#endif
+
+#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
+#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
+#endif
+
+#ifndef HAVE_IBV_DEVX_OBJ
+struct mlx5dv_devx_obj;
+struct mlx5dv_devx_umem { uint32_t umem_id; };
+#endif
+
+#ifndef HAVE_IBV_DEVX_ASYNC
+struct mlx5dv_devx_cmd_comp;
+struct mlx5dv_devx_async_cmd_hdr;
+#endif
+
+#ifndef HAVE_MLX5DV_DR
+enum  mlx5dv_dr_domain_type { unused, };
+struct mlx5dv_dr_domain;
+#endif
+
+#ifndef HAVE_MLX5DV_DR_DEVX_PORT
+struct mlx5dv_devx_port;
+#endif
+
+#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
+struct mlx5dv_dr_flow_meter_attr;
+#endif
+
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
+struct mlx5_glue {
+	const char *version;
+	int (*fork_init)(void);
+	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
+	int (*dealloc_pd)(struct ibv_pd *pd);
+	struct ibv_device **(*get_device_list)(int *num_devices);
+	void (*free_device_list)(struct ibv_device **list);
+	struct ibv_context *(*open_device)(struct ibv_device *device);
+	int (*close_device)(struct ibv_context *context);
+	int (*query_device)(struct ibv_context *context,
+			    struct ibv_device_attr *device_attr);
+	int (*query_device_ex)(struct ibv_context *context,
+			       const struct ibv_query_device_ex_input *input,
+			       struct ibv_device_attr_ex *attr);
+	int (*query_rt_values_ex)(struct ibv_context *context,
+			       struct ibv_values_ex *values);
+	int (*query_port)(struct ibv_context *context, uint8_t port_num,
+			  struct ibv_port_attr *port_attr);
+	struct ibv_comp_channel *(*create_comp_channel)
+		(struct ibv_context *context);
+	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
+	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
+				    void *cq_context,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector);
+	int (*destroy_cq)(struct ibv_cq *cq);
+	int (*get_cq_event)(struct ibv_comp_channel *channel,
+			    struct ibv_cq **cq, void **cq_context);
+	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
+	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
+		(struct ibv_context *context,
+		 struct ibv_rwq_ind_table_init_attr *init_attr);
+	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
+	struct ibv_wq *(*create_wq)(struct ibv_context *context,
+				    struct ibv_wq_init_attr *wq_init_attr);
+	int (*destroy_wq)(struct ibv_wq *wq);
+	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
+	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
+					struct ibv_flow_attr *flow);
+	int (*destroy_flow)(struct ibv_flow *flow_id);
+	int (*destroy_flow_action)(void *action);
+	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *qp_init_attr);
+	struct ibv_qp *(*create_qp_ex)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
+	int (*destroy_qp)(struct ibv_qp *qp);
+	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+			 int attr_mask);
+	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
+				 size_t length, int access);
+	int (*dereg_mr)(struct ibv_mr *mr);
+	struct ibv_counter_set *(*create_counter_set)
+		(struct ibv_context *context,
+		 struct ibv_counter_set_init_attr *init_attr);
+	int (*destroy_counter_set)(struct ibv_counter_set *cs);
+	int (*describe_counter_set)
+		(struct ibv_context *context,
+		 uint16_t counter_set_id,
+		 struct ibv_counter_set_description *cs_desc);
+	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
+				 struct ibv_counter_set_data *cs_data);
+	struct ibv_counters *(*create_counters)
+		(struct ibv_context *context,
+		 struct ibv_counters_init_attr *init_attr);
+	int (*destroy_counters)(struct ibv_counters *counters);
+	int (*attach_counters)(struct ibv_counters *counters,
+			       struct ibv_counter_attach_attr *attr,
+			       struct ibv_flow *flow);
+	int (*query_counters)(struct ibv_counters *counters,
+			      uint64_t *counters_value,
+			      uint32_t ncounters,
+			      uint32_t flags);
+	void (*ack_async_event)(struct ibv_async_event *event);
+	int (*get_async_event)(struct ibv_context *context,
+			       struct ibv_async_event *event);
+	const char *(*port_state_str)(enum ibv_port_state port_state);
+	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
+	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
+	void *(*dr_create_flow_action_dest_port)(void *domain,
+						 uint32_t port);
+	void *(*dr_create_flow_action_drop)();
+	void *(*dr_create_flow_action_push_vlan)
+					(struct mlx5dv_dr_domain *domain,
+					 rte_be32_t vlan_tag);
+	void *(*dr_create_flow_action_pop_vlan)();
+	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
+	int (*dr_destroy_flow_tbl)(void *tbl);
+	void *(*dr_create_domain)(struct ibv_context *ctx,
+				  enum mlx5dv_dr_domain_type domain);
+	int (*dr_destroy_domain)(void *domain);
+	struct ibv_cq_ex *(*dv_create_cq)
+		(struct ibv_context *context,
+		 struct ibv_cq_init_attr_ex *cq_attr,
+		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
+	struct ibv_wq *(*dv_create_wq)
+		(struct ibv_context *context,
+		 struct ibv_wq_init_attr *wq_attr,
+		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
+	int (*dv_query_device)(struct ibv_context *ctx_in,
+			       struct mlx5dv_context *attrs_out);
+	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
+				   enum mlx5dv_set_ctx_attr_type type,
+				   void *attr);
+	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
+	struct ibv_qp *(*dv_create_qp)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
+	void *(*dv_create_flow_matcher)
+		(struct ibv_context *context,
+		 struct mlx5dv_flow_matcher_attr *matcher_attr,
+		 void *tbl);
+	void *(*dv_create_flow)(void *matcher, void *match_value,
+			  size_t num_actions, void *actions[]);
+	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
+	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
+	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
+	void *(*dv_create_flow_action_modify_header)
+		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
+		 void *domain, uint64_t flags, size_t actions_sz,
+		 uint64_t actions[]);
+	void *(*dv_create_flow_action_packet_reformat)
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data);
+	void *(*dv_create_flow_action_tag)(uint32_t tag);
+	void *(*dv_create_flow_action_meter)
+		(struct mlx5dv_dr_flow_meter_attr *attr);
+	int (*dv_modify_flow_action_meter)(void *action,
+		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
+	int (*dv_destroy_flow)(void *flow);
+	int (*dv_destroy_flow_matcher)(void *matcher);
+	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_obj *(*devx_obj_create)
+					(struct ibv_context *ctx,
+					 const void *in, size_t inlen,
+					 void *out, size_t outlen);
+	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
+	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
+			      const void *in, size_t inlen,
+			      void *out, size_t outlen);
+	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
+			       const void *in, size_t inlen,
+			       void *out, size_t outlen);
+	int (*devx_general_cmd)(struct ibv_context *context,
+				const void *in, size_t inlen,
+				void *out, size_t outlen);
+	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
+					(struct ibv_context *context);
+	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
+				    const void *in, size_t inlen,
+				    size_t outlen, uint64_t wr_id,
+				    struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				       struct mlx5dv_devx_async_cmd_hdr *resp,
+				       size_t cmd_resp_len);
+	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
+						  void *addr, size_t size,
+						  uint32_t access);
+	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
+	int (*devx_qp_query)(struct ibv_qp *qp,
+			     const void *in, size_t inlen,
+			     void *out, size_t outlen);
+	int (*devx_port_query)(struct ibv_context *ctx,
+			       uint32_t port_num,
+			       struct mlx5dv_devx_port *mlx5_devx_port);
+	int (*dr_dump_domain)(FILE *file, void *domain);
+};
+
+const struct mlx5_glue *mlx5_glue;
+
+#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
new file mode 100644
index 0000000..4b521b2
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -0,0 +1,1884 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2016 6WIND S.A.
+ * Copyright 2016 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_PRM_H_
+#define RTE_PMD_MLX5_PRM_H_
+
+#include <assert.h>
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_vect.h>
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+/* RSS hash key size. */
+#define MLX5_RSS_HASH_KEY_LEN 40
+
+/* Get CQE owner bit. */
+#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
+
+/* Get CQE format. */
+#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
+
+/* Get CQE opcode. */
+#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
+
+/* Get CQE solicited event. */
+#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
+
+/* Invalidate a CQE. */
+#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
+
+/* WQE Segment sizes in bytes. */
+#define MLX5_WSEG_SIZE 16u
+#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
+#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
+#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
+
+/* WQE/WQEBB size in bytes. */
+#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
+
+/*
+ * Max size of a WQE session.
+ * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
+ * the WQE size field in Control Segment is 6 bits wide.
+ */
+#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
+
+/*
+ * Default minimum number of Tx queues for inlining packets.
+ * If there are less queues as specified we assume we have
+ * no enough CPU resources (cycles) to perform inlining,
+ * the PCIe throughput is not supposed as bottleneck and
+ * inlining is disabled.
+ */
+#define MLX5_INLINE_MAX_TXQS 8u
+#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
+
+/*
+ * Default packet length threshold to be inlined with
+ * enhanced MPW. If packet length exceeds the threshold
+ * the data are not inlined. Should be aligned in WQEBB
+ * boundary with accounting the title Control and Ethernet
+ * segments.
+ */
+#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Maximal inline data length sent with enhanced MPW.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Minimal amount of packets to be sent with EMPW.
+ * This limits the minimal required size of sent EMPW.
+ * If there are no enough resources to built minimal
+ * EMPW the sending loop exits.
+ */
+#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
+/*
+ * Maximal amount of packets to be sent with EMPW.
+ * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
+ * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
+ * without CQE generation request, being multiplied by
+ * MLX5_TX_COMP_MAX_CQE it may cause significant latency
+ * in tx burst routine at the moment of freeing multiple mbufs.
+ */
+#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
+#define MLX5_MPW_MAX_PACKETS 6
+#define MLX5_MPW_INLINE_MAX_PACKETS 2
+
+/*
+ * Default packet length threshold to be inlined with
+ * ordinary SEND. Inlining saves the MR key search
+ * and extra PCIe data fetch transaction, but eats the
+ * CPU cycles.
+ */
+#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE)
+/*
+ * Maximal inline data length sent with ordinary SEND.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE)
+
+/* Missed in mlv5dv.h, should define here. */
+#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
+
+/* CQE value to inform that VLAN is stripped. */
+#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
+
+/* IPv4 options. */
+#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
+
+/* IPv6 packet. */
+#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
+
+/* IPv4 packet. */
+#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
+
+/* TCP packet. */
+#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
+
+/* UDP packet. */
+#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
+
+/* IP is fragmented. */
+#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
+
+/* L2 header is valid. */
+#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
+
+/* L3 header is valid. */
+#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
+
+/* L4 header is valid. */
+#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
+
+/* Outer packet, 0 IPv4, 1 IPv6. */
+#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
+
+/* Tunnel packet bit in the CQE. */
+#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
+
+/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
+#define MLX5_CQE_LRO_PUSH_MASK 0x40
+
+/* Mask for L4 type in the CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_MASK 0x70
+
+/* The bit index of L4 type in CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_SHIFT 0x4
+
+/* L4 type to indicate TCP packet without acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
+
+/* L4 type to indicate TCP packet with acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
+
+/* Inner L3 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
+
+/* Inner L4 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
+
+/* Outer L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
+
+/* Outer L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
+
+/* Outer L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
+
+/* Outer L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
+
+/* Inner L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
+
+/* Inner L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
+
+/* Inner L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
+
+/* Inner L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
+
+/* VLAN insertion flag. */
+#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
+
+/* Data inline segment flag. */
+#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
+
+/* Is flow mark valid. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
+#else
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
+#endif
+
+/* INVALID is used by packets matching no flow rules. */
+#define MLX5_FLOW_MARK_INVALID 0
+
+/* Maximum allowed value to mark a packet. */
+#define MLX5_FLOW_MARK_MAX 0xfffff0
+
+/* Default mark value used when none is provided. */
+#define MLX5_FLOW_MARK_DEFAULT 0xffffff
+
+/* Default mark mask for metadata legacy mode. */
+#define MLX5_FLOW_MARK_MASK 0xffffff
+
+/* Maximum number of DS in WQE. Limited by 6-bit field. */
+#define MLX5_DSEG_MAX 63
+
+/* The completion mode offset in the WQE control segment line 2. */
+#define MLX5_COMP_MODE_OFFSET 2
+
+/* Amount of data bytes in minimal inline data segment. */
+#define MLX5_DSEG_MIN_INLINE_SIZE 12u
+
+/* Amount of data bytes in minimal inline eth segment. */
+#define MLX5_ESEG_MIN_INLINE_SIZE 18u
+
+/* Amount of data bytes after eth data segment. */
+#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
+
+/* The maximum log value of segments per RQ WQE. */
+#define MLX5_MAX_LOG_RQ_SEGS 5u
+
+/* The alignment needed for WQ buffer. */
+#define MLX5_WQE_BUF_ALIGNMENT 512
+
+/* Completion mode. */
+enum mlx5_completion_mode {
+	MLX5_COMP_ONLY_ERR = 0x0,
+	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
+	MLX5_COMP_ALWAYS = 0x2,
+	MLX5_COMP_CQE_AND_EQE = 0x3,
+};
+
+/* MPW mode. */
+enum mlx5_mpw_mode {
+	MLX5_MPW_DISABLED,
+	MLX5_MPW,
+	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
+};
+
+/* WQE Control segment. */
+struct mlx5_wqe_cseg {
+	uint32_t opcode;
+	uint32_t sq_ds;
+	uint32_t flags;
+	uint32_t misc;
+} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
+
+/* Header of data segment. Minimal size Data Segment */
+struct mlx5_wqe_dseg {
+	uint32_t bcount;
+	union {
+		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
+		struct {
+			uint32_t lkey;
+			uint64_t pbuf;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* Subset of struct WQE Ethernet Segment. */
+struct mlx5_wqe_eseg {
+	union {
+		struct {
+			uint32_t swp_offs;
+			uint8_t	cs_flags;
+			uint8_t	swp_flags;
+			uint16_t mss;
+			uint32_t metadata;
+			uint16_t inline_hdr_sz;
+			union {
+				uint16_t inline_data;
+				uint16_t vlan_tag;
+			};
+		} __rte_packed;
+		struct {
+			uint32_t offsets;
+			uint32_t flags;
+			uint32_t flow_metadata;
+			uint32_t inline_hdr;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* The title WQEBB, header of WQE. */
+struct mlx5_wqe {
+	union {
+		struct mlx5_wqe_cseg cseg;
+		uint32_t ctrl[4];
+	};
+	struct mlx5_wqe_eseg eseg;
+	union {
+		struct mlx5_wqe_dseg dseg[2];
+		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
+	};
+} __rte_packed;
+
+/* WQE for Multi-Packet RQ. */
+struct mlx5_wqe_mprq {
+	struct mlx5_wqe_srq_next_seg next_seg;
+	struct mlx5_wqe_data_seg dseg;
+};
+
+#define MLX5_MPRQ_LEN_MASK 0x000ffff
+#define MLX5_MPRQ_LEN_SHIFT 0
+#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
+#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
+#define MLX5_MPRQ_FILLER_MASK 0x80000000
+#define MLX5_MPRQ_FILLER_SHIFT 31
+
+#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
+
+/* CQ element structure - should be equal to the cache line size */
+struct mlx5_cqe {
+#if (RTE_CACHE_LINE_SIZE == 128)
+	uint8_t padding[64];
+#endif
+	uint8_t pkt_info;
+	uint8_t rsvd0;
+	uint16_t wqe_id;
+	uint8_t lro_tcppsh_abort_dupack;
+	uint8_t lro_min_ttl;
+	uint16_t lro_tcp_win;
+	uint32_t lro_ack_seq_num;
+	uint32_t rx_hash_res;
+	uint8_t rx_hash_type;
+	uint8_t rsvd1[3];
+	uint16_t csum;
+	uint8_t rsvd2[6];
+	uint16_t hdr_type_etc;
+	uint16_t vlan_info;
+	uint8_t lro_num_seg;
+	uint8_t rsvd3[3];
+	uint32_t flow_table_metadata;
+	uint8_t rsvd4[4];
+	uint32_t byte_cnt;
+	uint64_t timestamp;
+	uint32_t sop_drop_qpn;
+	uint16_t wqe_counter;
+	uint8_t rsvd5;
+	uint8_t op_own;
+};
+
+/* Adding direct verbs to data-path. */
+
+/* CQ sequence number mask. */
+#define MLX5_CQ_SQN_MASK 0x3
+
+/* CQ sequence number index. */
+#define MLX5_CQ_SQN_OFFSET 28
+
+/* CQ doorbell index mask. */
+#define MLX5_CI_MASK 0xffffff
+
+/* CQ doorbell offset. */
+#define MLX5_CQ_ARM_DB 1
+
+/* CQ doorbell offset*/
+#define MLX5_CQ_DOORBELL 0x20
+
+/* CQE format value. */
+#define MLX5_COMPRESSED 0x3
+
+/* Action type of header modification. */
+enum {
+	MLX5_MODIFICATION_TYPE_SET = 0x1,
+	MLX5_MODIFICATION_TYPE_ADD = 0x2,
+	MLX5_MODIFICATION_TYPE_COPY = 0x3,
+};
+
+/* The field of packet to be modified. */
+enum mlx5_modification_field {
+	MLX5_MODI_OUT_NONE = -1,
+	MLX5_MODI_OUT_SMAC_47_16 = 1,
+	MLX5_MODI_OUT_SMAC_15_0,
+	MLX5_MODI_OUT_ETHERTYPE,
+	MLX5_MODI_OUT_DMAC_47_16,
+	MLX5_MODI_OUT_DMAC_15_0,
+	MLX5_MODI_OUT_IP_DSCP,
+	MLX5_MODI_OUT_TCP_FLAGS,
+	MLX5_MODI_OUT_TCP_SPORT,
+	MLX5_MODI_OUT_TCP_DPORT,
+	MLX5_MODI_OUT_IPV4_TTL,
+	MLX5_MODI_OUT_UDP_SPORT,
+	MLX5_MODI_OUT_UDP_DPORT,
+	MLX5_MODI_OUT_SIPV6_127_96,
+	MLX5_MODI_OUT_SIPV6_95_64,
+	MLX5_MODI_OUT_SIPV6_63_32,
+	MLX5_MODI_OUT_SIPV6_31_0,
+	MLX5_MODI_OUT_DIPV6_127_96,
+	MLX5_MODI_OUT_DIPV6_95_64,
+	MLX5_MODI_OUT_DIPV6_63_32,
+	MLX5_MODI_OUT_DIPV6_31_0,
+	MLX5_MODI_OUT_SIPV4,
+	MLX5_MODI_OUT_DIPV4,
+	MLX5_MODI_OUT_FIRST_VID,
+	MLX5_MODI_IN_SMAC_47_16 = 0x31,
+	MLX5_MODI_IN_SMAC_15_0,
+	MLX5_MODI_IN_ETHERTYPE,
+	MLX5_MODI_IN_DMAC_47_16,
+	MLX5_MODI_IN_DMAC_15_0,
+	MLX5_MODI_IN_IP_DSCP,
+	MLX5_MODI_IN_TCP_FLAGS,
+	MLX5_MODI_IN_TCP_SPORT,
+	MLX5_MODI_IN_TCP_DPORT,
+	MLX5_MODI_IN_IPV4_TTL,
+	MLX5_MODI_IN_UDP_SPORT,
+	MLX5_MODI_IN_UDP_DPORT,
+	MLX5_MODI_IN_SIPV6_127_96,
+	MLX5_MODI_IN_SIPV6_95_64,
+	MLX5_MODI_IN_SIPV6_63_32,
+	MLX5_MODI_IN_SIPV6_31_0,
+	MLX5_MODI_IN_DIPV6_127_96,
+	MLX5_MODI_IN_DIPV6_95_64,
+	MLX5_MODI_IN_DIPV6_63_32,
+	MLX5_MODI_IN_DIPV6_31_0,
+	MLX5_MODI_IN_SIPV4,
+	MLX5_MODI_IN_DIPV4,
+	MLX5_MODI_OUT_IPV6_HOPLIMIT,
+	MLX5_MODI_IN_IPV6_HOPLIMIT,
+	MLX5_MODI_META_DATA_REG_A,
+	MLX5_MODI_META_DATA_REG_B = 0x50,
+	MLX5_MODI_META_REG_C_0,
+	MLX5_MODI_META_REG_C_1,
+	MLX5_MODI_META_REG_C_2,
+	MLX5_MODI_META_REG_C_3,
+	MLX5_MODI_META_REG_C_4,
+	MLX5_MODI_META_REG_C_5,
+	MLX5_MODI_META_REG_C_6,
+	MLX5_MODI_META_REG_C_7,
+	MLX5_MODI_OUT_TCP_SEQ_NUM,
+	MLX5_MODI_IN_TCP_SEQ_NUM,
+	MLX5_MODI_OUT_TCP_ACK_NUM,
+	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
+};
+
+/* Total number of metadata reg_c's. */
+#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
+
+enum modify_reg {
+	REG_NONE = 0,
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Modification sub command. */
+struct mlx5_modification_cmd {
+	union {
+		uint32_t data0;
+		struct {
+			unsigned int length:5;
+			unsigned int rsvd0:3;
+			unsigned int offset:5;
+			unsigned int rsvd1:3;
+			unsigned int field:12;
+			unsigned int action_type:4;
+		};
+	};
+	union {
+		uint32_t data1;
+		uint8_t data[4];
+		struct {
+			unsigned int rsvd2:8;
+			unsigned int dst_offset:5;
+			unsigned int rsvd3:3;
+			unsigned int dst_field:12;
+			unsigned int rsvd4:4;
+		};
+	};
+};
+
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+
+#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
+#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
+#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
+				  (&(__mlx5_nullp(typ)->fld)))
+#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0x1f))
+#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
+#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
+#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
+				  __mlx5_dw_bit_off(typ, fld))
+#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
+#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0xf))
+#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
+#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
+#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
+#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
+
+/* insert a value to a struct */
+#define MLX5_SET(typ, p, fld, v) \
+	do { \
+		u32 _v = v; \
+		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
+		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
+				  __mlx5_dw_off(typ, fld))) & \
+				  (~__mlx5_dw_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask(typ, fld)) << \
+				   __mlx5_dw_bit_off(typ, fld))); \
+	} while (0)
+
+#define MLX5_SET64(typ, p, fld, v) \
+	do { \
+		assert(__mlx5_bit_sz(typ, fld) == 64); \
+		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
+			rte_cpu_to_be_64(v); \
+	} while (0)
+
+#define MLX5_GET(typ, p, fld) \
+	((rte_be_to_cpu_32(*((__be32 *)(p) +\
+	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
+	__mlx5_mask(typ, fld))
+#define MLX5_GET16(typ, p, fld) \
+	((rte_be_to_cpu_16(*((__be16 *)(p) + \
+	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
+	 __mlx5_mask16(typ, fld))
+#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
+						   __mlx5_64_off(typ, fld)))
+#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
+
+struct mlx5_ifc_fte_match_set_misc_bits {
+	u8 gre_c_present[0x1];
+	u8 reserved_at_1[0x1];
+	u8 gre_k_present[0x1];
+	u8 gre_s_present[0x1];
+	u8 source_vhci_port[0x4];
+	u8 source_sqn[0x18];
+	u8 reserved_at_20[0x10];
+	u8 source_port[0x10];
+	u8 outer_second_prio[0x3];
+	u8 outer_second_cfi[0x1];
+	u8 outer_second_vid[0xc];
+	u8 inner_second_prio[0x3];
+	u8 inner_second_cfi[0x1];
+	u8 inner_second_vid[0xc];
+	u8 outer_second_cvlan_tag[0x1];
+	u8 inner_second_cvlan_tag[0x1];
+	u8 outer_second_svlan_tag[0x1];
+	u8 inner_second_svlan_tag[0x1];
+	u8 reserved_at_64[0xc];
+	u8 gre_protocol[0x10];
+	u8 gre_key_h[0x18];
+	u8 gre_key_l[0x8];
+	u8 vxlan_vni[0x18];
+	u8 reserved_at_b8[0x8];
+	u8 geneve_vni[0x18];
+	u8 reserved_at_e4[0x7];
+	u8 geneve_oam[0x1];
+	u8 reserved_at_e0[0xc];
+	u8 outer_ipv6_flow_label[0x14];
+	u8 reserved_at_100[0xc];
+	u8 inner_ipv6_flow_label[0x14];
+	u8 reserved_at_120[0xa];
+	u8 geneve_opt_len[0x6];
+	u8 geneve_protocol_type[0x10];
+	u8 reserved_at_140[0xc0];
+};
+
+struct mlx5_ifc_ipv4_layout_bits {
+	u8 reserved_at_0[0x60];
+	u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+	u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+	u8 reserved_at_0[0x80];
+};
+
+struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
+	u8 smac_47_16[0x20];
+	u8 smac_15_0[0x10];
+	u8 ethertype[0x10];
+	u8 dmac_47_16[0x20];
+	u8 dmac_15_0[0x10];
+	u8 first_prio[0x3];
+	u8 first_cfi[0x1];
+	u8 first_vid[0xc];
+	u8 ip_protocol[0x8];
+	u8 ip_dscp[0x6];
+	u8 ip_ecn[0x2];
+	u8 cvlan_tag[0x1];
+	u8 svlan_tag[0x1];
+	u8 frag[0x1];
+	u8 ip_version[0x4];
+	u8 tcp_flags[0x9];
+	u8 tcp_sport[0x10];
+	u8 tcp_dport[0x10];
+	u8 reserved_at_c0[0x20];
+	u8 udp_sport[0x10];
+	u8 udp_dport[0x10];
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
+};
+
+struct mlx5_ifc_fte_match_mpls_bits {
+	u8 mpls_label[0x14];
+	u8 mpls_exp[0x3];
+	u8 mpls_s_bos[0x1];
+	u8 mpls_ttl[0x8];
+};
+
+struct mlx5_ifc_fte_match_set_misc2_bits {
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
+	u8 metadata_reg_c_7[0x20];
+	u8 metadata_reg_c_6[0x20];
+	u8 metadata_reg_c_5[0x20];
+	u8 metadata_reg_c_4[0x20];
+	u8 metadata_reg_c_3[0x20];
+	u8 metadata_reg_c_2[0x20];
+	u8 metadata_reg_c_1[0x20];
+	u8 metadata_reg_c_0[0x20];
+	u8 metadata_reg_a[0x20];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
+};
+
+struct mlx5_ifc_fte_match_set_misc3_bits {
+	u8 inner_tcp_seq_num[0x20];
+	u8 outer_tcp_seq_num[0x20];
+	u8 inner_tcp_ack_num[0x20];
+	u8 outer_tcp_ack_num[0x20];
+	u8 reserved_at_auto1[0x8];
+	u8 outer_vxlan_gpe_vni[0x18];
+	u8 outer_vxlan_gpe_next_protocol[0x8];
+	u8 outer_vxlan_gpe_flags[0x8];
+	u8 reserved_at_a8[0x10];
+	u8 icmp_header_data[0x20];
+	u8 icmpv6_header_data[0x20];
+	u8 icmp_type[0x8];
+	u8 icmp_code[0x8];
+	u8 icmpv6_type[0x8];
+	u8 icmpv6_code[0x8];
+	u8 reserved_at_120[0x20];
+	u8 gtpu_teid[0x20];
+	u8 gtpu_msg_type[0x08];
+	u8 gtpu_msg_flags[0x08];
+	u8 reserved_at_170[0x90];
+};
+
+/* Flow matcher. */
+struct mlx5_ifc_fte_match_param_bits {
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
+	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
+	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
+	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
+};
+
+enum {
+	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
+};
+
+enum {
+	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
+	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
+	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
+	MLX5_CMD_OP_CREATE_RQ = 0x908,
+	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
+	MLX5_CMD_OP_QUERY_TIS = 0x915,
+	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
+	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+};
+
+enum {
+	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+};
+
+/* Flow counters. */
+struct mlx5_ifc_alloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_40[0x18];
+	u8         flow_counter_bulk[0x8];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_traffic_counter_bits {
+	u8         packets[0x40];
+	u8         octets[0x40];
+};
+
+struct mlx5_ifc_query_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
+};
+
+struct mlx5_ifc_query_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         reserved_at_40[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+	u8         clear[0x1];
+	u8         dump_to_memory[0x1];
+	u8         num_of_counters[0x1e];
+	u8         flow_counter_id[0x20];
+};
+
+struct mlx5_ifc_mkc_bits {
+	u8         reserved_at_0[0x1];
+	u8         free[0x1];
+	u8         reserved_at_2[0x1];
+	u8         access_mode_4_2[0x3];
+	u8         reserved_at_6[0x7];
+	u8         relaxed_ordering_write[0x1];
+	u8         reserved_at_e[0x1];
+	u8         small_fence_on_rdma_read_response[0x1];
+	u8         umr_en[0x1];
+	u8         a[0x1];
+	u8         rw[0x1];
+	u8         rr[0x1];
+	u8         lw[0x1];
+	u8         lr[0x1];
+	u8         access_mode_1_0[0x2];
+	u8         reserved_at_18[0x8];
+
+	u8         qpn[0x18];
+	u8         mkey_7_0[0x8];
+
+	u8         reserved_at_40[0x20];
+
+	u8         length64[0x1];
+	u8         bsf_en[0x1];
+	u8         sync_umr[0x1];
+	u8         reserved_at_63[0x2];
+	u8         expected_sigerr_count[0x1];
+	u8         reserved_at_66[0x1];
+	u8         en_rinval[0x1];
+	u8         pd[0x18];
+
+	u8         start_addr[0x40];
+
+	u8         len[0x40];
+
+	u8         bsf_octword_size[0x20];
+
+	u8         reserved_at_120[0x80];
+
+	u8         translations_octword_size[0x20];
+
+	u8         reserved_at_1c0[0x1b];
+	u8         log_page_size[0x5];
+
+	u8         reserved_at_1e0[0x20];
+};
+
+struct mlx5_ifc_create_mkey_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x8];
+	u8         mkey_index[0x18];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_mkey_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         pg_access[0x1];
+	u8         reserved_at_61[0x1f];
+
+	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
+
+	u8         reserved_at_280[0x80];
+
+	u8         translations_octword_actual_size[0x20];
+
+	u8         mkey_umem_id[0x20];
+
+	u8         mkey_umem_offset[0x40];
+
+	u8         reserved_at_380[0x500];
+
+	u8         klm_pas_mtt[][0x20];
+};
+
+enum {
+	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+};
+
+enum {
+	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
+	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
+};
+
+enum {
+	MLX5_CAP_INLINE_MODE_L2,
+	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
+	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
+};
+
+enum {
+	MLX5_INLINE_MODE_NONE,
+	MLX5_INLINE_MODE_L2,
+	MLX5_INLINE_MODE_IP,
+	MLX5_INLINE_MODE_TCP_UDP,
+	MLX5_INLINE_MODE_RESERVED4,
+	MLX5_INLINE_MODE_INNER_L2,
+	MLX5_INLINE_MODE_INNER_IP,
+	MLX5_INLINE_MODE_INNER_TCP_UDP,
+};
+
+/* HCA bit masks indicating which Flex parser protocols are already enabled. */
+#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
+#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
+#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
+#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
+#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
+#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
+#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
+#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
+
+struct mlx5_ifc_cmd_hca_cap_bits {
+	u8 reserved_at_0[0x30];
+	u8 vhca_id[0x10];
+	u8 reserved_at_40[0x40];
+	u8 log_max_srq_sz[0x8];
+	u8 log_max_qp_sz[0x8];
+	u8 reserved_at_90[0xb];
+	u8 log_max_qp[0x5];
+	u8 reserved_at_a0[0xb];
+	u8 log_max_srq[0x5];
+	u8 reserved_at_b0[0x10];
+	u8 reserved_at_c0[0x8];
+	u8 log_max_cq_sz[0x8];
+	u8 reserved_at_d0[0xb];
+	u8 log_max_cq[0x5];
+	u8 log_max_eq_sz[0x8];
+	u8 reserved_at_e8[0x2];
+	u8 log_max_mkey[0x6];
+	u8 reserved_at_f0[0x8];
+	u8 dump_fill_mkey[0x1];
+	u8 reserved_at_f9[0x3];
+	u8 log_max_eq[0x4];
+	u8 max_indirection[0x8];
+	u8 fixed_buffer_size[0x1];
+	u8 log_max_mrw_sz[0x7];
+	u8 force_teardown[0x1];
+	u8 reserved_at_111[0x1];
+	u8 log_max_bsf_list_size[0x6];
+	u8 umr_extended_translation_offset[0x1];
+	u8 null_mkey[0x1];
+	u8 log_max_klm_list_size[0x6];
+	u8 reserved_at_120[0xa];
+	u8 log_max_ra_req_dc[0x6];
+	u8 reserved_at_130[0xa];
+	u8 log_max_ra_res_dc[0x6];
+	u8 reserved_at_140[0xa];
+	u8 log_max_ra_req_qp[0x6];
+	u8 reserved_at_150[0xa];
+	u8 log_max_ra_res_qp[0x6];
+	u8 end_pad[0x1];
+	u8 cc_query_allowed[0x1];
+	u8 cc_modify_allowed[0x1];
+	u8 start_pad[0x1];
+	u8 cache_line_128byte[0x1];
+	u8 reserved_at_165[0xa];
+	u8 qcam_reg[0x1];
+	u8 gid_table_size[0x10];
+	u8 out_of_seq_cnt[0x1];
+	u8 vport_counters[0x1];
+	u8 retransmission_q_counters[0x1];
+	u8 debug[0x1];
+	u8 modify_rq_counter_set_id[0x1];
+	u8 rq_delay_drop[0x1];
+	u8 max_qp_cnt[0xa];
+	u8 pkey_table_size[0x10];
+	u8 vport_group_manager[0x1];
+	u8 vhca_group_manager[0x1];
+	u8 ib_virt[0x1];
+	u8 eth_virt[0x1];
+	u8 vnic_env_queue_counters[0x1];
+	u8 ets[0x1];
+	u8 nic_flow_table[0x1];
+	u8 eswitch_manager[0x1];
+	u8 device_memory[0x1];
+	u8 mcam_reg[0x1];
+	u8 pcam_reg[0x1];
+	u8 local_ca_ack_delay[0x5];
+	u8 port_module_event[0x1];
+	u8 enhanced_error_q_counters[0x1];
+	u8 ports_check[0x1];
+	u8 reserved_at_1b3[0x1];
+	u8 disable_link_up[0x1];
+	u8 beacon_led[0x1];
+	u8 port_type[0x2];
+	u8 num_ports[0x8];
+	u8 reserved_at_1c0[0x1];
+	u8 pps[0x1];
+	u8 pps_modify[0x1];
+	u8 log_max_msg[0x5];
+	u8 reserved_at_1c8[0x4];
+	u8 max_tc[0x4];
+	u8 temp_warn_event[0x1];
+	u8 dcbx[0x1];
+	u8 general_notification_event[0x1];
+	u8 reserved_at_1d3[0x2];
+	u8 fpga[0x1];
+	u8 rol_s[0x1];
+	u8 rol_g[0x1];
+	u8 reserved_at_1d8[0x1];
+	u8 wol_s[0x1];
+	u8 wol_g[0x1];
+	u8 wol_a[0x1];
+	u8 wol_b[0x1];
+	u8 wol_m[0x1];
+	u8 wol_u[0x1];
+	u8 wol_p[0x1];
+	u8 stat_rate_support[0x10];
+	u8 reserved_at_1f0[0xc];
+	u8 cqe_version[0x4];
+	u8 compact_address_vector[0x1];
+	u8 striding_rq[0x1];
+	u8 reserved_at_202[0x1];
+	u8 ipoib_enhanced_offloads[0x1];
+	u8 ipoib_basic_offloads[0x1];
+	u8 reserved_at_205[0x1];
+	u8 repeated_block_disabled[0x1];
+	u8 umr_modify_entity_size_disabled[0x1];
+	u8 umr_modify_atomic_disabled[0x1];
+	u8 umr_indirect_mkey_disabled[0x1];
+	u8 umr_fence[0x2];
+	u8 reserved_at_20c[0x3];
+	u8 drain_sigerr[0x1];
+	u8 cmdif_checksum[0x2];
+	u8 sigerr_cqe[0x1];
+	u8 reserved_at_213[0x1];
+	u8 wq_signature[0x1];
+	u8 sctr_data_cqe[0x1];
+	u8 reserved_at_216[0x1];
+	u8 sho[0x1];
+	u8 tph[0x1];
+	u8 rf[0x1];
+	u8 dct[0x1];
+	u8 qos[0x1];
+	u8 eth_net_offloads[0x1];
+	u8 roce[0x1];
+	u8 atomic[0x1];
+	u8 reserved_at_21f[0x1];
+	u8 cq_oi[0x1];
+	u8 cq_resize[0x1];
+	u8 cq_moderation[0x1];
+	u8 reserved_at_223[0x3];
+	u8 cq_eq_remap[0x1];
+	u8 pg[0x1];
+	u8 block_lb_mc[0x1];
+	u8 reserved_at_229[0x1];
+	u8 scqe_break_moderation[0x1];
+	u8 cq_period_start_from_cqe[0x1];
+	u8 cd[0x1];
+	u8 reserved_at_22d[0x1];
+	u8 apm[0x1];
+	u8 vector_calc[0x1];
+	u8 umr_ptr_rlky[0x1];
+	u8 imaicl[0x1];
+	u8 reserved_at_232[0x4];
+	u8 qkv[0x1];
+	u8 pkv[0x1];
+	u8 set_deth_sqpn[0x1];
+	u8 reserved_at_239[0x3];
+	u8 xrc[0x1];
+	u8 ud[0x1];
+	u8 uc[0x1];
+	u8 rc[0x1];
+	u8 uar_4k[0x1];
+	u8 reserved_at_241[0x9];
+	u8 uar_sz[0x6];
+	u8 reserved_at_250[0x8];
+	u8 log_pg_sz[0x8];
+	u8 bf[0x1];
+	u8 driver_version[0x1];
+	u8 pad_tx_eth_packet[0x1];
+	u8 reserved_at_263[0x8];
+	u8 log_bf_reg_size[0x5];
+	u8 reserved_at_270[0xb];
+	u8 lag_master[0x1];
+	u8 num_lag_ports[0x4];
+	u8 reserved_at_280[0x10];
+	u8 max_wqe_sz_sq[0x10];
+	u8 reserved_at_2a0[0x10];
+	u8 max_wqe_sz_rq[0x10];
+	u8 max_flow_counter_31_16[0x10];
+	u8 max_wqe_sz_sq_dc[0x10];
+	u8 reserved_at_2e0[0x7];
+	u8 max_qp_mcg[0x19];
+	u8 reserved_at_300[0x10];
+	u8 flow_counter_bulk_alloc[0x08];
+	u8 log_max_mcg[0x8];
+	u8 reserved_at_320[0x3];
+	u8 log_max_transport_domain[0x5];
+	u8 reserved_at_328[0x3];
+	u8 log_max_pd[0x5];
+	u8 reserved_at_330[0xb];
+	u8 log_max_xrcd[0x5];
+	u8 nic_receive_steering_discard[0x1];
+	u8 receive_discard_vport_down[0x1];
+	u8 transmit_discard_vport_down[0x1];
+	u8 reserved_at_343[0x5];
+	u8 log_max_flow_counter_bulk[0x8];
+	u8 max_flow_counter_15_0[0x10];
+	u8 modify_tis[0x1];
+	u8 flow_counters_dump[0x1];
+	u8 reserved_at_360[0x1];
+	u8 log_max_rq[0x5];
+	u8 reserved_at_368[0x3];
+	u8 log_max_sq[0x5];
+	u8 reserved_at_370[0x3];
+	u8 log_max_tir[0x5];
+	u8 reserved_at_378[0x3];
+	u8 log_max_tis[0x5];
+	u8 basic_cyclic_rcv_wqe[0x1];
+	u8 reserved_at_381[0x2];
+	u8 log_max_rmp[0x5];
+	u8 reserved_at_388[0x3];
+	u8 log_max_rqt[0x5];
+	u8 reserved_at_390[0x3];
+	u8 log_max_rqt_size[0x5];
+	u8 reserved_at_398[0x3];
+	u8 log_max_tis_per_sq[0x5];
+	u8 ext_stride_num_range[0x1];
+	u8 reserved_at_3a1[0x2];
+	u8 log_max_stride_sz_rq[0x5];
+	u8 reserved_at_3a8[0x3];
+	u8 log_min_stride_sz_rq[0x5];
+	u8 reserved_at_3b0[0x3];
+	u8 log_max_stride_sz_sq[0x5];
+	u8 reserved_at_3b8[0x3];
+	u8 log_min_stride_sz_sq[0x5];
+	u8 hairpin[0x1];
+	u8 reserved_at_3c1[0x2];
+	u8 log_max_hairpin_queues[0x5];
+	u8 reserved_at_3c8[0x3];
+	u8 log_max_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3d0[0x3];
+	u8 log_max_hairpin_num_packets[0x5];
+	u8 reserved_at_3d8[0x3];
+	u8 log_max_wq_sz[0x5];
+	u8 nic_vport_change_event[0x1];
+	u8 disable_local_lb_uc[0x1];
+	u8 disable_local_lb_mc[0x1];
+	u8 log_min_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3e8[0x3];
+	u8 log_max_vlan_list[0x5];
+	u8 reserved_at_3f0[0x3];
+	u8 log_max_current_mc_list[0x5];
+	u8 reserved_at_3f8[0x3];
+	u8 log_max_current_uc_list[0x5];
+	u8 general_obj_types[0x40];
+	u8 reserved_at_440[0x20];
+	u8 reserved_at_460[0x10];
+	u8 max_num_eqs[0x10];
+	u8 reserved_at_480[0x3];
+	u8 log_max_l2_table[0x5];
+	u8 reserved_at_488[0x8];
+	u8 log_uar_page_sz[0x10];
+	u8 reserved_at_4a0[0x20];
+	u8 device_frequency_mhz[0x20];
+	u8 device_frequency_khz[0x20];
+	u8 reserved_at_500[0x20];
+	u8 num_of_uars_per_page[0x20];
+	u8 flex_parser_protocols[0x20];
+	u8 reserved_at_560[0x20];
+	u8 reserved_at_580[0x3c];
+	u8 mini_cqe_resp_stride_index[0x1];
+	u8 cqe_128_always[0x1];
+	u8 cqe_compression_128[0x1];
+	u8 cqe_compression[0x1];
+	u8 cqe_compression_timeout[0x10];
+	u8 cqe_compression_max_num[0x10];
+	u8 reserved_at_5e0[0x10];
+	u8 tag_matching[0x1];
+	u8 rndv_offload_rc[0x1];
+	u8 rndv_offload_dc[0x1];
+	u8 log_tag_matching_list_sz[0x5];
+	u8 reserved_at_5f8[0x3];
+	u8 log_max_xrq[0x5];
+	u8 affiliate_nic_vport_criteria[0x8];
+	u8 native_port_num[0x8];
+	u8 num_vhca_ports[0x8];
+	u8 reserved_at_618[0x6];
+	u8 sw_owner_id[0x1];
+	u8 reserved_at_61f[0x1e1];
+};
+
+struct mlx5_ifc_qos_cap_bits {
+	u8 packet_pacing[0x1];
+	u8 esw_scheduling[0x1];
+	u8 esw_bw_share[0x1];
+	u8 esw_rate_limit[0x1];
+	u8 reserved_at_4[0x1];
+	u8 packet_pacing_burst_bound[0x1];
+	u8 packet_pacing_typical_size[0x1];
+	u8 flow_meter_srtcm[0x1];
+	u8 reserved_at_8[0x8];
+	u8 log_max_flow_meter[0x8];
+	u8 flow_meter_reg_id[0x8];
+	u8 reserved_at_25[0x20];
+	u8 packet_pacing_max_rate[0x20];
+	u8 packet_pacing_min_rate[0x20];
+	u8 reserved_at_80[0x10];
+	u8 packet_pacing_rate_table_size[0x10];
+	u8 esw_element_type[0x10];
+	u8 esw_tsar_type[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 max_qos_para_vport[0x10];
+	u8 max_tsar_bw_share[0x20];
+	u8 reserved_at_100[0x6e8];
+};
+
+struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
+	u8 csum_cap[0x1];
+	u8 vlan_cap[0x1];
+	u8 lro_cap[0x1];
+	u8 lro_psh_flag[0x1];
+	u8 lro_time_stamp[0x1];
+	u8 lro_max_msg_sz_mode[0x2];
+	u8 wqe_vlan_insert[0x1];
+	u8 self_lb_en_modifiable[0x1];
+	u8 self_lb_mc[0x1];
+	u8 self_lb_uc[0x1];
+	u8 max_lso_cap[0x5];
+	u8 multi_pkt_send_wqe[0x2];
+	u8 wqe_inline_mode[0x2];
+	u8 rss_ind_tbl_cap[0x4];
+	u8 reg_umr_sq[0x1];
+	u8 scatter_fcs[0x1];
+	u8 enhanced_multi_pkt_send_wqe[0x1];
+	u8 tunnel_lso_const_out_ip_id[0x1];
+	u8 tunnel_lro_gre[0x1];
+	u8 tunnel_lro_vxlan[0x1];
+	u8 tunnel_stateless_gre[0x1];
+	u8 tunnel_stateless_vxlan[0x1];
+	u8 swp[0x1];
+	u8 swp_csum[0x1];
+	u8 swp_lso[0x1];
+	u8 reserved_at_23[0x8];
+	u8 tunnel_stateless_gtp[0x1];
+	u8 reserved_at_25[0x4];
+	u8 max_vxlan_udp_ports[0x8];
+	u8 reserved_at_38[0x6];
+	u8 max_geneve_opt_len[0x1];
+	u8 tunnel_stateless_geneve_rx[0x1];
+	u8 reserved_at_40[0x10];
+	u8 lro_min_mss_size[0x10];
+	u8 reserved_at_60[0x120];
+	u8 lro_timer_supported_periods[4][0x20];
+	u8 reserved_at_200[0x600];
+};
+
+union mlx5_ifc_hca_cap_union_bits {
+	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
+	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
+	       per_protocol_networking_offload_caps;
+	struct mlx5_ifc_qos_cap_bits qos_cap;
+	u8 reserved_at_0[0x8000];
+};
+
+struct mlx5_ifc_query_hca_cap_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	union mlx5_ifc_hca_cap_union_bits capability;
+};
+
+struct mlx5_ifc_query_hca_cap_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mac_address_layout_bits {
+	u8 reserved_at_0[0x10];
+	u8 mac_addr_47_32[0x10];
+	u8 mac_addr_31_0[0x20];
+};
+
+struct mlx5_ifc_nic_vport_context_bits {
+	u8 reserved_at_0[0x5];
+	u8 min_wqe_inline_mode[0x3];
+	u8 reserved_at_8[0x15];
+	u8 disable_mc_local_lb[0x1];
+	u8 disable_uc_local_lb[0x1];
+	u8 roce_en[0x1];
+	u8 arm_change_event[0x1];
+	u8 reserved_at_21[0x1a];
+	u8 event_on_mtu[0x1];
+	u8 event_on_promisc_change[0x1];
+	u8 event_on_vlan_change[0x1];
+	u8 event_on_mc_address_change[0x1];
+	u8 event_on_uc_address_change[0x1];
+	u8 reserved_at_40[0xc];
+	u8 affiliation_criteria[0x4];
+	u8 affiliated_vhca_id[0x10];
+	u8 reserved_at_60[0xd0];
+	u8 mtu[0x10];
+	u8 system_image_guid[0x40];
+	u8 port_guid[0x40];
+	u8 node_guid[0x40];
+	u8 reserved_at_200[0x140];
+	u8 qkey_violation_counter[0x10];
+	u8 reserved_at_350[0x430];
+	u8 promisc_uc[0x1];
+	u8 promisc_mc[0x1];
+	u8 promisc_all[0x1];
+	u8 reserved_at_783[0x2];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_788[0xc];
+	u8 allowed_list_size[0xc];
+	struct mlx5_ifc_mac_address_layout_bits permanent_address;
+	u8 reserved_at_7e0[0x20];
+};
+
+struct mlx5_ifc_query_nic_vport_context_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
+};
+
+struct mlx5_ifc_query_nic_vport_context_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 other_vport[0x1];
+	u8 reserved_at_41[0xf];
+	u8 vport_number[0x10];
+	u8 reserved_at_60[0x5];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_68[0x18];
+};
+
+struct mlx5_ifc_tisc_bits {
+	u8 strict_lag_tx_port_affinity[0x1];
+	u8 reserved_at_1[0x3];
+	u8 lag_tx_port_affinity[0x04];
+	u8 reserved_at_8[0x4];
+	u8 prio[0x4];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x100];
+	u8 reserved_at_120[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_140[0x8];
+	u8 underlay_qpn[0x18];
+	u8 reserved_at_160[0x3a0];
+};
+
+struct mlx5_ifc_query_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_tisc_bits tis_context;
+};
+
+struct mlx5_ifc_query_tis_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+enum {
+	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
+	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
+	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
+	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
+};
+
+enum {
+	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
+	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
+};
+
+struct mlx5_ifc_wq_bits {
+	u8 wq_type[0x4];
+	u8 wq_signature[0x1];
+	u8 end_padding_mode[0x2];
+	u8 cd_slave[0x1];
+	u8 reserved_at_8[0x18];
+	u8 hds_skip_first_sge[0x1];
+	u8 log2_hds_buf_size[0x3];
+	u8 reserved_at_24[0x7];
+	u8 page_offset[0x5];
+	u8 lwm[0x10];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x8];
+	u8 uar_page[0x18];
+	u8 dbr_addr[0x40];
+	u8 hw_counter[0x20];
+	u8 sw_counter[0x20];
+	u8 reserved_at_100[0xc];
+	u8 log_wq_stride[0x4];
+	u8 reserved_at_110[0x3];
+	u8 log_wq_pg_sz[0x5];
+	u8 reserved_at_118[0x3];
+	u8 log_wq_sz[0x5];
+	u8 dbr_umem_valid[0x1];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_122[0x1];
+	u8 log_hairpin_num_packets[0x5];
+	u8 reserved_at_128[0x3];
+	u8 log_hairpin_data_sz[0x5];
+	u8 reserved_at_130[0x4];
+	u8 single_wqe_log_num_of_strides[0x4];
+	u8 two_byte_shift_en[0x1];
+	u8 reserved_at_139[0x4];
+	u8 single_stride_log_num_of_bytes[0x3];
+	u8 dbr_umem_id[0x20];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_offset[0x40];
+	u8 reserved_at_1c0[0x440];
+};
+
+enum {
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
+};
+
+enum {
+	MLX5_RQC_STATE_RST  = 0x0,
+	MLX5_RQC_STATE_RDY  = 0x1,
+	MLX5_RQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_rqc_bits {
+	u8 rlky[0x1];
+	u8 delay_drop_en[0x1];
+	u8 scatter_fcs[0x1];
+	u8 vsd[0x1];
+	u8 mem_rq_type[0x4];
+	u8 state[0x4];
+	u8 reserved_at_c[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 counter_set_id[0x8];
+	u8 reserved_at_68[0x18];
+	u8 reserved_at_80[0x8];
+	u8 rmpn[0x18];
+	u8 reserved_at_a0[0x8];
+	u8 hairpin_peer_sq[0x18];
+	u8 reserved_at_c0[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_e0[0xa0];
+	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
+};
+
+struct mlx5_ifc_create_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+struct mlx5_ifc_modify_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
+enum {
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
+};
+
+struct mlx5_ifc_modify_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 rq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+enum {
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
+};
+
+struct mlx5_ifc_rx_hash_field_select_bits {
+	u8 l3_prot_type[0x1];
+	u8 l4_prot_type[0x1];
+	u8 selected_fields[0x1e];
+};
+
+enum {
+	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
+	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
+};
+
+enum {
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
+};
+
+enum {
+	MLX5_RX_HASH_FN_NONE           = 0x0,
+	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
+	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
+};
+
+enum {
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
+};
+
+enum {
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
+};
+
+struct mlx5_ifc_tirc_bits {
+	u8 reserved_at_0[0x20];
+	u8 disp_type[0x4];
+	u8 reserved_at_24[0x1c];
+	u8 reserved_at_40[0x40];
+	u8 reserved_at_80[0x4];
+	u8 lro_timeout_period_usecs[0x10];
+	u8 lro_enable_mask[0x4];
+	u8 lro_max_msg_sz[0x8];
+	u8 reserved_at_a0[0x40];
+	u8 reserved_at_e0[0x8];
+	u8 inline_rqn[0x18];
+	u8 rx_hash_symmetric[0x1];
+	u8 reserved_at_101[0x1];
+	u8 tunneled_offload_en[0x1];
+	u8 reserved_at_103[0x5];
+	u8 indirect_table[0x18];
+	u8 rx_hash_fn[0x4];
+	u8 reserved_at_124[0x2];
+	u8 self_lb_block[0x2];
+	u8 transport_domain[0x18];
+	u8 rx_hash_toeplitz_key[10][0x20];
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
+	u8 reserved_at_2c0[0x4c0];
+};
+
+struct mlx5_ifc_create_tir_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tirn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tir_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tirc_bits ctx;
+};
+
+struct mlx5_ifc_rq_num_bits {
+	u8 reserved_at_0[0x8];
+	u8 rq_num[0x18];
+};
+
+struct mlx5_ifc_rqtc_bits {
+	u8 reserved_at_0[0xa0];
+	u8 reserved_at_a0[0x10];
+	u8 rqt_max_size[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 rqt_actual_size[0x10];
+	u8 reserved_at_e0[0x6a0];
+	struct mlx5_ifc_rq_num_bits rq_num[];
+};
+
+struct mlx5_ifc_create_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+enum {
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
+};
+
+struct mlx5_ifc_flow_meter_parameters_bits {
+	u8         valid[0x1];			// 00h
+	u8         bucket_overflow[0x1];
+	u8         start_color[0x2];
+	u8         both_buckets_on_green[0x1];
+	u8         meter_mode[0x2];
+	u8         reserved_at_1[0x19];
+	u8         reserved_at_2[0x20]; //04h
+	u8         reserved_at_3[0x3];
+	u8         cbs_exponent[0x5];		// 08h
+	u8         cbs_mantissa[0x8];
+	u8         reserved_at_4[0x3];
+	u8         cir_exponent[0x5];
+	u8         cir_mantissa[0x8];
+	u8         reserved_at_5[0x20];		// 0Ch
+	u8         reserved_at_6[0x3];
+	u8         ebs_exponent[0x5];		// 10h
+	u8         ebs_mantissa[0x8];
+	u8         reserved_at_7[0x3];
+	u8         eir_exponent[0x5];
+	u8         eir_mantissa[0x8];
+	u8         reserved_at_8[0x60];		// 14h-1Ch
+};
+
+/* CQE format mask. */
+#define MLX5E_CQE_FORMAT_MASK 0xc
+
+/* MPW opcode. */
+#define MLX5_OPC_MOD_MPW 0x01
+
+/* Compressed Rx CQE structure. */
+struct mlx5_mini_cqe8 {
+	union {
+		uint32_t rx_hash_result;
+		struct {
+			uint16_t checksum;
+			uint16_t stride_idx;
+		};
+		struct {
+			uint16_t wqe_counter;
+			uint8_t  s_wqe_opcode;
+			uint8_t  reserved;
+		} s_wqe_info;
+	};
+	uint32_t byte_cnt;
+};
+
+/* srTCM PRM flow meter parameters. */
+enum {
+	MLX5_FLOW_COLOR_RED = 0,
+	MLX5_FLOW_COLOR_YELLOW,
+	MLX5_FLOW_COLOR_GREEN,
+	MLX5_FLOW_COLOR_UNDEFINED,
+};
+
+/* Maximum value of srTCM metering parameters. */
+#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
+#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
+#define MLX5_SRTCM_EBS_MAX 0
+
+/**
+ * Convert a user mark to flow mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_set(uint32_t val)
+{
+	uint32_t ret;
+
+	/*
+	 * Add one to the user value to differentiate un-marked flows from
+	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
+	 * remains untouched.
+	 */
+	if (val != MLX5_FLOW_MARK_DEFAULT)
+		++val;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	/*
+	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
+	 * word, byte-swapped by the kernel on little-endian systems. In this
+	 * case, left-shifting the resulting big-endian value ensures the
+	 * least significant 24 bits are retained when converting it back.
+	 */
+	ret = rte_cpu_to_be_32(val) >> 8;
+#else
+	ret = val;
+#endif
+	return ret;
+}
+
+/**
+ * Convert a mark to user mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_get(uint32_t val)
+{
+	/*
+	 * Subtract one from the retrieved value. It was added by
+	 * mlx5_flow_mark_set() to distinguish unmarked flows.
+	 */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	return (val >> 8) - 1;
+#else
+	return val - 1;
+#endif
+}
+
+#endif /* RTE_PMD_MLX5_PRM_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
new file mode 100644
index 0000000..e4f85e2
--- /dev/null
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -0,0 +1,20 @@
+DPDK_20.02 {
+	global:
+
+	mlx5_devx_cmd_create_rq;
+	mlx5_devx_cmd_create_rqt;
+	mlx5_devx_cmd_create_sq;
+	mlx5_devx_cmd_create_tir;
+	mlx5_devx_cmd_create_td;
+	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_destroy;
+	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_query;
+	mlx5_devx_cmd_flow_dump;
+	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_qp_query_tis_td;
+	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_get_out_command_status;
+};
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 0466d9d..88ce197 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -12,9 +12,6 @@ LIB_GLUE_VERSION = 20.02.0
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
-ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
-endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
@@ -37,34 +34,22 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
-endif
-
 # Basic CFLAGS.
 CFLAGS += -O3
 CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
-CFLAGS += -I.
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-CFLAGS_mlx5_glue.o += -fPIC
-LDLIBS += -ldl
-else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
-LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-else
-LDLIBS += -libverbs -lmlx5
-endif
+LDLIBS += -lrte_common_mlx5
 LDLIBS += -lm
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -74,6 +59,7 @@ LDLIBS += -lrte_bus_pci
 CFLAGS += -Wno-error=cast-qual
 
 EXPORT_MAP := rte_pmd_mlx5_version.map
+
 # memseg walk is not part of stable API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
@@ -96,282 +82,3 @@ endif
 
 include $(RTE_SDK)/mk/rte.lib.mk
 
-# Generate and clean-up mlx5_autoconf.h.
-
-export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
-export AUTO_CONFIG_CFLAGS += -Wno-error
-
-ifndef V
-AUTOCONF_OUTPUT := >/dev/null
-endif
-
-mlx5_autoconf.h.new: FORCE
-
-mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
-	$Q $(RM) -f -- '$@'
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_MPLS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_FLOW_SPEC_MPLS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAG_RX_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_SWP \
-		infiniband/mlx5dv.h \
-		type 'struct mlx5dv_sw_parsing_caps' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_MPW \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DV_SUPPORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_create_flow_action_packet_reformat \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ESWITCH \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_VLAN \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_push_vlan \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_DEVX_PORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_query_devx_port \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_OBJ \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_create \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DEVX_COUNTERS \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_ASYNC \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_query_async \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_dest_devx_tir \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_flow_meter \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_FLOW_DUMP \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dump_dr_domain \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
-		infiniband/mlx5dv.h \
-		enum MLX5_MMAP_GET_NC_PAGES_CMD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_25G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_50G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_100G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
-		infiniband/verbs.h \
-		type 'struct ibv_counter_set_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
-		infiniband/verbs.h \
-		type 'struct ibv_counters_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NL_NLDEV \
-		rdma/rdma_netlink.h \
-		enum RDMA_NL_NLDEV \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_PORT_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_PORT_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_PORT_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_NUM_VF \
-		linux/if_link.h \
-		enum IFLA_NUM_VF \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_EXT_MASK \
-		linux/if_link.h \
-		enum IFLA_EXT_MASK \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_SWITCH_ID \
-		linux/if_link.h \
-		enum IFLA_PHYS_SWITCH_ID \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_PORT_NAME \
-		linux/if_link.h \
-		enum IFLA_PHYS_PORT_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_STATIC_ASSERT \
-		/usr/include/assert.h \
-		define static_assert \
-		$(AUTOCONF_OUTPUT)
-
-# Create mlx5_autoconf.h or update it in case it differs from the new one.
-
-mlx5_autoconf.h: mlx5_autoconf.h.new
-	$Q [ -f '$@' ] && \
-		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
-		mv '$<' '$@'
-
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
-
-# Generate dependency plug-in for rdma-core when the PMD must not be linked
-# directly, so that applications do not inherit this dependency.
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-
-$(LIB): $(LIB_GLUE)
-
-ifeq ($(LINK_USING_CC),1)
-GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
-else
-GLUE_LDFLAGS := $(LDFLAGS)
-endif
-$(LIB_GLUE): mlx5_glue.o
-	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
-		-Wl,-h,$(LIB_GLUE) \
-		-shared -o $@ $< -libverbs -lmlx5
-
-mlx5_glue.o: mlx5_autoconf.h
-
-endif
-
-clean_mlx5: FORCE
-	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
-	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
-
-clean: clean_mlx5
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 3ad4f02..f6d0db9 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -7,224 +7,54 @@ if not is_linux
 	reason = 'only supported on Linux'
 	subdir_done()
 endif
-build = true
 
-pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
 LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
 LIB_GLUE_VERSION = '20.02.0'
 LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-if pmd_dlopen
-	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
-	cflags += [
-		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
-		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
-	]
-endif
 
-libnames = [ 'mlx5', 'ibverbs' ]
-libs = []
-foreach libname:libnames
-	lib = dependency('lib' + libname, required:false)
-	if not lib.found()
-		lib = cc.find_library(libname, required:false)
-	endif
-	if lib.found()
-		libs += [ lib ]
-	else
-		build = false
-		reason = 'missing dependency, "' + libname + '"'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5']
+sources = files(
+	'mlx5.c',
+	'mlx5_ethdev.c',
+	'mlx5_flow.c',
+	'mlx5_flow_meter.c',
+	'mlx5_flow_dv.c',
+	'mlx5_flow_verbs.c',
+	'mlx5_mac.c',
+	'mlx5_mr.c',
+	'mlx5_nl.c',
+	'mlx5_rss.c',
+	'mlx5_rxmode.c',
+	'mlx5_rxq.c',
+	'mlx5_rxtx.c',
+	'mlx5_mp.c',
+	'mlx5_stats.c',
+	'mlx5_trigger.c',
+	'mlx5_txq.c',
+	'mlx5_vlan.c',
+	'mlx5_utils.c',
+	'mlx5_socket.c',
+)
+if (dpdk_conf.has('RTE_ARCH_X86_64')
+	or dpdk_conf.has('RTE_ARCH_ARM64')
+	or dpdk_conf.has('RTE_ARCH_PPC_64'))
+	sources += files('mlx5_rxtx_vec.c')
+endif
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
 	endif
 endforeach
-
-if build
-	allow_experimental_apis = true
-	deps += ['hash']
-	ext_deps += libs
-	sources = files(
-		'mlx5.c',
-		'mlx5_ethdev.c',
-		'mlx5_flow.c',
-		'mlx5_flow_meter.c',
-		'mlx5_flow_dv.c',
-		'mlx5_flow_verbs.c',
-		'mlx5_mac.c',
-		'mlx5_mr.c',
-		'mlx5_nl.c',
-		'mlx5_rss.c',
-		'mlx5_rxmode.c',
-		'mlx5_rxq.c',
-		'mlx5_rxtx.c',
-		'mlx5_mp.c',
-		'mlx5_stats.c',
-		'mlx5_trigger.c',
-		'mlx5_txq.c',
-		'mlx5_vlan.c',
-		'mlx5_devx_cmds.c',
-		'mlx5_utils.c',
-		'mlx5_socket.c',
-	)
-	if (dpdk_conf.has('RTE_ARCH_X86_64')
-		or dpdk_conf.has('RTE_ARCH_ARM64')
-		or dpdk_conf.has('RTE_ARCH_PPC_64'))
-		sources += files('mlx5_rxtx_vec.c')
-	endif
-	if not pmd_dlopen
-		sources += files('mlx5_glue.c')
-	endif
-	cflags_options = [
-		'-std=c11',
-		'-Wno-strict-prototypes',
-		'-D_BSD_SOURCE',
-		'-D_DEFAULT_SOURCE',
-		'-D_XOPEN_SOURCE=600'
-	]
-	foreach option:cflags_options
-		if cc.has_argument(option)
-			cflags += option
-		endif
-	endforeach
-	if get_option('buildtype').contains('debug')
-		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
-	else
-		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
-	endif
-	# To maintain the compatibility with the make build system
-	# mlx5_autoconf.h file is still generated.
-	# input array for meson member search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search", "struct member to search" ]
-	has_member_args = [
-		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
-		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
-		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
-		'struct ibv_counters_init_attr', 'comp_mask' ],
-	]
-	# input array for meson symbol search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search" ]
-	has_sym_args = [
-		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
-		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
-		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
-		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_create_flow_action_packet_reformat' ],
-		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
-		'IBV_FLOW_SPEC_MPLS' ],
-		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
-		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAG_RX_END_PADDING' ],
-		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_query_devx_port' ],
-		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_create' ],
-		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
-		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
-		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_query_async' ],
-		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_dest_devx_tir' ],
-		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_flow_meter' ],
-		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
-		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
-		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
-		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
-		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_push_vlan' ],
-		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseLR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseLR4_Full' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
-		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
-		'IFLA_NUM_VF' ],
-		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
-		'IFLA_EXT_MASK' ],
-		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
-		'IFLA_PHYS_SWITCH_ID' ],
-		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
-		'IFLA_PHYS_PORT_NAME' ],
-		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
-		'RDMA_NL_NLDEV' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_GET' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_PORT_GET' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_NAME' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
-		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
-		'mlx5dv_dump_dr_domain'],
-	]
-	config = configuration_data()
-	foreach arg:has_sym_args
-		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
-			dependencies: libs))
-	endforeach
-	foreach arg:has_member_args
-		file_prefix = '#include <' + arg[1] + '>'
-		config.set(arg[0], cc.has_member(arg[2], arg[3],
-			prefix : file_prefix, dependencies: libs))
-	endforeach
-	configure_file(output : 'mlx5_autoconf.h', configuration : config)
-endif
-# Build Glue Library
-if pmd_dlopen and build
-	dlopen_name = 'mlx5_glue'
-	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
-	dlopen_so_version = LIB_GLUE_VERSION
-	dlopen_sources = files('mlx5_glue.c')
-	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
-	dlopen_includes = [global_inc]
-	dlopen_includes += include_directories(
-		'../../../lib/librte_eal/common/include/generic',
-	)
-	shared_lib = shared_library(
-		dlopen_lib_name,
-		dlopen_sources,
-		include_directories: dlopen_includes,
-		c_args: cflags,
-		dependencies: libs,
-		link_args: [
-		'-Wl,-export-dynamic',
-		'-Wl,-h,@0@'.format(LIB_GLUE),
-		],
-		soversion: dlopen_so_version,
-		install: true,
-		install_dir: dlopen_install_dir,
-	)
+if get_option('buildtype').contains('debug')
+	cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+else
+	cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
 endif
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2f91e50..1cb8374 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -38,15 +38,16 @@
 #include <rte_string_fns.h>
 #include <rte_alarm.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0b8b1b6..29c0a06 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -32,13 +32,14 @@
 #include <rte_errno.h>
 #include <rte_flow.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
deleted file mode 100644
index 1302919..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ /dev/null
@@ -1,974 +0,0 @@
-// SPDX-License-Identifier: BSD-3-Clause
-/* Copyright 2018 Mellanox Technologies, Ltd */
-
-#include <unistd.h>
-
-#include <rte_flow_driver.h>
-#include <rte_malloc.h>
-
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_utils.h"
-
-
-/**
- * Allocate flow counters via devx interface.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param dcs
- *   Pointer to counters properties structure to be filled by the routine.
- * @param bulk_n_128
- *   Bulk counter numbers in 128 counters units.
- *
- * @return
- *   Pointer to counter object on success, a negative value otherwise and
- *   rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
-{
-	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
-	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
-
-	if (!dcs) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
-	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
-	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
-					      sizeof(in), out, sizeof(out));
-	if (!dcs->obj) {
-		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
-		rte_errno = errno;
-		rte_free(dcs);
-		return NULL;
-	}
-	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
-	return dcs;
-}
-
-/**
- * Query flow counters values.
- *
- * @param[in] dcs
- *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
- * @param[in] clear
- *   Whether hardware should clear the counters after the query or not.
- * @param[in] n_counters
- *   0 in case of 1 counter to read, otherwise the counter number to read.
- *  @param pkts
- *   The number of packets that matched the flow.
- *  @param bytes
- *    The number of bytes that matched the flow.
- *  @param mkey
- *   The mkey key for batch query.
- *  @param addr
- *    The address in the mkey range for batch query.
- *  @param cmd_comp
- *   The completion object for asynchronous batch query.
- *  @param async_id
- *    The ID to be returned in the asynchronous batch query response.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				 int clear, uint32_t n_counters,
-				 uint64_t *pkts, uint64_t *bytes,
-				 uint32_t mkey, void *addr,
-				 struct mlx5dv_devx_cmd_comp *cmd_comp,
-				 uint64_t async_id)
-{
-	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-			MLX5_ST_SZ_BYTES(traffic_counter);
-	uint32_t out[out_len];
-	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
-	void *stats;
-	int rc;
-
-	MLX5_SET(query_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
-	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
-	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
-	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
-
-	if (n_counters) {
-		MLX5_SET(query_flow_counter_in, in, num_of_counters,
-			 n_counters);
-		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
-		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
-		MLX5_SET64(query_flow_counter_in, in, address,
-			   (uint64_t)(uintptr_t)addr);
-	}
-	if (!cmd_comp)
-		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
-					       out_len);
-	else
-		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
-						     out_len, async_id,
-						     cmd_comp);
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
-		rte_errno = rc;
-		return -rc;
-	}
-	if (!n_counters) {
-		stats = MLX5_ADDR_OF(query_flow_counter_out,
-				     out, flow_statistics);
-		*pkts = MLX5_GET64(traffic_counter, stats, packets);
-		*bytes = MLX5_GET64(traffic_counter, stats, octets);
-	}
-	return 0;
-}
-
-/**
- * Create a new mkey.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] attr
- *   Attributes of the requested mkey.
- *
- * @return
- *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
- *   is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-			  struct mlx5_devx_mkey_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
-	void *mkc;
-	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
-	size_t pgsize;
-	uint32_t translation_size;
-
-	if (!mkey) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
-	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
-	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
-		 translation_size);
-	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
-	MLX5_SET(mkc, mkc, lw, 0x1);
-	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
-	MLX5_SET(mkc, mkc, qpn, 0xffffff);
-	MLX5_SET(mkc, mkc, pd, attr->pd);
-	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
-	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
-	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
-	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
-					       sizeof(out));
-	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
-		rte_errno = errno;
-		rte_free(mkey);
-		return NULL;
-	}
-	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
-	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
-	return mkey;
-}
-
-/**
- * Get status of devx command response.
- * Mainly used for asynchronous commands.
- *
- * @param[in] out
- *   The out response buffer.
- *
- * @return
- *   0 on success, non-zero value otherwise.
- */
-int
-mlx5_devx_get_out_command_status(void *out)
-{
-	int status;
-
-	if (!out)
-		return -EINVAL;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	if (status) {
-		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-
-		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
-			syndrome);
-	}
-	return status;
-}
-
-/**
- * Destroy any object allocated by a Devx API.
- *
- * @param[in] obj
- *   Pointer to a general object.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
-{
-	int ret;
-
-	if (!obj)
-		return 0;
-	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
-	rte_free(obj);
-	return ret;
-}
-
-/**
- * Query NIC vport context.
- * Fills minimal inline attribute.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] vport
- *   vport index
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-static int
-mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
-				      unsigned int vport,
-				      struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
-	void *vctx;
-	int status, syndrome, rc;
-
-	/* Query NIC vport context to determine inline mode. */
-	MLX5_SET(query_nic_vport_context_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
-	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
-	if (vport)
-		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_nic_vport_context_out, out, status);
-	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
-			    nic_vport_context);
-	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
-					   min_wqe_inline_mode);
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query HCA attributes.
- * Using those attributes we can check on run time if the device
- * is having the required capabilities.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-			     struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
-	void *hcattr;
-	int status, syndrome, rc;
-
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in), out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->flow_counter_bulk_alloc_bitmap =
-			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
-	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
-					    flow_counters_dump);
-	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
-	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
-	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
-						log_max_hairpin_queues);
-	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
-						    log_max_hairpin_wq_data_sz);
-	attr->log_max_hairpin_num_packets = MLX5_GET
-		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
-	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
-	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
-					  eth_net_offloads);
-	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
-					       flex_parser_protocols);
-	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
-	if (attr->qos.sup) {
-		MLX5_SET(query_hca_cap_in, in, op_mod,
-			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
-			 MLX5_HCA_CAP_OPMOD_GET_CUR);
-		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
-						 out, sizeof(out));
-		if (rc)
-			goto error;
-		if (status) {
-			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
-				" status %x, syndrome = %x",
-				status, syndrome);
-			return -1;
-		}
-		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-		attr->qos.srtcm_sup =
-				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
-		attr->qos.log_max_flow_meter =
-				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
-		attr->qos.flow_meter_reg_c_ids =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
-	}
-	if (!attr->eth_net_offloads)
-		return 0;
-
-	/* Query HCA offloads for Ethernet protocol. */
-	memset(in, 0, sizeof(in));
-	memset(out, 0, sizeof(out));
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc) {
-		attr->eth_net_offloads = 0;
-		goto error;
-	}
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		attr->eth_net_offloads = 0;
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_vlan_insert);
-	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_cap);
-	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
-					hcattr, tunnel_lro_gre);
-	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
-					  hcattr, tunnel_lro_vxlan);
-	attr->lro_max_msg_sz_mode = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, lro_max_msg_sz_mode);
-	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
-		attr->lro_timer_supported_periods[i] =
-			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_timer_supported_periods[i]);
-	}
-	attr->tunnel_stateless_geneve_rx =
-			    MLX5_GET(per_protocol_networking_offload_caps,
-				     hcattr, tunnel_stateless_geneve_rx);
-	attr->geneve_max_opt_len =
-		    MLX5_GET(per_protocol_networking_offload_caps,
-			     hcattr, max_geneve_opt_len);
-	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_inline_mode);
-	attr->tunnel_stateless_gtp = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, tunnel_stateless_gtp);
-	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-		return 0;
-	if (attr->eth_virt) {
-		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
-		if (rc) {
-			attr->eth_virt = 0;
-			goto error;
-		}
-	}
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query TIS transport domain from QP verbs object using DevX API.
- *
- * @param[in] qp
- *   Pointer to verbs QP returned by ibv_create_qp .
- * @param[in] tis_num
- *   TIS number of TIS to query.
- * @param[out] tis_td
- *   Pointer to TIS transport domain variable, to be set by the routine.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-			      uint32_t *tis_td)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
-	int rc;
-	void *tis_ctx;
-
-	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
-	MLX5_SET(query_tis_in, in, tisn, tis_num);
-	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query QP using DevX");
-		return -rc;
-	};
-	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
-	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
-	return 0;
-}
-
-/**
- * Fill WQ data for DevX API command.
- * Utility function for use when creating DevX objects containing a WQ.
- *
- * @param[in] wq_ctx
- *   Pointer to WQ context to fill with data.
- * @param [in] wq_attr
- *   Pointer to WQ attributes structure to fill in WQ context.
- */
-static void
-devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
-{
-	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
-	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
-	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
-	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
-	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
-	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
-	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
-	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
-	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
-	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
-	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
-	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
-	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
-	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
-	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
-	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
-	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
-	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
-	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
-		 wq_attr->log_hairpin_num_packets);
-	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
-	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
-		 wq_attr->single_wqe_log_num_of_strides);
-	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
-	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
-		 wq_attr->single_stride_log_num_of_bytes);
-	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
-	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
-	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
-}
-
-/**
- * Create RQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rq_attr
- *   Pointer to create RQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-			struct mlx5_devx_create_rq_attr *rq_attr,
-			int socket)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *rq = NULL;
-
-	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
-	if (!rq) {
-		DRV_LOG(ERR, "Failed to allocate RQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
-	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
-	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
-	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
-	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
-	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
-	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
-	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
-	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-	wq_attr = &rq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						  out, sizeof(out));
-	if (!rq->obj) {
-		DRV_LOG(ERR, "Failed to create RQ using DevX");
-		rte_errno = errno;
-		rte_free(rq);
-		return NULL;
-	}
-	rq->id = MLX5_GET(create_rq_out, out, rqn);
-	return rq;
-}
-
-/**
- * Modify RQ using DevX API.
- *
- * @param[in] rq
- *   Pointer to RQ object structure.
- * @param [in] rq_attr
- *   Pointer to modify RQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			struct mlx5_devx_modify_rq_attr *rq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	int ret;
-
-	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
-	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
-	MLX5_SET(modify_rq_in, in, rqn, rq->id);
-	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
-	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
-		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
-		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
-		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
-		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
-	}
-	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify RQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIR using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tir_attr
- *   Pointer to TIR attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-			 struct mlx5_devx_tir_attr *tir_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
-	void *tir_ctx, *outer, *inner;
-	struct mlx5_devx_obj *tir = NULL;
-	int i;
-
-	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
-	if (!tir) {
-		DRV_LOG(ERR, "Failed to allocate TIR data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
-	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
-	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
-	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
-		 tir_attr->lro_timeout_period_usecs);
-	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
-	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
-	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
-	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
-	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
-		 tir_attr->tunneled_offload_en);
-	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
-	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
-	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
-	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
-	for (i = 0; i < 10; i++) {
-		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
-			 tir_attr->rx_hash_toeplitz_key[i]);
-	}
-	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
-	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, selected_fields,
-		 tir_attr->rx_hash_field_selector_outer.selected_fields);
-	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
-	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, selected_fields,
-		 tir_attr->rx_hash_field_selector_inner.selected_fields);
-	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						   out, sizeof(out));
-	if (!tir->obj) {
-		DRV_LOG(ERR, "Failed to create TIR using DevX");
-		rte_errno = errno;
-		rte_free(tir);
-		return NULL;
-	}
-	tir->id = MLX5_GET(create_tir_out, out, tirn);
-	return tir;
-}
-
-/**
- * Create RQT using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rqt_attr
- *   Pointer to RQT attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-			 struct mlx5_devx_rqt_attr *rqt_attr)
-{
-	uint32_t *in = NULL;
-	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
-			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
-	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
-	void *rqt_ctx;
-	struct mlx5_devx_obj *rqt = NULL;
-	int i;
-
-	in = rte_calloc(__func__, 1, inlen, 0);
-	if (!in) {
-		DRV_LOG(ERR, "Failed to allocate RQT IN data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
-	if (!rqt) {
-		DRV_LOG(ERR, "Failed to allocate RQT data");
-		rte_errno = ENOMEM;
-		rte_free(in);
-		return NULL;
-	}
-	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
-	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
-	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
-	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
-	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
-		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
-	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
-	rte_free(in);
-	if (!rqt->obj) {
-		DRV_LOG(ERR, "Failed to create RQT using DevX");
-		rte_errno = errno;
-		rte_free(rqt);
-		return NULL;
-	}
-	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
-	return rqt;
-}
-
-/**
- * Create SQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- **/
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-			struct mlx5_devx_create_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
-	void *sq_ctx;
-	void *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *sq = NULL;
-
-	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
-	if (!sq) {
-		DRV_LOG(ERR, "Failed to allocate SQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
-	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
-	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
-	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
-	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
-		 sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
-		 sq_attr->min_wqe_inline_mode);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
-	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
-	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
-	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
-	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
-	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
-		 sq_attr->packet_pacing_rate_limit_index);
-	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
-	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
-	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
-	wq_attr = &sq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!sq->obj) {
-		DRV_LOG(ERR, "Failed to create SQ using DevX");
-		rte_errno = errno;
-		rte_free(sq);
-		return NULL;
-	}
-	sq->id = MLX5_GET(create_sq_out, out, sqn);
-	return sq;
-}
-
-/**
- * Modify SQ using DevX API.
- *
- * @param[in] sq
- *   Pointer to SQ object structure.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			struct mlx5_devx_modify_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
-	void *sq_ctx;
-	int ret;
-
-	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
-	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
-	MLX5_SET(modify_sq_in, in, sqn, sq->id);
-	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
-	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify SQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIS using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tis_attr
- *   Pointer to TIS attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-			 struct mlx5_devx_tis_attr *tis_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
-	struct mlx5_devx_obj *tis = NULL;
-	void *tis_ctx;
-
-	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
-	if (!tis) {
-		DRV_LOG(ERR, "Failed to allocate TIS object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
-	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
-	MLX5_SET(tisc, tis_ctx, transport_domain,
-		 tis_attr->transport_domain);
-	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					      out, sizeof(out));
-	if (!tis->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(tis);
-		return NULL;
-	}
-	tis->id = MLX5_GET(create_tis_out, out, tisn);
-	return tis;
-}
-
-/**
- * Create transport domain using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_td(struct ibv_context *ctx)
-{
-	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
-	struct mlx5_devx_obj *td = NULL;
-
-	td = rte_calloc(__func__, 1, sizeof(*td), 0);
-	if (!td) {
-		DRV_LOG(ERR, "Failed to allocate TD object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_transport_domain_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
-	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!td->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(td);
-		return NULL;
-	}
-	td->id = MLX5_GET(alloc_transport_domain_out, out,
-			   transport_domain);
-	return td;
-}
-
-/**
- * Dump all flows to file.
- *
- * @param[in] fdb_domain
- *   FDB domain.
- * @param[in] rx_domain
- *   RX domain.
- * @param[in] tx_domain
- *   TX domain.
- * @param[out] file
- *   Pointer to file stream.
- *
- * @return
- *   0 on success, a nagative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
-			void *rx_domain __rte_unused,
-			void *tx_domain __rte_unused, FILE *file __rte_unused)
-{
-	int ret = 0;
-
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
-		if (ret)
-			return ret;
-	}
-	assert(rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
-	if (ret)
-		return ret;
-	assert(tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
-#else
-	ret = ENOTSUP;
-#endif
-	return -ret;
-}
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
deleted file mode 100644
index 0c5afde..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.h
+++ /dev/null
@@ -1,227 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
-#define RTE_PMD_MLX5_DEVX_CMDS_H_
-
-#include "mlx5_glue.h"
-
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
-
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					      struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				       struct mlx5_devx_create_rq_attr *rq_attr,
-				       int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					   struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					   struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-				      struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			    struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-					   struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
-			    FILE *file);
-#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ce0109c..eddf888 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -36,9 +36,10 @@
 #include <rte_rwlock.h>
 #include <rte_cycles.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 34f3a53..a2c07f5 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -27,12 +27,13 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 /* Dev ops structure defined in mlx5.c */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9832542..55f9a5a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -25,8 +25,9 @@
 #include <rte_alarm.h>
 #include <rte_mtr.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5.h"
-#include "mlx5_prm.h"
 
 /* Private rte flow items. */
 enum mlx5_rte_flow_item_type {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d70dd4f..50d1078 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -29,12 +29,13 @@
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index c4d28b2..32d51c0 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -9,6 +9,8 @@
 #include <rte_mtr.h>
 #include <rte_mtr_driver.h>
 
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_flow.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index c787c98..8922bac 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -26,11 +26,12 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #define VERBS_SPEC_INNER(item_flags) \
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
deleted file mode 100644
index 4906eeb..0000000
--- a/drivers/net/mlx5/mlx5_glue.c
+++ /dev/null
@@ -1,1150 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <stdalign.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <stdlib.h>
-
-/*
- * Not needed by this file; included to work around the lack of off_t
- * definition for mlx5dv.h with unpatched rdma-core versions.
- */
-#include <sys/types.h>
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_config.h>
-
-#include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-
-static int
-mlx5_glue_fork_init(void)
-{
-	return ibv_fork_init();
-}
-
-static struct ibv_pd *
-mlx5_glue_alloc_pd(struct ibv_context *context)
-{
-	return ibv_alloc_pd(context);
-}
-
-static int
-mlx5_glue_dealloc_pd(struct ibv_pd *pd)
-{
-	return ibv_dealloc_pd(pd);
-}
-
-static struct ibv_device **
-mlx5_glue_get_device_list(int *num_devices)
-{
-	return ibv_get_device_list(num_devices);
-}
-
-static void
-mlx5_glue_free_device_list(struct ibv_device **list)
-{
-	ibv_free_device_list(list);
-}
-
-static struct ibv_context *
-mlx5_glue_open_device(struct ibv_device *device)
-{
-	return ibv_open_device(device);
-}
-
-static int
-mlx5_glue_close_device(struct ibv_context *context)
-{
-	return ibv_close_device(context);
-}
-
-static int
-mlx5_glue_query_device(struct ibv_context *context,
-		       struct ibv_device_attr *device_attr)
-{
-	return ibv_query_device(context, device_attr);
-}
-
-static int
-mlx5_glue_query_device_ex(struct ibv_context *context,
-			  const struct ibv_query_device_ex_input *input,
-			  struct ibv_device_attr_ex *attr)
-{
-	return ibv_query_device_ex(context, input, attr);
-}
-
-static int
-mlx5_glue_query_rt_values_ex(struct ibv_context *context,
-			  struct ibv_values_ex *values)
-{
-	return ibv_query_rt_values_ex(context, values);
-}
-
-static int
-mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
-		     struct ibv_port_attr *port_attr)
-{
-	return ibv_query_port(context, port_num, port_attr);
-}
-
-static struct ibv_comp_channel *
-mlx5_glue_create_comp_channel(struct ibv_context *context)
-{
-	return ibv_create_comp_channel(context);
-}
-
-static int
-mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
-{
-	return ibv_destroy_comp_channel(channel);
-}
-
-static struct ibv_cq *
-mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
-		    struct ibv_comp_channel *channel, int comp_vector)
-{
-	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
-}
-
-static int
-mlx5_glue_destroy_cq(struct ibv_cq *cq)
-{
-	return ibv_destroy_cq(cq);
-}
-
-static int
-mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
-		       void **cq_context)
-{
-	return ibv_get_cq_event(channel, cq, cq_context);
-}
-
-static void
-mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
-{
-	ibv_ack_cq_events(cq, nevents);
-}
-
-static struct ibv_rwq_ind_table *
-mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
-			       struct ibv_rwq_ind_table_init_attr *init_attr)
-{
-	return ibv_create_rwq_ind_table(context, init_attr);
-}
-
-static int
-mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
-{
-	return ibv_destroy_rwq_ind_table(rwq_ind_table);
-}
-
-static struct ibv_wq *
-mlx5_glue_create_wq(struct ibv_context *context,
-		    struct ibv_wq_init_attr *wq_init_attr)
-{
-	return ibv_create_wq(context, wq_init_attr);
-}
-
-static int
-mlx5_glue_destroy_wq(struct ibv_wq *wq)
-{
-	return ibv_destroy_wq(wq);
-}
-static int
-mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
-{
-	return ibv_modify_wq(wq, wq_attr);
-}
-
-static struct ibv_flow *
-mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
-{
-	return ibv_create_flow(qp, flow);
-}
-
-static int
-mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
-{
-	return ibv_destroy_flow(flow_id);
-}
-
-static int
-mlx5_glue_destroy_flow_action(void *action)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_destroy(action);
-#else
-	struct mlx5dv_flow_action_attr *attr = action;
-	int res = 0;
-	switch (attr->type) {
-	case MLX5DV_FLOW_ACTION_TAG:
-		break;
-	default:
-		res = ibv_destroy_flow_action(attr->action);
-		break;
-	}
-	free(action);
-	return res;
-#endif
-#else
-	(void)action;
-	return ENOTSUP;
-#endif
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
-{
-	return ibv_create_qp(pd, qp_init_attr);
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp_ex(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
-{
-	return ibv_create_qp_ex(context, qp_init_attr_ex);
-}
-
-static int
-mlx5_glue_destroy_qp(struct ibv_qp *qp)
-{
-	return ibv_destroy_qp(qp);
-}
-
-static int
-mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
-{
-	return ibv_modify_qp(qp, attr, attr_mask);
-}
-
-static struct ibv_mr *
-mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
-{
-	return ibv_reg_mr(pd, addr, length, access);
-}
-
-static int
-mlx5_glue_dereg_mr(struct ibv_mr *mr)
-{
-	return ibv_dereg_mr(mr);
-}
-
-static struct ibv_counter_set *
-mlx5_glue_create_counter_set(struct ibv_context *context,
-			     struct ibv_counter_set_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)init_attr;
-	return NULL;
-#else
-	return ibv_create_counter_set(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)cs;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counter_set(cs);
-#endif
-}
-
-static int
-mlx5_glue_describe_counter_set(struct ibv_context *context,
-			       uint16_t counter_set_id,
-			       struct ibv_counter_set_description *cs_desc)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)counter_set_id;
-	(void)cs_desc;
-	return ENOTSUP;
-#else
-	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
-#endif
-}
-
-static int
-mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
-			    struct ibv_counter_set_data *cs_data)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)query_attr;
-	(void)cs_data;
-	return ENOTSUP;
-#else
-	return ibv_query_counter_set(query_attr, cs_data);
-#endif
-}
-
-static struct ibv_counters *
-mlx5_glue_create_counters(struct ibv_context *context,
-			  struct ibv_counters_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)context;
-	(void)init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return ibv_create_counters(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counters(struct ibv_counters *counters)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counters(counters);
-#endif
-}
-
-static int
-mlx5_glue_attach_counters(struct ibv_counters *counters,
-			  struct ibv_counter_attach_attr *attr,
-			  struct ibv_flow *flow)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)attr;
-	(void)flow;
-	return ENOTSUP;
-#else
-	return ibv_attach_counters_point_flow(counters, attr, flow);
-#endif
-}
-
-static int
-mlx5_glue_query_counters(struct ibv_counters *counters,
-			 uint64_t *counters_value,
-			 uint32_t ncounters,
-			 uint32_t flags)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)counters_value;
-	(void)ncounters;
-	(void)flags;
-	return ENOTSUP;
-#else
-	return ibv_read_counters(counters, counters_value, ncounters, flags);
-#endif
-}
-
-static void
-mlx5_glue_ack_async_event(struct ibv_async_event *event)
-{
-	ibv_ack_async_event(event);
-}
-
-static int
-mlx5_glue_get_async_event(struct ibv_context *context,
-			  struct ibv_async_event *event)
-{
-	return ibv_get_async_event(context, event);
-}
-
-static const char *
-mlx5_glue_port_state_str(enum ibv_port_state port_state)
-{
-	return ibv_port_state_str(port_state);
-}
-
-static struct ibv_cq *
-mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
-{
-	return ibv_cq_ex_to_cq(cq);
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_table(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
-#else
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_dest_vport(domain, port);
-#else
-	(void)domain;
-	(void)port;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_drop(void)
-{
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_drop();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
-					  rte_be32_t vlan_tag)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
-#else
-	(void)domain;
-	(void)vlan_tag;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_pop_vlan(void)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_pop_vlan();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_create(domain, level);
-#else
-	(void)domain;
-	(void)level;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_destroy(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_domain(struct ibv_context *ctx,
-			   enum  mlx5dv_dr_domain_type domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_create(ctx, domain);
-#else
-	(void)ctx;
-	(void)domain;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_domain(void *domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_destroy(domain);
-#else
-	(void)domain;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_cq_ex *
-mlx5_glue_dv_create_cq(struct ibv_context *context,
-		       struct ibv_cq_init_attr_ex *cq_attr,
-		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
-{
-	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
-}
-
-static struct ibv_wq *
-mlx5_glue_dv_create_wq(struct ibv_context *context,
-		       struct ibv_wq_init_attr *wq_attr,
-		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
-{
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-	(void)context;
-	(void)wq_attr;
-	(void)mlx5_wq_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
-#endif
-}
-
-static int
-mlx5_glue_dv_query_device(struct ibv_context *ctx,
-			  struct mlx5dv_context *attrs_out)
-{
-	return mlx5dv_query_device(ctx, attrs_out);
-}
-
-static int
-mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
-			      enum mlx5dv_set_ctx_attr_type type, void *attr)
-{
-	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
-}
-
-static int
-mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
-{
-	return mlx5dv_init_obj(obj, obj_type);
-}
-
-static struct ibv_qp *
-mlx5_glue_dv_create_qp(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
-{
-#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
-#else
-	(void)context;
-	(void)qp_init_attr_ex;
-	(void)dv_qp_init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
-				 struct mlx5dv_flow_matcher_attr *matcher_attr,
-				 void *tbl)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)context;
-	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
-					matcher_attr->match_criteria_enable,
-					matcher_attr->match_mask);
-#else
-	(void)tbl;
-	return mlx5dv_create_flow_matcher(context, matcher_attr);
-#endif
-#else
-	(void)context;
-	(void)matcher_attr;
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow(void *matcher,
-			 void *match_value,
-			 size_t num_actions,
-			 void *actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
-				     (struct mlx5dv_dr_action **)actions);
-#else
-	struct mlx5dv_flow_action_attr actions_attr[8];
-
-	if (num_actions > 8)
-		return NULL;
-	for (size_t i = 0; i < num_actions; i++)
-		actions_attr[i] =
-			*((struct mlx5dv_flow_action_attr *)(actions[i]));
-	return mlx5dv_create_flow(matcher, match_value,
-				  num_actions, actions_attr);
-#endif
-#else
-	(void)matcher;
-	(void)match_value;
-	(void)num_actions;
-	(void)actions;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)offset;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
-	action->obj = counter_obj;
-	return action;
-#endif
-#else
-	(void)counter_obj;
-	(void)offset;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
-	action->obj = qp;
-	return action;
-#endif
-#else
-	(void)qp;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
-{
-#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
-	return mlx5dv_dr_action_create_dest_devx_tir(tir);
-#else
-	(void)tir;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_modify_header
-					(struct ibv_context *ctx,
-					 enum mlx5dv_flow_table_type ft_type,
-					 void *domain, uint64_t flags,
-					 size_t actions_sz,
-					 uint64_t actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
-						     (__be64 *)actions);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)domain;
-	(void)flags;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_modify_header
-		(ctx, actions_sz, actions, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)actions_sz;
-	(void)actions;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_packet_reformat
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
-						       reformat_type, data_sz,
-						       data);
-#else
-	(void)domain;
-	(void)flags;
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_packet_reformat
-		(ctx, data_sz, data, reformat_type, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)reformat_type;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)data_sz;
-	(void)data;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_tag(tag);
-#else
-	struct mlx5dv_flow_action_attr *action;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_TAG;
-	action->tag_value = tag;
-	return action;
-#endif
-#endif
-	(void)tag;
-	errno = ENOTSUP;
-	return NULL;
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_create_flow_meter(attr);
-#else
-	(void)attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dv_modify_flow_action_meter(void *action,
-				      struct mlx5dv_dr_flow_meter_attr *attr,
-				      uint64_t modify_bits)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
-#else
-	(void)action;
-	(void)attr;
-	(void)modify_bits;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow(void *flow_id)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_destroy(flow_id);
-#else
-	return ibv_destroy_flow(flow_id);
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow_matcher(void *matcher)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_matcher_destroy(matcher);
-#else
-	return mlx5dv_destroy_flow_matcher(matcher);
-#endif
-#else
-	(void)matcher;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_context *
-mlx5_glue_dv_open_device(struct ibv_device *device)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_open_device(device,
-				  &(struct mlx5dv_context_attr){
-					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
-				  });
-#else
-	(void)device;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static struct mlx5dv_devx_obj *
-mlx5_glue_devx_obj_create(struct ibv_context *ctx,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_destroy(obj);
-#else
-	(void)obj;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
-			 const void *in, size_t inlen,
-			 void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
-			   const void *in, size_t inlen,
-			   void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_cmd_comp *
-mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_create_cmd_comp(ctx);
-#else
-	(void)ctx;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void
-mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
-#else
-	(void)cmd_comp;
-	errno = -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
-			       size_t inlen, size_t outlen, uint64_t wr_id,
-			       struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
-					   cmd_comp);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)outlen;
-	(void)wr_id;
-	(void)cmd_comp;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
-				  size_t cmd_resp_len)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
-					      cmd_resp_len);
-#else
-	(void)cmd_comp;
-	(void)cmd_resp;
-	(void)cmd_resp_len;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_umem *
-mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
-			uint32_t access)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_reg(context, addr, size, access);
-#else
-	(void)context;
-	(void)addr;
-	(void)size;
-	(void)access;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_dereg(dv_devx_umem);
-#else
-	(void)dv_devx_umem;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_qp_query(struct ibv_qp *qp,
-			const void *in, size_t inlen,
-			void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
-#else
-	(void)qp;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_devx_port_query(struct ibv_context *ctx,
-			  uint32_t port_num,
-			  struct mlx5dv_devx_port *mlx5_devx_port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
-#else
-	(void)ctx;
-	(void)port_num;
-	(void)mlx5_devx_port;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dr_dump_domain(FILE *file, void *domain)
-{
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	return mlx5dv_dump_dr_domain(file, domain);
-#else
-	RTE_SET_USED(file);
-	RTE_SET_USED(domain);
-	return -ENOTSUP;
-#endif
-}
-
-alignas(RTE_CACHE_LINE_SIZE)
-const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
-	.version = MLX5_GLUE_VERSION,
-	.fork_init = mlx5_glue_fork_init,
-	.alloc_pd = mlx5_glue_alloc_pd,
-	.dealloc_pd = mlx5_glue_dealloc_pd,
-	.get_device_list = mlx5_glue_get_device_list,
-	.free_device_list = mlx5_glue_free_device_list,
-	.open_device = mlx5_glue_open_device,
-	.close_device = mlx5_glue_close_device,
-	.query_device = mlx5_glue_query_device,
-	.query_device_ex = mlx5_glue_query_device_ex,
-	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
-	.query_port = mlx5_glue_query_port,
-	.create_comp_channel = mlx5_glue_create_comp_channel,
-	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
-	.create_cq = mlx5_glue_create_cq,
-	.destroy_cq = mlx5_glue_destroy_cq,
-	.get_cq_event = mlx5_glue_get_cq_event,
-	.ack_cq_events = mlx5_glue_ack_cq_events,
-	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
-	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
-	.create_wq = mlx5_glue_create_wq,
-	.destroy_wq = mlx5_glue_destroy_wq,
-	.modify_wq = mlx5_glue_modify_wq,
-	.create_flow = mlx5_glue_create_flow,
-	.destroy_flow = mlx5_glue_destroy_flow,
-	.destroy_flow_action = mlx5_glue_destroy_flow_action,
-	.create_qp = mlx5_glue_create_qp,
-	.create_qp_ex = mlx5_glue_create_qp_ex,
-	.destroy_qp = mlx5_glue_destroy_qp,
-	.modify_qp = mlx5_glue_modify_qp,
-	.reg_mr = mlx5_glue_reg_mr,
-	.dereg_mr = mlx5_glue_dereg_mr,
-	.create_counter_set = mlx5_glue_create_counter_set,
-	.destroy_counter_set = mlx5_glue_destroy_counter_set,
-	.describe_counter_set = mlx5_glue_describe_counter_set,
-	.query_counter_set = mlx5_glue_query_counter_set,
-	.create_counters = mlx5_glue_create_counters,
-	.destroy_counters = mlx5_glue_destroy_counters,
-	.attach_counters = mlx5_glue_attach_counters,
-	.query_counters = mlx5_glue_query_counters,
-	.ack_async_event = mlx5_glue_ack_async_event,
-	.get_async_event = mlx5_glue_get_async_event,
-	.port_state_str = mlx5_glue_port_state_str,
-	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
-	.dr_create_flow_action_dest_flow_tbl =
-		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
-	.dr_create_flow_action_dest_port =
-		mlx5_glue_dr_create_flow_action_dest_port,
-	.dr_create_flow_action_drop =
-		mlx5_glue_dr_create_flow_action_drop,
-	.dr_create_flow_action_push_vlan =
-		mlx5_glue_dr_create_flow_action_push_vlan,
-	.dr_create_flow_action_pop_vlan =
-		mlx5_glue_dr_create_flow_action_pop_vlan,
-	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
-	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
-	.dr_create_domain = mlx5_glue_dr_create_domain,
-	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
-	.dv_create_cq = mlx5_glue_dv_create_cq,
-	.dv_create_wq = mlx5_glue_dv_create_wq,
-	.dv_query_device = mlx5_glue_dv_query_device,
-	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
-	.dv_init_obj = mlx5_glue_dv_init_obj,
-	.dv_create_qp = mlx5_glue_dv_create_qp,
-	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
-	.dv_create_flow = mlx5_glue_dv_create_flow,
-	.dv_create_flow_action_counter =
-		mlx5_glue_dv_create_flow_action_counter,
-	.dv_create_flow_action_dest_ibv_qp =
-		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
-	.dv_create_flow_action_dest_devx_tir =
-		mlx5_glue_dv_create_flow_action_dest_devx_tir,
-	.dv_create_flow_action_modify_header =
-		mlx5_glue_dv_create_flow_action_modify_header,
-	.dv_create_flow_action_packet_reformat =
-		mlx5_glue_dv_create_flow_action_packet_reformat,
-	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
-	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
-	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
-	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
-	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
-	.dv_open_device = mlx5_glue_dv_open_device,
-	.devx_obj_create = mlx5_glue_devx_obj_create,
-	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
-	.devx_obj_query = mlx5_glue_devx_obj_query,
-	.devx_obj_modify = mlx5_glue_devx_obj_modify,
-	.devx_general_cmd = mlx5_glue_devx_general_cmd,
-	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
-	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
-	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
-	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
-	.devx_umem_reg = mlx5_glue_devx_umem_reg,
-	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
-	.devx_qp_query = mlx5_glue_devx_qp_query,
-	.devx_port_query = mlx5_glue_devx_port_query,
-	.dr_dump_domain = mlx5_glue_dr_dump_domain,
-};
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
deleted file mode 100644
index 6771a18..0000000
--- a/drivers/net/mlx5/mlx5_glue.h
+++ /dev/null
@@ -1,264 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#ifndef MLX5_GLUE_H_
-#define MLX5_GLUE_H_
-
-#include <stddef.h>
-#include <stdint.h>
-
-#include "rte_byteorder.h"
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#ifndef MLX5_GLUE_VERSION
-#define MLX5_GLUE_VERSION ""
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-struct ibv_counter_set;
-struct ibv_counter_set_data;
-struct ibv_counter_set_description;
-struct ibv_counter_set_init_attr;
-struct ibv_query_counter_set_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-struct ibv_counters;
-struct ibv_counters_init_attr;
-struct ibv_counter_attach_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-struct mlx5dv_qp_init_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-struct mlx5dv_wq_init_attr;
-#endif
-
-#ifndef HAVE_IBV_FLOW_DV_SUPPORT
-struct mlx5dv_flow_matcher;
-struct mlx5dv_flow_matcher_attr;
-struct mlx5dv_flow_action_attr;
-struct mlx5dv_flow_match_parameters;
-struct mlx5dv_dr_flow_meter_attr;
-struct ibv_flow_action;
-enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
-enum mlx5dv_flow_table_type { flow_table_type = 0, };
-#endif
-
-#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
-#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
-#endif
-
-#ifndef HAVE_IBV_DEVX_OBJ
-struct mlx5dv_devx_obj;
-struct mlx5dv_devx_umem { uint32_t umem_id; };
-#endif
-
-#ifndef HAVE_IBV_DEVX_ASYNC
-struct mlx5dv_devx_cmd_comp;
-struct mlx5dv_devx_async_cmd_hdr;
-#endif
-
-#ifndef HAVE_MLX5DV_DR
-enum  mlx5dv_dr_domain_type { unused, };
-struct mlx5dv_dr_domain;
-#endif
-
-#ifndef HAVE_MLX5DV_DR_DEVX_PORT
-struct mlx5dv_devx_port;
-#endif
-
-#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
-struct mlx5dv_dr_flow_meter_attr;
-#endif
-
-/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
-struct mlx5_glue {
-	const char *version;
-	int (*fork_init)(void);
-	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
-	int (*dealloc_pd)(struct ibv_pd *pd);
-	struct ibv_device **(*get_device_list)(int *num_devices);
-	void (*free_device_list)(struct ibv_device **list);
-	struct ibv_context *(*open_device)(struct ibv_device *device);
-	int (*close_device)(struct ibv_context *context);
-	int (*query_device)(struct ibv_context *context,
-			    struct ibv_device_attr *device_attr);
-	int (*query_device_ex)(struct ibv_context *context,
-			       const struct ibv_query_device_ex_input *input,
-			       struct ibv_device_attr_ex *attr);
-	int (*query_rt_values_ex)(struct ibv_context *context,
-			       struct ibv_values_ex *values);
-	int (*query_port)(struct ibv_context *context, uint8_t port_num,
-			  struct ibv_port_attr *port_attr);
-	struct ibv_comp_channel *(*create_comp_channel)
-		(struct ibv_context *context);
-	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
-	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
-				    void *cq_context,
-				    struct ibv_comp_channel *channel,
-				    int comp_vector);
-	int (*destroy_cq)(struct ibv_cq *cq);
-	int (*get_cq_event)(struct ibv_comp_channel *channel,
-			    struct ibv_cq **cq, void **cq_context);
-	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
-	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
-		(struct ibv_context *context,
-		 struct ibv_rwq_ind_table_init_attr *init_attr);
-	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
-	struct ibv_wq *(*create_wq)(struct ibv_context *context,
-				    struct ibv_wq_init_attr *wq_init_attr);
-	int (*destroy_wq)(struct ibv_wq *wq);
-	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
-	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
-					struct ibv_flow_attr *flow);
-	int (*destroy_flow)(struct ibv_flow *flow_id);
-	int (*destroy_flow_action)(void *action);
-	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
-				    struct ibv_qp_init_attr *qp_init_attr);
-	struct ibv_qp *(*create_qp_ex)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
-	int (*destroy_qp)(struct ibv_qp *qp);
-	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
-			 int attr_mask);
-	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
-				 size_t length, int access);
-	int (*dereg_mr)(struct ibv_mr *mr);
-	struct ibv_counter_set *(*create_counter_set)
-		(struct ibv_context *context,
-		 struct ibv_counter_set_init_attr *init_attr);
-	int (*destroy_counter_set)(struct ibv_counter_set *cs);
-	int (*describe_counter_set)
-		(struct ibv_context *context,
-		 uint16_t counter_set_id,
-		 struct ibv_counter_set_description *cs_desc);
-	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
-				 struct ibv_counter_set_data *cs_data);
-	struct ibv_counters *(*create_counters)
-		(struct ibv_context *context,
-		 struct ibv_counters_init_attr *init_attr);
-	int (*destroy_counters)(struct ibv_counters *counters);
-	int (*attach_counters)(struct ibv_counters *counters,
-			       struct ibv_counter_attach_attr *attr,
-			       struct ibv_flow *flow);
-	int (*query_counters)(struct ibv_counters *counters,
-			      uint64_t *counters_value,
-			      uint32_t ncounters,
-			      uint32_t flags);
-	void (*ack_async_event)(struct ibv_async_event *event);
-	int (*get_async_event)(struct ibv_context *context,
-			       struct ibv_async_event *event);
-	const char *(*port_state_str)(enum ibv_port_state port_state);
-	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
-	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
-	void *(*dr_create_flow_action_dest_port)(void *domain,
-						 uint32_t port);
-	void *(*dr_create_flow_action_drop)();
-	void *(*dr_create_flow_action_push_vlan)
-					(struct mlx5dv_dr_domain *domain,
-					 rte_be32_t vlan_tag);
-	void *(*dr_create_flow_action_pop_vlan)();
-	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
-	int (*dr_destroy_flow_tbl)(void *tbl);
-	void *(*dr_create_domain)(struct ibv_context *ctx,
-				  enum mlx5dv_dr_domain_type domain);
-	int (*dr_destroy_domain)(void *domain);
-	struct ibv_cq_ex *(*dv_create_cq)
-		(struct ibv_context *context,
-		 struct ibv_cq_init_attr_ex *cq_attr,
-		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
-	struct ibv_wq *(*dv_create_wq)
-		(struct ibv_context *context,
-		 struct ibv_wq_init_attr *wq_attr,
-		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
-	int (*dv_query_device)(struct ibv_context *ctx_in,
-			       struct mlx5dv_context *attrs_out);
-	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
-				   enum mlx5dv_set_ctx_attr_type type,
-				   void *attr);
-	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
-	struct ibv_qp *(*dv_create_qp)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
-	void *(*dv_create_flow_matcher)
-		(struct ibv_context *context,
-		 struct mlx5dv_flow_matcher_attr *matcher_attr,
-		 void *tbl);
-	void *(*dv_create_flow)(void *matcher, void *match_value,
-			  size_t num_actions, void *actions[]);
-	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
-	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
-	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
-	void *(*dv_create_flow_action_modify_header)
-		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
-		 void *domain, uint64_t flags, size_t actions_sz,
-		 uint64_t actions[]);
-	void *(*dv_create_flow_action_packet_reformat)
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data);
-	void *(*dv_create_flow_action_tag)(uint32_t tag);
-	void *(*dv_create_flow_action_meter)
-		(struct mlx5dv_dr_flow_meter_attr *attr);
-	int (*dv_modify_flow_action_meter)(void *action,
-		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
-	int (*dv_destroy_flow)(void *flow);
-	int (*dv_destroy_flow_matcher)(void *matcher);
-	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
-	struct mlx5dv_devx_obj *(*devx_obj_create)
-					(struct ibv_context *ctx,
-					 const void *in, size_t inlen,
-					 void *out, size_t outlen);
-	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
-	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
-			      const void *in, size_t inlen,
-			      void *out, size_t outlen);
-	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
-			       const void *in, size_t inlen,
-			       void *out, size_t outlen);
-	int (*devx_general_cmd)(struct ibv_context *context,
-				const void *in, size_t inlen,
-				void *out, size_t outlen);
-	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
-					(struct ibv_context *context);
-	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
-				    const void *in, size_t inlen,
-				    size_t outlen, uint64_t wr_id,
-				    struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				       struct mlx5dv_devx_async_cmd_hdr *resp,
-				       size_t cmd_resp_len);
-	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
-						  void *addr, size_t size,
-						  uint32_t access);
-	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
-	int (*devx_qp_query)(struct ibv_qp *qp,
-			     const void *in, size_t inlen,
-			     void *out, size_t outlen);
-	int (*devx_port_query)(struct ibv_context *ctx,
-			       uint32_t port_num,
-			       struct mlx5dv_devx_port *mlx5_devx_port);
-	int (*dr_dump_domain)(FILE *file, void *domain);
-};
-
-const struct mlx5_glue *mlx5_glue;
-
-#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 7bdaa2a..a646b90 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -27,10 +27,10 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 /**
  * Get MAC address by querying netdevice.
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 0d549b6..b1cd9f7 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -17,10 +17,11 @@
 #include <rte_rwlock.h>
 #include <rte_bus_pci.h>
 
+#include <mlx5_glue.h>
+
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_glue.h"
 
 struct mr_find_contig_memsegs_data {
 	uintptr_t addr;
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
deleted file mode 100644
index 6ad214b..0000000
--- a/drivers/net/mlx5/mlx5_prm.h
+++ /dev/null
@@ -1,1883 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2016 6WIND S.A.
- * Copyright 2016 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_PRM_H_
-#define RTE_PMD_MLX5_PRM_H_
-
-#include <assert.h>
-
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_vect.h>
-#include "mlx5_autoconf.h"
-
-/* RSS hash key size. */
-#define MLX5_RSS_HASH_KEY_LEN 40
-
-/* Get CQE owner bit. */
-#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
-
-/* Get CQE format. */
-#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
-
-/* Get CQE opcode. */
-#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
-
-/* Get CQE solicited event. */
-#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
-
-/* Invalidate a CQE. */
-#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
-
-/* WQE Segment sizes in bytes. */
-#define MLX5_WSEG_SIZE 16u
-#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
-#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
-#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
-
-/* WQE/WQEBB size in bytes. */
-#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
-
-/*
- * Max size of a WQE session.
- * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
- * the WQE size field in Control Segment is 6 bits wide.
- */
-#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
-
-/*
- * Default minimum number of Tx queues for inlining packets.
- * If there are less queues as specified we assume we have
- * no enough CPU resources (cycles) to perform inlining,
- * the PCIe throughput is not supposed as bottleneck and
- * inlining is disabled.
- */
-#define MLX5_INLINE_MAX_TXQS 8u
-#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
-
-/*
- * Default packet length threshold to be inlined with
- * enhanced MPW. If packet length exceeds the threshold
- * the data are not inlined. Should be aligned in WQEBB
- * boundary with accounting the title Control and Ethernet
- * segments.
- */
-#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Maximal inline data length sent with enhanced MPW.
- * Is based on maximal WQE size.
- */
-#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Minimal amount of packets to be sent with EMPW.
- * This limits the minimal required size of sent EMPW.
- * If there are no enough resources to built minimal
- * EMPW the sending loop exits.
- */
-#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
-/*
- * Maximal amount of packets to be sent with EMPW.
- * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
- * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
- * without CQE generation request, being multiplied by
- * MLX5_TX_COMP_MAX_CQE it may cause significant latency
- * in tx burst routine at the moment of freeing multiple mbufs.
- */
-#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
-#define MLX5_MPW_MAX_PACKETS 6
-#define MLX5_MPW_INLINE_MAX_PACKETS 2
-
-/*
- * Default packet length threshold to be inlined with
- * ordinary SEND. Inlining saves the MR key search
- * and extra PCIe data fetch transaction, but eats the
- * CPU cycles.
- */
-#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE)
-/*
- * Maximal inline data length sent with ordinary SEND.
- * Is based on maximal WQE size.
- */
-#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE)
-
-/* Missed in mlv5dv.h, should define here. */
-#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
-
-/* CQE value to inform that VLAN is stripped. */
-#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
-
-/* IPv4 options. */
-#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
-
-/* IPv6 packet. */
-#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
-
-/* IPv4 packet. */
-#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
-
-/* TCP packet. */
-#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
-
-/* UDP packet. */
-#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
-
-/* IP is fragmented. */
-#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
-
-/* L2 header is valid. */
-#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
-
-/* L3 header is valid. */
-#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
-
-/* L4 header is valid. */
-#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
-
-/* Outer packet, 0 IPv4, 1 IPv6. */
-#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
-
-/* Tunnel packet bit in the CQE. */
-#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
-
-/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
-#define MLX5_CQE_LRO_PUSH_MASK 0x40
-
-/* Mask for L4 type in the CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_MASK 0x70
-
-/* The bit index of L4 type in CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_SHIFT 0x4
-
-/* L4 type to indicate TCP packet without acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
-
-/* L4 type to indicate TCP packet with acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
-
-/* Inner L3 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
-
-/* Inner L4 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
-
-/* Outer L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
-
-/* Outer L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
-
-/* Outer L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
-
-/* Outer L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
-
-/* Inner L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
-
-/* Inner L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
-
-/* Inner L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
-
-/* Inner L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
-
-/* VLAN insertion flag. */
-#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
-
-/* Data inline segment flag. */
-#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
-
-/* Is flow mark valid. */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
-#else
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
-#endif
-
-/* INVALID is used by packets matching no flow rules. */
-#define MLX5_FLOW_MARK_INVALID 0
-
-/* Maximum allowed value to mark a packet. */
-#define MLX5_FLOW_MARK_MAX 0xfffff0
-
-/* Default mark value used when none is provided. */
-#define MLX5_FLOW_MARK_DEFAULT 0xffffff
-
-/* Default mark mask for metadata legacy mode. */
-#define MLX5_FLOW_MARK_MASK 0xffffff
-
-/* Maximum number of DS in WQE. Limited by 6-bit field. */
-#define MLX5_DSEG_MAX 63
-
-/* The completion mode offset in the WQE control segment line 2. */
-#define MLX5_COMP_MODE_OFFSET 2
-
-/* Amount of data bytes in minimal inline data segment. */
-#define MLX5_DSEG_MIN_INLINE_SIZE 12u
-
-/* Amount of data bytes in minimal inline eth segment. */
-#define MLX5_ESEG_MIN_INLINE_SIZE 18u
-
-/* Amount of data bytes after eth data segment. */
-#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
-
-/* The maximum log value of segments per RQ WQE. */
-#define MLX5_MAX_LOG_RQ_SEGS 5u
-
-/* The alignment needed for WQ buffer. */
-#define MLX5_WQE_BUF_ALIGNMENT 512
-
-/* Completion mode. */
-enum mlx5_completion_mode {
-	MLX5_COMP_ONLY_ERR = 0x0,
-	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
-	MLX5_COMP_ALWAYS = 0x2,
-	MLX5_COMP_CQE_AND_EQE = 0x3,
-};
-
-/* MPW mode. */
-enum mlx5_mpw_mode {
-	MLX5_MPW_DISABLED,
-	MLX5_MPW,
-	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
-};
-
-/* WQE Control segment. */
-struct mlx5_wqe_cseg {
-	uint32_t opcode;
-	uint32_t sq_ds;
-	uint32_t flags;
-	uint32_t misc;
-} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
-
-/* Header of data segment. Minimal size Data Segment */
-struct mlx5_wqe_dseg {
-	uint32_t bcount;
-	union {
-		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
-		struct {
-			uint32_t lkey;
-			uint64_t pbuf;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* Subset of struct WQE Ethernet Segment. */
-struct mlx5_wqe_eseg {
-	union {
-		struct {
-			uint32_t swp_offs;
-			uint8_t	cs_flags;
-			uint8_t	swp_flags;
-			uint16_t mss;
-			uint32_t metadata;
-			uint16_t inline_hdr_sz;
-			union {
-				uint16_t inline_data;
-				uint16_t vlan_tag;
-			};
-		} __rte_packed;
-		struct {
-			uint32_t offsets;
-			uint32_t flags;
-			uint32_t flow_metadata;
-			uint32_t inline_hdr;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* The title WQEBB, header of WQE. */
-struct mlx5_wqe {
-	union {
-		struct mlx5_wqe_cseg cseg;
-		uint32_t ctrl[4];
-	};
-	struct mlx5_wqe_eseg eseg;
-	union {
-		struct mlx5_wqe_dseg dseg[2];
-		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
-	};
-} __rte_packed;
-
-/* WQE for Multi-Packet RQ. */
-struct mlx5_wqe_mprq {
-	struct mlx5_wqe_srq_next_seg next_seg;
-	struct mlx5_wqe_data_seg dseg;
-};
-
-#define MLX5_MPRQ_LEN_MASK 0x000ffff
-#define MLX5_MPRQ_LEN_SHIFT 0
-#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
-#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
-#define MLX5_MPRQ_FILLER_MASK 0x80000000
-#define MLX5_MPRQ_FILLER_SHIFT 31
-
-#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
-
-/* CQ element structure - should be equal to the cache line size */
-struct mlx5_cqe {
-#if (RTE_CACHE_LINE_SIZE == 128)
-	uint8_t padding[64];
-#endif
-	uint8_t pkt_info;
-	uint8_t rsvd0;
-	uint16_t wqe_id;
-	uint8_t lro_tcppsh_abort_dupack;
-	uint8_t lro_min_ttl;
-	uint16_t lro_tcp_win;
-	uint32_t lro_ack_seq_num;
-	uint32_t rx_hash_res;
-	uint8_t rx_hash_type;
-	uint8_t rsvd1[3];
-	uint16_t csum;
-	uint8_t rsvd2[6];
-	uint16_t hdr_type_etc;
-	uint16_t vlan_info;
-	uint8_t lro_num_seg;
-	uint8_t rsvd3[3];
-	uint32_t flow_table_metadata;
-	uint8_t rsvd4[4];
-	uint32_t byte_cnt;
-	uint64_t timestamp;
-	uint32_t sop_drop_qpn;
-	uint16_t wqe_counter;
-	uint8_t rsvd5;
-	uint8_t op_own;
-};
-
-/* Adding direct verbs to data-path. */
-
-/* CQ sequence number mask. */
-#define MLX5_CQ_SQN_MASK 0x3
-
-/* CQ sequence number index. */
-#define MLX5_CQ_SQN_OFFSET 28
-
-/* CQ doorbell index mask. */
-#define MLX5_CI_MASK 0xffffff
-
-/* CQ doorbell offset. */
-#define MLX5_CQ_ARM_DB 1
-
-/* CQ doorbell offset*/
-#define MLX5_CQ_DOORBELL 0x20
-
-/* CQE format value. */
-#define MLX5_COMPRESSED 0x3
-
-/* Action type of header modification. */
-enum {
-	MLX5_MODIFICATION_TYPE_SET = 0x1,
-	MLX5_MODIFICATION_TYPE_ADD = 0x2,
-	MLX5_MODIFICATION_TYPE_COPY = 0x3,
-};
-
-/* The field of packet to be modified. */
-enum mlx5_modification_field {
-	MLX5_MODI_OUT_NONE = -1,
-	MLX5_MODI_OUT_SMAC_47_16 = 1,
-	MLX5_MODI_OUT_SMAC_15_0,
-	MLX5_MODI_OUT_ETHERTYPE,
-	MLX5_MODI_OUT_DMAC_47_16,
-	MLX5_MODI_OUT_DMAC_15_0,
-	MLX5_MODI_OUT_IP_DSCP,
-	MLX5_MODI_OUT_TCP_FLAGS,
-	MLX5_MODI_OUT_TCP_SPORT,
-	MLX5_MODI_OUT_TCP_DPORT,
-	MLX5_MODI_OUT_IPV4_TTL,
-	MLX5_MODI_OUT_UDP_SPORT,
-	MLX5_MODI_OUT_UDP_DPORT,
-	MLX5_MODI_OUT_SIPV6_127_96,
-	MLX5_MODI_OUT_SIPV6_95_64,
-	MLX5_MODI_OUT_SIPV6_63_32,
-	MLX5_MODI_OUT_SIPV6_31_0,
-	MLX5_MODI_OUT_DIPV6_127_96,
-	MLX5_MODI_OUT_DIPV6_95_64,
-	MLX5_MODI_OUT_DIPV6_63_32,
-	MLX5_MODI_OUT_DIPV6_31_0,
-	MLX5_MODI_OUT_SIPV4,
-	MLX5_MODI_OUT_DIPV4,
-	MLX5_MODI_OUT_FIRST_VID,
-	MLX5_MODI_IN_SMAC_47_16 = 0x31,
-	MLX5_MODI_IN_SMAC_15_0,
-	MLX5_MODI_IN_ETHERTYPE,
-	MLX5_MODI_IN_DMAC_47_16,
-	MLX5_MODI_IN_DMAC_15_0,
-	MLX5_MODI_IN_IP_DSCP,
-	MLX5_MODI_IN_TCP_FLAGS,
-	MLX5_MODI_IN_TCP_SPORT,
-	MLX5_MODI_IN_TCP_DPORT,
-	MLX5_MODI_IN_IPV4_TTL,
-	MLX5_MODI_IN_UDP_SPORT,
-	MLX5_MODI_IN_UDP_DPORT,
-	MLX5_MODI_IN_SIPV6_127_96,
-	MLX5_MODI_IN_SIPV6_95_64,
-	MLX5_MODI_IN_SIPV6_63_32,
-	MLX5_MODI_IN_SIPV6_31_0,
-	MLX5_MODI_IN_DIPV6_127_96,
-	MLX5_MODI_IN_DIPV6_95_64,
-	MLX5_MODI_IN_DIPV6_63_32,
-	MLX5_MODI_IN_DIPV6_31_0,
-	MLX5_MODI_IN_SIPV4,
-	MLX5_MODI_IN_DIPV4,
-	MLX5_MODI_OUT_IPV6_HOPLIMIT,
-	MLX5_MODI_IN_IPV6_HOPLIMIT,
-	MLX5_MODI_META_DATA_REG_A,
-	MLX5_MODI_META_DATA_REG_B = 0x50,
-	MLX5_MODI_META_REG_C_0,
-	MLX5_MODI_META_REG_C_1,
-	MLX5_MODI_META_REG_C_2,
-	MLX5_MODI_META_REG_C_3,
-	MLX5_MODI_META_REG_C_4,
-	MLX5_MODI_META_REG_C_5,
-	MLX5_MODI_META_REG_C_6,
-	MLX5_MODI_META_REG_C_7,
-	MLX5_MODI_OUT_TCP_SEQ_NUM,
-	MLX5_MODI_IN_TCP_SEQ_NUM,
-	MLX5_MODI_OUT_TCP_ACK_NUM,
-	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
-};
-
-/* Total number of metadata reg_c's. */
-#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
-
-enum modify_reg {
-	REG_NONE = 0,
-	REG_A,
-	REG_B,
-	REG_C_0,
-	REG_C_1,
-	REG_C_2,
-	REG_C_3,
-	REG_C_4,
-	REG_C_5,
-	REG_C_6,
-	REG_C_7,
-};
-
-/* Modification sub command. */
-struct mlx5_modification_cmd {
-	union {
-		uint32_t data0;
-		struct {
-			unsigned int length:5;
-			unsigned int rsvd0:3;
-			unsigned int offset:5;
-			unsigned int rsvd1:3;
-			unsigned int field:12;
-			unsigned int action_type:4;
-		};
-	};
-	union {
-		uint32_t data1;
-		uint8_t data[4];
-		struct {
-			unsigned int rsvd2:8;
-			unsigned int dst_offset:5;
-			unsigned int rsvd3:3;
-			unsigned int dst_field:12;
-			unsigned int rsvd4:4;
-		};
-	};
-};
-
-typedef uint32_t u32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-
-#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
-#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
-#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
-				  (&(__mlx5_nullp(typ)->fld)))
-#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0x1f))
-#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
-#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
-#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
-				  __mlx5_dw_bit_off(typ, fld))
-#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
-#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0xf))
-#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
-#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
-#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
-#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
-
-/* insert a value to a struct */
-#define MLX5_SET(typ, p, fld, v) \
-	do { \
-		u32 _v = v; \
-		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
-		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
-				  __mlx5_dw_off(typ, fld))) & \
-				  (~__mlx5_dw_mask(typ, fld))) | \
-				 (((_v) & __mlx5_mask(typ, fld)) << \
-				   __mlx5_dw_bit_off(typ, fld))); \
-	} while (0)
-
-#define MLX5_SET64(typ, p, fld, v) \
-	do { \
-		assert(__mlx5_bit_sz(typ, fld) == 64); \
-		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
-			rte_cpu_to_be_64(v); \
-	} while (0)
-
-#define MLX5_GET(typ, p, fld) \
-	((rte_be_to_cpu_32(*((__be32 *)(p) +\
-	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
-	__mlx5_mask(typ, fld))
-#define MLX5_GET16(typ, p, fld) \
-	((rte_be_to_cpu_16(*((__be16 *)(p) + \
-	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
-	 __mlx5_mask16(typ, fld))
-#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
-						   __mlx5_64_off(typ, fld)))
-#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
-
-struct mlx5_ifc_fte_match_set_misc_bits {
-	u8 gre_c_present[0x1];
-	u8 reserved_at_1[0x1];
-	u8 gre_k_present[0x1];
-	u8 gre_s_present[0x1];
-	u8 source_vhci_port[0x4];
-	u8 source_sqn[0x18];
-	u8 reserved_at_20[0x10];
-	u8 source_port[0x10];
-	u8 outer_second_prio[0x3];
-	u8 outer_second_cfi[0x1];
-	u8 outer_second_vid[0xc];
-	u8 inner_second_prio[0x3];
-	u8 inner_second_cfi[0x1];
-	u8 inner_second_vid[0xc];
-	u8 outer_second_cvlan_tag[0x1];
-	u8 inner_second_cvlan_tag[0x1];
-	u8 outer_second_svlan_tag[0x1];
-	u8 inner_second_svlan_tag[0x1];
-	u8 reserved_at_64[0xc];
-	u8 gre_protocol[0x10];
-	u8 gre_key_h[0x18];
-	u8 gre_key_l[0x8];
-	u8 vxlan_vni[0x18];
-	u8 reserved_at_b8[0x8];
-	u8 geneve_vni[0x18];
-	u8 reserved_at_e4[0x7];
-	u8 geneve_oam[0x1];
-	u8 reserved_at_e0[0xc];
-	u8 outer_ipv6_flow_label[0x14];
-	u8 reserved_at_100[0xc];
-	u8 inner_ipv6_flow_label[0x14];
-	u8 reserved_at_120[0xa];
-	u8 geneve_opt_len[0x6];
-	u8 geneve_protocol_type[0x10];
-	u8 reserved_at_140[0xc0];
-};
-
-struct mlx5_ifc_ipv4_layout_bits {
-	u8 reserved_at_0[0x60];
-	u8 ipv4[0x20];
-};
-
-struct mlx5_ifc_ipv6_layout_bits {
-	u8 ipv6[16][0x8];
-};
-
-union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
-	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
-	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
-	u8 reserved_at_0[0x80];
-};
-
-struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
-	u8 smac_47_16[0x20];
-	u8 smac_15_0[0x10];
-	u8 ethertype[0x10];
-	u8 dmac_47_16[0x20];
-	u8 dmac_15_0[0x10];
-	u8 first_prio[0x3];
-	u8 first_cfi[0x1];
-	u8 first_vid[0xc];
-	u8 ip_protocol[0x8];
-	u8 ip_dscp[0x6];
-	u8 ip_ecn[0x2];
-	u8 cvlan_tag[0x1];
-	u8 svlan_tag[0x1];
-	u8 frag[0x1];
-	u8 ip_version[0x4];
-	u8 tcp_flags[0x9];
-	u8 tcp_sport[0x10];
-	u8 tcp_dport[0x10];
-	u8 reserved_at_c0[0x20];
-	u8 udp_sport[0x10];
-	u8 udp_dport[0x10];
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
-};
-
-struct mlx5_ifc_fte_match_mpls_bits {
-	u8 mpls_label[0x14];
-	u8 mpls_exp[0x3];
-	u8 mpls_s_bos[0x1];
-	u8 mpls_ttl[0x8];
-};
-
-struct mlx5_ifc_fte_match_set_misc2_bits {
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
-	u8 metadata_reg_c_7[0x20];
-	u8 metadata_reg_c_6[0x20];
-	u8 metadata_reg_c_5[0x20];
-	u8 metadata_reg_c_4[0x20];
-	u8 metadata_reg_c_3[0x20];
-	u8 metadata_reg_c_2[0x20];
-	u8 metadata_reg_c_1[0x20];
-	u8 metadata_reg_c_0[0x20];
-	u8 metadata_reg_a[0x20];
-	u8 metadata_reg_b[0x20];
-	u8 reserved_at_1c0[0x40];
-};
-
-struct mlx5_ifc_fte_match_set_misc3_bits {
-	u8 inner_tcp_seq_num[0x20];
-	u8 outer_tcp_seq_num[0x20];
-	u8 inner_tcp_ack_num[0x20];
-	u8 outer_tcp_ack_num[0x20];
-	u8 reserved_at_auto1[0x8];
-	u8 outer_vxlan_gpe_vni[0x18];
-	u8 outer_vxlan_gpe_next_protocol[0x8];
-	u8 outer_vxlan_gpe_flags[0x8];
-	u8 reserved_at_a8[0x10];
-	u8 icmp_header_data[0x20];
-	u8 icmpv6_header_data[0x20];
-	u8 icmp_type[0x8];
-	u8 icmp_code[0x8];
-	u8 icmpv6_type[0x8];
-	u8 icmpv6_code[0x8];
-	u8 reserved_at_120[0x20];
-	u8 gtpu_teid[0x20];
-	u8 gtpu_msg_type[0x08];
-	u8 gtpu_msg_flags[0x08];
-	u8 reserved_at_170[0x90];
-};
-
-/* Flow matcher. */
-struct mlx5_ifc_fte_match_param_bits {
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
-	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
-	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
-	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
-};
-
-enum {
-	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
-};
-
-enum {
-	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
-	MLX5_CMD_OP_CREATE_MKEY = 0x200,
-	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
-	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
-	MLX5_CMD_OP_CREATE_TIR = 0x900,
-	MLX5_CMD_OP_CREATE_SQ = 0X904,
-	MLX5_CMD_OP_MODIFY_SQ = 0X905,
-	MLX5_CMD_OP_CREATE_RQ = 0x908,
-	MLX5_CMD_OP_MODIFY_RQ = 0x909,
-	MLX5_CMD_OP_CREATE_TIS = 0x912,
-	MLX5_CMD_OP_QUERY_TIS = 0x915,
-	MLX5_CMD_OP_CREATE_RQT = 0x916,
-	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
-	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
-};
-
-enum {
-	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
-};
-
-/* Flow counters. */
-struct mlx5_ifc_alloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_40[0x18];
-	u8         flow_counter_bulk[0x8];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_traffic_counter_bits {
-	u8         packets[0x40];
-	u8         octets[0x40];
-};
-
-struct mlx5_ifc_query_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
-};
-
-struct mlx5_ifc_query_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         reserved_at_40[0x20];
-	u8         mkey[0x20];
-	u8         address[0x40];
-	u8         clear[0x1];
-	u8         dump_to_memory[0x1];
-	u8         num_of_counters[0x1e];
-	u8         flow_counter_id[0x20];
-};
-
-struct mlx5_ifc_mkc_bits {
-	u8         reserved_at_0[0x1];
-	u8         free[0x1];
-	u8         reserved_at_2[0x1];
-	u8         access_mode_4_2[0x3];
-	u8         reserved_at_6[0x7];
-	u8         relaxed_ordering_write[0x1];
-	u8         reserved_at_e[0x1];
-	u8         small_fence_on_rdma_read_response[0x1];
-	u8         umr_en[0x1];
-	u8         a[0x1];
-	u8         rw[0x1];
-	u8         rr[0x1];
-	u8         lw[0x1];
-	u8         lr[0x1];
-	u8         access_mode_1_0[0x2];
-	u8         reserved_at_18[0x8];
-
-	u8         qpn[0x18];
-	u8         mkey_7_0[0x8];
-
-	u8         reserved_at_40[0x20];
-
-	u8         length64[0x1];
-	u8         bsf_en[0x1];
-	u8         sync_umr[0x1];
-	u8         reserved_at_63[0x2];
-	u8         expected_sigerr_count[0x1];
-	u8         reserved_at_66[0x1];
-	u8         en_rinval[0x1];
-	u8         pd[0x18];
-
-	u8         start_addr[0x40];
-
-	u8         len[0x40];
-
-	u8         bsf_octword_size[0x20];
-
-	u8         reserved_at_120[0x80];
-
-	u8         translations_octword_size[0x20];
-
-	u8         reserved_at_1c0[0x1b];
-	u8         log_page_size[0x5];
-
-	u8         reserved_at_1e0[0x20];
-};
-
-struct mlx5_ifc_create_mkey_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-
-	u8         syndrome[0x20];
-
-	u8         reserved_at_40[0x8];
-	u8         mkey_index[0x18];
-
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_mkey_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-
-	u8         reserved_at_40[0x20];
-
-	u8         pg_access[0x1];
-	u8         reserved_at_61[0x1f];
-
-	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
-
-	u8         reserved_at_280[0x80];
-
-	u8         translations_octword_actual_size[0x20];
-
-	u8         mkey_umem_id[0x20];
-
-	u8         mkey_umem_offset[0x40];
-
-	u8         reserved_at_380[0x500];
-
-	u8         klm_pas_mtt[][0x20];
-};
-
-enum {
-	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
-};
-
-enum {
-	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
-	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
-};
-
-enum {
-	MLX5_CAP_INLINE_MODE_L2,
-	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
-	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
-};
-
-enum {
-	MLX5_INLINE_MODE_NONE,
-	MLX5_INLINE_MODE_L2,
-	MLX5_INLINE_MODE_IP,
-	MLX5_INLINE_MODE_TCP_UDP,
-	MLX5_INLINE_MODE_RESERVED4,
-	MLX5_INLINE_MODE_INNER_L2,
-	MLX5_INLINE_MODE_INNER_IP,
-	MLX5_INLINE_MODE_INNER_TCP_UDP,
-};
-
-/* HCA bit masks indicating which Flex parser protocols are already enabled. */
-#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
-#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
-#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
-#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
-#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
-#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
-#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
-#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
-
-struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x30];
-	u8 vhca_id[0x10];
-	u8 reserved_at_40[0x40];
-	u8 log_max_srq_sz[0x8];
-	u8 log_max_qp_sz[0x8];
-	u8 reserved_at_90[0xb];
-	u8 log_max_qp[0x5];
-	u8 reserved_at_a0[0xb];
-	u8 log_max_srq[0x5];
-	u8 reserved_at_b0[0x10];
-	u8 reserved_at_c0[0x8];
-	u8 log_max_cq_sz[0x8];
-	u8 reserved_at_d0[0xb];
-	u8 log_max_cq[0x5];
-	u8 log_max_eq_sz[0x8];
-	u8 reserved_at_e8[0x2];
-	u8 log_max_mkey[0x6];
-	u8 reserved_at_f0[0x8];
-	u8 dump_fill_mkey[0x1];
-	u8 reserved_at_f9[0x3];
-	u8 log_max_eq[0x4];
-	u8 max_indirection[0x8];
-	u8 fixed_buffer_size[0x1];
-	u8 log_max_mrw_sz[0x7];
-	u8 force_teardown[0x1];
-	u8 reserved_at_111[0x1];
-	u8 log_max_bsf_list_size[0x6];
-	u8 umr_extended_translation_offset[0x1];
-	u8 null_mkey[0x1];
-	u8 log_max_klm_list_size[0x6];
-	u8 reserved_at_120[0xa];
-	u8 log_max_ra_req_dc[0x6];
-	u8 reserved_at_130[0xa];
-	u8 log_max_ra_res_dc[0x6];
-	u8 reserved_at_140[0xa];
-	u8 log_max_ra_req_qp[0x6];
-	u8 reserved_at_150[0xa];
-	u8 log_max_ra_res_qp[0x6];
-	u8 end_pad[0x1];
-	u8 cc_query_allowed[0x1];
-	u8 cc_modify_allowed[0x1];
-	u8 start_pad[0x1];
-	u8 cache_line_128byte[0x1];
-	u8 reserved_at_165[0xa];
-	u8 qcam_reg[0x1];
-	u8 gid_table_size[0x10];
-	u8 out_of_seq_cnt[0x1];
-	u8 vport_counters[0x1];
-	u8 retransmission_q_counters[0x1];
-	u8 debug[0x1];
-	u8 modify_rq_counter_set_id[0x1];
-	u8 rq_delay_drop[0x1];
-	u8 max_qp_cnt[0xa];
-	u8 pkey_table_size[0x10];
-	u8 vport_group_manager[0x1];
-	u8 vhca_group_manager[0x1];
-	u8 ib_virt[0x1];
-	u8 eth_virt[0x1];
-	u8 vnic_env_queue_counters[0x1];
-	u8 ets[0x1];
-	u8 nic_flow_table[0x1];
-	u8 eswitch_manager[0x1];
-	u8 device_memory[0x1];
-	u8 mcam_reg[0x1];
-	u8 pcam_reg[0x1];
-	u8 local_ca_ack_delay[0x5];
-	u8 port_module_event[0x1];
-	u8 enhanced_error_q_counters[0x1];
-	u8 ports_check[0x1];
-	u8 reserved_at_1b3[0x1];
-	u8 disable_link_up[0x1];
-	u8 beacon_led[0x1];
-	u8 port_type[0x2];
-	u8 num_ports[0x8];
-	u8 reserved_at_1c0[0x1];
-	u8 pps[0x1];
-	u8 pps_modify[0x1];
-	u8 log_max_msg[0x5];
-	u8 reserved_at_1c8[0x4];
-	u8 max_tc[0x4];
-	u8 temp_warn_event[0x1];
-	u8 dcbx[0x1];
-	u8 general_notification_event[0x1];
-	u8 reserved_at_1d3[0x2];
-	u8 fpga[0x1];
-	u8 rol_s[0x1];
-	u8 rol_g[0x1];
-	u8 reserved_at_1d8[0x1];
-	u8 wol_s[0x1];
-	u8 wol_g[0x1];
-	u8 wol_a[0x1];
-	u8 wol_b[0x1];
-	u8 wol_m[0x1];
-	u8 wol_u[0x1];
-	u8 wol_p[0x1];
-	u8 stat_rate_support[0x10];
-	u8 reserved_at_1f0[0xc];
-	u8 cqe_version[0x4];
-	u8 compact_address_vector[0x1];
-	u8 striding_rq[0x1];
-	u8 reserved_at_202[0x1];
-	u8 ipoib_enhanced_offloads[0x1];
-	u8 ipoib_basic_offloads[0x1];
-	u8 reserved_at_205[0x1];
-	u8 repeated_block_disabled[0x1];
-	u8 umr_modify_entity_size_disabled[0x1];
-	u8 umr_modify_atomic_disabled[0x1];
-	u8 umr_indirect_mkey_disabled[0x1];
-	u8 umr_fence[0x2];
-	u8 reserved_at_20c[0x3];
-	u8 drain_sigerr[0x1];
-	u8 cmdif_checksum[0x2];
-	u8 sigerr_cqe[0x1];
-	u8 reserved_at_213[0x1];
-	u8 wq_signature[0x1];
-	u8 sctr_data_cqe[0x1];
-	u8 reserved_at_216[0x1];
-	u8 sho[0x1];
-	u8 tph[0x1];
-	u8 rf[0x1];
-	u8 dct[0x1];
-	u8 qos[0x1];
-	u8 eth_net_offloads[0x1];
-	u8 roce[0x1];
-	u8 atomic[0x1];
-	u8 reserved_at_21f[0x1];
-	u8 cq_oi[0x1];
-	u8 cq_resize[0x1];
-	u8 cq_moderation[0x1];
-	u8 reserved_at_223[0x3];
-	u8 cq_eq_remap[0x1];
-	u8 pg[0x1];
-	u8 block_lb_mc[0x1];
-	u8 reserved_at_229[0x1];
-	u8 scqe_break_moderation[0x1];
-	u8 cq_period_start_from_cqe[0x1];
-	u8 cd[0x1];
-	u8 reserved_at_22d[0x1];
-	u8 apm[0x1];
-	u8 vector_calc[0x1];
-	u8 umr_ptr_rlky[0x1];
-	u8 imaicl[0x1];
-	u8 reserved_at_232[0x4];
-	u8 qkv[0x1];
-	u8 pkv[0x1];
-	u8 set_deth_sqpn[0x1];
-	u8 reserved_at_239[0x3];
-	u8 xrc[0x1];
-	u8 ud[0x1];
-	u8 uc[0x1];
-	u8 rc[0x1];
-	u8 uar_4k[0x1];
-	u8 reserved_at_241[0x9];
-	u8 uar_sz[0x6];
-	u8 reserved_at_250[0x8];
-	u8 log_pg_sz[0x8];
-	u8 bf[0x1];
-	u8 driver_version[0x1];
-	u8 pad_tx_eth_packet[0x1];
-	u8 reserved_at_263[0x8];
-	u8 log_bf_reg_size[0x5];
-	u8 reserved_at_270[0xb];
-	u8 lag_master[0x1];
-	u8 num_lag_ports[0x4];
-	u8 reserved_at_280[0x10];
-	u8 max_wqe_sz_sq[0x10];
-	u8 reserved_at_2a0[0x10];
-	u8 max_wqe_sz_rq[0x10];
-	u8 max_flow_counter_31_16[0x10];
-	u8 max_wqe_sz_sq_dc[0x10];
-	u8 reserved_at_2e0[0x7];
-	u8 max_qp_mcg[0x19];
-	u8 reserved_at_300[0x10];
-	u8 flow_counter_bulk_alloc[0x08];
-	u8 log_max_mcg[0x8];
-	u8 reserved_at_320[0x3];
-	u8 log_max_transport_domain[0x5];
-	u8 reserved_at_328[0x3];
-	u8 log_max_pd[0x5];
-	u8 reserved_at_330[0xb];
-	u8 log_max_xrcd[0x5];
-	u8 nic_receive_steering_discard[0x1];
-	u8 receive_discard_vport_down[0x1];
-	u8 transmit_discard_vport_down[0x1];
-	u8 reserved_at_343[0x5];
-	u8 log_max_flow_counter_bulk[0x8];
-	u8 max_flow_counter_15_0[0x10];
-	u8 modify_tis[0x1];
-	u8 flow_counters_dump[0x1];
-	u8 reserved_at_360[0x1];
-	u8 log_max_rq[0x5];
-	u8 reserved_at_368[0x3];
-	u8 log_max_sq[0x5];
-	u8 reserved_at_370[0x3];
-	u8 log_max_tir[0x5];
-	u8 reserved_at_378[0x3];
-	u8 log_max_tis[0x5];
-	u8 basic_cyclic_rcv_wqe[0x1];
-	u8 reserved_at_381[0x2];
-	u8 log_max_rmp[0x5];
-	u8 reserved_at_388[0x3];
-	u8 log_max_rqt[0x5];
-	u8 reserved_at_390[0x3];
-	u8 log_max_rqt_size[0x5];
-	u8 reserved_at_398[0x3];
-	u8 log_max_tis_per_sq[0x5];
-	u8 ext_stride_num_range[0x1];
-	u8 reserved_at_3a1[0x2];
-	u8 log_max_stride_sz_rq[0x5];
-	u8 reserved_at_3a8[0x3];
-	u8 log_min_stride_sz_rq[0x5];
-	u8 reserved_at_3b0[0x3];
-	u8 log_max_stride_sz_sq[0x5];
-	u8 reserved_at_3b8[0x3];
-	u8 log_min_stride_sz_sq[0x5];
-	u8 hairpin[0x1];
-	u8 reserved_at_3c1[0x2];
-	u8 log_max_hairpin_queues[0x5];
-	u8 reserved_at_3c8[0x3];
-	u8 log_max_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3d0[0x3];
-	u8 log_max_hairpin_num_packets[0x5];
-	u8 reserved_at_3d8[0x3];
-	u8 log_max_wq_sz[0x5];
-	u8 nic_vport_change_event[0x1];
-	u8 disable_local_lb_uc[0x1];
-	u8 disable_local_lb_mc[0x1];
-	u8 log_min_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3e8[0x3];
-	u8 log_max_vlan_list[0x5];
-	u8 reserved_at_3f0[0x3];
-	u8 log_max_current_mc_list[0x5];
-	u8 reserved_at_3f8[0x3];
-	u8 log_max_current_uc_list[0x5];
-	u8 general_obj_types[0x40];
-	u8 reserved_at_440[0x20];
-	u8 reserved_at_460[0x10];
-	u8 max_num_eqs[0x10];
-	u8 reserved_at_480[0x3];
-	u8 log_max_l2_table[0x5];
-	u8 reserved_at_488[0x8];
-	u8 log_uar_page_sz[0x10];
-	u8 reserved_at_4a0[0x20];
-	u8 device_frequency_mhz[0x20];
-	u8 device_frequency_khz[0x20];
-	u8 reserved_at_500[0x20];
-	u8 num_of_uars_per_page[0x20];
-	u8 flex_parser_protocols[0x20];
-	u8 reserved_at_560[0x20];
-	u8 reserved_at_580[0x3c];
-	u8 mini_cqe_resp_stride_index[0x1];
-	u8 cqe_128_always[0x1];
-	u8 cqe_compression_128[0x1];
-	u8 cqe_compression[0x1];
-	u8 cqe_compression_timeout[0x10];
-	u8 cqe_compression_max_num[0x10];
-	u8 reserved_at_5e0[0x10];
-	u8 tag_matching[0x1];
-	u8 rndv_offload_rc[0x1];
-	u8 rndv_offload_dc[0x1];
-	u8 log_tag_matching_list_sz[0x5];
-	u8 reserved_at_5f8[0x3];
-	u8 log_max_xrq[0x5];
-	u8 affiliate_nic_vport_criteria[0x8];
-	u8 native_port_num[0x8];
-	u8 num_vhca_ports[0x8];
-	u8 reserved_at_618[0x6];
-	u8 sw_owner_id[0x1];
-	u8 reserved_at_61f[0x1e1];
-};
-
-struct mlx5_ifc_qos_cap_bits {
-	u8 packet_pacing[0x1];
-	u8 esw_scheduling[0x1];
-	u8 esw_bw_share[0x1];
-	u8 esw_rate_limit[0x1];
-	u8 reserved_at_4[0x1];
-	u8 packet_pacing_burst_bound[0x1];
-	u8 packet_pacing_typical_size[0x1];
-	u8 flow_meter_srtcm[0x1];
-	u8 reserved_at_8[0x8];
-	u8 log_max_flow_meter[0x8];
-	u8 flow_meter_reg_id[0x8];
-	u8 reserved_at_25[0x20];
-	u8 packet_pacing_max_rate[0x20];
-	u8 packet_pacing_min_rate[0x20];
-	u8 reserved_at_80[0x10];
-	u8 packet_pacing_rate_table_size[0x10];
-	u8 esw_element_type[0x10];
-	u8 esw_tsar_type[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 max_qos_para_vport[0x10];
-	u8 max_tsar_bw_share[0x20];
-	u8 reserved_at_100[0x6e8];
-};
-
-struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
-	u8 csum_cap[0x1];
-	u8 vlan_cap[0x1];
-	u8 lro_cap[0x1];
-	u8 lro_psh_flag[0x1];
-	u8 lro_time_stamp[0x1];
-	u8 lro_max_msg_sz_mode[0x2];
-	u8 wqe_vlan_insert[0x1];
-	u8 self_lb_en_modifiable[0x1];
-	u8 self_lb_mc[0x1];
-	u8 self_lb_uc[0x1];
-	u8 max_lso_cap[0x5];
-	u8 multi_pkt_send_wqe[0x2];
-	u8 wqe_inline_mode[0x2];
-	u8 rss_ind_tbl_cap[0x4];
-	u8 reg_umr_sq[0x1];
-	u8 scatter_fcs[0x1];
-	u8 enhanced_multi_pkt_send_wqe[0x1];
-	u8 tunnel_lso_const_out_ip_id[0x1];
-	u8 tunnel_lro_gre[0x1];
-	u8 tunnel_lro_vxlan[0x1];
-	u8 tunnel_stateless_gre[0x1];
-	u8 tunnel_stateless_vxlan[0x1];
-	u8 swp[0x1];
-	u8 swp_csum[0x1];
-	u8 swp_lso[0x1];
-	u8 reserved_at_23[0x8];
-	u8 tunnel_stateless_gtp[0x1];
-	u8 reserved_at_25[0x4];
-	u8 max_vxlan_udp_ports[0x8];
-	u8 reserved_at_38[0x6];
-	u8 max_geneve_opt_len[0x1];
-	u8 tunnel_stateless_geneve_rx[0x1];
-	u8 reserved_at_40[0x10];
-	u8 lro_min_mss_size[0x10];
-	u8 reserved_at_60[0x120];
-	u8 lro_timer_supported_periods[4][0x20];
-	u8 reserved_at_200[0x600];
-};
-
-union mlx5_ifc_hca_cap_union_bits {
-	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
-	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
-	       per_protocol_networking_offload_caps;
-	struct mlx5_ifc_qos_cap_bits qos_cap;
-	u8 reserved_at_0[0x8000];
-};
-
-struct mlx5_ifc_query_hca_cap_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	union mlx5_ifc_hca_cap_union_bits capability;
-};
-
-struct mlx5_ifc_query_hca_cap_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_mac_address_layout_bits {
-	u8 reserved_at_0[0x10];
-	u8 mac_addr_47_32[0x10];
-	u8 mac_addr_31_0[0x20];
-};
-
-struct mlx5_ifc_nic_vport_context_bits {
-	u8 reserved_at_0[0x5];
-	u8 min_wqe_inline_mode[0x3];
-	u8 reserved_at_8[0x15];
-	u8 disable_mc_local_lb[0x1];
-	u8 disable_uc_local_lb[0x1];
-	u8 roce_en[0x1];
-	u8 arm_change_event[0x1];
-	u8 reserved_at_21[0x1a];
-	u8 event_on_mtu[0x1];
-	u8 event_on_promisc_change[0x1];
-	u8 event_on_vlan_change[0x1];
-	u8 event_on_mc_address_change[0x1];
-	u8 event_on_uc_address_change[0x1];
-	u8 reserved_at_40[0xc];
-	u8 affiliation_criteria[0x4];
-	u8 affiliated_vhca_id[0x10];
-	u8 reserved_at_60[0xd0];
-	u8 mtu[0x10];
-	u8 system_image_guid[0x40];
-	u8 port_guid[0x40];
-	u8 node_guid[0x40];
-	u8 reserved_at_200[0x140];
-	u8 qkey_violation_counter[0x10];
-	u8 reserved_at_350[0x430];
-	u8 promisc_uc[0x1];
-	u8 promisc_mc[0x1];
-	u8 promisc_all[0x1];
-	u8 reserved_at_783[0x2];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_788[0xc];
-	u8 allowed_list_size[0xc];
-	struct mlx5_ifc_mac_address_layout_bits permanent_address;
-	u8 reserved_at_7e0[0x20];
-};
-
-struct mlx5_ifc_query_nic_vport_context_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
-};
-
-struct mlx5_ifc_query_nic_vport_context_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 other_vport[0x1];
-	u8 reserved_at_41[0xf];
-	u8 vport_number[0x10];
-	u8 reserved_at_60[0x5];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_68[0x18];
-};
-
-struct mlx5_ifc_tisc_bits {
-	u8 strict_lag_tx_port_affinity[0x1];
-	u8 reserved_at_1[0x3];
-	u8 lag_tx_port_affinity[0x04];
-	u8 reserved_at_8[0x4];
-	u8 prio[0x4];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x100];
-	u8 reserved_at_120[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_140[0x8];
-	u8 underlay_qpn[0x18];
-	u8 reserved_at_160[0x3a0];
-};
-
-struct mlx5_ifc_query_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_tisc_bits tis_context;
-};
-
-struct mlx5_ifc_query_tis_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-enum {
-	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
-	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
-	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
-	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
-};
-
-enum {
-	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
-	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
-};
-
-struct mlx5_ifc_wq_bits {
-	u8 wq_type[0x4];
-	u8 wq_signature[0x1];
-	u8 end_padding_mode[0x2];
-	u8 cd_slave[0x1];
-	u8 reserved_at_8[0x18];
-	u8 hds_skip_first_sge[0x1];
-	u8 log2_hds_buf_size[0x3];
-	u8 reserved_at_24[0x7];
-	u8 page_offset[0x5];
-	u8 lwm[0x10];
-	u8 reserved_at_40[0x8];
-	u8 pd[0x18];
-	u8 reserved_at_60[0x8];
-	u8 uar_page[0x18];
-	u8 dbr_addr[0x40];
-	u8 hw_counter[0x20];
-	u8 sw_counter[0x20];
-	u8 reserved_at_100[0xc];
-	u8 log_wq_stride[0x4];
-	u8 reserved_at_110[0x3];
-	u8 log_wq_pg_sz[0x5];
-	u8 reserved_at_118[0x3];
-	u8 log_wq_sz[0x5];
-	u8 dbr_umem_valid[0x1];
-	u8 wq_umem_valid[0x1];
-	u8 reserved_at_122[0x1];
-	u8 log_hairpin_num_packets[0x5];
-	u8 reserved_at_128[0x3];
-	u8 log_hairpin_data_sz[0x5];
-	u8 reserved_at_130[0x4];
-	u8 single_wqe_log_num_of_strides[0x4];
-	u8 two_byte_shift_en[0x1];
-	u8 reserved_at_139[0x4];
-	u8 single_stride_log_num_of_bytes[0x3];
-	u8 dbr_umem_id[0x20];
-	u8 wq_umem_id[0x20];
-	u8 wq_umem_offset[0x40];
-	u8 reserved_at_1c0[0x440];
-};
-
-enum {
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
-};
-
-enum {
-	MLX5_RQC_STATE_RST  = 0x0,
-	MLX5_RQC_STATE_RDY  = 0x1,
-	MLX5_RQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_rqc_bits {
-	u8 rlky[0x1];
-	u8 delay_drop_en[0x1];
-	u8 scatter_fcs[0x1];
-	u8 vsd[0x1];
-	u8 mem_rq_type[0x4];
-	u8 state[0x4];
-	u8 reserved_at_c[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 counter_set_id[0x8];
-	u8 reserved_at_68[0x18];
-	u8 reserved_at_80[0x8];
-	u8 rmpn[0x18];
-	u8 reserved_at_a0[0x8];
-	u8 hairpin_peer_sq[0x18];
-	u8 reserved_at_c0[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_e0[0xa0];
-	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
-};
-
-struct mlx5_ifc_create_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-struct mlx5_ifc_modify_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_create_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tis_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tisc_bits ctx;
-};
-
-enum {
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
-};
-
-struct mlx5_ifc_modify_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 rq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-enum {
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
-};
-
-struct mlx5_ifc_rx_hash_field_select_bits {
-	u8 l3_prot_type[0x1];
-	u8 l4_prot_type[0x1];
-	u8 selected_fields[0x1e];
-};
-
-enum {
-	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
-	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
-};
-
-enum {
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
-};
-
-enum {
-	MLX5_RX_HASH_FN_NONE           = 0x0,
-	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
-	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
-};
-
-enum {
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
-};
-
-enum {
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
-};
-
-struct mlx5_ifc_tirc_bits {
-	u8 reserved_at_0[0x20];
-	u8 disp_type[0x4];
-	u8 reserved_at_24[0x1c];
-	u8 reserved_at_40[0x40];
-	u8 reserved_at_80[0x4];
-	u8 lro_timeout_period_usecs[0x10];
-	u8 lro_enable_mask[0x4];
-	u8 lro_max_msg_sz[0x8];
-	u8 reserved_at_a0[0x40];
-	u8 reserved_at_e0[0x8];
-	u8 inline_rqn[0x18];
-	u8 rx_hash_symmetric[0x1];
-	u8 reserved_at_101[0x1];
-	u8 tunneled_offload_en[0x1];
-	u8 reserved_at_103[0x5];
-	u8 indirect_table[0x18];
-	u8 rx_hash_fn[0x4];
-	u8 reserved_at_124[0x2];
-	u8 self_lb_block[0x2];
-	u8 transport_domain[0x18];
-	u8 rx_hash_toeplitz_key[10][0x20];
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
-	u8 reserved_at_2c0[0x4c0];
-};
-
-struct mlx5_ifc_create_tir_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tirn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tir_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tirc_bits ctx;
-};
-
-struct mlx5_ifc_rq_num_bits {
-	u8 reserved_at_0[0x8];
-	u8 rq_num[0x18];
-};
-
-struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
-	u8 rqt_max_size[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 rqt_actual_size[0x10];
-	u8 reserved_at_e0[0x6a0];
-	struct mlx5_ifc_rq_num_bits rq_num[];
-};
-
-struct mlx5_ifc_create_rqt_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqtn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-struct mlx5_ifc_create_rqt_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqtc_bits rqt_context;
-};
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-enum {
-	MLX5_SQC_STATE_RST  = 0x0,
-	MLX5_SQC_STATE_RDY  = 0x1,
-	MLX5_SQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_sqc_bits {
-	u8 rlky[0x1];
-	u8 cd_master[0x1];
-	u8 fre[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 allow_multi_pkt_send_wqe[0x1];
-	u8 min_wqe_inline_mode[0x3];
-	u8 state[0x4];
-	u8 reg_umr[0x1];
-	u8 allow_swp[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 reserved_at_60[0x8];
-	u8 hairpin_peer_rq[0x18];
-	u8 reserved_at_80[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_a0[0x50];
-	u8 packet_pacing_rate_limit_index[0x10];
-	u8 tis_lst_sz[0x10];
-	u8 reserved_at_110[0x10];
-	u8 reserved_at_120[0x40];
-	u8 reserved_at_160[0x8];
-	u8 tis_num_0[0x18];
-	struct mlx5_ifc_wq_bits wq;
-};
-
-struct mlx5_ifc_query_sq_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_modify_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_modify_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 sq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-struct mlx5_ifc_create_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-enum {
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
-};
-
-struct mlx5_ifc_flow_meter_parameters_bits {
-	u8         valid[0x1];			// 00h
-	u8         bucket_overflow[0x1];
-	u8         start_color[0x2];
-	u8         both_buckets_on_green[0x1];
-	u8         meter_mode[0x2];
-	u8         reserved_at_1[0x19];
-	u8         reserved_at_2[0x20]; //04h
-	u8         reserved_at_3[0x3];
-	u8         cbs_exponent[0x5];		// 08h
-	u8         cbs_mantissa[0x8];
-	u8         reserved_at_4[0x3];
-	u8         cir_exponent[0x5];
-	u8         cir_mantissa[0x8];
-	u8         reserved_at_5[0x20];		// 0Ch
-	u8         reserved_at_6[0x3];
-	u8         ebs_exponent[0x5];		// 10h
-	u8         ebs_mantissa[0x8];
-	u8         reserved_at_7[0x3];
-	u8         eir_exponent[0x5];
-	u8         eir_mantissa[0x8];
-	u8         reserved_at_8[0x60];		// 14h-1Ch
-};
-
-/* CQE format mask. */
-#define MLX5E_CQE_FORMAT_MASK 0xc
-
-/* MPW opcode. */
-#define MLX5_OPC_MOD_MPW 0x01
-
-/* Compressed Rx CQE structure. */
-struct mlx5_mini_cqe8 {
-	union {
-		uint32_t rx_hash_result;
-		struct {
-			uint16_t checksum;
-			uint16_t stride_idx;
-		};
-		struct {
-			uint16_t wqe_counter;
-			uint8_t  s_wqe_opcode;
-			uint8_t  reserved;
-		} s_wqe_info;
-	};
-	uint32_t byte_cnt;
-};
-
-/* srTCM PRM flow meter parameters. */
-enum {
-	MLX5_FLOW_COLOR_RED = 0,
-	MLX5_FLOW_COLOR_YELLOW,
-	MLX5_FLOW_COLOR_GREEN,
-	MLX5_FLOW_COLOR_UNDEFINED,
-};
-
-/* Maximum value of srTCM metering parameters. */
-#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
-#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
-#define MLX5_SRTCM_EBS_MAX 0
-
-/**
- * Convert a user mark to flow mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_set(uint32_t val)
-{
-	uint32_t ret;
-
-	/*
-	 * Add one to the user value to differentiate un-marked flows from
-	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
-	 * remains untouched.
-	 */
-	if (val != MLX5_FLOW_MARK_DEFAULT)
-		++val;
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	/*
-	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
-	 * word, byte-swapped by the kernel on little-endian systems. In this
-	 * case, left-shifting the resulting big-endian value ensures the
-	 * least significant 24 bits are retained when converting it back.
-	 */
-	ret = rte_cpu_to_be_32(val) >> 8;
-#else
-	ret = val;
-#endif
-	return ret;
-}
-
-/**
- * Convert a mark to user mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_get(uint32_t val)
-{
-	/*
-	 * Subtract one from the retrieved value. It was added by
-	 * mlx5_flow_mark_set() to distinguish unmarked flows.
-	 */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	return (val >> 8) - 1;
-#else
-	return val - 1;
-#endif
-}
-
-#endif /* RTE_PMD_MLX5_PRM_H_ */
 --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 1028264..345ce3a 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -22,8 +22,8 @@
 #include <rte_malloc.h>
 #include <rte_ethdev_driver.h>
 
-#include "mlx5.h"
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_rxtx.h"
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 89168cd..62fdbe6 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -30,14 +30,16 @@
 #include <rte_debug.h>
 #include <rte_io.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
+
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 2eede1b..a845f67 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -28,13 +28,14 @@
 #include <rte_cycles.h>
 #include <rte_flow.h>
 
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index b6a33c5..84b1fce 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -31,13 +31,14 @@
 #include <rte_bus_pci.h>
 #include <rte_malloc.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
-#include "mlx5_glue.h"
 
 /* Support tunnel matching. */
 #define MLX5_FLOW_TUNNEL 10
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d85f908..5505762 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -23,13 +23,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #if defined RTE_ARCH_X86_64
 #include "mlx5_rxtx_vec_sse.h"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h
index 85e0bd5..39aefc3 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
@@ -9,8 +9,9 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5_autoconf.h"
-#include "mlx5_prm.h"
 
 /* HW checksum offload capabilities of vectorized Tx. */
 #define MLX5_VEC_TX_CKSUM_OFFLOAD_CAP \
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 8e79883..cd1b65f 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -17,13 +17,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 86785c7..9fd8429 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #pragma GCC diagnostic ignored "-Wcast-qual"
 
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 35b7761..f281b9e 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 205e4fe..0ed7170 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,9 +13,9 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5adb4dc..1d2ba8a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -28,13 +28,14 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
-#include "mlx5_utils.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5_defs.h"
+#include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index ebf79b8..c868aee 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -13,8 +13,11 @@
 #include <assert.h>
 #include <errno.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 
+
 /*
  * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
  * Otherwise there would be a type conflict between stdbool and altivec.
@@ -50,81 +53,14 @@
 /* Save and restore errno around argument evaluation. */
 #define ERRNO_SAFE(x) ((errno = (int []){ errno, ((x), 0) }[0]))
 
-/*
- * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
- * manner.
- */
-#define PMD_DRV_LOG_STRIP(a, b) a
-#define PMD_DRV_LOG_OPAREN (
-#define PMD_DRV_LOG_CPAREN )
-#define PMD_DRV_LOG_COMMA ,
-
-/* Return the file name part of a path. */
-static inline const char *
-pmd_drv_log_basename(const char *s)
-{
-	const char *n = s;
-
-	while (*n)
-		if (*(n++) == '/')
-			s = n;
-	return s;
-}
-
 extern int mlx5_logtype;
 
-#define PMD_DRV_LOG___(level, ...) \
-	rte_log(RTE_LOG_ ## level, \
-		mlx5_logtype, \
-		RTE_FMT(MLX5_DRIVER_NAME ": " \
-			RTE_FMT_HEAD(__VA_ARGS__,), \
-		RTE_FMT_TAIL(__VA_ARGS__,)))
-
-/*
- * When debugging is enabled (NDEBUG not defined), file, line and function
- * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
- */
-#ifndef NDEBUG
-
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, "%s:%u: %s(): " __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, \
-		s "\n" PMD_DRV_LOG_COMMA \
-		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
-		__LINE__ PMD_DRV_LOG_COMMA \
-		__func__, \
-		__VA_ARGS__)
-
-#else /* NDEBUG */
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, s "\n", __VA_ARGS__)
-
-#endif /* NDEBUG */
-
 /* Generic printf()-like logging macro with automatic line feed. */
 #define DRV_LOG(level, ...) \
-	PMD_DRV_LOG_(level, \
+	PMD_DRV_LOG_(level, mlx5_logtype, MLX5_DRIVER_NAME, \
 		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
 		PMD_DRV_LOG_CPAREN)
 
-/* claim_zero() does not perform any check when debugging is disabled. */
-#ifndef NDEBUG
-
-#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
-#define claim_zero(...) assert((__VA_ARGS__) == 0)
-#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
-
-#else /* NDEBUG */
-
-#define DEBUG(...) (void)0
-#define claim_zero(...) (__VA_ARGS__)
-#define claim_nonzero(...) (__VA_ARGS__)
-
-#endif /* NDEBUG */
-
 #define INFO(...) DRV_LOG(INFO, __VA_ARGS__)
 #define WARN(...) DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) DRV_LOG(ERR, __VA_ARGS__)
@@ -144,13 +80,6 @@
 	 (((val) & (from)) / ((from) / (to))) : \
 	 (((val) & (from)) * ((to) / (from))))
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
-#define MKSTR(name, ...) \
-	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
-	char name[mkstr_size_##name + 1]; \
-	\
-	snprintf(name, sizeof(name), "" __VA_ARGS__)
-
 /**
  * Return logarithm of the nearest power of two above input value.
  *
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index feac0f1..b0fa31a 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -27,10 +27,11 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 1169dd8..d90f14d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,6 +196,7 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 03/38] mlx5: share the mlx5 glue reference
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 01/38] net/mlx5: separate DevX commands interface Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 02/38] mlx5: prepare common library Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 04/38] mlx5: share mlx5 PCI device detection Matan Azrad
                   ` (36 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
Both, the mlx5 PMD and the vdpa mlx5 PMD should initialize the glue.
The glue initialization should be only one per process, so all the mlx5
PMDs using the glue should share the same glue object.
Move the glue initialization to be in common/mlx5 library to be
initialized by its constractor only once.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.c | 173 +++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/Makefile         |   3 -
 drivers/net/mlx5/meson.build      |   4 -
 drivers/net/mlx5/mlx5.c           | 172 +------------------------------------
 4 files changed, 173 insertions(+), 179 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 14ebd30..9c88a63 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -2,16 +2,185 @@
  * Copyright 2019 Mellanox Technologies, Ltd
  */
 
+#include <dlfcn.h>
+#include <unistd.h>
+#include <string.h>
+
+#include <rte_errno.h>
+
 #include "mlx5_common.h"
+#include "mlx5_common_utils.h"
+#include "mlx5_glue.h"
 
 
 int mlx5_common_logtype;
 
 
-RTE_INIT(rte_mlx5_common_pmd_init)
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+
+/**
+ * Suffix RTE_EAL_PMD_PATH with "-glue".
+ *
+ * This function performs a sanity check on RTE_EAL_PMD_PATH before
+ * suffixing its last component.
+ *
+ * @param buf[out]
+ *   Output buffer, should be large enough otherwise NULL is returned.
+ * @param size
+ *   Size of @p out.
+ *
+ * @return
+ *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
+ */
+static char *
+mlx5_glue_path(char *buf, size_t size)
+{
+	static const char *const bad[] = { "/", ".", "..", NULL };
+	const char *path = RTE_EAL_PMD_PATH;
+	size_t len = strlen(path);
+	size_t off;
+	int i;
+
+	while (len && path[len - 1] == '/')
+		--len;
+	for (off = len; off && path[off - 1] != '/'; --off)
+		;
+	for (i = 0; bad[i]; ++i)
+		if (!strncmp(path + off, bad[i], (int)(len - off)))
+			goto error;
+	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
+	if (i == -1 || (size_t)i >= size)
+		goto error;
+	return buf;
+error:
+	RTE_LOG(ERR, PMD, "unable to append \"-glue\" to last component of"
+		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"), please"
+		" re-configure DPDK");
+	return NULL;
+}
+#endif
+
+/**
+ * Initialization routine for run-time dependency on rdma-core.
+ */
+RTE_INIT_PRIO(mlx5_glue_init, CLASS)
 {
-	/* Initialize driver log type. */
+	void *handle = NULL;
+
+	/* Initialize common log type. */
 	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
 	if (mlx5_common_logtype >= 0)
 		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+	/*
+	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+	 * huge pages. Calling ibv_fork_init() during init allows
+	 * applications to use fork() safely for purposes other than
+	 * using this PMD, which is not supported in forked processes.
+	 */
+	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+	/* Match the size of Rx completion entry to the size of a cacheline. */
+	if (RTE_CACHE_LINE_SIZE == 128)
+		setenv("MLX5_CQE_SIZE", "128", 0);
+	/*
+	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
+	 * cleanup all the Verbs resources even when the device was removed.
+	 */
+	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
+	/* The glue initialization was done earlier by mlx5 common library. */
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
+	const char *path[] = {
+		/*
+		 * A basic security check is necessary before trusting
+		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
+		 */
+		(geteuid() == getuid() && getegid() == getgid() ?
+		 getenv("MLX5_GLUE_PATH") : NULL),
+		/*
+		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
+		 * variant, otherwise let dlopen() look up libraries on its
+		 * own.
+		 */
+		(*RTE_EAL_PMD_PATH ?
+		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
+	};
+	unsigned int i = 0;
+	void **sym;
+	const char *dlmsg;
+
+	while (!handle && i != RTE_DIM(path)) {
+		const char *end;
+		size_t len;
+		int ret;
+
+		if (!path[i]) {
+			++i;
+			continue;
+		}
+		end = strpbrk(path[i], ":;");
+		if (!end)
+			end = path[i] + strlen(path[i]);
+		len = end - path[i];
+		ret = 0;
+		do {
+			char name[ret + 1];
+
+			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
+				       (int)len, path[i],
+				       (!len || *(end - 1) == '/') ? "" : "/");
+			if (ret == -1)
+				break;
+			if (sizeof(name) != (size_t)ret + 1)
+				continue;
+			DRV_LOG(DEBUG, "Looking for rdma-core glue as "
+				"\"%s\"", name);
+			handle = dlopen(name, RTLD_LAZY);
+			break;
+		} while (1);
+		path[i] = end + 1;
+		if (!*end)
+			++i;
+	}
+	if (!handle) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(WARNING, "Cannot load glue library: %s", dlmsg);
+		goto glue_error;
+	}
+	sym = dlsym(handle, "mlx5_glue");
+	if (!sym || !*sym) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(ERR, "Cannot resolve glue symbol: %s", dlmsg);
+		goto glue_error;
+	}
+	mlx5_glue = *sym;
+#endif /* RTE_IBVERBS_LINK_DLOPEN */
+#ifndef NDEBUG
+	/* Glue structure must not contain any NULL pointers. */
+	{
+		unsigned int i;
+
+		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
+			assert(((const void *const *)mlx5_glue)[i]);
+	}
+#endif
+	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
+		rte_errno = EINVAL;
+		DRV_LOG(ERR, "rdma-core glue \"%s\" mismatch: \"%s\" is "
+			"required", mlx5_glue->version, MLX5_GLUE_VERSION);
+		goto glue_error;
+	}
+	mlx5_glue->fork_init();
+	return;
+glue_error:
+	if (handle)
+		dlclose(handle);
+	DRV_LOG(WARNING, "Cannot initialize MLX5 common due to missing"
+		" run-time dependency on rdma-core libraries (libibverbs,"
+		" libmlx5)");
+	mlx5_glue = NULL;
+	return;
 }
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 88ce197..dc6b3c8 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -6,9 +6,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx5.a
-LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
-LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 20.02.0
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index f6d0db9..e10ef3a 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -8,10 +8,6 @@ if not is_linux
 	subdir_done()
 endif
 
-LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
-LIB_GLUE_VERSION = '20.02.0'
-LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-
 allow_experimental_apis = true
 deps += ['hash', 'common_mlx5']
 sources = files(
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1cb8374..f3cb19d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -7,7 +7,6 @@
 #include <unistd.h>
 #include <string.h>
 #include <assert.h>
-#include <dlfcn.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <errno.h>
@@ -3494,138 +3493,6 @@ struct mlx5_flow_id_pool *
 		     RTE_PCI_DRV_PROBE_AGAIN,
 };
 
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-
-/**
- * Suffix RTE_EAL_PMD_PATH with "-glue".
- *
- * This function performs a sanity check on RTE_EAL_PMD_PATH before
- * suffixing its last component.
- *
- * @param buf[out]
- *   Output buffer, should be large enough otherwise NULL is returned.
- * @param size
- *   Size of @p out.
- *
- * @return
- *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
- */
-static char *
-mlx5_glue_path(char *buf, size_t size)
-{
-	static const char *const bad[] = { "/", ".", "..", NULL };
-	const char *path = RTE_EAL_PMD_PATH;
-	size_t len = strlen(path);
-	size_t off;
-	int i;
-
-	while (len && path[len - 1] == '/')
-		--len;
-	for (off = len; off && path[off - 1] != '/'; --off)
-		;
-	for (i = 0; bad[i]; ++i)
-		if (!strncmp(path + off, bad[i], (int)(len - off)))
-			goto error;
-	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
-	if (i == -1 || (size_t)i >= size)
-		goto error;
-	return buf;
-error:
-	DRV_LOG(ERR,
-		"unable to append \"-glue\" to last component of"
-		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),"
-		" please re-configure DPDK");
-	return NULL;
-}
-
-/**
- * Initialization routine for run-time dependency on rdma-core.
- */
-static int
-mlx5_glue_init(void)
-{
-	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
-	const char *path[] = {
-		/*
-		 * A basic security check is necessary before trusting
-		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
-		 */
-		(geteuid() == getuid() && getegid() == getgid() ?
-		 getenv("MLX5_GLUE_PATH") : NULL),
-		/*
-		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
-		 * variant, otherwise let dlopen() look up libraries on its
-		 * own.
-		 */
-		(*RTE_EAL_PMD_PATH ?
-		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
-	};
-	unsigned int i = 0;
-	void *handle = NULL;
-	void **sym;
-	const char *dlmsg;
-
-	while (!handle && i != RTE_DIM(path)) {
-		const char *end;
-		size_t len;
-		int ret;
-
-		if (!path[i]) {
-			++i;
-			continue;
-		}
-		end = strpbrk(path[i], ":;");
-		if (!end)
-			end = path[i] + strlen(path[i]);
-		len = end - path[i];
-		ret = 0;
-		do {
-			char name[ret + 1];
-
-			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
-				       (int)len, path[i],
-				       (!len || *(end - 1) == '/') ? "" : "/");
-			if (ret == -1)
-				break;
-			if (sizeof(name) != (size_t)ret + 1)
-				continue;
-			DRV_LOG(DEBUG, "looking for rdma-core glue as \"%s\"",
-				name);
-			handle = dlopen(name, RTLD_LAZY);
-			break;
-		} while (1);
-		path[i] = end + 1;
-		if (!*end)
-			++i;
-	}
-	if (!handle) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(WARNING, "cannot load glue library: %s", dlmsg);
-		goto glue_error;
-	}
-	sym = dlsym(handle, "mlx5_glue");
-	if (!sym || !*sym) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(ERR, "cannot resolve glue symbol: %s", dlmsg);
-		goto glue_error;
-	}
-	mlx5_glue = *sym;
-	return 0;
-glue_error:
-	if (handle)
-		dlclose(handle);
-	DRV_LOG(WARNING,
-		"cannot initialize PMD due to missing run-time dependency on"
-		" rdma-core libraries (libibverbs, libmlx5)");
-	return -rte_errno;
-}
-
-#endif
-
 /**
  * Driver initialization routine.
  */
@@ -3640,43 +3507,8 @@ struct mlx5_flow_id_pool *
 	mlx5_set_ptype_table();
 	mlx5_set_cksum_table();
 	mlx5_set_swp_types_table();
-	/*
-	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
-	 * huge pages. Calling ibv_fork_init() during init allows
-	 * applications to use fork() safely for purposes other than
-	 * using this PMD, which is not supported in forked processes.
-	 */
-	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
-	/* Match the size of Rx completion entry to the size of a cacheline. */
-	if (RTE_CACHE_LINE_SIZE == 128)
-		setenv("MLX5_CQE_SIZE", "128", 0);
-	/*
-	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
-	 * cleanup all the Verbs resources even when the device was removed.
-	 */
-	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-	if (mlx5_glue_init())
-		return;
-	assert(mlx5_glue);
-#endif
-#ifndef NDEBUG
-	/* Glue structure must not contain any NULL pointers. */
-	{
-		unsigned int i;
-
-		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
-			assert(((const void *const *)mlx5_glue)[i]);
-	}
-#endif
-	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
-		DRV_LOG(ERR,
-			"rdma-core glue \"%s\" mismatch: \"%s\" is required",
-			mlx5_glue->version, MLX5_GLUE_VERSION);
-		return;
-	}
-	mlx5_glue->fork_init();
-	rte_pci_register(&mlx5_driver);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_driver);
 }
 
 RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 04/38] mlx5: share mlx5 PCI device detection
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (2 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 03/38] mlx5: share the mlx5 glue reference Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 05/38] mlx5: share mlx5 devices information Matan Azrad
                   ` (35 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Move PCI detection by IB device from mlx5 PMD to the common code.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 55 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  4 ++
 drivers/common/mlx5/rte_common_mlx5_version.map |  2 +
 drivers/net/mlx5/mlx5.c                         |  1 +
 drivers/net/mlx5/mlx5.h                         |  2 -
 drivers/net/mlx5/mlx5_ethdev.c                  | 53 +-----------------------
 drivers/net/mlx5/mlx5_rxtx.c                    |  1 +
 drivers/net/mlx5/mlx5_stats.c                   |  3 ++
 9 files changed, 68 insertions(+), 55 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b94d3c0..66585b2 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_pci
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 9c88a63..2381208 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -5,6 +5,9 @@
 #include <dlfcn.h>
 #include <unistd.h>
 #include <string.h>
+#include <stdio.h>
+
+#include <rte_errno.h>
 
 #include <rte_errno.h>
 
@@ -16,6 +19,58 @@
 int mlx5_common_logtype;
 
 
+/**
+ * Get PCI information by sysfs device path.
+ *
+ * @param dev_path
+ *   Pointer to device sysfs folder name.
+ * @param[out] pci_addr
+ *   PCI bus address output buffer.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_to_pci_addr(const char *dev_path,
+		     struct rte_pci_addr *pci_addr)
+{
+	FILE *file;
+	char line[32];
+	MKSTR(path, "%s/device/uevent", dev_path);
+
+	file = fopen(path, "rb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	while (fgets(line, sizeof(line), file) == line) {
+		size_t len = strlen(line);
+		int ret;
+
+		/* Truncate long lines. */
+		if (len == (sizeof(line) - 1))
+			while (line[(len - 1)] != '\n') {
+				ret = fgetc(file);
+				if (ret == EOF)
+					break;
+				line[(len - 1)] = ret;
+			}
+		/* Extract information. */
+		if (sscanf(line,
+			   "PCI_SLOT_NAME="
+			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
+			   &pci_addr->domain,
+			   &pci_addr->bus,
+			   &pci_addr->devid,
+			   &pci_addr->function) == 4) {
+			ret = 0;
+			break;
+		}
+	}
+	fclose(file);
+	return 0;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9f10def..107ab8d 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -6,7 +6,9 @@
 #define RTE_PMD_MLX5_COMMON_H_
 
 #include <assert.h>
+#include <stdio.h>
 
+#include <rte_pci.h>
 #include <rte_log.h>
 
 
@@ -84,4 +86,6 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index e4f85e2..0c01172 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,4 +17,6 @@ DPDK_20.02 {
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
 	mlx5_devx_get_out_command_status;
+
+	mlx5_dev_to_pci_addr;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f3cb19d..75175c9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -39,6 +39,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 29c0a06..4126284 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -653,8 +653,6 @@ int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
 int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
-int mlx5_dev_to_pci_addr(const char *dev_path,
-			 struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
 void mlx5_dev_interrupt_handler_devx(void *arg);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index eddf888..2628e64 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
@@ -1212,58 +1213,6 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get PCI information by sysfs device path.
- *
- * @param dev_path
- *   Pointer to device sysfs folder name.
- * @param[out] pci_addr
- *   PCI bus address output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_dev_to_pci_addr(const char *dev_path,
-		     struct rte_pci_addr *pci_addr)
-{
-	FILE *file;
-	char line[32];
-	MKSTR(path, "%s/device/uevent", dev_path);
-
-	file = fopen(path, "rb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	while (fgets(line, sizeof(line), file) == line) {
-		size_t len = strlen(line);
-		int ret;
-
-		/* Truncate long lines. */
-		if (len == (sizeof(line) - 1))
-			while (line[(len - 1)] != '\n') {
-				ret = fgetc(file);
-				if (ret == EOF)
-					break;
-				line[(len - 1)] = ret;
-			}
-		/* Extract information. */
-		if (sscanf(line,
-			   "PCI_SLOT_NAME="
-			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
-			   &pci_addr->domain,
-			   &pci_addr->bus,
-			   &pci_addr->devid,
-			   &pci_addr->function) == 4) {
-			ret = 0;
-			break;
-		}
-	}
-	fclose(file);
-	return 0;
-}
-
-/**
  * Handle asynchronous removal event for entire multiport device.
  *
  * @param sh
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index a845f67..95e0a99 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 0ed7170..4c69e77 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,10 +13,13 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 
+
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
 		.dpdk_name = "rx_port_unicast_bytes",
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 05/38] mlx5: share mlx5 devices information
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (3 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 04/38] mlx5: share mlx5 PCI device detection Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 06/38] drivers: introduce mlx5 vDPA driver Matan Azrad
                   ` (34 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Move the vendor information, vendor ID and device IDs from mlx5 PMD to
the common mlx5 file.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 21 +++++++++++++++++++++
 drivers/net/mlx5/mlx5.h           | 21 ---------------------
 drivers/net/mlx5/mlx5_txq.c       |  1 +
 3 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 107ab8d..0f57a27 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -86,6 +86,27 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+enum {
+	PCI_VENDOR_ID_MELLANOX = 0x15b3,
+};
+
+enum {
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
+};
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4126284..f6488e1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -41,27 +41,6 @@
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
 
-enum {
-	PCI_VENDOR_ID_MELLANOX = 0x15b3,
-};
-
-enum {
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
-};
-
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1d2ba8a..7bff769 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 06/38] drivers: introduce mlx5 vDPA driver
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (4 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 05/38] mlx5: share mlx5 devices information Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 07/38] common/mlx5: expose vDPA DevX capabilities Matan Azrad
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add a new driver to support vDPA operations by Mellanox devices.
The first Mellanox devices which support vDPA operations are
ConnectX6DX and Bluefield1 HCA for their PF ports and VF ports.
This driver is depending on rdma-core like the mlx5 PMD, also it is
going to use mlx5 DevX to create HW objects directly by the FW.
Hence, the common/mlx5 library is linked to the mlx5_vdpa driver.
This driver will not be compiled by default due to the above
dependencies.
Register a new log type for this driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 MAINTAINERS                                     |   8 +
 config/common_base                              |   5 +
 doc/guides/rel_notes/release_20_02.rst          |   5 +
 doc/guides/vdpadevs/features/mlx5.ini           |  14 ++
 doc/guides/vdpadevs/index.rst                   |   1 +
 doc/guides/vdpadevs/mlx5.rst                    |  89 +++++++++++
 drivers/common/Makefile                         |   2 +-
 drivers/common/mlx5/Makefile                    |  15 +-
 drivers/meson.build                             |   8 +-
 drivers/vdpa/Makefile                           |   2 +
 drivers/vdpa/meson.build                        |   3 +-
 drivers/vdpa/mlx5/Makefile                      |  36 +++++
 drivers/vdpa/mlx5/meson.build                   |  29 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.c                   | 201 ++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +++
 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
 mk/rte.app.mk                                   |  15 +-
 17 files changed, 440 insertions(+), 16 deletions(-)
 create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
 create mode 100644 doc/guides/vdpadevs/mlx5.rst
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/meson.build
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
 create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
diff --git a/MAINTAINERS b/MAINTAINERS
index 4b0d524..8137a69 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -742,6 +742,7 @@ F: buildtools/options-ibverbs-static.sh
 F: doc/guides/nics/mlx5.rst
 F: doc/guides/nics/features/mlx5.ini
 
+
 Microsoft vdev_netvsc - EXPERIMENTAL
 M: Matan Azrad <matan@mellanox.com>
 F: drivers/net/vdev_netvsc/
@@ -1102,6 +1103,13 @@ F: drivers/vdpa/ifc/
 F: doc/guides/vdpadevs/ifc.rst
 F: doc/guides/vdpadevs/features/ifcvf.ini
 
+Mellanox mlx5 vDPA
+M: Matan Azrad <matan@mellanox.com>
+M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
+F: drivers/vdpa/mlx5/
+F: doc/guides/vdpadevs/mlx5.rst
+F: doc/guides/vdpadevs/features/mlx5.ini
+
 
 Eventdev Drivers
 ----------------
diff --git a/config/common_base b/config/common_base
index 3632fd8..80fb38f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -366,6 +366,11 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
 
+#
+# Compile vdpa-oriented Mellanox ConnectX-6 & Bluefield (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=n
+
 # Linking method for mlx4/5 dependency on ibverbs and related libraries
 # Default linking is dynamic by linker.
 # Other options are: dynamic by dlopen at run-time, or statically embedded.
diff --git a/doc/guides/rel_notes/release_20_02.rst b/doc/guides/rel_notes/release_20_02.rst
index 933b9e5..1bc94ba 100644
--- a/doc/guides/rel_notes/release_20_02.rst
+++ b/doc/guides/rel_notes/release_20_02.rst
@@ -143,6 +143,11 @@ New Features
   Added a new OCTEON TX2 rawdev PMD for End Point mode of operation.
   See the :doc:`../rawdevs/octeontx2_ep` for more details on this new PMD.
 
+* **Add new VDPA PMD based on Mellanox devices**
+
+  Added a new Mellanox VDPA  (``mlx5_vdpa``) PMD.
+  See the :doc:`../vdpadevs/mlx5` guide for more details on this driver.
+
 
 Removed Items
 -------------
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
new file mode 100644
index 0000000..d635bdf
--- /dev/null
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'mlx5' VDPA driver.
+;
+; Refer to default.ini for the full list of available driver features.
+;
+[Features]
+Other kdrv           = Y
+ARMv8                = Y
+Power8               = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
+Design doc           = Y
+
diff --git a/doc/guides/vdpadevs/index.rst b/doc/guides/vdpadevs/index.rst
index 9657108..1a13efe 100644
--- a/doc/guides/vdpadevs/index.rst
+++ b/doc/guides/vdpadevs/index.rst
@@ -13,3 +13,4 @@ which can be used from an application through vhost API.
 
     features_overview
     ifc
+    mlx5
diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
new file mode 100644
index 0000000..6e0560c
--- /dev/null
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -0,0 +1,89 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2019 Mellanox Technologies, Ltd
+
+MLX5 vDPA driver
+================
+
+The MLX5 vDPA (vhost data path acceleration) driver library
+(**librte_pmd_mlx5_vdpa**) provides support for **Mellanox ConnectX-6**,
+**Mellanox ConnectX-6DX** and **Mellanox BlueField** families of
+10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in
+SR-IOV context.
+
+.. note::
+
+   Due to external dependencies, this driver is disabled in default
+   configuration of the "make" build. It can be enabled with
+   ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=y`` or by using "meson" build system which
+   will detect dependencies.
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` (default **n**)
+
+  Toggle compilation of librte_pmd_mlx5 itself.
+
+- ``CONFIG_RTE_IBVERBS_LINK_DLOPEN`` (default **n**)
+
+  Build PMD with additional code to make it loadable without hard
+  dependencies on **libibverbs** nor **libmlx5**, which may not be installed
+  on the target system.
+
+  In this mode, their presence is still required for it to run properly,
+  however their absence won't prevent a DPDK application from starting (with
+  ``CONFIG_RTE_BUILD_SHARED_LIB`` disabled) and they won't show up as
+  missing with ``ldd(1)``.
+
+  It works by moving these dependencies to a purpose-built rdma-core "glue"
+  plug-in which must either be installed in a directory whose name is based
+  on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if set, or in a
+  standard location for the dynamic linker (e.g. ``/lib``) if left to the
+  default empty string (``""``).
+
+  This option has no performance impact.
+
+- ``CONFIG_RTE_IBVERBS_LINK_STATIC`` (default **n**)
+
+  Embed static flavor of the dependencies **libibverbs** and **libmlx5**
+  in the PMD shared library or the executable static binary.
+
+.. note::
+
+   For BlueField, target should be set to ``arm64-bluefield-linux-gcc``. This
+   will enable ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` and set
+   ``RTE_CACHE_LINE_SIZE`` to 64. Default armv8a configuration of make build and
+   meson build set it to 128 then brings performance degradation.
+
+Design
+------
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel,
+combined with hardware specifications that allow to handle virtual memory
+addresses directly, ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+The PMD can use libibverbs and libmlx5 to access the device firmware
+or directly the hardware components.
+There are different levels of objects and bypassing abilities
+to get the best performances:
+
+- Verbs is a complete high-level generic API
+- Direct Verbs is a device-specific API
+- DevX allows to access firmware objects
+- Direct Rules manages flow steering at low-level hardware layer
+
+Enabling librte_pmd_mlx5 causes DPDK applications to be linked against
+libibverbs.
+
+Supported NICs
+--------------
+
+* Mellanox(R) ConnectX(R)-6 200G MCX654106A-HCAT (4x200G)
+* Mellanox(R) ConnectX(R)-6DX EN 100G MCX623106AN-CDAT (2*100G)
+* Mellanox(R) ConnectX(R)-6DX EN 200G MCX623105AN-VDAT (1*200G)
+* Mellanox(R) BlueField SmartNIC 25G MBF1M332A-ASCAT (2*25G)
+
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 4775d4b..96bd7ac 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,7 +35,7 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
-ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
 DIRS-y += mlx5
 endif
 
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 66585b2..2ff549e 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -10,14 +10,15 @@ LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
 LIB_GLUE_VERSION = 20.02.0
 
 # Sources.
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
 ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+SRCS-y += mlx5_glue.c
 endif
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
-
+SRCS-y += mlx5_devx_cmds.c
+SRCS-y += mlx5_common.c
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+INSTALL-y-lib += $(LIB_GLUE)
+endif
 endif
 
 # Basic CFLAGS.
@@ -301,7 +302,9 @@ mlx5_autoconf.h: mlx5_autoconf.h.new
 		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
 		mv '$<' '$@'
 
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+$(SRCS-y:.c=.o): mlx5_autoconf.h
+endif
 
 # Generate dependency plug-in for rdma-core when the PMD must not be linked
 # directly, so that applications do not inherit this dependency.
diff --git a/drivers/meson.build b/drivers/meson.build
index 29708cc..bd154fa 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -42,6 +42,7 @@ foreach class:dpdk_driver_classes
 		build = true # set to false to disable, e.g. missing deps
 		reason = '<unknown reason>' # set if build == false to explain
 		name = drv
+		fmt_name = ''
 		allow_experimental_apis = false
 		sources = []
 		objs = []
@@ -98,8 +99,11 @@ foreach class:dpdk_driver_classes
 		else
 			class_drivers += name
 
-			dpdk_conf.set(config_flag_fmt.format(name.to_upper()),1)
-			lib_name = driver_name_fmt.format(name)
+			if fmt_name == ''
+				fmt_name = name
+			endif
+			dpdk_conf.set(config_flag_fmt.format(fmt_name.to_upper()),1)
+			lib_name = driver_name_fmt.format(fmt_name)
 
 			if allow_experimental_apis
 				cflags += '-DALLOW_EXPERIMENTAL_API'
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index b5a7a11..6e88359 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -7,4 +7,6 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc
 endif
 
+DIRS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build
index 2f047b5..e3ed54a 100644
--- a/drivers/vdpa/meson.build
+++ b/drivers/vdpa/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2019 Mellanox Technologies, Ltd
 
-drivers = ['ifc']
+drivers = ['ifc',
+	   'mlx5',]
 std_deps = ['bus_pci', 'kvargs']
 std_deps += ['vhost']
 config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
new file mode 100644
index 0000000..c1c8cc0
--- /dev/null
+++ b/drivers/vdpa/mlx5/Makefile
@@ -0,0 +1,36 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_pmd_mlx5_vdpa.a
+
+# Sources.
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+LDLIBS += -lrte_common_mlx5
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual
+
+EXPORT_MAP := rte_pmd_mlx5_vdpa_version.map
+# memseg walk is not part of stable API
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+CFLAGS += -DNDEBUG -UPEDANTIC
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
new file mode 100644
index 0000000..4bca6ea
--- /dev/null
+++ b/drivers/vdpa/mlx5/meson.build
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+
+fmt_name = 'mlx5_vdpa'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
+sources = files(
+	'mlx5_vdpa.c',
+)
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
+	endif
+endforeach
+
+cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
new file mode 100644
index 0000000..cb49a32
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -0,0 +1,201 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_errno.h>
+#include <rte_bus_pci.h>
+#include <rte_vdpa.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+
+
+struct mlx5_vdpa_priv {
+	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	int id; /* vDPA device id. */
+	struct ibv_context *ctx; /* Device context. */
+	struct rte_vdpa_dev_addr dev_addr;
+};
+
+TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
+					      TAILQ_HEAD_INITIALIZER(priv_list);
+static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
+int mlx5_vdpa_logtype;
+
+static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
+	.get_queue_num = NULL,
+	.get_features = NULL,
+	.get_protocol_features = NULL,
+	.dev_conf = NULL,
+	.dev_close = NULL,
+	.set_vring_state = NULL,
+	.set_features = NULL,
+	.migration_done = NULL,
+	.get_vfio_group_fd = NULL,
+	.get_vfio_device_fd = NULL,
+	.get_notify_area = NULL,
+};
+
+/**
+ * DPDK callback to register a PCI device.
+ *
+ * This function spawns vdpa device out of a given PCI device.
+ *
+ * @param[in] pci_drv
+ *   PCI driver structure (mlx5_vpda_driver).
+ * @param[in] pci_dev
+ *   PCI device information.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		    struct rte_pci_device *pci_dev __rte_unused)
+{
+	struct ibv_device **ibv_list;
+	struct ibv_device *ibv_match = NULL;
+	struct mlx5_vdpa_priv *priv = NULL;
+	struct ibv_context *ctx;
+	int ret;
+
+	errno = 0;
+	ibv_list = mlx5_glue->get_device_list(&ret);
+	if (!ibv_list) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
+		return -ENOSYS;
+	}
+	while (ret-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
+		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
+			continue;
+		if (pci_dev->addr.domain != pci_addr.domain ||
+		    pci_dev->addr.bus != pci_addr.bus ||
+		    pci_dev->addr.devid != pci_addr.devid ||
+		    pci_dev->addr.function != pci_addr.function)
+			continue;
+		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
+			ibv_list[ret]->name);
+		ibv_match = ibv_list[ret];
+		break;
+	}
+	if (!ibv_match) {
+		DRV_LOG(ERR, "No matching IB device for PCI slot "
+			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		return -rte_errno;
+	}
+	ctx = mlx5_glue->dv_open_device(ibv_match);
+	if (!ctx) {
+		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
+			ibv_match->name);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv),
+			   RTE_CACHE_LINE_SIZE);
+	if (!priv) {
+		DRV_LOG(ERR, "Failed to allocate private memory.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	priv->ctx = ctx;
+	priv->dev_addr.pci_addr = pci_dev->addr;
+	priv->dev_addr.type = PCI_ADDR;
+	priv->id = rte_vdpa_register_device(&priv->dev_addr, &mlx5_vdpa_ops);
+	if (priv->id < 0) {
+		DRV_LOG(ERR, "Failed to register vDPA device.");
+		rte_errno = rte_errno ? rte_errno : EINVAL;
+		goto error;
+	}
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_INSERT_TAIL(&priv_list, priv, next);
+	pthread_mutex_unlock(&priv_list_lock);
+	return 0;
+
+error:
+	rte_free(priv);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to remove a PCI device.
+ *
+ * This function removes all Ethernet devices belong to a given PCI device.
+ *
+ * @param[in] pci_dev
+ *   Pointer to the PCI device.
+ *
+ * @return
+ *   0 on success, the function cannot fail.
+ */
+static int
+mlx5_vdpa_pci_remove(struct rte_pci_device *pci_dev __rte_unused)
+{
+	return 0;
+}
+
+static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+			       PCI_DEVICE_ID_MELLANOX_CONNECTX5BF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+			       PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6VF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6DX)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF)
+	},
+	{
+		.vendor_id = 0
+	}
+};
+
+static struct rte_pci_driver mlx5_vdpa_driver = {
+	.driver = {
+		.name = "mlx5_vdpa",
+	},
+	.id_table = mlx5_vdpa_pci_id_map,
+	.probe = mlx5_vdpa_pci_probe,
+	.remove = mlx5_vdpa_pci_remove,
+	.drv_flags = 0,
+};
+
+/**
+ * Driver initialization routine.
+ */
+RTE_INIT(rte_mlx5_vdpa_init)
+{
+	/* Initialize common log type. */
+	mlx5_vdpa_logtype = rte_log_register("pmd.vdpa.mlx5");
+	if (mlx5_vdpa_logtype >= 0)
+		rte_log_set_level(mlx5_vdpa_logtype, RTE_LOG_NOTICE);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_vdpa_driver);
+}
+
+RTE_PMD_EXPORT_NAME(net_mlx5_vdpa, __COUNTER__);
+RTE_PMD_REGISTER_PCI_TABLE(net_mlx5_vdpa, mlx5_vdpa_pci_id_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_mlx5_vdpa, "* ib_uverbs & mlx5_core & mlx5_ib");
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_utils.h b/drivers/vdpa/mlx5/mlx5_vdpa_utils.h
new file mode 100644
index 0000000..a239df9
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_VDPA_UTILS_H_
+#define RTE_PMD_MLX5_VDPA_UTILS_H_
+
+#include <mlx5_common.h>
+
+
+extern int mlx5_vdpa_logtype;
+
+#define MLX5_VDPA_LOG_PREFIX "mlx5_vdpa"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_vdpa_logtype, MLX5_VDPA_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_VDPA_UTILS_H_ */
diff --git a/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map b/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
new file mode 100644
index 0000000..143836e
--- /dev/null
+++ b/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
@@ -0,0 +1,3 @@
+DPDK_20.02 {
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index d90f14d..94c9b16 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,18 +196,21 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+_LDLIBS-y                                   += -lrte_common_mlx5
+endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)  += -lrte_pmd_mlx5_vdpa
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -ldl
+_LDLIBS-y                                   += -ldl
 else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
 LIBS_IBVERBS_STATIC = $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += $(LIBS_IBVERBS_STATIC)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += $(LIBS_IBVERBS_STATIC)
+_LDLIBS-y                                   += $(LIBS_IBVERBS_STATIC)
 else
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+_LDLIBS-y                                   += -libverbs -lmlx5
+endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -libverbs -lmlx4
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -libverbs -lmlx5
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD)     += -lrte_pmd_mvneta
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 07/38] common/mlx5: expose vDPA DevX capabilities
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (5 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 06/38] drivers: introduce mlx5 vDPA driver Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 08/38] vdpa/mlx5: support queues number operation Matan Azrad
                   ` (32 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add the DevX capabilities for vDPA configuration and information of
Mellanox devices.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 90 ++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 24 ++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 45 ++++++++++++++++++
 3 files changed, 159 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 67e5929..c6965da 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -285,6 +285,91 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Query NIC vDPA attributes.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] vdpa_attr
+ *   vDPA Attributes structure to fill.
+ */
+static void
+mlx5_devx_cmd_query_hca_vdpa_attr(struct ibv_context *ctx,
+				  struct mlx5_hca_vdpa_attr *vdpa_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+	rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out, sizeof(out));
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (rc || status) {
+		RTE_LOG(DEBUG, PMD, "Failed to query devx VDPA capabilities,"
+			" status %x, syndrome = %x", status, syndrome);
+		vdpa_attr->valid = 0;
+	} else {
+		vdpa_attr->valid = 1;
+		vdpa_attr->desc_tunnel_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 desc_tunnel_offload_type);
+		vdpa_attr->eth_frame_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 eth_frame_offload_type);
+		vdpa_attr->virtio_version_1_0 =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_version_1_0);
+		vdpa_attr->tso_ipv4 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv4);
+		vdpa_attr->tso_ipv6 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv6);
+		vdpa_attr->tx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      tx_csum);
+		vdpa_attr->rx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      rx_csum);
+		vdpa_attr->event_mode = MLX5_GET(virtio_emulation_cap, hcattr,
+						 event_mode);
+		vdpa_attr->virtio_queue_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_queue_type);
+		vdpa_attr->log_doorbell_stride =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_stride);
+		vdpa_attr->log_doorbell_bar_size =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_bar_size);
+		vdpa_attr->doorbell_bar_offset =
+			MLX5_GET64(virtio_emulation_cap, hcattr,
+				   doorbell_bar_offset);
+		vdpa_attr->max_num_virtio_queues =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 max_num_virtio_queues);
+		vdpa_attr->umem_1_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_a);
+		vdpa_attr->umem_1_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_b);
+		vdpa_attr->umem_2_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_2_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_b);
+	}
+}
+
+/**
  * Query HCA attributes.
  * Using those attributes we can check on run time if the device
  * is having the required capabilities.
@@ -343,6 +428,9 @@ struct mlx5_devx_obj *
 	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
 					       flex_parser_protocols);
 	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	attr->vdpa.valid = !!(MLX5_GET64(cmd_hca_cap, hcattr,
+					 general_obj_types) &
+			      MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q);
 	if (attr->qos.sup) {
 		MLX5_SET(query_hca_cap_in, in, op_mod,
 			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
@@ -365,6 +453,8 @@ struct mlx5_devx_obj *
 		attr->qos.flow_meter_reg_c_ids =
 			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
 	}
+	if (attr->vdpa.valid)
+		mlx5_devx_cmd_query_hca_vdpa_attr(ctx, &attr->vdpa);
 	if (!attr->eth_net_offloads)
 		return 0;
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 0c5afde..dea1597 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -31,6 +31,29 @@ struct mlx5_hca_qos_attr {
 
 };
 
+struct mlx5_hca_vdpa_attr {
+	uint8_t virtio_queue_type;
+	uint32_t valid:1;
+	uint32_t desc_tunnel_offload_type:1;
+	uint32_t eth_frame_offload_type:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t log_doorbell_stride:5;
+	uint32_t log_doorbell_bar_size:5;
+	uint32_t max_num_virtio_queues;
+	uint32_t umem_1_buffer_param_a;
+	uint32_t umem_1_buffer_param_b;
+	uint32_t umem_2_buffer_param_a;
+	uint32_t umem_2_buffer_param_b;
+	uint32_t umem_3_buffer_param_a;
+	uint32_t umem_3_buffer_param_b;
+	uint64_t doorbell_bar_offset;
+};
+
 /* HCA supports this number of time periods for LRO. */
 #define MLX5_LRO_NUM_SUPP_PERIODS 4
 
@@ -58,6 +81,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_num_packets:5;
 	uint32_t vhca_id:16;
 	struct mlx5_hca_qos_attr qos;
+	struct mlx5_hca_vdpa_attr vdpa;
 };
 
 struct mlx5_devx_wq_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4b521b2..b0b8ab9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -881,6 +881,11 @@ enum {
 	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION = 0x13 << 1,
+};
+
+enum {
+	MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q = (1ULL << 0xd),
 };
 
 enum {
@@ -1254,11 +1259,51 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8 reserved_at_200[0x600];
 };
 
+enum {
+	MLX5_VIRTQ_TYPE_SPLIT = 0,
+	MLX5_VIRTQ_TYPE_PACKED = 1,
+};
+
+enum {
+	MLX5_VIRTQ_EVENT_MODE_NO_MSIX = 0,
+	MLX5_VIRTQ_EVENT_MODE_CQ = 1,
+	MLX5_VIRTQ_EVENT_MODE_MSIX = 2,
+};
+
+struct mlx5_ifc_virtio_emulation_cap_bits {
+	u8 desc_tunnel_offload_type[0x1];
+	u8 eth_frame_offload_type[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_7[0x1][0x9];
+	u8 event_mode[0x8];
+	u8 virtio_queue_type[0x8];
+	u8 reserved_at_20[0x13];
+	u8 log_doorbell_stride[0x5];
+	u8 reserved_at_3b[0x3];
+	u8 log_doorbell_bar_size[0x5];
+	u8 doorbell_bar_offset[0x40];
+	u8 reserved_at_80[0x8];
+	u8 max_num_virtio_queues[0x18];
+	u8 reserved_at_a0[0x60];
+	u8 umem_1_buffer_param_a[0x20];
+	u8 umem_1_buffer_param_b[0x20];
+	u8 umem_2_buffer_param_a[0x20];
+	u8 umem_2_buffer_param_b[0x20];
+	u8 umem_3_buffer_param_a[0x20];
+	u8 umem_3_buffer_param_b[0x20];
+	u8 reserved_at_1c0[0x620];
+};
+
 union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
 	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
 	       per_protocol_networking_offload_caps;
 	struct mlx5_ifc_qos_cap_bits qos_cap;
+	struct mlx5_ifc_virtio_emulation_cap_bits vdpa_caps;
 	u8 reserved_at_0[0x8000];
 };
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 08/38] vdpa/mlx5: support queues number operation
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (6 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 07/38] common/mlx5: expose vDPA DevX capabilities Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 09/38] vdpa/mlx5: support features get operations Matan Azrad
                   ` (31 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Support get_queue_num operation to get the maximum number of queues
supported by the device.
This number comes from the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index cb49a32..32ca908 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -9,6 +9,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
 
 #include "mlx5_vdpa_utils.h"
 
@@ -18,6 +19,7 @@ struct mlx5_vdpa_priv {
 	int id; /* vDPA device id. */
 	struct ibv_context *ctx; /* Device context. */
 	struct rte_vdpa_dev_addr dev_addr;
+	struct mlx5_hca_vdpa_attr caps;
 };
 
 TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
@@ -25,8 +27,43 @@ struct mlx5_vdpa_priv {
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 int mlx5_vdpa_logtype;
 
+static struct mlx5_vdpa_priv *
+mlx5_vdpa_find_priv_resource_by_did(int did)
+{
+	struct mlx5_vdpa_priv *priv;
+	int found = 0;
+
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_FOREACH(priv, &priv_list, next) {
+		if (did == priv->id) {
+			found = 1;
+			break;
+		}
+	}
+	pthread_mutex_unlock(&priv_list_lock);
+	if (!found) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	return priv;
+}
+
+static int
+mlx5_vdpa_get_queue_num(int did, uint32_t *queue_num)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*queue_num = priv->caps.max_num_virtio_queues;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
-	.get_queue_num = NULL,
+	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = NULL,
 	.get_protocol_features = NULL,
 	.dev_conf = NULL,
@@ -60,6 +97,7 @@ struct mlx5_vdpa_priv {
 	struct ibv_device *ibv_match = NULL;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx;
+	struct mlx5_hca_attr attr;
 	int ret;
 
 	errno = 0;
@@ -107,6 +145,20 @@ struct mlx5_vdpa_priv {
 		rte_errno = ENOMEM;
 		goto error;
 	}
+	ret = mlx5_devx_cmd_query_hca_attr(ctx, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to read HCA capabilities.");
+		rte_errno = ENOTSUP;
+		goto error;
+	} else {
+		if (!attr.vdpa.valid || !attr.vdpa.max_num_virtio_queues) {
+			DRV_LOG(ERR, "Not enough capabilities to support vdpa,"
+				" maybe old FW/OFED version?");
+			rte_errno = ENOTSUP;
+			goto error;
+		}
+		priv->caps = attr.vdpa;
+	}
 	priv->ctx = ctx;
 	priv->dev_addr.pci_addr = pci_dev->addr;
 	priv->dev_addr.type = PCI_ADDR;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 09/38] vdpa/mlx5: support features get operations
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (7 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 08/38] vdpa/mlx5: support queues number operation Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 10/38] common/mlx5: glue null memory region allocation Matan Azrad
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add support for get_features and get_protocol_features operations.
Part of the features are reported by the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |  7 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 66 +++++++++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index d635bdf..fea491d 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -4,6 +4,13 @@
 ; Refer to default.ini for the full list of available driver features.
 ;
 [Features]
+
+any layout           = Y
+guest announce       = Y
+mq                   = Y
+proto mq             = Y
+proto log shmfd      = Y
+proto host notifier  = Y
 Other kdrv           = Y
 ARMv8                = Y
 Power8               = Y
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 32ca908..f8dff3e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
+#include <linux/virtio_net.h>
+
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
@@ -10,6 +12,7 @@
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
 
 #include "mlx5_vdpa_utils.h"
 
@@ -22,6 +25,27 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 };
 
+#ifndef VIRTIO_F_ORDER_PLATFORM
+#define VIRTIO_F_ORDER_PLATFORM 36
+#endif
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
+#define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
+			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+			    (1ULL << VIRTIO_NET_F_MQ) | \
+			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
+			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
+
+#define MLX5_VDPA_PROTOCOL_FEATURES \
+			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_MQ))
+
 TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -62,10 +86,48 @@ struct mlx5_vdpa_priv {
 	return 0;
 }
 
+static int
+mlx5_vdpa_get_vdpa_features(int did, uint64_t *features)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*features = MLX5_VDPA_DEFAULT_FEATURES;
+	if (priv->caps.virtio_queue_type & (1 << MLX5_VIRTQ_TYPE_PACKED))
+		*features |= (1ULL << VIRTIO_F_RING_PACKED);
+	if (priv->caps.tso_ipv4)
+		*features |= (1ULL << VIRTIO_NET_F_HOST_TSO4);
+	if (priv->caps.tso_ipv6)
+		*features |= (1ULL << VIRTIO_NET_F_HOST_TSO6);
+	if (priv->caps.tx_csum)
+		*features |= (1ULL << VIRTIO_NET_F_CSUM);
+	if (priv->caps.rx_csum)
+		*features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+	if (priv->caps.virtio_version_1_0)
+		*features |= (1ULL << VIRTIO_F_VERSION_1);
+	return 0;
+}
+
+static int
+mlx5_vdpa_get_protocol_features(int did, uint64_t *features)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*features = MLX5_VDPA_PROTOCOL_FEATURES;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
-	.get_features = NULL,
-	.get_protocol_features = NULL,
+	.get_features = mlx5_vdpa_get_vdpa_features,
+	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = NULL,
 	.dev_close = NULL,
 	.set_vring_state = NULL,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 10/38] common/mlx5: glue null memory region allocation
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (8 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 09/38] vdpa/mlx5: support features get operations Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 11/38] common/mlx5: support DevX indirect mkey creation Matan Azrad
                   ` (29 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add support for rdma-core API to allocate NULL MR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 13 +++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  1 +
 2 files changed, 14 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index d5bc84e..e75e6bc 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -226,6 +226,18 @@
 	return ibv_reg_mr(pd, addr, length, access);
 }
 
+static struct ibv_mr *
+mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return ibv_alloc_null_mr(pd);
+#else
+	(void)pd;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static int
 mlx5_glue_dereg_mr(struct ibv_mr *mr)
 {
@@ -1070,6 +1082,7 @@
 	.destroy_qp = mlx5_glue_destroy_qp,
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
+	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
 	.destroy_counter_set = mlx5_glue_destroy_counter_set,
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index f4c3180..33afaf4 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -138,6 +138,7 @@ struct mlx5_glue {
 			 int attr_mask);
 	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
 				 size_t length, int access);
+	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
 		(struct ibv_context *context,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 11/38] common/mlx5: support DevX indirect mkey creation
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (9 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 10/38] common/mlx5: glue null memory region allocation Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 12/38] common/mlx5: glue event queue query Matan Azrad
                   ` (28 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add option to create an indirect mkey by the current
mlx5_devx_cmd_mkey_create command.
Align the net/mlx5 driver usage in the above command.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 43 ++++++++++++++++++++++++++++++------
 drivers/common/mlx5/mlx5_devx_cmds.h | 12 ++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 12 ++++++++++
 drivers/net/mlx5/mlx5_flow_dv.c      |  4 ++++
 4 files changed, 64 insertions(+), 7 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6965da..aa2feeb 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -142,7 +142,11 @@ struct mlx5_devx_obj *
 mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
 			  struct mlx5_devx_mkey_attr *attr)
 {
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	struct mlx5_klm *klm_array = attr->klm_array;
+	int klm_num = attr->klm_num;
+	int in_size_dw = MLX5_ST_SZ_DW(create_mkey_in) +
+		     (klm_num ? RTE_ALIGN(klm_num, 4) : 0) * MLX5_ST_SZ_DW(klm);
+	uint32_t in[in_size_dw];
 	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
 	void *mkc;
 	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
@@ -153,27 +157,52 @@ struct mlx5_devx_obj *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	memset(in, 0, in_size_dw * 4);
 	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
 	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	if (klm_num > 0) {
+		int i;
+		uint8_t *klm = (uint8_t *)MLX5_ADDR_OF(create_mkey_in, in,
+						       klm_pas_mtt);
+		translation_size = RTE_ALIGN(klm_num, 4);
+		for (i = 0; i < klm_num; i++) {
+			MLX5_SET(klm, klm, byte_count, klm_array[i].byte_count);
+			MLX5_SET(klm, klm, mkey, klm_array[i].mkey);
+			MLX5_SET64(klm, klm, address, klm_array[i].address);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		for (; i < (int)translation_size; i++) {
+			MLX5_SET(klm, klm, mkey, 0x0);
+			MLX5_SET64(klm, klm, address, 0x0);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		MLX5_SET(mkc, mkc, access_mode_1_0, attr->log_entity_size ?
+			 MLX5_MKC_ACCESS_MODE_KLM_FBS :
+			 MLX5_MKC_ACCESS_MODE_KLM);
+		MLX5_SET(mkc, mkc, log_page_size, attr->log_entity_size);
+	} else {
+		translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+		MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+		MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	}
 	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
 		 translation_size);
 	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(create_mkey_in, in, pg_access, attr->pg_access);
 	MLX5_SET(mkc, mkc, lw, 0x1);
 	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, attr->pd);
 	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
 	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
 	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
 	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, in_size_dw * 4, out,
 					       sizeof(out));
 	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		DRV_LOG(ERR, "Can't create %sdirect mkey - error %d\n",
+			klm_num ? "an in" : "a ", errno);
 		rte_errno = errno;
 		rte_free(mkey);
 		return NULL;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index dea1597..ceeca64 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -13,11 +13,22 @@ struct mlx5_devx_obj {
 	int id; /* The object ID. */
 };
 
+/* UMR memory buffer used to define 1 entry in indirect mkey. */
+struct mlx5_klm {
+	uint32_t byte_count;
+	uint32_t mkey;
+	uint64_t address;
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
 	uint32_t umem_id;
 	uint32_t pd;
+	uint32_t log_entity_size;
+	uint32_t pg_access:1;
+	struct mlx5_klm *klm_array;
+	int klm_num;
 };
 
 /* HCA qos attributes. */
@@ -212,6 +223,7 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t hairpin_peer_vhca:16;
 };
 
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index b0b8ab9..058bd8c 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -726,6 +726,8 @@ enum {
 
 enum {
 	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+	MLX5_MKC_ACCESS_MODE_KLM   = 0x2,
+	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
 /* Flow counters. */
@@ -790,6 +792,16 @@ struct mlx5_ifc_query_flow_counter_in_bits {
 	u8         flow_counter_id[0x20];
 };
 
+#define MLX5_MAX_KLM_BYTE_COUNT 0x80000000u
+#define MLX5_MIN_KLM_FIXED_BUFFER_SIZE 0x1000u
+
+
+struct mlx5_ifc_klm_bits {
+	u8         byte_count[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+};
+
 struct mlx5_ifc_mkc_bits {
 	u8         reserved_at_0[0x1];
 	u8         free[0x1];
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 50d1078..cfdfdf4 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3933,6 +3933,10 @@ struct field_modify_info modify_tcp[] = {
 	mkey_attr.size = size;
 	mkey_attr.umem_id = mem_mng->umem->umem_id;
 	mkey_attr.pd = sh->pdn;
+	mkey_attr.log_entity_size = 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = NULL;
+	mkey_attr.klm_num = 0;
 	mem_mng->dm = mlx5_devx_cmd_mkey_create(sh->ctx, &mkey_attr);
 	if (!mem_mng->dm) {
 		mlx5_glue->devx_umem_dereg(mem_mng->umem);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 12/38] common/mlx5: glue event queue query
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (10 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 11/38] common/mlx5: support DevX indirect mkey creation Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 13/38] common/mlx5: glue event interrupt commands Matan Azrad
                   ` (27 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The event queue is managed only by the kernel.
Add the rdma-core command in glue to query the kernel event queue
details.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  2 ++
 2 files changed, 17 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e75e6bc..ac51e65 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1045,6 +1045,20 @@
 #else
 	RTE_SET_USED(file);
 	RTE_SET_USED(domain);
+        return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_query_eqn(struct ibv_context *ctx, uint32_t cpus,
+			 uint32_t *eqn)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_query_eqn(ctx, cpus, eqn);
+#else
+	(void)ctx;
+	(void)cpus;
+	(void)eqn;
 	return -ENOTSUP;
 #endif
 }
@@ -1148,4 +1162,5 @@
 	.devx_qp_query = mlx5_glue_devx_qp_query,
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+	.devx_query_eqn = mlx5_glue_devx_query_eqn,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 33afaf4..fe51f97 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -259,6 +259,8 @@ struct mlx5_glue {
 			       uint32_t port_num,
 			       struct mlx5dv_devx_port *mlx5_devx_port);
 	int (*dr_dump_domain)(FILE *file, void *domain);
+	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
+			      uint32_t *eqn);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 13/38] common/mlx5: glue event interrupt commands
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (11 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 12/38] common/mlx5: glue event queue query Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 14/38] common/mlx5: glue UAR allocation Matan Azrad
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add the next commands to glue in order to support interrupt event
channel operations associated to events in the EQ:
	devx_create_event_channel,
	devx_destroy_event_channel,
	devx_subscribe_devx_event,
	devx_subscribe_devx_event_fd,
	devx_get_event.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++
 drivers/common/mlx5/meson.build |  2 ++
 drivers/common/mlx5/mlx5_glue.c | 79 +++++++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h | 25 +++++++++++++
 4 files changed, 111 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 2ff549e..6b618ad 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -155,6 +155,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_action_create_dest_devx_tir \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_EVENT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_get_event \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
 		infiniband/mlx5dv.h \
 		func mlx5dv_dr_action_create_flow_meter \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index d2eeb45..fa11fd9 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -108,6 +108,8 @@ if build
 		'mlx5dv_devx_obj_query_async' ],
 		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_IBV_DEVX_EVENT', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_get_event' ],
 		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_flow_meter' ],
 		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index ac51e65..20364cb 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1063,6 +1063,80 @@
 #endif
 }
 
+static struct mlx5dv_devx_event_channel *
+mlx5_glue_devx_create_event_channel(struct ibv_context *ctx, int flags)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_create_event_channel(ctx, flags);
+#else
+	(void)ctx;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_event_channel(struct mlx5dv_devx_event_channel *eventc)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	mlx5dv_devx_destroy_event_channel(eventc);
+#else
+	(void)eventc;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event(struct mlx5dv_devx_event_channel *eventc,
+				    struct mlx5dv_devx_obj *obj,
+				    uint16_t events_sz, uint16_t events_num[],
+				    uint64_t cookie)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event(eventc, obj, events_sz,
+						events_num, cookie);
+#else
+	(void)eventc;
+	(void)obj;
+	(void)events_sz;
+	(void)events_num;
+	(void)cookie;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event_fd(struct mlx5dv_devx_event_channel *eventc,
+				       int fd, struct mlx5dv_devx_obj *obj,
+				       uint16_t event_num)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event_fd(eventc, fd, obj, event_num);
+#else
+	(void)eventc;
+	(void)fd;
+	(void)obj;
+	(void)event_num;
+	return -ENOTSUP;
+#endif
+}
+
+static ssize_t
+mlx5_glue_devx_get_event(struct mlx5dv_devx_event_channel *eventc,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_get_event(eventc, event_data, event_resp_len);
+#else
+	(void)eventc;
+	(void)event_data;
+	(void)event_resp_len;
+	errno = ENOTSUP;
+	return -1;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1163,4 +1237,9 @@
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
 	.devx_query_eqn = mlx5_glue_devx_query_eqn,
+	.devx_create_event_channel = mlx5_glue_devx_create_event_channel,
+	.devx_destroy_event_channel = mlx5_glue_devx_destroy_event_channel,
+	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
+	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
+	.devx_get_event = mlx5_glue_devx_get_event,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index fe51f97..6fc00dd 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -86,6 +86,12 @@
 struct mlx5dv_dr_flow_meter_attr;
 #endif
 
+#ifndef HAVE_IBV_DEVX_EVENT
+struct mlx5dv_devx_event_channel { int fd; };
+struct mlx5dv_devx_async_event_hdr;
+#define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -261,6 +267,25 @@ struct mlx5_glue {
 	int (*dr_dump_domain)(FILE *file, void *domain);
 	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
 			      uint32_t *eqn);
+	struct mlx5dv_devx_event_channel *(*devx_create_event_channel)
+				(struct ibv_context *context, int flags);
+	void (*devx_destroy_event_channel)
+			(struct mlx5dv_devx_event_channel *event_channel);
+	int (*devx_subscribe_devx_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t events_sz,
+			 uint16_t events_num[],
+			 uint64_t cookie);
+	int (*devx_subscribe_devx_event_fd)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 int fd,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t event_num);
+	ssize_t (*devx_get_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 14/38] common/mlx5: glue UAR allocation
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (12 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 13/38] common/mlx5: glue event interrupt commands Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 15/38] common/mlx5: add DevX command to create CQ Matan Azrad
                   ` (25 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The isolated, protected and independent direct access to the HW by
multiple processes is implemented via User Access Region (UAR)
mechanism.
The UAR is part of PCI address space that is mapped for direct access to
the HW from the CPU.
UAR is comprised of multiple pages, each page containing registers that
control the HW operation.
UAR mechanism is used to post execution or control requests to the HW.
It is used by the HW to enforce protection and isolation between
different processes.
Add a glue command to allocate and free an UAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 25 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  4 ++++
 2 files changed, 29 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index 20364cb..8fbe6fd 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1137,6 +1137,29 @@
 #endif
 }
 
+static struct mlx5dv_devx_uar *
+mlx5_glue_devx_alloc_uar(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_alloc_uar(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_free_uar(struct mlx5dv_devx_uar *devx_uar)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	mlx5dv_devx_free_uar(devx_uar);
+#else
+	(void)devx_uar;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1242,4 +1265,6 @@
 	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
 	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
 	.devx_get_event = mlx5_glue_devx_get_event,
+	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
+	.devx_free_uar = mlx5_glue_devx_free_uar,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 6fc00dd..7d9256e 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -66,6 +66,7 @@
 #ifndef HAVE_IBV_DEVX_OBJ
 struct mlx5dv_devx_obj;
 struct mlx5dv_devx_umem { uint32_t umem_id; };
+struct mlx5dv_devx_uar { void *reg_addr; void *base_addr; uint32_t page_id; };
 #endif
 
 #ifndef HAVE_IBV_DEVX_ASYNC
@@ -230,6 +231,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
+						  uint32_t flags);
+	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
 	struct mlx5dv_devx_obj *(*devx_obj_create)
 					(struct ibv_context *ctx,
 					 const void *in, size_t inlen,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 15/38] common/mlx5: add DevX command to create CQ
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (13 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 14/38] common/mlx5: glue UAR allocation Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 16/38] common/mlx5: glue VAR allocation Matan Azrad
                   ` (24 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
HW implements completion queues(CQ) used to post completion reports upon
completion of work request.
Used for Rx and Tx datapath.
Add DevX command to create a CQ.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 57 ++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            | 19 +++++++
 drivers/common/mlx5/mlx5_prm.h                  | 71 +++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 148 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index aa2feeb..ef7d70c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1091,3 +1091,60 @@ struct mlx5_devx_obj *
 #endif
 	return -ret;
 }
+
+/* 
+ * Create CQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to CQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_cq(struct ibv_context *ctx, struct mlx5_devx_cq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_cq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_cq_out)] = {0};
+	struct mlx5_devx_obj *cq_obj = rte_zmalloc(__func__, sizeof(*cq_obj),
+						   0);
+	void *cqctx = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+
+	if (!cq_obj) {
+		DRV_LOG(ERR, "Failed to allocate CQ object memory.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
+	if (attr->db_umem_valid) {
+		MLX5_SET(cqc, cqctx, dbr_umem_valid, attr->db_umem_valid);
+		MLX5_SET(cqc, cqctx, dbr_umem_id, attr->db_umem_id);
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_umem_offset);
+	} else {
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_addr);
+	}
+	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
+	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
+	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
+	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
+	if (attr->q_umem_valid) {
+		MLX5_SET(create_cq_in, in, cq_umem_valid, attr->q_umem_valid);
+		MLX5_SET(create_cq_in, in, cq_umem_id, attr->q_umem_id);
+		MLX5_SET64(create_cq_in, in, cq_umem_offset,
+			   attr->q_umem_offset);
+	}
+	cq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!cq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ using DevX errno=%d.", errno);
+		rte_free(cq_obj);
+		return NULL;
+	}
+	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
+	return cq_obj;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index ceeca64..7b50861 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -224,6 +224,23 @@ struct mlx5_devx_modify_sq_attr {
 };
 
 
+/* CQ attributes structure, used by CQ operations. */
+struct mlx5_devx_cq_attr {
+	uint32_t q_umem_valid:1;
+	uint32_t db_umem_valid:1;
+	uint32_t use_first_only:1;
+	uint32_t overrun_ignore:1;
+	uint32_t log_cq_size:5;
+	uint32_t log_page_size:5;
+	uint32_t uar_page_id;
+	uint32_t q_umem_id;
+	uint64_t q_umem_offset;
+	uint32_t db_umem_id;
+	uint64_t db_umem_offset;
+	uint32_t eqn;
+	uint64_t db_addr;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -260,4 +277,6 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
 struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
+					      struct mlx5_devx_cq_attr *attr);
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 058bd8c..0206a8e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -710,6 +710,7 @@ enum {
 enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_CREATE_CQ = 0x400,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -1844,6 +1845,76 @@ struct mlx5_ifc_flow_meter_parameters_bits {
 	u8         reserved_at_8[0x60];		// 14h-1Ch
 };
 
+struct mlx5_ifc_cqc_bits {
+	u8 status[0x4];
+	u8 as_notify[0x1];
+	u8 initiator_src_dct[0x1];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_7[0x1];
+	u8 cqe_sz[0x3];
+	u8 cc[0x1];
+	u8 reserved_at_c[0x1];
+	u8 scqe_break_moderation_en[0x1];
+	u8 oi[0x1];
+	u8 cq_period_mode[0x2];
+	u8 cqe_comp_en[0x1];
+	u8 mini_cqe_res_format[0x2];
+	u8 st[0x4];
+	u8 reserved_at_18[0x8];
+	u8 dbr_umem_id[0x20];
+	u8 reserved_at_40[0x14];
+	u8 page_offset[0x6];
+	u8 reserved_at_5a[0x6];
+	u8 reserved_at_60[0x3];
+	u8 log_cq_size[0x5];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x4];
+	u8 cq_period[0xc];
+	u8 cq_max_count[0x10];
+	u8 reserved_at_a0[0x18];
+	u8 c_eqn[0x8];
+	u8 reserved_at_c0[0x3];
+	u8 log_page_size[0x5];
+	u8 reserved_at_c8[0x18];
+	u8 reserved_at_e0[0x20];
+	u8 reserved_at_100[0x8];
+	u8 last_notified_index[0x18];
+	u8 reserved_at_120[0x8];
+	u8 last_solicit_index[0x18];
+	u8 reserved_at_140[0x8];
+	u8 consumer_counter[0x18];
+	u8 reserved_at_160[0x8];
+	u8 producer_counter[0x18];
+	u8 local_partition_id[0xc];
+	u8 process_id[0x14];
+	u8 reserved_at_1A0[0x20];
+	u8 dbr_addr[0x40];
+};
+
+struct mlx5_ifc_create_cq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_cq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_cqc_bits cq_context;
+	u8 cq_umem_offset[0x40];
+	u8 cq_umem_id[0x20];
+	u8 cq_umem_valid[0x1];
+	u8 reserved_at_2e1[0x1f];
+	u8 reserved_at_300[0x580];
+	u8 pas[];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 0c01172..c6a203d 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -1,6 +1,7 @@
 DPDK_20.02 {
 	global:
 
+	mlx5_devx_cmd_create_cq;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 16/38] common/mlx5: glue VAR allocation
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (14 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 15/38] common/mlx5: add DevX command to create CQ Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 17/38] common/mlx5: add DevX virtio emulation commands Matan Azrad
                   ` (23 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Virtio access region(VAR) is the UAR that allocated for virtio emulation
access.
Add rdma-core operations to allocate and free VAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++++
 drivers/common/mlx5/meson.build |  1 +
 drivers/common/mlx5/mlx5_glue.c | 26 ++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  8 ++++++++
 4 files changed, 40 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 6b618ad..82403a2 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -175,6 +175,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5_MMAP_GET_NC_PAGES_CMD \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_VAR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_alloc_var \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index fa11fd9..74419c6 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -120,6 +120,7 @@ if build
 		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
 		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_IBV_VAR', 'infiniband/mlx5dv.h', 'mlx5dv_alloc_var' ],
 		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
 		'SUPPORTED_40000baseKR4_Full' ],
 		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index 8fbe6fd..a3e392c 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1160,6 +1160,30 @@
 #endif
 }
 
+static struct mlx5dv_var *
+mlx5_glue_dv_alloc_var(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_VAR
+	return mlx5dv_alloc_var(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_dv_free_var(struct mlx5dv_var *var)
+{
+#ifdef HAVE_IBV_VAR
+	mlx5dv_free_var(var);
+#else
+	(void)var;
+	errno = ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1267,4 +1291,6 @@
 	.devx_get_event = mlx5_glue_devx_get_event,
 	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
 	.devx_free_uar = mlx5_glue_devx_free_uar,
+	.dv_alloc_var = mlx5_glue_dv_alloc_var,
+	.dv_free_var = mlx5_glue_dv_free_var,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 7d9256e..6238b43 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -93,6 +93,11 @@
 #define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
 #endif
 
+#ifndef HAVE_IBV_VAR
+struct mlx5dv_var { uint32_t page_id; uint32_t length; off_t mmap_off;
+			uint64_t comp_mask; };
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -231,6 +236,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_var *(*dv_alloc_var)(struct ibv_context *context,
+					   uint32_t flags);
+	void (*dv_free_var)(struct mlx5dv_var *var);
 	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
 						  uint32_t flags);
 	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 17/38] common/mlx5: add DevX virtio emulation commands
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (15 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 16/38] common/mlx5: glue VAR allocation Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 18/38] vdpa/mlx5: prepare memory regions Matan Azrad
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Virtio emulation offload allows SW to offload the I/O operations of a
virtio virtqueue, using the device, allowing an improved performance
for its users.
While supplying all the relevant Virtqueue information (type, size,
memory location, doorbell information, etc.). The device can then
offload the I/O operation of this queue, according to its device type
characteristics.
Some of the virtio features can be supported according to the device
capability, for example, TSO and checksum.
Add virtio queue create, modify and query DevX commands.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 199 +++++++++++++++++++++---
 drivers/common/mlx5/mlx5_devx_cmds.h            |  48 +++++-
 drivers/common/mlx5/mlx5_prm.h                  | 117 ++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   3 +
 4 files changed, 343 insertions(+), 24 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index ef7d70c..f843606 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -377,24 +377,18 @@ struct mlx5_devx_obj *
 		vdpa_attr->max_num_virtio_queues =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 max_num_virtio_queues);
-		vdpa_attr->umem_1_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_a);
-		vdpa_attr->umem_1_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_b);
-		vdpa_attr->umem_2_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_2_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_b);
+		vdpa_attr->umems[0].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_a);
+		vdpa_attr->umems[0].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_b);
+		vdpa_attr->umems[1].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_a);
+		vdpa_attr->umems[1].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_b);
+		vdpa_attr->umems[2].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_a);
+		vdpa_attr->umems[2].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_b);
 	}
 }
 
@@ -1148,3 +1142,172 @@ struct mlx5_devx_obj *
 	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
 	return cq_obj;
 }
+
+/**
+ * Create VIRTQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to VIRTQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	struct mlx5_devx_obj *virtq_obj = rte_zmalloc(__func__,
+						     sizeof(*virtq_obj), 0);
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+
+	if (!virtq_obj) {
+		DRV_LOG(ERR, "Failed to allocate virtq data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	MLX5_SET16(virtio_net_q, virtq, hw_used_index, attr->hw_used_index);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+	MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+	MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+	MLX5_SET(virtio_q, virtctx, event_cqn_or_msix, attr->cq_id);
+	MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+	MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+	MLX5_SET64(virtio_q, virtctx, available_addr, attr->available_addr);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	MLX5_SET16(virtio_q, virtctx, queue_size, attr->q_size);
+	MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	MLX5_SET(virtio_q, virtctx, umem_1_id, attr->umems[0].id);
+	MLX5_SET(virtio_q, virtctx, umem_1_size, attr->umems[0].size);
+	MLX5_SET64(virtio_q, virtctx, umem_1_offset, attr->umems[0].offset);
+	MLX5_SET(virtio_q, virtctx, umem_2_id, attr->umems[1].id);
+	MLX5_SET(virtio_q, virtctx, umem_2_size, attr->umems[1].size);
+	MLX5_SET64(virtio_q, virtctx, umem_2_offset, attr->umems[1].offset);
+	MLX5_SET(virtio_q, virtctx, umem_3_id, attr->umems[2].id);
+	MLX5_SET(virtio_q, virtctx, umem_3_size, attr->umems[2].size);
+	MLX5_SET64(virtio_q, virtctx, umem_3_offset, attr->umems[2].offset);
+	MLX5_SET(virtio_net_q, virtq, tisn_or_qpn, attr->tis_id);
+	virtq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						    sizeof(out));
+	if (!virtq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create VIRTQ Obj using DevX.");
+		rte_free(virtq_obj);
+		return NULL;
+	}
+	virtq_obj->id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+	return virtq_obj;
+}
+
+/**
+ * Modify VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in] attr
+ *   Pointer to modify virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	switch (attr->type) {
+	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
+			 attr->dirty_bitmap_mkey);
+		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
+			 attr->dirty_bitmap_addr);
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
+			 attr->dirty_bitmap_size);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
+			 attr->dirty_bitmap_dump_enable);
+		break;
+	default:
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Query VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in/out] attr
+ *   Pointer to virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_virtq_out)] = {0};
+	void *hdr = MLX5_ADDR_OF(query_virtq_out, in, hdr);
+	void *virtq = MLX5_ADDR_OF(query_virtq_out, out, virtq);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	ret = mlx5_glue->devx_obj_query(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	attr->hw_available_index = MLX5_GET16(virtio_net_q, virtq,
+					      hw_available_index);
+	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 7b50861..63c84f8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -56,12 +56,10 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t max_num_virtio_queues;
-	uint32_t umem_1_buffer_param_a;
-	uint32_t umem_1_buffer_param_b;
-	uint32_t umem_2_buffer_param_a;
-	uint32_t umem_2_buffer_param_b;
-	uint32_t umem_3_buffer_param_a;
-	uint32_t umem_3_buffer_param_b;
+	struct {
+		uint32_t a;
+		uint32_t b;
+	} umems[3];
 	uint64_t doorbell_bar_offset;
 };
 
@@ -241,6 +239,37 @@ struct mlx5_devx_cq_attr {
 	uint64_t db_addr;
 };
 
+/* Virtq attributes structure, used by VIRTQ operations. */
+struct mlx5_devx_virtq_attr {
+	uint16_t hw_available_index;
+	uint16_t hw_used_index;
+	uint16_t q_size;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t state:4;
+	uint32_t dirty_bitmap_dump_enable:1;
+	uint32_t dirty_bitmap_mkey;
+	uint32_t dirty_bitmap_size;
+	uint32_t mkey;
+	uint32_t cq_id;
+	uint32_t queue_index;
+	uint32_t tis_id;
+	uint64_t dirty_bitmap_addr;
+	uint64_t type;
+	uint64_t desc_addr;
+	uint64_t used_addr;
+	uint64_t available_addr;
+	struct {
+		uint32_t id;
+		uint32_t size;
+		uint64_t offset;
+	} umems[3];
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -279,4 +308,11 @@ int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
 					      struct mlx5_devx_cq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+					     struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			       struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			      struct mlx5_devx_virtq_attr *attr);
+
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 0206a8e..6db89bb 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -527,6 +527,8 @@ struct mlx5_modification_cmd {
 #define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
 				    (__mlx5_bit_off(typ, fld) & 0xf))
 #define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << \
+				  __mlx5_16_bit_off(typ, fld))
 #define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
 #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
 #define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
@@ -551,6 +553,17 @@ struct mlx5_modification_cmd {
 			rte_cpu_to_be_64(v); \
 	} while (0)
 
+#define MLX5_SET16(typ, p, fld, v) \
+	do { \
+		u16 _v = v; \
+		*((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \
+		rte_cpu_to_be_16((rte_be_to_cpu_16(*((__be16 *)(p) + \
+				  __mlx5_16_off(typ, fld))) & \
+				  (~__mlx5_16_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask16(typ, fld)) << \
+				  __mlx5_16_bit_off(typ, fld))); \
+	} while (0)
+
 #define MLX5_GET(typ, p, fld) \
 	((rte_be_to_cpu_32(*((__be32 *)(p) +\
 	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
@@ -723,6 +736,9 @@ enum {
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
+	MLX5_CMD_OP_MODIFY_GENERAL_OBJECT = 0xa01,
+	MLX5_CMD_OP_QUERY_GENERAL_OBJECT = 0xa02,
 };
 
 enum {
@@ -1689,6 +1705,11 @@ struct mlx5_ifc_create_tir_in_bits {
 	struct mlx5_ifc_tirc_bits ctx;
 };
 
+enum {
+	MLX5_INLINE_Q_TYPE_RQ = 0x0,
+	MLX5_INLINE_Q_TYPE_VIRTQ = 0x1,
+};
+
 struct mlx5_ifc_rq_num_bits {
 	u8 reserved_at_0[0x8];
 	u8 rq_num[0x18];
@@ -1915,6 +1936,102 @@ struct mlx5_ifc_create_cq_in_bits {
 	u8 pas[];
 };
 
+enum {
+	MLX5_GENERAL_OBJ_TYPE_VIRTQ = 0x000d,
+};
+
+struct mlx5_ifc_general_obj_in_cmd_hdr_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x20];
+	u8 obj_type[0x10];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_general_obj_out_cmd_hdr_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+enum {
+	MLX5_VIRTQ_STATE_INIT = 0,
+	MLX5_VIRTQ_STATE_RDY = 1,
+	MLX5_VIRTQ_STATE_SUSPEND = 2,
+	MLX5_VIRTQ_STATE_ERROR = 3,
+};
+
+enum {
+	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+};
+
+struct mlx5_ifc_virtio_q_bits {
+	u8 virtio_q_type[0x8];
+	u8 reserved_at_8[0x5];
+	u8 event_mode[0x3];
+	u8 queue_index[0x10];
+	u8 full_emulation[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 reserved_at_22[0x2];
+	u8 offload_type[0x4];
+	u8 event_cqn_or_msix[0x18];
+	u8 doorbell_stride_idx[0x10];
+	u8 queue_size[0x10];
+	u8 device_emulation_id[0x20];
+	u8 desc_addr[0x40];
+	u8 used_addr[0x40];
+	u8 available_addr[0x40];
+	u8 virtio_q_mkey[0x20];
+	u8 reserved_at_160[0x20];
+	u8 umem_1_id[0x20];
+	u8 umem_1_size[0x20];
+	u8 umem_1_offset[0x40];
+	u8 umem_2_id[0x20];
+	u8 umem_2_size[0x20];
+	u8 umem_2_offset[0x40];
+	u8 umem_3_id[0x20];
+	u8 umem_3_size[0x20];
+	u8 umem_3_offset[0x40];
+	u8 reserved_at_300[0x100];
+};
+
+struct mlx5_ifc_virtio_net_q_bits {
+	u8 modify_field_select[0x40];
+	u8 reserved_at_40[0x40];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_84[0x6];
+	u8 dirty_bitmap_dump_enable[0x1];
+	u8 vhost_log_page[0x5];
+	u8 reserved_at_90[0xc];
+	u8 state[0x4];
+	u8 error_type[0x8];
+	u8 tisn_or_qpn[0x18];
+	u8 dirty_bitmap_mkey[0x20];
+	u8 dirty_bitmap_size[0x20];
+	u8 dirty_bitmap_addr[0x40];
+	u8 hw_available_index[0x10];
+	u8 hw_used_index[0x10];
+	u8 reserved_at_160[0xa0];
+	struct mlx5_ifc_virtio_q_bits virtio_q_context;
+};
+
+struct mlx5_ifc_create_virtq_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
+struct mlx5_ifc_query_virtq_out_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index c6a203d..f3082ce 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -8,6 +8,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_create_tir;
 	mlx5_devx_cmd_create_td;
 	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
 	mlx5_devx_cmd_flow_counter_query;
@@ -15,8 +16,10 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_cmd_query_virtq;
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 18/38] vdpa/mlx5: prepare memory regions
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (16 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 17/38] common/mlx5: add DevX virtio emulation commands Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 19/38] mlx5: share CQ entry check Matan Azrad
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
In order to map the guest physical addresses used by the virtio device
guest side to the host physical addresses used by the HW as the host
side, memory regions are created.
By this way, for example, the HW can translate the addresses of the
packets posted by the guest and to take the packets from the correct
place.
The design is to work with single MR which will be configured to the
virtio queues in the HW, hence a lot of direct MRs are grouped to single
indirect MR.
Create functions to prepare and release MRs with all the related
resources that are required for it.
Create a new file mlx5_vdpa_mem.c to manage all the MR related code
in the driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile        |   4 +-
 drivers/vdpa/mlx5/meson.build     |   3 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c     |  11 +-
 drivers/vdpa/mlx5/mlx5_vdpa.h     |  60 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 351 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 418 insertions(+), 11 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index c1c8cc0..5472797 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -8,6 +8,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
@@ -15,6 +16,7 @@ CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
 CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
 CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
+CFLAGS += -I$(RTE_SDK)/lib/librte_sched
 CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
@@ -22,7 +24,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 LDLIBS += -lrte_common_mlx5
-LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci -lrte_sched
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 4bca6ea..7e5dd95 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -9,9 +9,10 @@ endif
 
 fmt_name = 'mlx5_vdpa'
 allow_experimental_apis = true
-deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
+deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
+	'mlx5_vdpa_mem.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f8dff3e..f4af74e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -7,7 +7,6 @@
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
-#include <rte_vdpa.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -15,16 +14,9 @@
 #include <mlx5_prm.h>
 
 #include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
 
 
-struct mlx5_vdpa_priv {
-	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	int id; /* vDPA device id. */
-	struct ibv_context *ctx; /* Device context. */
-	struct rte_vdpa_dev_addr dev_addr;
-	struct mlx5_hca_vdpa_attr caps;
-};
-
 #ifndef VIRTIO_F_ORDER_PLATFORM
 #define VIRTIO_F_ORDER_PLATFORM 36
 #endif
@@ -230,6 +222,7 @@ struct mlx5_vdpa_priv {
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
+	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
new file mode 100644
index 0000000..e27baea
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_VDPA_H_
+#define RTE_PMD_MLX5_VDPA_H_
+
+#include <sys/queue.h>
+
+#include <rte_vdpa.h>
+#include <rte_vhost.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+struct mlx5_vdpa_query_mr {
+	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
+	void *addr;
+	uint64_t length;
+	struct mlx5dv_devx_umem *umem;
+	struct mlx5_devx_obj *mkey;
+	int is_indirect;
+};
+
+struct mlx5_vdpa_priv {
+	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	int id; /* vDPA device id. */
+	int vid; /* vhost device id. */
+	struct ibv_context *ctx; /* Device context. */
+	struct rte_vdpa_dev_addr dev_addr;
+	struct mlx5_hca_vdpa_attr caps;
+	uint32_t pdn; /* Protection Domain number. */
+	struct ibv_pd *pd;
+	uint32_t gpa_mkey_index;
+	struct ibv_mr *null_mr;
+	struct rte_vhost_memory *vmem;
+	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+};
+
+/**
+ * Release all the prepared memory regions and all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Register all the memory regions of the virtio device to the HW and allocate
+ * all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
+
+#endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
new file mode 100644
index 0000000..e060dfa
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -0,0 +1,351 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <assert.h>
+#include <stdlib.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_common.h>
+#include <rte_sched_common.h>
+
+#include <mlx5_prm.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static int
+mlx5_vdpa_pd_prepare(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->pd)
+		return 0;
+	priv->pd = mlx5_glue->alloc_pd(priv->ctx);
+	if (priv->pd == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PD.");
+		return errno ? -errno : -ENOMEM;
+	}
+	struct mlx5dv_obj obj;
+	struct mlx5dv_pd pd_info;
+	int ret = 0;
+
+	obj.pd.in = priv->pd;
+	obj.pd.out = &pd_info;
+	ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
+	if (ret) {
+		DRV_LOG(ERR, "Fail to get PD object info.");
+		mlx5_glue->dealloc_pd(priv->pd);
+		priv->pd = NULL;
+		return -errno;
+	}
+	priv->pdn = pd_info.pdn;
+	return 0;
+#else
+	(void)priv;
+	DRV_LOG(ERR, "Cannot get pdn - no DV support.");
+	return -ENOTSUP;
+#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
+}
+
+void
+mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_query_mr *entry;
+	struct mlx5_vdpa_query_mr *next;
+	int ret __rte_unused;
+
+	entry = SLIST_FIRST(&priv->mr_list);
+	while (entry) {
+		next = SLIST_NEXT(entry, next);
+		ret = mlx5_devx_cmd_destroy(entry->mkey);
+		assert(!ret);
+		if (!entry->is_indirect) {
+			ret = mlx5_glue->devx_umem_dereg(entry->umem);
+			assert(!ret);
+		}
+		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
+		rte_free(entry);
+		entry = next;
+	}
+	SLIST_INIT(&priv->mr_list);
+	if (priv->null_mr) {
+		ret = mlx5_glue->dereg_mr(priv->null_mr);
+		assert(!ret);
+		priv->null_mr = NULL;
+	}
+	if (priv->pd) {
+		ret = mlx5_glue->dealloc_pd(priv->pd);
+		assert(!ret);
+		priv->pd = NULL;
+	}
+	if (priv->vmem) {
+		free(priv->vmem);
+		priv->vmem = NULL;
+	}
+}
+
+static int
+mlx5_vdpa_regions_addr_cmp(const void *a, const void *b)
+{
+	const struct rte_vhost_mem_region *region_a = a;
+	const struct rte_vhost_mem_region *region_b = b;
+
+	if (region_a->guest_phys_addr < region_b->guest_phys_addr)
+		return -1;
+	if (region_a->guest_phys_addr > region_b->guest_phys_addr)
+		return 1;
+	return 0;
+}
+
+#define KLM_NUM_MAX_ALIGN(sz) (RTE_ALIGN_CEIL(sz, MLX5_MAX_KLM_BYTE_COUNT) / \
+			       MLX5_MAX_KLM_BYTE_COUNT)
+
+/*
+ * Allocate and sort the region list and choose indirect mkey mode:
+ *   1. Calculate GCD, guest memory size and indirect mkey entries num per mode.
+ *   2. Align GCD to the maximum allowed size(2G) and to be power of 2.
+ *   2. Decide the indirect mkey mode according to the next rules:
+ *         a. If both KLM_FBS entries number and KLM entries number are bigger
+ *            than the maximum allowed(max of uint16_t) - error.
+ *         b. KLM mode if KLM_FBS entries number is bigger than the maximum
+ *            allowed(max of uint16_t).
+ *         c. KLM mode if GCD is smaller than the minimum allowed(4K).
+ *         d. KLM mode if the total size of KLM entries is in one cache line
+ *            and the total size of KLM_FBS entries is not in one cache line.
+ *         e. Otherwise, KLM_FBS mode.
+ */
+static struct rte_vhost_memory *
+mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
+				    uint64_t *gcd, uint32_t *entries_num)
+{
+	struct rte_vhost_memory *mem;
+	uint64_t size;
+	uint64_t klm_entries_num = 0;
+	uint64_t klm_fbs_entries_num;
+	uint32_t i;
+	int ret = rte_vhost_get_mem_table(vid, &mem);
+
+	if (ret < 0) {
+		DRV_LOG(ERR, "Failed to get VM memory layout vid =%d.", vid);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	qsort(mem->regions, mem->nregions, sizeof(mem->regions[0]),
+	      mlx5_vdpa_regions_addr_cmp);
+	*mem_size = (mem->regions[(mem->nregions - 1)].guest_phys_addr) +
+				      (mem->regions[(mem->nregions - 1)].size) -
+					      (mem->regions[0].guest_phys_addr);
+	*gcd = 0;
+	for (i = 0; i < mem->nregions; ++i) {
+		DRV_LOG(INFO,  "Region %u: HVA 0x%" PRIx64 ", GPA 0x%" PRIx64
+			", size 0x%" PRIx64 ".", i,
+			mem->regions[i].host_user_addr,
+			mem->regions[i].guest_phys_addr, mem->regions[i].size);
+		if (i > 0) {
+			/* Hole handle. */
+			size = mem->regions[i].guest_phys_addr -
+				(mem->regions[i - 1].guest_phys_addr +
+				 mem->regions[i - 1].size);
+			*gcd = rte_get_gcd(*gcd, size);
+			klm_entries_num += KLM_NUM_MAX_ALIGN(size);
+		}
+		size = mem->regions[i].size;
+		*gcd = rte_get_gcd(*gcd, size);
+		klm_entries_num += KLM_NUM_MAX_ALIGN(size);
+	}
+	if (*gcd > MLX5_MAX_KLM_BYTE_COUNT)
+		*gcd = rte_get_gcd(*gcd, MLX5_MAX_KLM_BYTE_COUNT);
+	if (!RTE_IS_POWER_OF_2(*gcd)) {
+		uint64_t candidate_gcd = rte_align64prevpow2(*gcd);
+
+		while (candidate_gcd > 1 && (*gcd % candidate_gcd))
+			candidate_gcd /= 2;
+		DRV_LOG(DEBUG, "GCD 0x%" PRIx64 " is not power of 2. Adjusted "
+			"GCD is 0x%" PRIx64 ".", *gcd, candidate_gcd);
+		*gcd = candidate_gcd;
+	}
+	klm_fbs_entries_num = *mem_size / *gcd;
+	if (*gcd < MLX5_MIN_KLM_FIXED_BUFFER_SIZE || klm_fbs_entries_num >
+	    UINT16_MAX || ((klm_entries_num * sizeof(struct mlx5_klm)) <=
+	    RTE_CACHE_LINE_SIZE && (klm_fbs_entries_num *
+				    sizeof(struct mlx5_klm)) >
+							RTE_CACHE_LINE_SIZE)) {
+		*mode = MLX5_MKC_ACCESS_MODE_KLM;
+		*entries_num = klm_entries_num;
+		DRV_LOG(INFO, "Indirect mkey mode is KLM.");
+	} else {
+		*mode = MLX5_MKC_ACCESS_MODE_KLM_FBS;
+		*entries_num = klm_fbs_entries_num;
+		DRV_LOG(INFO, "Indirect mkey mode is KLM Fixed Buffer Size.");
+	}
+	DRV_LOG(DEBUG, "Memory registration information: nregions = %u, "
+		"mem_size = 0x%" PRIx64 ", GCD = 0x%" PRIx64
+		", klm_fbs_entries_num = 0x%" PRIx64 ", klm_entries_num = 0x%"
+		PRIx64 ".", mem->nregions, *mem_size, *gcd, klm_fbs_entries_num,
+		klm_entries_num);
+	if (*entries_num > UINT16_MAX) {
+		DRV_LOG(ERR, "Failed to prepare memory of vid %d - memory is "
+			"too fragmented.", vid);
+		free(mem);
+		return NULL;
+	}
+	return mem;
+}
+
+#define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
+				MLX5_MAX_KLM_BYTE_COUNT : (sz))
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_mkey_attr mkey_attr;
+	struct mlx5_vdpa_query_mr *entry = NULL;
+	struct rte_vhost_mem_region *reg = NULL;
+	uint8_t mode;
+	uint32_t entries_num = 0;
+	uint32_t i;
+	uint64_t gcd;
+	uint64_t klm_size;
+	uint64_t mem_size;
+	uint64_t k;
+	int klm_index = 0;
+	int ret;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
+	struct mlx5_klm klm_array[entries_num];
+
+	if (!mem)
+		return -rte_errno;
+	priv->vmem = mem;
+	ret = mlx5_vdpa_pd_prepare(priv);
+	if (ret)
+		goto error;
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		ret = -errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
+		if (!entry) {
+			ret = -ENOMEM;
+			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
+			goto error;
+		}
+		entry->umem = mlx5_glue->devx_umem_reg(priv->ctx,
+					 (void *)(uintptr_t)reg->host_user_addr,
+					     reg->size, IBV_ACCESS_LOCAL_WRITE);
+		if (!entry->umem) {
+			DRV_LOG(ERR, "Failed to register Umem by Devx.");
+			ret = -errno;
+			goto error;
+		}
+		mkey_attr.addr = (uintptr_t)(reg->guest_phys_addr);
+		mkey_attr.size = reg->size;
+		mkey_attr.umem_id = entry->umem->umem_id;
+		mkey_attr.pd = priv->pdn;
+		mkey_attr.pg_access = 1;
+		mkey_attr.klm_array = NULL;
+		mkey_attr.klm_num = 0;
+		entry->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+		if (!entry->mkey) {
+			DRV_LOG(ERR, "Failed to create direct Mkey.");
+			ret = -rte_errno;
+			goto error;
+		}
+		entry->addr = (void *)(uintptr_t)(reg->host_user_addr);
+		entry->length = reg->size;
+		entry->is_indirect = 0;
+		if (i > 0) {
+			uint64_t sadd;
+			uint64_t empty_region_sz = reg->guest_phys_addr -
+					  (mem->regions[i - 1].guest_phys_addr +
+					   mem->regions[i - 1].size);
+
+			if (empty_region_sz > 0) {
+				sadd = mem->regions[i - 1].guest_phys_addr +
+				       mem->regions[i - 1].size;
+				klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
+				      KLM_SIZE_MAX_ALIGN(empty_region_sz) : gcd;
+				for (k = 0; k < empty_region_sz;
+				     k += klm_size) {
+					klm_array[klm_index].byte_count =
+						k + klm_size > empty_region_sz ?
+						 empty_region_sz - k : klm_size;
+					klm_array[klm_index].mkey =
+							    priv->null_mr->lkey;
+					klm_array[klm_index].address = sadd + k;
+					klm_index++;
+				}
+			}
+		}
+		klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
+					    KLM_SIZE_MAX_ALIGN(reg->size) : gcd;
+		for (k = 0; k < reg->size; k += klm_size) {
+			klm_array[klm_index].byte_count = k + klm_size >
+					   reg->size ? reg->size - k : klm_size;
+			klm_array[klm_index].mkey = entry->mkey->id;
+			klm_array[klm_index].address = reg->guest_phys_addr + k;
+			klm_index++;
+		}
+		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
+	}
+	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
+	mkey_attr.size = mem_size;
+	mkey_attr.pd = priv->pdn;
+	mkey_attr.umem_id = 0;
+	/* Must be zero for KLM mode. */
+	mkey_attr.log_entity_size = mode == MLX5_MKC_ACCESS_MODE_KLM_FBS ?
+							  rte_log2_u64(gcd) : 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = klm_array;
+	mkey_attr.klm_num = klm_index;
+	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
+	if (!entry) {
+		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
+		ret = -ENOMEM;
+		goto error;
+	}
+	entry->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+	if (!entry->mkey) {
+		DRV_LOG(ERR, "Failed to create indirect Mkey.");
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->is_indirect = 1;
+	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
+	priv->gpa_mkey_index = entry->mkey->id;
+	return 0;
+error:
+	if (entry) {
+		if (entry->mkey)
+			mlx5_devx_cmd_destroy(entry->mkey);
+		if (entry->umem)
+			mlx5_glue->devx_umem_dereg(entry->umem);
+		rte_free(entry);
+	}
+	mlx5_vdpa_mem_dereg(priv);
+	rte_errno = -ret;
+	return ret;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 19/38] mlx5: share CQ entry check
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (17 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 18/38] vdpa/mlx5: prepare memory regions Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 20/38] vdpa/mlx5: prepare completion queues Matan Azrad
                   ` (20 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The CQE has owner bit to indicate if it is in SW control or HW.
Share a CQE check for all the mlx5 drivers.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 41 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      | 39 +------------------------------------
 2 files changed, 42 insertions(+), 38 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 0f57a27..9d464d4 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -9,8 +9,11 @@
 #include <stdio.h>
 
 #include <rte_pci.h>
+#include <rte_atomic.h>
 #include <rte_log.h>
 
+#include "mlx5_prm.h"
+
 
 /*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
@@ -107,6 +110,44 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* CQE status. */
+enum mlx5_cqe_status {
+	MLX5_CQE_STATUS_SW_OWN = -1,
+	MLX5_CQE_STATUS_HW_OWN = -2,
+	MLX5_CQE_STATUS_ERR = -3,
+};
+
+/**
+ * Check whether CQE is valid.
+ *
+ * @param cqe
+ *   Pointer to CQE.
+ * @param cqes_n
+ *   Size of completion queue.
+ * @param ci
+ *   Consumer index.
+ *
+ * @return
+ *   The CQE status.
+ */
+static __rte_always_inline enum mlx5_cqe_status
+check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
+	  const uint16_t ci)
+{
+	const uint16_t idx = ci & cqes_n;
+	const uint8_t op_own = cqe->op_own;
+	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
+	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
+
+	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
+		return MLX5_CQE_STATUS_HW_OWN;
+	rte_cio_rmb();
+	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
+		     op_code == MLX5_CQE_REQ_ERR))
+		return MLX5_CQE_STATUS_ERR;
+	return MLX5_CQE_STATUS_SW_OWN;
+}
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 84b1fce..9242497 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -33,6 +33,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
@@ -549,44 +550,6 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 #define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
 #endif
 
-/* CQE status. */
-enum mlx5_cqe_status {
-	MLX5_CQE_STATUS_SW_OWN = -1,
-	MLX5_CQE_STATUS_HW_OWN = -2,
-	MLX5_CQE_STATUS_ERR = -3,
-};
-
-/**
- * Check whether CQE is valid.
- *
- * @param cqe
- *   Pointer to CQE.
- * @param cqes_n
- *   Size of completion queue.
- * @param ci
- *   Consumer index.
- *
- * @return
- *   The CQE status.
- */
-static __rte_always_inline enum mlx5_cqe_status
-check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
-	  const uint16_t ci)
-{
-	const uint16_t idx = ci & cqes_n;
-	const uint8_t op_own = cqe->op_own;
-	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
-	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
-
-	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
-		return MLX5_CQE_STATUS_HW_OWN;
-	rte_cio_rmb();
-	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
-		     op_code == MLX5_CQE_REQ_ERR))
-		return MLX5_CQE_STATUS_ERR;
-	return MLX5_CQE_STATUS_SW_OWN;
-}
-
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
  * cloned mbuf is allocated is returned instead.
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 20/38] vdpa/mlx5: prepare completion queues
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (18 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 19/38] mlx5: share CQ entry check Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 21/38] vdpa/mlx5: handle completions Matan Azrad
                   ` (19 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
As an arrangement to the vitrio queues creation, a CQ should be created
for the virtio queue.
The design is to trigger an event for the guest and for the vdpa driver
when a new CQE is posted by the HW after the packet transition.
This patch add the basic operations to create and destroy CQs and to
trigger the CQE events when a new CQE is posted.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile       |   1 +
 drivers/vdpa/mlx5/meson.build    |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h    |  56 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cq.c | 154 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 212 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cq.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 5472797..f813824 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -9,6 +9,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_cq.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 7e5dd95..aec5d34 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -13,6 +13,7 @@ deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
+	'mlx5_vdpa_cq.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e27baea..6008e3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -9,9 +9,27 @@
 
 #include <rte_vdpa.h>
 #include <rte_vhost.h>
+#include <rte_spinlock.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+
+struct mlx5_vdpa_cq {
+	uint16_t log_desc_n;
+	uint32_t cq_ci:24;
+	uint32_t arm_sn:2;
+	rte_spinlock_t sl;
+	struct mlx5_devx_obj *cq;
+	struct mlx5dv_devx_umem *umem_obj;
+	union {
+		volatile void *umem_buf;
+		volatile struct mlx5_cqe *cqes;
+	};
+	volatile uint32_t *db_rec;
+	uint64_t errors;
+};
 
 struct mlx5_vdpa_query_mr {
 	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
@@ -34,6 +52,9 @@ struct mlx5_vdpa_priv {
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
 	struct rte_vhost_memory *vmem;
+	uint32_t eqn;
+	struct mlx5dv_devx_event_channel *eventc;
+	struct mlx5dv_devx_uar *uar;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -57,4 +78,39 @@ struct mlx5_vdpa_priv {
  */
 int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
 
+
+/**
+ * Create a CQ and all its related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] desc_n
+ *   Number of CQEs.
+ * @param[in] callfd
+ *   The guest notification file descriptor.
+ * @param[in/out] cq
+ *   Pointer to the CQ structure.
+ *
+ * @return
+ *   0 on success, -1 otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+			int callfd, struct mlx5_vdpa_cq *cq);
+
+/**
+ * Destroy a CQ and all its related resources.
+ *
+ * @param[in/out] cq
+ *   Pointer to the CQ structure.
+ */
+void mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq);
+
+/**
+ * Release all the CQ global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_cq_global_release(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cq.c b/drivers/vdpa/mlx5/mlx5_vdpa_cq.c
new file mode 100644
index 0000000..563277f
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cq.c
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <unistd.h>
+#include <stdint.h>
+#include <assert.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_lcore.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+void
+mlx5_vdpa_cq_global_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->uar) {
+		mlx5_glue->devx_free_uar(priv->uar);
+		priv->uar = NULL;
+	}
+	if (priv->eventc) {
+		mlx5_glue->devx_destroy_event_channel(priv->eventc);
+		priv->eventc = NULL;
+	}
+	priv->eqn = 0;
+}
+
+/* Prepare all the global resources for all the CQs.*/
+static int
+mlx5_vdpa_cq_global_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t lcore;
+
+	if (priv->eventc)
+		return 0;
+	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
+	if (mlx5_glue->devx_query_eqn(priv->ctx, lcore, &priv->eqn)) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
+		return -1;
+	}
+	priv->eventc = mlx5_glue->devx_create_event_channel(priv->ctx,
+			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+	if (!priv->eventc) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create event channel %d.",
+			rte_errno);
+		goto error;
+	}
+	priv->uar = mlx5_glue->devx_alloc_uar(priv->ctx, 0);
+	if (!priv->uar) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to allocate UAR.");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_vdpa_cq_global_release(priv);
+	return -1;
+}
+
+void
+mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq)
+{
+	int ret __rte_unused;
+
+	if (cq->cq) {
+		ret = mlx5_devx_cmd_destroy(cq->cq);
+		assert(!ret);
+	}
+	if (cq->umem_obj) {
+		ret = mlx5_glue->devx_umem_dereg(cq->umem_obj);
+		assert(!ret);
+	}
+	if (cq->umem_buf)
+		rte_free((void *)(uintptr_t)cq->umem_buf);
+	memset(cq, 0, sizeof(*cq));
+}
+
+int
+mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n, int callfd,
+		    struct mlx5_vdpa_cq *cq)
+{
+	struct mlx5_devx_cq_attr attr;
+	size_t pgsize = sysconf(_SC_PAGESIZE);
+	uint32_t log_desc_n = rte_log2_u32(desc_n);
+	uint32_t umem_size;
+	int ret;
+	uint16_t event_nums[1] = {0};
+
+	if (mlx5_vdpa_cq_global_prepare(priv))
+		return -1;
+	cq->log_desc_n = log_desc_n;
+	umem_size = sizeof(struct mlx5_cqe) * (1 << log_desc_n) +
+							sizeof(*cq->db_rec) * 2;
+	cq->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
+	if (!cq->umem_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for CQ.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	cq->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
+						(void *)(uintptr_t)cq->umem_buf,
+						umem_size,
+						IBV_ACCESS_LOCAL_WRITE);
+	if (!cq->umem_obj) {
+		DRV_LOG(ERR, "Failed to register umem for CQ.");
+		goto error;
+	}
+	attr.q_umem_valid = 1;
+	attr.db_umem_valid = 1;
+	attr.use_first_only = 0;
+	attr.overrun_ignore = 0;
+	attr.uar_page_id = priv->uar->page_id;
+	attr.q_umem_id = cq->umem_obj->umem_id;
+	attr.q_umem_offset = 0;
+	attr.db_umem_id = cq->umem_obj->umem_id;
+	attr.db_umem_offset = sizeof(struct mlx5_cqe) * (1 << log_desc_n);
+	attr.eqn = priv->eqn;
+	attr.log_cq_size = log_desc_n;
+	attr.log_page_size = rte_log2_u32(pgsize);
+	cq->cq = mlx5_devx_cmd_create_cq(priv->ctx, &attr);
+	if (!cq->cq)
+		goto error;
+	cq->db_rec = RTE_PTR_ADD(cq->umem_buf, (uintptr_t)attr.db_umem_offset);
+	cq->cq_ci = 0;
+	rte_spinlock_init(&cq->sl);
+	/* Subscribe CQ event to the event channel controlled by the driver. */
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc, cq->cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)cq);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to subscribe CQE event.");
+		rte_errno = errno;
+		goto error;
+	}
+	/* Subscribe CQ event to the guest FD only if it is not in poll mode. */
+	if (callfd != -1) {
+		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv->eventc,
+							      callfd,
+							      cq->cq->obj, 0);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
+			rte_errno = errno;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	mlx5_vdpa_cq_destroy(cq);
+	return -1;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 21/38] vdpa/mlx5: handle completions
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (19 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 20/38] vdpa/mlx5: prepare completion queues Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 22/38] vdpa/mlx5: prepare virtio queues Matan Azrad
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
In order to free resources in the CQ and to allow to the HW to send
packets from the guest, the CQ should be polled.
In order to poll the CQ we need to trigger an interrupt for each new
CQE posted by the HW.
Register interrupt handler to poll and arm a CQ when completion event
was raised by the HW.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h   |   4 ++
 drivers/vdpa/mlx5/mlx5_vdpa.h    |  24 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cq.c | 129 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 157 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 6db89bb..2c0e023 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -392,6 +392,10 @@ struct mlx5_cqe {
 /* CQE format value. */
 #define MLX5_COMPRESSED 0x3
 
+/* CQ doorbell cmd types. */
+#define MLX5_CQ_DBR_CMD_SOL_ONLY (1 << 24)
+#define MLX5_CQ_DBR_CMD_ALL (0 << 24)
+
 /* Action type of header modification. */
 enum {
 	MLX5_MODIFICATION_TYPE_SET = 0x1,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 6008e3f..617f57a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -10,12 +10,16 @@
 #include <rte_vdpa.h>
 #include <rte_vhost.h>
 #include <rte_spinlock.h>
+#include <rte_interrupts.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
 
 
+#define MLX5_VDPA_INTR_RETRIES 256
+#define MLX5_VDPA_INTR_RETRIES_USEC 1000
+
 struct mlx5_vdpa_cq {
 	uint16_t log_desc_n;
 	uint32_t cq_ci:24;
@@ -55,6 +59,7 @@ struct mlx5_vdpa_priv {
 	uint32_t eqn;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_uar *uar;
+	struct rte_intr_handle intr_handle;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -113,4 +118,23 @@ int mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_cq_global_release(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Setup CQE event.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Unset CQE event .
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cq.c b/drivers/vdpa/mlx5/mlx5_vdpa_cq.c
index 563277f..d64c097 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cq.c
@@ -4,10 +4,15 @@
 #include <unistd.h>
 #include <stdint.h>
 #include <assert.h>
+#include <fcntl.h>
 
 #include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_common.h>
+
+#include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
@@ -78,6 +83,30 @@
 	memset(cq, 0, sizeof(*cq));
 }
 
+static inline void
+mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
+{
+	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
+	uint32_t arm_sn = cq->arm_sn << MLX5_CQ_SQN_OFFSET;
+	uint32_t cq_ci = cq->cq_ci & MLX5_CI_MASK & cqe_mask;
+	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
+	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq->id;
+	uint64_t db_be = rte_cpu_to_be_64(doorbell);
+	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr, MLX5_CQ_DOORBELL);
+
+	rte_io_wmb();
+	cq->db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
+	rte_wmb();
+#ifdef RTE_ARCH_64
+	*(uint64_t *)addr = db_be;
+#else
+	*(uint32_t *)addr = db_be;
+	rte_io_wmb();
+	*((uint32_t *)addr + 1) = db_be >> 32;
+#endif
+	cq->arm_sn++;
+}
+
 int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n, int callfd,
 		    struct mlx5_vdpa_cq *cq)
@@ -147,8 +176,108 @@
 			goto error;
 		}
 	}
+	/* First arming. */
+	mlx5_vdpa_cq_arm(priv, cq);
 	return 0;
 error:
 	mlx5_vdpa_cq_destroy(cq);
 	return -1;
 }
+
+static inline void __rte_unused
+mlx5_vdpa_cq_poll(struct mlx5_vdpa_priv *priv __rte_unused,
+		  struct mlx5_vdpa_cq *cq)
+{
+	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
+	int ret;
+
+	do {
+		volatile struct mlx5_cqe *cqe = cq->cqes + (cq->cq_ci &
+							    cqe_mask);
+
+		ret = check_cqe(cqe, cqe_mask + 1, cq->cq_ci);
+		switch (ret) {
+		case MLX5_CQE_STATUS_ERR:
+			cq->errors++;
+			/*fall-through*/
+		case MLX5_CQE_STATUS_SW_OWN:
+			cq->cq_ci++;
+			break;
+		case MLX5_CQE_STATUS_HW_OWN:
+		default:
+			break;
+		}
+	} while (ret != MLX5_CQE_STATUS_HW_OWN);
+	rte_io_wmb();
+	cq->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+}
+
+static void
+mlx5_vdpa_interrupt_handler(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_EVENT
+	(void)cb_arg;
+	return;
+#else
+	struct mlx5_vdpa_priv *priv = cb_arg;
+	union {
+		struct mlx5dv_devx_async_event_hdr event_resp;
+		uint8_t buf[sizeof(struct mlx5dv_devx_async_event_hdr) + 128];
+	} out;
+
+	while (mlx5_glue->devx_get_event(priv->eventc, &out.event_resp,
+					 sizeof(out.buf)) >=
+				       (ssize_t)sizeof(out.event_resp.cookie)) {
+		struct mlx5_vdpa_cq *cq = (struct mlx5_vdpa_cq *)
+					       (uintptr_t)out.event_resp.cookie;
+		rte_spinlock_lock(&cq->sl);
+		mlx5_vdpa_cq_poll(priv, cq);
+		mlx5_vdpa_cq_arm(priv, cq);
+		rte_spinlock_unlock(&cq->sl);
+		DRV_LOG(DEBUG, "CQ %p event: new cq_ci = %u.", cq, cq->cq_ci);
+	}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+int
+mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
+{
+	int flags = fcntl(priv->eventc->fd, F_GETFL);
+	int ret = fcntl(priv->eventc->fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to change event channel FD.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	priv->intr_handle.fd = priv->eventc->fd;
+	priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&priv->intr_handle,
+				       mlx5_vdpa_interrupt_handler, priv)) {
+		priv->intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register CQE interrupt %d.", rte_errno);
+		return -rte_errno;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
+{
+	int retries = MLX5_VDPA_INTR_RETRIES;
+	int ret = -EAGAIN;
+
+	if (priv->intr_handle.fd) {
+		while (retries-- && ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(&priv->intr_handle,
+						    mlx5_vdpa_interrupt_handler,
+						    priv);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d "
+					"of CQ interrupt, retries = %d.",
+					priv->intr_handle.fd, retries);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+			}
+		}
+		memset(&priv->intr_handle, 0, sizeof(priv->intr_handle));
+	}
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 22/38] vdpa/mlx5: prepare virtio queues
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (20 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 21/38] vdpa/mlx5: handle completions Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 23/38] vdpa/mlx5: support stateless offloads Matan Azrad
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The HW virtq object represents an emulated context for a VIRTIO_NET
virtqueue which was created and managed by a VIRTIO_NET driver as
defined in VIRTIO Specification.
Add support to prepare and release all the basic HW resources needed
the user virtqs emulation according to the rte_vhost configurations.
This patch prepares the basic configurations needed by DevX commands to
create a virtq.
Add new file mlx5_vdpa_virtq.c to manage virtq operations.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile          |   1 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  36 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 218 ++++++++++++++++++++++++++++++++++++
 5 files changed, 257 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index f813824..6418ee8 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_cq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index aec5d34..157fee1 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -14,6 +14,7 @@ sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
 	'mlx5_vdpa_cq.c',
+	'mlx5_vdpa_virtq.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f4af74e..967da2d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -223,6 +223,7 @@
 		goto error;
 	}
 	SLIST_INIT(&priv->mr_list);
+	SLIST_INIT(&priv->virtq_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 617f57a..ea6d3b4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -44,6 +44,19 @@ struct mlx5_vdpa_query_mr {
 	int is_indirect;
 };
 
+struct mlx5_vdpa_virtq {
+	SLIST_ENTRY(mlx5_vdpa_virtq) next;
+	uint16_t index;
+	uint16_t vq_size;
+	struct mlx5_devx_obj *virtq;
+	struct mlx5_vdpa_cq cq;
+	struct {
+		struct mlx5dv_devx_umem *obj;
+		void *buf;
+		uint32_t size;
+	} umems[3];
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	int id; /* vDPA device id. */
@@ -60,6 +73,10 @@ struct mlx5_vdpa_priv {
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_uar *uar;
 	struct rte_intr_handle intr_handle;
+	struct mlx5_devx_obj *td;
+	struct mlx5_devx_obj *tis;
+	uint16_t nr_virtqs;
+	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -137,4 +154,23 @@ int mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Release a virtq and all its related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create all the HW virtqs resources and all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
new file mode 100644
index 0000000..04f05dd
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -0,0 +1,218 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <string.h>
+#include <assert.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+static int
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret __rte_unused;
+	int i;
+
+	if (virtq->virtq) {
+		ret = mlx5_devx_cmd_destroy(virtq->virtq);
+		assert(!ret);
+		virtq->virtq = NULL;
+	}
+	for (i = 0; i < 3; ++i) {
+		if (virtq->umems[i].obj) {
+			ret = mlx5_glue->devx_umem_dereg(virtq->umems[i].obj);
+			assert(!ret);
+		}
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+	}
+	memset(&virtq->umems, 0, sizeof(virtq->umems));
+	if (virtq->cq.cq)
+		mlx5_vdpa_cq_destroy(&virtq->cq);
+	return 0;
+}
+
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_virtq *entry;
+	struct mlx5_vdpa_virtq *next;
+	int ret __rte_unused;
+
+	entry = SLIST_FIRST(&priv->virtq_list);
+	while (entry) {
+		next = SLIST_NEXT(entry, next);
+		mlx5_vdpa_virtq_unset(entry);
+		SLIST_REMOVE(&priv->virtq_list, entry, mlx5_vdpa_virtq, next);
+		rte_free(entry);
+		entry = next;
+	}
+	SLIST_INIT(&priv->virtq_list);
+	if (priv->tis) {
+		ret = mlx5_devx_cmd_destroy(priv->tis);
+		assert(!ret);
+		priv->tis = NULL;
+	}
+	if (priv->td) {
+		ret = mlx5_devx_cmd_destroy(priv->td);
+		assert(!ret);
+		priv->td = NULL;
+	}
+}
+
+static uint64_t
+mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
+{
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+	uint64_t gpa = 0;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		if (hva >= reg->host_user_addr &&
+		    hva < reg->host_user_addr + reg->size) {
+			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
+			break;
+		}
+	}
+	return gpa;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv,
+		      struct mlx5_vdpa_virtq *virtq, int index)
+{
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	uint64_t gpa;
+	int ret;
+	int i;
+	uint16_t last_avail_idx;
+	uint16_t last_used_idx;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	/*
+	 * No need CQ creation when the guest in poll mode or when the
+	 * capability allows it.
+	 */
+	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
+					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+						      MLX5_VIRTQ_EVENT_MODE_CQ :
+						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_CQ) {
+		ret = mlx5_vdpa_cq_create(priv, vq.size, vq.callfd,
+					  &virtq->cq);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create CQ for virtq %d.",
+				index);
+			return -1;
+		}
+		attr.cq_id = virtq->cq.cq->id;
+	} else {
+		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
+			" need CQ and event mechanism.", index);
+	}
+	/* Setup 3 UMEMs for each virtq. */
+	for (i = 0; i < 3; ++i) {
+		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
+							  priv->caps.umems[i].b;
+		assert(virtq->umems[i].size);
+		virtq->umems[i].buf = rte_zmalloc(__func__,
+						  virtq->umems[i].size, 4096);
+		if (!virtq->umems[i].buf) {
+			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				" %u.", i, index);
+			goto error;
+		}
+		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->ctx,
+							virtq->umems[i].buf,
+							virtq->umems[i].size,
+							IBV_ACCESS_LOCAL_WRITE);
+		if (!virtq->umems[i].obj) {
+			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				i, index);
+			goto error;
+		}
+		attr.umems[i].id = virtq->umems[i].obj->umem_id;
+		attr.umems[i].offset = 0;
+		attr.umems[i].size = virtq->umems[i].size;
+	}
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+		goto error;
+	}
+	attr.desc_addr = gpa;
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for used ring.");
+		goto error;
+	}
+	attr.used_addr = gpa;
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for available ring.");
+		goto error;
+	}
+	attr.available_addr = gpa;
+	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
+				 &last_used_idx);
+	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		"virtq %d.", priv->vid, last_avail_idx, last_used_idx, index);
+	attr.hw_available_index = last_avail_idx;
+	attr.hw_used_index = last_used_idx;
+	attr.q_size = vq.size;
+	attr.mkey = priv->gpa_mkey_index;
+	attr.tis_id = priv->tis->id;
+	attr.queue_index = index;
+	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
+	if (!virtq->virtq)
+		goto error;
+	return 0;
+error:
+	mlx5_vdpa_virtq_unset(virtq);
+	return -1;
+}
+
+int
+mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	uint32_t i;
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+
+	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transpprt domain.");
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	priv->tis = mlx5_devx_cmd_create_tis(priv->ctx, &tis_attr);
+	if (!priv->tis) {
+		DRV_LOG(ERR, "Failed to create TIS.");
+		goto error;
+	}
+	for (i = 0; i < nr_vring; i++) {
+		virtq = rte_zmalloc(__func__, sizeof(*virtq), 0);
+		if (!virtq || mlx5_vdpa_virtq_setup(priv, virtq, i)) {
+			if (virtq)
+				rte_free(virtq);
+			goto error;
+		}
+		SLIST_INSERT_HEAD(&priv->virtq_list, virtq, next);
+	}
+	priv->nr_virtqs = nr_vring;
+	return 0;
+error:
+	mlx5_vdpa_virtqs_release(priv);
+	return -1;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 23/38] vdpa/mlx5: support stateless offloads
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (21 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 22/38] vdpa/mlx5: prepare virtio queues Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 24/38] common/mlx5: allow type configuration for DevX RQT Matan Azrad
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add support for the next features in virtq configuration:
	VIRTIO_F_RING_PACKED,
	VIRTIO_NET_F_HOST_TSO4,
	VIRTIO_NET_F_HOST_TSO6,
	VIRTIO_NET_F_CSUM,
	VIRTIO_NET_F_GUEST_CSUM,
	VIRTIO_F_VERSION_1,
These features support depends in the DevX capabilities reported by the
device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |   7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  10 ----
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  10 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 108 ++++++++++++++++++++++++++++------
 4 files changed, 107 insertions(+), 28 deletions(-)
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index fea491d..e4ee34b 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -4,10 +4,15 @@
 ; Refer to default.ini for the full list of available driver features.
 ;
 [Features]
-
+csum                 = Y
+guest csum           = Y
+host tso4            = Y
+host tso6            = Y
+version 1            = Y
 any layout           = Y
 guest announce       = Y
 mq                   = Y
+packed               = Y
 proto mq             = Y
 proto log shmfd      = Y
 proto host notifier  = Y
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 967da2d..be89050 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,8 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
-#include <linux/virtio_net.h>
-
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
@@ -17,14 +15,6 @@
 #include "mlx5_vdpa.h"
 
 
-#ifndef VIRTIO_F_ORDER_PLATFORM
-#define VIRTIO_F_ORDER_PLATFORM 36
-#endif
-
-#ifndef VIRTIO_F_RING_PACKED
-#define VIRTIO_F_RING_PACKED 34
-#endif
-
 #define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
 			    (1ULL << VIRTIO_NET_F_MQ) | \
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index ea6d3b4..a7fd7e2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -5,6 +5,7 @@
 #ifndef RTE_PMD_MLX5_VDPA_H_
 #define RTE_PMD_MLX5_VDPA_H_
 
+#include <linux/virtio_net.h>
 #include <sys/queue.h>
 
 #include <rte_vdpa.h>
@@ -20,6 +21,14 @@
 #define MLX5_VDPA_INTR_RETRIES 256
 #define MLX5_VDPA_INTR_RETRIES_USEC 1000
 
+#ifndef VIRTIO_F_ORDER_PLATFORM
+#define VIRTIO_F_ORDER_PLATFORM 36
+#endif
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
 struct mlx5_vdpa_cq {
 	uint16_t log_desc_n;
 	uint32_t cq_ci:24;
@@ -76,6 +85,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_devx_obj *td;
 	struct mlx5_devx_obj *tis;
 	uint16_t nr_virtqs;
+	uint64_t features; /* Negotiated features. */
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 04f05dd..332913c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -62,6 +62,7 @@
 		assert(!ret);
 		priv->td = NULL;
 	}
+	priv->features = 0;
 }
 
 static uint64_t
@@ -99,6 +100,14 @@
 		return -1;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
+	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
+							VIRTIO_F_VERSION_1));
+	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need CQ creation when the guest in poll mode or when the
 	 * capability allows it.
@@ -145,24 +154,29 @@
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
 	}
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
-		goto error;
-	}
-	attr.desc_addr = gpa;
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for used ring.");
-		goto error;
-	}
-	attr.used_addr = gpa;
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for available ring.");
-		goto error;
+	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.desc);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
+			goto error;
+		}
+		attr.desc_addr = gpa;
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.used);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get GPA for used ring.");
+			goto error;
+		}
+		attr.used_addr = gpa;
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.avail);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get GPA for available ring.");
+			goto error;
+		}
+		attr.available_addr = gpa;
 	}
-	attr.available_addr = gpa;
 	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
 				 &last_used_idx);
 	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
@@ -182,6 +196,61 @@
 	return -1;
 }
 
+static int
+mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+		if (!(priv->caps.virtio_queue_type & (1 <<
+						     MLX5_VIRTQ_TYPE_PACKED))) {
+			DRV_LOG(ERR, "Failed to configur PACKED mode for vdev "
+				"%d - it was not reported by HW/driver"
+				" capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4)) {
+		if (!priv->caps.tso_ipv4) {
+			DRV_LOG(ERR, "Failed to enable TSO4 for vdev %d - TSO4"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6)) {
+		if (!priv->caps.tso_ipv6) {
+			DRV_LOG(ERR, "Failed to enable TSO6 for vdev %d - TSO6"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_CSUM)) {
+		if (!priv->caps.tx_csum) {
+			DRV_LOG(ERR, "Failed to enable CSUM for vdev %d - CSUM"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)) {
+		if (!priv->caps.rx_csum) {
+			DRV_LOG(ERR, "Failed to enable GUEST CSUM for vdev %d"
+				" GUEST CSUM was not reported by HW/driver "
+				"capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_F_VERSION_1)) {
+		if (!priv->caps.virtio_version_1_0) {
+			DRV_LOG(ERR, "Failed to enable version 1 for vdev %d "
+				"version 1 was not reported by HW/driver"
+				" capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -189,7 +258,12 @@
 	struct mlx5_vdpa_virtq *virtq;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
 
+	if (ret || mlx5_vdpa_features_validate(priv)) {
+		DRV_LOG(ERR, "Failed to configure negotiated features.");
+		return -1;
+	}
 	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
 	if (!priv->td) {
 		DRV_LOG(ERR, "Failed to create transpprt domain.");
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 24/38] common/mlx5: allow type configuration for DevX RQT
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (22 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 23/38] vdpa/mlx5: support stateless offloads Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 25/38] common/mlx5: add TIR fields constants Matan Azrad
                   ` (15 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Allow virtio queue type configuration in the RQ table.
The needed fields and configuration was added.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 1 +
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 drivers/common/mlx5/mlx5_prm.h       | 5 +++--
 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index f843606..3b0d7bd 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -844,6 +844,7 @@ struct mlx5_devx_obj *
 	}
 	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
 	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
 	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
 	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
 	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 63c84f8..065b02a 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -179,6 +179,7 @@ struct mlx5_devx_tir_attr {
 
 /* RQT attributes structure, used by RQT operations. */
 struct mlx5_devx_rqt_attr {
+	uint8_t rq_type;
 	uint32_t rqt_max_size:16;
 	uint32_t rqt_actual_size:16;
 	uint32_t rq_list[];
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 2c0e023..52b1aa1 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1720,8 +1720,9 @@ struct mlx5_ifc_rq_num_bits {
 };
 
 struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
+	u8 reserved_at_0[0xa5];
+	u8 list_q_type[0x3];
+	u8 reserved_at_a8[0x8];
 	u8 rqt_max_size[0x10];
 	u8 reserved_at_c0[0x10];
 	u8 rqt_actual_size[0x10];
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 25/38] common/mlx5: add TIR fields constants
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (23 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 24/38] common/mlx5: allow type configuration for DevX RQT Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 26/38] common/mlx5: add DevX command to modify RQT Matan Azrad
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The DevX TIR object configuration should get L3 and L4 protocols
expected to be forwarded by the TIR.
Add the PRM constant values needed to configure the L3 and L4 protocols.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 52b1aa1..f45383d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1625,6 +1625,16 @@ struct mlx5_ifc_modify_rq_in_bits {
 };
 
 enum {
+	MLX5_L3_PROT_TYPE_IPV4 = 0,
+	MLX5_L3_PROT_TYPE_IPV6 = 1,
+};
+
+enum {
+	MLX5_L4_PROT_TYPE_TCP = 0,
+	MLX5_L4_PROT_TYPE_UDP = 1,
+};
+
+enum {
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 26/38] common/mlx5: add DevX command to modify RQT
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (24 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 25/38] common/mlx5: add TIR fields constants Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 27/38] common/mlx5: get DevX capability for max RQT size Matan Azrad
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
RQ table can be changed to support different list of queues.
Add DevX command to modify DevX RQT object to point on new RQ list.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 47 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  2 ++
 drivers/common/mlx5/mlx5_prm.h                  | 21 +++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 71 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 3b0d7bd..badd51e 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -862,6 +862,53 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Modify RQT using DevX API.
+ *
+ * @param[in] rqt
+ *   Pointer to RQT DevX object structure.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(modify_rqt_out)] = {0};
+	uint32_t *in = rte_calloc(__func__, 1, inlen, 0);
+	void *rqt_ctx;
+	int i;
+	int ret;
+
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT modify IN data.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	MLX5_SET(modify_rqt_in, in, opcode, MLX5_CMD_OP_MODIFY_RQT);
+	MLX5_SET(modify_rqt_in, in, rqtn, rqt->id);
+	MLX5_SET64(modify_rqt_in, in, modify_bitmask, 0x1);
+	rqt_ctx = MLX5_ADDR_OF(modify_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	ret = mlx5_glue->devx_obj_modify(rqt->obj, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQT using DevX.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return ret;
+}
+
+/**
  * Create SQ using DevX API.
  *
  * @param[in] ctx
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 065b02a..1f635bb 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -315,5 +315,7 @@ int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 			       struct mlx5_devx_virtq_attr *attr);
 int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
 			      struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			     struct mlx5_devx_rqt_attr *rqt_attr);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index f45383d..8984b8c 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -738,6 +738,7 @@ enum {
 	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_MODIFY_RQT = 0x917,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
 	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
@@ -1760,10 +1761,30 @@ struct mlx5_ifc_create_rqt_in_bits {
 	u8 reserved_at_40[0xc0];
 	struct mlx5_ifc_rqtc_bits rqt_context;
 };
+
+struct mlx5_ifc_modify_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_modify_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_SQC_STATE_RST  = 0x0,
 	MLX5_SQC_STATE_RDY  = 0x1,
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index f3082ce..37a6902 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -15,6 +15,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_rqt;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 27/38] common/mlx5: get DevX capability for max RQT size
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (25 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 26/38] common/mlx5: add DevX command to modify RQT Matan Azrad
@ 2020-01-20 17:02 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 28/38] vdpa/mlx5: add basic steering configurations Matan Azrad
                   ` (12 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:02 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
In order to allow RQT size configuration which is limited to the
correct maximum value, add log_max_rqt_size for DevX capability
structure.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 2 ++
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 2 files changed, 3 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index badd51e..6c1ed10 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -436,6 +436,8 @@ struct mlx5_devx_obj *
 			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
+	attr->log_max_rqt_size = MLX5_GET(cmd_hca_cap, hcattr,
+					  log_max_rqt_size);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
 	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 1f635bb..b3ef245 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -69,6 +69,7 @@ struct mlx5_hca_vdpa_attr {
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
 	uint32_t flow_counters_dump:1;
+	uint32_t log_max_rqt_size:5;
 	uint8_t flow_counter_bulk_alloc_bitmap;
 	uint32_t eth_net_offloads:1;
 	uint32_t eth_virt:1;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 28/38] vdpa/mlx5: add basic steering configurations
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (26 preceding siblings ...)
  2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 27/38] common/mlx5: get DevX capability for max RQT size Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 29/38] vdpa/mlx5: support queue state operation Matan Azrad
                   ` (11 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add a steering object to be managed by a new file mlx5_vdpa_steer.c.
Allow promiscuous flow to scatter the device Rx packets to the virtio
queues using RSS action.
In order to allow correct RSS in L3 and L4, split the flow to 7 flows
as required by the device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile          |   2 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  34 +++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 270 ++++++++++++++++++++++++++++++++++++
 5 files changed, 308 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 6418ee8..f523bbb 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -11,6 +11,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_cq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
+
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 157fee1..9bbf819 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 	'mlx5_vdpa_mem.c',
 	'mlx5_vdpa_cq.c',
 	'mlx5_vdpa_virtq.c',
+	'mlx5_vdpa_steer.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index be89050..d6014fc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -202,6 +202,7 @@
 			goto error;
 		}
 		priv->caps = attr.vdpa;
+		priv->log_max_rqt_size = attr.log_max_rqt_size;
 	}
 	priv->ctx = ctx;
 	priv->dev_addr.pci_addr = pci_dev->addr;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index a7fd7e2..0f91682 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -66,6 +66,18 @@ struct mlx5_vdpa_virtq {
 	} umems[3];
 };
 
+struct mlx5_vdpa_steer {
+	struct mlx5_devx_obj *rqt;
+	void *domain;
+	void *tbl;
+	struct {
+		struct mlx5dv_flow_matcher *matcher;
+		struct mlx5_devx_obj *tir;
+		void *tir_action;
+		void *flow;
+	} rss[7];
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	int id; /* vDPA device id. */
@@ -86,7 +98,9 @@ struct mlx5_vdpa_priv {
 	struct mlx5_devx_obj *tis;
 	uint16_t nr_virtqs;
 	uint64_t features; /* Negotiated features. */
+	uint16_t log_max_rqt_size;
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
+	struct mlx5_vdpa_steer steer;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -183,4 +197,24 @@ int mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Unset steering and release all its related resources- stop traffic.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Setup steering and all its related resources to enable RSS trafic from the
+ * device to all the Rx host queues.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
new file mode 100644
index 0000000..b3cfebd
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -0,0 +1,270 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <assert.h>
+#include <netinet/in.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+int
+mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
+{
+	int ret __rte_unused;
+	unsigned i;
+
+	for (i = 0; i < RTE_DIM(priv->steer.rss); ++i) {
+		if (priv->steer.rss[i].flow) {
+			ret = mlx5_glue->dv_destroy_flow
+						      (priv->steer.rss[i].flow);
+			assert(!ret);
+			priv->steer.rss[i].flow = NULL;
+		}
+		if (priv->steer.rss[i].tir_action) {
+			ret = mlx5_glue->destroy_flow_action
+						(priv->steer.rss[i].tir_action);
+			assert(!ret);
+			priv->steer.rss[i].tir_action = NULL;
+		}
+		if (priv->steer.rss[i].tir) {
+			ret = mlx5_devx_cmd_destroy(priv->steer.rss[i].tir);
+			assert(!ret);
+			priv->steer.rss[i].tir = NULL;
+		}
+		if (priv->steer.rss[i].matcher) {
+			ret = mlx5_glue->dv_destroy_flow_matcher
+						   (priv->steer.rss[i].matcher);
+			assert(!ret);
+			priv->steer.rss[i].matcher = NULL;
+		}
+	}
+	if (priv->steer.tbl) {
+		ret = mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl);
+		assert(!ret);
+		priv->steer.tbl = NULL;
+	}
+	if (priv->steer.domain) {
+		ret = mlx5_glue->dr_destroy_domain(priv->steer.domain);
+		assert(!ret);
+		priv->steer.domain = NULL;
+	}
+	if (priv->steer.rqt) {
+		ret = mlx5_devx_cmd_destroy(priv->steer.rqt);
+		assert(!ret);
+		priv->steer.rqt = NULL;
+	}
+	return 0;
+}
+
+/*
+ * According to VIRTIO_NET Spec the virtqueues index identity its type by:
+ * 0 receiveq1
+ * 1 transmitq1
+ * ...
+ * 2(N-1) receiveqN
+ * 2(N-1)+1 transmitqN
+ * 2N controlq
+ */
+static uint8_t
+is_virtq_recvq(int virtq_index, int nr_vring)
+{
+	if (virtq_index % 2 == 0 && virtq_index != nr_vring - 1)
+		return 1;
+	return 0;
+}
+
+#define MLX5_VDPA_DEFAULT_RQT_SIZE 512
+static int __rte_unused
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_virtq *virtq;
+	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
+				 1 << priv->log_max_rqt_size);
+	struct mlx5_devx_rqt_attr *attr = rte_zmalloc(__func__, sizeof(*attr)
+						      + rqt_n *
+						      sizeof(uint32_t), 0);
+	uint32_t i = 0, j;
+	int ret = 0;
+
+	if (!attr) {
+		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+			attr->rq_list[i] = virtq->virtq->id;
+			i++;
+		}
+	}
+	for (j = 0; i != rqt_n; ++i, ++j)
+		attr->rq_list[i] = attr->rq_list[j];
+	attr->rq_type = MLX5_INLINE_Q_TYPE_VIRTQ;
+	attr->rqt_max_size = rqt_n;
+	attr->rqt_actual_size = rqt_n;
+	if (!priv->steer.rqt) {
+		priv->steer.rqt = mlx5_devx_cmd_create_rqt(priv->ctx, attr);
+		if (!priv->steer.rqt) {
+			DRV_LOG(ERR, "Failed to create RQT.");
+			ret = -rte_errno;
+		}
+	} else {
+		ret = mlx5_devx_cmd_modify_rqt(priv->steer.rqt, attr);
+		if (ret)
+			DRV_LOG(ERR, "Failed to modify RQT.");
+	}
+	rte_free(attr);
+	return ret;
+}
+
+static int __rte_unused
+mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_MLX5DV_DR
+	struct mlx5_devx_tir_attr tir_att = {
+		.disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT,
+		.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ,
+		.transport_domain = priv->td->id,
+		.indirect_table = priv->steer.rqt->id,
+		.rx_hash_symmetric = 1,
+		.rx_hash_toeplitz_key = { 0x2cc681d1, 0x5bdbf4f7, 0xfca28319,
+					  0xdb1a3e94, 0x6b9e38d9, 0x2c9c03d1,
+					  0xad9944a7, 0xd9563d59, 0x063c25f3,
+					  0xfc1fdc2a },
+	};
+	struct {
+		size_t size;
+		/**< Size of match value. Do NOT split size and key! */
+		uint32_t buf[MLX5_ST_SZ_DW(fte_match_param)];
+		/**< Matcher value. This value is used as the mask or a key. */
+	} matcher_mask = {
+				.size = sizeof(matcher_mask.buf),
+			},
+	  matcher_value = {
+				.size = sizeof(matcher_value.buf),
+			};
+	struct mlx5dv_flow_matcher_attr dv_attr = {
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.match_mask = (void *)&matcher_mask,
+	};
+	void *match_m = matcher_mask.buf;
+	void *match_v = matcher_value.buf;
+	void *headers_m = MLX5_ADDR_OF(fte_match_param, match_m, outer_headers);
+	void *headers_v = MLX5_ADDR_OF(fte_match_param, match_v, outer_headers);
+	void *actions[1];
+	const uint8_t l3_hash =
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP) |
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP);
+	const uint8_t l4_hash =
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT) |
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT);
+	enum { PRIO, CRITERIA, IP_VER_M, IP_VER_V, IP_PROT_M, IP_PROT_V, L3_BIT,
+	       L4_BIT, HASH, END};
+	const uint8_t vars[RTE_DIM(priv->steer.rss)][END] = {
+		{ 7, 0, 0, 0, 0, 0, 0, 0, 0 },
+		{ 6, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0, 0,
+		 MLX5_L3_PROT_TYPE_IPV4, 0, l3_hash },
+		{ 6, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0, 0,
+		 MLX5_L3_PROT_TYPE_IPV6, 0, l3_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0xff,
+		 IPPROTO_UDP, MLX5_L3_PROT_TYPE_IPV4, MLX5_L4_PROT_TYPE_UDP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0xff,
+		 IPPROTO_TCP, MLX5_L3_PROT_TYPE_IPV4, MLX5_L4_PROT_TYPE_TCP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0xff,
+		 IPPROTO_UDP, MLX5_L3_PROT_TYPE_IPV6, MLX5_L4_PROT_TYPE_UDP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0xff,
+		 IPPROTO_TCP, MLX5_L3_PROT_TYPE_IPV6, MLX5_L4_PROT_TYPE_TCP,
+		 l3_hash | l4_hash },
+	};
+	unsigned i;
+
+	for (i = 0; i < RTE_DIM(priv->steer.rss); ++i) {
+		dv_attr.priority = vars[i][PRIO];
+		dv_attr.match_criteria_enable = vars[i][CRITERIA];
+		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_version,
+			 vars[i][IP_VER_M]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_version,
+			 vars[i][IP_VER_V]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_protocol,
+			 vars[i][IP_PROT_M]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
+			 vars[i][IP_PROT_V]);
+		tir_att.rx_hash_field_selector_outer.l3_prot_type =
+								vars[i][L3_BIT];
+		tir_att.rx_hash_field_selector_outer.l4_prot_type =
+								vars[i][L4_BIT];
+		tir_att.rx_hash_field_selector_outer.selected_fields =
+								  vars[i][HASH];
+		priv->steer.rss[i].matcher = mlx5_glue->dv_create_flow_matcher
+					 (priv->ctx, &dv_attr, priv->steer.tbl);
+		if (!priv->steer.rss[i].matcher) {
+			DRV_LOG(ERR, "Failed to create matcher %d.", i);
+			goto error;
+		}
+		priv->steer.rss[i].tir = mlx5_devx_cmd_create_tir(priv->ctx,
+								  &tir_att);
+		if (!priv->steer.rss[i].tir) {
+			DRV_LOG(ERR, "Failed to create TIR %d.", i);
+			goto error;
+		}
+		priv->steer.rss[i].tir_action =
+				mlx5_glue->dv_create_flow_action_dest_devx_tir
+						  (priv->steer.rss[i].tir->obj);
+		if (!priv->steer.rss[i].tir_action) {
+			DRV_LOG(ERR, "Failed to create TIR action %d.", i);
+			goto error;
+		}
+		actions[0] = priv->steer.rss[i].tir_action;
+		priv->steer.rss[i].flow = mlx5_glue->dv_create_flow
+					(priv->steer.rss[i].matcher,
+					 (void *)&matcher_value, 1, actions);
+		if (!priv->steer.rss[i].flow) {
+			DRV_LOG(ERR, "Failed to create flow %d.", i);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	/* Resources will be freed by the caller. */
+	return -1;
+#else
+	(void)priv;
+	return -ENOTSUP;
+#endif /* HAVE_MLX5DV_DR */
+}
+
+int
+mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_MLX5DV_DR
+	if (mlx5_vdpa_rqt_prepare(priv))
+		return -1;
+	priv->steer.domain = mlx5_glue->dr_create_domain(priv->ctx,
+						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		goto error;
+	}
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		goto error;
+	}
+	if (mlx5_vdpa_rss_flows_create(priv))
+		goto error;
+	return 0;
+error:
+	mlx5_vdpa_steer_unset(priv);
+	return -1;
+#else
+	(void)priv;
+	return -ENOTSUP;
+#endif /* HAVE_MLX5DV_DR */
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 29/38] vdpa/mlx5: support queue state operation
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (27 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 28/38] vdpa/mlx5: add basic steering configurations Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 30/38] vdpa/mlx5: map doorbell Matan Azrad
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add support for set_vring_state operation.
Using DevX API the virtq state can be changed as described in PRM:
	enable - move to ready state.
	disable - move to suspend state.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 23 ++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 15 +++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 22 ++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 25 +++++++++++++++++++++----
 4 files changed, 78 insertions(+), 7 deletions(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d6014fc..8f078e5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -106,13 +106,34 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_set_vring_state(int vid, int vring, int state)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	struct mlx5_vdpa_virtq *virtq = NULL;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	SLIST_FOREACH(virtq, &priv->virtq_list, next)
+		if (virtq->index == vring)
+			break;
+	if (!virtq) {
+		DRV_LOG(ERR, "Invalid or unconfigured vring id: %d.", vring);
+		return -EINVAL;
+	}
+	return mlx5_vdpa_virtq_enable(virtq, state);
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = NULL,
 	.dev_close = NULL,
-	.set_vring_state = NULL,
+	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = NULL,
 	.migration_done = NULL,
 	.get_vfio_group_fd = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 0f91682..318f1e8 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,8 +55,10 @@ struct mlx5_vdpa_query_mr {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
+	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
+	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_vdpa_cq cq;
 	struct {
@@ -198,6 +200,19 @@ int mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 
 /**
+ * Enable\Disable virtq..
+ *
+ * @param[in] virtq
+ *   The vdpa driver private virtq structure.
+ * @param[in] enable
+ *   Set to enable, otherwise disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_virtq *virtq, int enable);
+
+/**
  * Unset steering and release all its related resources- stop traffic.
  *
  * @param[in] priv
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index b3cfebd..37b7668 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -78,7 +78,7 @@
 }
 
 #define MLX5_VDPA_DEFAULT_RQT_SIZE 512
-static int __rte_unused
+static int
 mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_vdpa_virtq *virtq;
@@ -96,7 +96,8 @@
 		return -ENOMEM;
 	}
 	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
-		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+		if (is_virtq_recvq(virtq->index, priv->nr_virtqs) &&
+		    virtq->enable) {
 			attr->rq_list[i] = virtq->virtq->id;
 			i++;
 		}
@@ -121,6 +122,23 @@
 	return ret;
 }
 
+int
+mlx5_vdpa_virtq_enable(struct mlx5_vdpa_virtq *virtq, int enable)
+{
+	struct mlx5_vdpa_priv *priv = virtq->priv;
+	int ret = 0;
+
+	if (virtq->enable == !!enable)
+		return 0;
+	virtq->enable = !!enable;
+	if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+		ret = mlx5_vdpa_rqt_prepare(priv);
+		if (ret)
+			virtq->enable = !enable;
+	}
+	return ret;
+}
+
 static int __rte_unused
 mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 332913c..a294117 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -15,14 +15,14 @@
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret __rte_unused;
-	int i;
+	unsigned i;
 
 	if (virtq->virtq) {
 		ret = mlx5_devx_cmd_destroy(virtq->virtq);
 		assert(!ret);
 		virtq->virtq = NULL;
 	}
-	for (i = 0; i < 3; ++i) {
+	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 		if (virtq->umems[i].obj) {
 			ret = mlx5_glue->devx_umem_dereg(virtq->umems[i].obj);
 			assert(!ret);
@@ -65,6 +65,19 @@
 	priv->features = 0;
 }
 
+static int
+mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
+{
+	struct mlx5_devx_virtq_attr attr = {
+			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.state = state ? MLX5_VIRTQ_STATE_RDY :
+					 MLX5_VIRTQ_STATE_SUSPEND,
+			.queue_index = virtq->index,
+	};
+
+	return mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+}
+
 static uint64_t
 mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 {
@@ -91,7 +104,7 @@
 	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
-	int i;
+	unsigned i;
 	uint16_t last_avail_idx;
 	uint16_t last_used_idx;
 
@@ -130,7 +143,7 @@
 			" need CQ and event mechanism.", index);
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < 3; ++i) {
+	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
 							  priv->caps.umems[i].b;
 		assert(virtq->umems[i].size);
@@ -188,8 +201,12 @@
 	attr.tis_id = priv->tis->id;
 	attr.queue_index = index;
 	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
+	virtq->priv = priv;
 	if (!virtq->virtq)
 		goto error;
+	if (mlx5_vdpa_virtq_modify(virtq, 1))
+		goto error;
+	virtq->enable = 1;
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 30/38] vdpa/mlx5: map doorbell
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (28 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 29/38] vdpa/mlx5: support queue state operation Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 31/38] vdpa/mlx5: support live migration Matan Azrad
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The HW support only 4 bytes doorbell writing detection.
The virtio device set only 2 bytes when it rings the doorbell.
Map the virtio doorbell detected by the virtio queue kickfd to the HW
VAR space when it expects to get the virtio emulation doorbell.
Use the EAL interrupt mechanism to get notification when a new event
appears in kickfd by the guest and write 4 bytes to the HW doorbell space
in the notification callback.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  3 ++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 84 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 86 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 318f1e8..8503b7b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -66,6 +66,7 @@ struct mlx5_vdpa_virtq {
 		void *buf;
 		uint32_t size;
 	} umems[3];
+	struct rte_intr_handle intr_handle;
 };
 
 struct mlx5_vdpa_steer {
@@ -103,6 +104,8 @@ struct mlx5_vdpa_priv {
 	uint16_t log_max_rqt_size;
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	struct mlx5_vdpa_steer steer;
+	struct mlx5dv_var *var;
+	void *virtq_db_addr;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index a294117..fbcf971 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -3,20 +3,65 @@
  */
 #include <string.h>
 #include <assert.h>
+#include <unistd.h>
+#include <sys/mman.h>
 
 #include <rte_malloc.h>
 #include <rte_errno.h>
+#include <rte_io.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
 
+static void
+mlx5_vdpa_virtq_handler(void *cb_arg)
+{
+	struct mlx5_vdpa_virtq *virtq = cb_arg;
+	struct mlx5_vdpa_priv *priv = virtq->priv;
+	uint64_t buf;
+	int nbytes;
+
+	do {
+		nbytes = read(virtq->intr_handle.fd, &buf, 8);
+		if (nbytes < 0) {
+			if (errno == EINTR ||
+			    errno == EWOULDBLOCK ||
+			    errno == EAGAIN)
+				continue;
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+				virtq->index, strerror(errno));
+		}
+		break;
+	} while (1);
+	assert(priv->virtq_db_addr);
+	rte_write32(virtq->index, priv->virtq_db_addr);
+	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret __rte_unused;
 	unsigned i;
+	int retries = MLX5_VDPA_INTR_RETRIES;
+	int ret = -EAGAIN;
 
+	if (virtq->intr_handle.fd) {
+		while (retries-- && ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(&virtq->intr_handle,
+							mlx5_vdpa_virtq_handler,
+							virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d "
+					"of virtq %d interrupt, retries = %d.",
+					virtq->intr_handle.fd,
+					(int)virtq->index, retries);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+			}
+		}
+		assert(!ret);
+		memset(&virtq->intr_handle, 0, sizeof(virtq->intr_handle));
+	}
 	if (virtq->virtq) {
 		ret = mlx5_devx_cmd_destroy(virtq->virtq);
 		assert(!ret);
@@ -62,6 +107,15 @@
 		assert(!ret);
 		priv->td = NULL;
 	}
+	if (priv->virtq_db_addr) {
+		ret = munmap(priv->virtq_db_addr, priv->var->length);
+		assert(!ret);
+		priv->virtq_db_addr = NULL;
+	}
+	if (priv->var) {
+		mlx5_glue->dv_free_var(priv->var);
+		priv->var = NULL;
+	}
 	priv->features = 0;
 }
 
@@ -207,6 +261,17 @@
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->enable = 1;
+	virtq->intr_handle.fd = vq.kickfd;
+	virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&virtq->intr_handle,
+				       mlx5_vdpa_virtq_handler, virtq)) {
+		virtq->intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register virtq %d interrupt.", index);
+		goto error;
+	} else {
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			virtq->intr_handle.fd, index);
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
@@ -281,6 +346,23 @@
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
 		return -1;
 	}
+	priv->var = mlx5_glue->dv_alloc_var(priv->ctx, 0);
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.\n", errno);
+		return -1;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, priv->ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		goto error;
+	} else {
+		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+			priv->virtq_db_addr);
+	}
 	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
 	if (!priv->td) {
 		DRV_LOG(ERR, "Failed to create transpprt domain.");
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 31/38] vdpa/mlx5: support live migration
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (29 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 30/38] vdpa/mlx5: map doorbell Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 32/38] vdpa/mlx5: support close and config operations Matan Azrad
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add support for live migration feature by the HW:
	Create a single Mkey that maps the memory address space of the
		VHOST live migration log file.
	Modify VIRTIO_NET_Q object and provide vhost_log_page,
		dirty_bitmap_mkey, dirty_bitmap_size, dirty_bitmap_addr
		and dirty_bitmap_dump_enable.
	Modify VIRTIO_NET_Q object and move state to SUSPEND.
	Query VIRTIO_NET_Q and get hw_available_idx and hw_used_idx.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |   1 +
 drivers/vdpa/mlx5/Makefile            |   1 +
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  44 +++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  55 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 132 ++++++++++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 ++-
 7 files changed, 239 insertions(+), 3 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index e4ee34b..1da9c1b 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -9,6 +9,7 @@ guest csum           = Y
 host tso4            = Y
 host tso6            = Y
 version 1            = Y
+log all              = Y
 any layout           = Y
 guest announce       = Y
 mq                   = Y
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index f523bbb..62938b8 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -12,6 +12,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_cq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_lm.c
 
 
 # Basic CFLAGS.
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 9bbf819..60cefd7 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -16,6 +16,7 @@ sources = files(
 	'mlx5_vdpa_cq.c',
 	'mlx5_vdpa_virtq.c',
 	'mlx5_vdpa_steer.c',
+	'mlx5_vdpa_lm.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8f078e5..e536d19 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -19,7 +19,8 @@
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
 			    (1ULL << VIRTIO_NET_F_MQ) | \
 			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
-			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
+			    (1ULL << VIRTIO_F_ORDER_PLATFORM) | \
+			    (1ULL << VHOST_F_LOG_ALL))
 
 #define MLX5_VDPA_PROTOCOL_FEATURES \
 			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
@@ -127,6 +128,45 @@
 	return mlx5_vdpa_virtq_enable(virtq, state);
 }
 
+static int
+mlx5_vdpa_features_set(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	uint64_t log_base, log_size;
+	uint64_t features;
+	int ret;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	ret = rte_vhost_get_negotiated_features(vid, &features);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to get negotiated features.");
+		return ret;
+	}
+	if (RTE_VHOST_NEED_LOG(features)) {
+		ret = rte_vhost_get_log_base(vid, &log_base, &log_size);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to get log base.");
+			return ret;
+		}
+		ret = mlx5_vdpa_dirty_bitmap_set(priv, log_base, log_size);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set dirty bitmap.");
+			return ret;
+		}
+		DRV_LOG(INFO, "mlx5 vdpa: enabling dirty logging...");
+		ret = mlx5_vdpa_logging_enable(priv, 1);
+		if (ret) {
+			DRV_LOG(ERR, "Failed t enable dirty logging.");
+			return ret;
+		}
+	}
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
@@ -134,7 +174,7 @@
 	.dev_conf = NULL,
 	.dev_close = NULL,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
-	.set_features = NULL,
+	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
 	.get_vfio_group_fd = NULL,
 	.get_vfio_device_fd = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 8503b7b..7c0a045 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -235,4 +235,59 @@ int mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Enable\Disable live migration logging.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] enable
+ *   Set for enable, unset for disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable);
+
+/**
+ * Set dirty bitmap logging to allow live migration.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] log_base
+ *   Vhost log base.
+ * @param[in] log_size
+ *   Vhost log size.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
+			       uint64_t log_size);
+
+/**
+ * Log all virtqs information for live migration.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] enable
+ *   Set for enable, unset for disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Modify virtq state to be ready or suspend.
+ *
+ * @param[in] virtq
+ *   The vdpa driver private virtq structure.
+ * @param[in] state
+ *   Set for ready, otherwise suspend.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
new file mode 100644
index 0000000..58ca6d9
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <assert.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+int
+mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
+{
+	struct mlx5_devx_virtq_attr attr = {
+		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.dirty_bitmap_dump_enable = enable,
+	};
+	struct mlx5_vdpa_virtq *virtq;
+
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		attr.queue_index = virtq->index;
+		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to modify virtq %d logging.",
+				virtq->index);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
+			   uint64_t log_size)
+{
+	struct mlx5_devx_mkey_attr mkey_attr = {
+			.addr = (uintptr_t)log_base,
+			.size = log_size,
+			.pd = priv->pdn,
+			.pg_access = 1,
+			.klm_array = NULL,
+			.klm_num = 0,
+	};
+	struct mlx5_devx_virtq_attr attr = {
+		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.dirty_bitmap_addr = log_base,
+		.dirty_bitmap_size = log_size,
+	};
+	struct mlx5_vdpa_query_mr *mr = rte_malloc(__func__, sizeof(*mr), 0);
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!mr) {
+		DRV_LOG(ERR, "Failed to allocate mem for lm mr.");
+		return -1;
+	}
+	mr->umem = mlx5_glue->devx_umem_reg(priv->ctx,
+					    (void *)(uintptr_t)log_base,
+					    log_size, IBV_ACCESS_LOCAL_WRITE);
+	if (!mr->umem) {
+		DRV_LOG(ERR, "Failed to register umem for lm mr.");
+		goto err;
+	}
+	mkey_attr.umem_id = mr->umem->umem_id;
+	mr->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+	if (!mr->mkey) {
+		DRV_LOG(ERR, "Failed to create Mkey for lm.");
+		goto err;
+	}
+	attr.dirty_bitmap_mkey = mr->mkey->id;
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		attr.queue_index = virtq->index;
+		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to modify virtq %d for lm.",
+				virtq->index);
+			goto err;
+		}
+	}
+	mr->is_indirect = 0;
+	SLIST_INSERT_HEAD(&priv->mr_list, mr, next);
+	return 0;
+err:
+	if (mr->mkey)
+		mlx5_devx_cmd_destroy(mr->mkey);
+	if (mr->umem)
+		mlx5_glue->devx_umem_dereg(mr->umem);
+	rte_free(mr);
+	return -1;
+}
+
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
+int
+mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	uint64_t features;
+	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
+
+	if (ret) {
+		DRV_LOG(ERR, "Failed to get negotiated features.");
+		return -1;
+	}
+	if (RTE_VHOST_NEED_LOG(features)) {
+		SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+			ret = mlx5_vdpa_virtq_modify(virtq, 0);
+			if (ret)
+				return -1;
+			if (mlx5_devx_cmd_query_virtq(virtq->virtq, &attr)) {
+				DRV_LOG(ERR, "Failed to query virtq %d.",
+					virtq->index);
+				return -1;
+			}
+			DRV_LOG(INFO, "Query vid %d vring %d: hw_available_idx="
+				"%d, hw_used_index=%d", priv->vid, virtq->index,
+				attr.hw_available_index, attr.hw_used_index);
+			ret = rte_vhost_set_vring_base(priv->vid, virtq->index,
+						       attr.hw_available_index,
+						       attr.hw_used_index);
+			if (ret) {
+				DRV_LOG(ERR, "Failed to set virtq %d base.",
+					virtq->index);
+				return -1;
+			}
+			rte_vhost_log_used_vring(priv->vid, virtq->index, 0,
+				       MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+		}
+	}
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index fbcf971..fa05d01 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -119,7 +119,7 @@
 	priv->features = 0;
 }
 
-static int
+int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
@@ -261,6 +261,12 @@
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->enable = 1;
+	virtq->priv = priv;
+	/* Be sure notifications are not missed during configuration. */
+	ret = rte_vhost_enable_guest_notification(priv->vid, index, 1);
+	assert(!ret);
+	rte_write32(virtq->index, priv->virtq_db_addr);
+	/* Setup doorbell mapping. */
 	virtq->intr_handle.fd = vq.kickfd;
 	virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
 	if (rte_intr_callback_register(&virtq->intr_handle,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 32/38] vdpa/mlx5: support close and config operations
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (30 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 31/38] vdpa/mlx5: support live migration Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 33/38] mlx5: skip probing according to the vDPA mode Matan Azrad
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Support dev_conf and dev_conf operations.
These operations allow vdpa traffic.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 51 +++++++++++++++++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e536d19..f27a1a4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -167,12 +167,59 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	int ret = 0;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	if (priv->configured)
+		ret |= mlx5_vdpa_lm_log(priv);
+	mlx5_vdpa_cqe_event_unset(priv);
+	ret |= mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_cq_global_release(priv);
+	mlx5_vdpa_mem_dereg(priv);
+	priv->configured = 0;
+	priv->vid = 0;
+	return ret;
+}
+
+static int
+mlx5_vdpa_dev_config(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
+		return -1;
+	}
+	priv->vid = vid;
+	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_virtqs_prepare(priv) ||
+	    mlx5_vdpa_steer_setup(priv) || mlx5_vdpa_cqe_event_setup(priv)) {
+		mlx5_vdpa_dev_close(vid);
+		return -1;
+	}
+	priv->configured = 1;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
-	.dev_conf = NULL,
-	.dev_close = NULL,
+	.dev_conf = mlx5_vdpa_dev_config,
+	.dev_close = mlx5_vdpa_dev_close,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 7c0a045..b5fec50 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -83,6 +83,7 @@ struct mlx5_vdpa_steer {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	uint8_t configured;
 	int id; /* vDPA device id. */
 	int vid; /* vhost device id. */
 	struct ibv_context *ctx; /* Device context. */
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 33/38] mlx5: skip probing according to the vDPA mode
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (31 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 32/38] vdpa/mlx5: support close and config operations Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 34/38] net/mlx5: separate Netlink commands interface Matan Azrad
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Both, net/mlx5 PMD and vdpa/mlx5 drivers can probe the same Mellanox
devices.
Skip net/mlx5 PMD probing while the device is in vDPA mode selected by
the device devargs provided by the user: vdpa=1.
Skip vdpa/mlx5 PMD probing while the device is not in vDPA mode selected by
the device devargs provided by the user: vdpa=1.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/meson.build                 |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 36 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  3 +++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 drivers/net/mlx5/mlx5.c                         |  7 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa.c                   |  7 ++++-
 7 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 82403a2..aeacce3 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -42,7 +42,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 74419c6..8291a92 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -37,7 +37,7 @@ endforeach
 
 if build
 	allow_experimental_apis = true
-	deps += ['hash', 'pci', 'net', 'eal']
+	deps += ['hash', 'pci', 'net', 'eal', 'kvargs']
 	ext_deps += libs
 	sources = files(
 		'mlx5_devx_cmds.c',
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 2381208..f756b6b 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -71,6 +71,42 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_check_handler(__rte_unused const char *key, const char *value,
+			__rte_unused void *opaque)
+{
+	if (strcmp(value, "1"))
+		return -1;
+	return 0;
+}
+
+int
+mlx5_vdpa_mode_selected(struct rte_devargs *devargs)
+{
+	struct rte_kvargs *kvlist;
+	const char *key = "vdpa";
+	int ret = 0;
+
+	if (devargs == NULL)
+		return 0;
+
+	kvlist = rte_kvargs_parse(devargs->args, NULL);
+	if (kvlist == NULL)
+		return 0;
+
+	if (!rte_kvargs_count(kvlist, key))
+		goto exit;
+
+	/* Vdpa mode selected when there's a key-value pair: vdpa=1. */
+	if (rte_kvargs_process(kvlist, key, mlx5_vdpa_check_handler, NULL) < 0)
+		goto exit;
+	ret = 1;
+
+exit:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9d464d4..aeaa7b9 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -11,6 +11,8 @@
 #include <rte_pci.h>
 #include <rte_atomic.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "mlx5_prm.h"
 
@@ -149,5 +151,6 @@ enum mlx5_cqe_status {
 }
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+int mlx5_vdpa_mode_selected(struct rte_devargs *devargs);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 37a6902..16b9b34 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -24,4 +24,5 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
+	mlx5_vdpa_mode_selected;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 75175c9..27cea8b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2956,7 +2956,12 @@ struct mlx5_flow_id_pool *
 	struct mlx5_dev_config dev_config;
 	int ret;
 
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+        if (mlx5_vdpa_mode_selected(pci_dev->device.devargs)) {
+                DRV_LOG(DEBUG, "Skip probing - should be probed by the vdpa"
+                        " driver.");
+                return 1;
+        }
+ 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
 		mlx5_pmd_socket_init();
 	ret = mlx5_init_once();
 	if (ret) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f27a1a4..2ceef18 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -243,7 +243,7 @@
  */
 static int
 mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		    struct rte_pci_device *pci_dev __rte_unused)
+		    struct rte_pci_device *pci_dev)
 {
 	struct ibv_device **ibv_list;
 	struct ibv_device *ibv_match = NULL;
@@ -252,6 +252,11 @@
 	struct mlx5_hca_attr attr;
 	int ret;
 
+	if (!mlx5_vdpa_mode_selected(pci_dev->device.devargs)) {
+		DRV_LOG(DEBUG, "Skip mlx5 vdpa probing - no \"vdpa=1\" in"
+			" devargs.");
+		return 1;
+	}
 	errno = 0;
 	ibv_list = mlx5_glue->get_device_list(&ret);
 	if (!ibv_list) {
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 34/38] net/mlx5: separate Netlink commands interface
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (32 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 33/38] mlx5: skip probing according to the vDPA mode Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 35/38] net/mlx5: reduce Netlink commands dependencies Matan Azrad
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
The Netlink commands interfaces is included in the mlx5.h file with a
lot of other PMD interfaces.
As an arrangement to make the Netlink commands shared with different
PMDs, this patch moves the Netlink interface to a new file called
mlx5_nl.h.
Move non Netlink pure vlan commands from mlx5_nl.c to the
mlx5_vlan.c.
Rename all the Netlink commands and structure to use prefix mlx5_nl.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  72 +++------------------
 drivers/net/mlx5/mlx5_nl.c   | 149 +++----------------------------------------
 drivers/net/mlx5/mlx5_nl.h   |  69 ++++++++++++++++++++
 drivers/net/mlx5/mlx5_vlan.c | 134 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 220 insertions(+), 204 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f6488e1..5cfcf99 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -39,6 +39,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
+#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
@@ -75,24 +76,6 @@ struct mlx5_mp_param {
 /** Key string for IPC. */
 #define MLX5_MP_NAME "net_mlx5_mp"
 
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
 
 LIST_HEAD(mlx5_dev_list, mlx5_ibv_shared);
 
@@ -226,30 +209,12 @@ enum mlx5_verbs_alloc_type {
 	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
 };
 
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
 	uint32_t created:1;
 };
 
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t nl_sn;
-	uint32_t vf_ifindex;
-	struct rte_eth_dev *dev;
-	struct mlx5_vlan_dev vlan_dev[4096];
-};
-
 /**
  * Verbs allocator needs a context to know in the callback which kind of
  * resources it is allocating.
@@ -574,7 +539,7 @@ struct mlx5_priv {
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
 	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
-	struct mlx5_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
+	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
@@ -670,6 +635,8 @@ int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 void mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx5_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
 		      uint32_t index, uint32_t vmdq);
+struct mlx5_nl_vlan_vmwa_context *mlx5_vlan_vmwa_init
+				    (struct rte_eth_dev *dev, uint32_t ifindex);
 int mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr);
 int mlx5_set_mc_addr_list(struct rte_eth_dev *dev,
 			struct rte_ether_addr *mc_addr_set,
@@ -713,6 +680,11 @@ int mlx5_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
 int mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 void mlx5_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t queue, int on);
 int mlx5_vlan_offload_set(struct rte_eth_dev *dev, int mask);
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *ctx);
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
 
 /* mlx5_trigger.c */
 
@@ -794,32 +766,6 @@ int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
 int mlx5_pmd_socket_init(void);
 void mlx5_pmd_socket_uninit(void);
 
-/* mlx5_nl.c */
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-struct mlx5_vlan_vmwa_context *mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-						   uint32_t ifindex);
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *ctx);
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index e7ba034..3fe4b6f 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -5,7 +5,6 @@
 
 #include <errno.h>
 #include <linux/if_link.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
@@ -18,8 +17,6 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
-#include <rte_malloc.h>
-#include <rte_hypervisor.h>
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -1072,7 +1069,8 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_switch_info(int nl, unsigned int ifindex, struct mlx5_switch_info *info)
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
 {
 	uint32_t seq = random();
 	struct {
@@ -1116,12 +1114,12 @@ struct mlx5_nl_ifindex_data {
  * Delete VLAN network device by ifindex.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Interface index of network device to delete.
  */
-static void
-mlx5_vlan_vmwa_delete(struct mlx5_vlan_vmwa_context *vmwa,
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
 	int ret;
@@ -1196,14 +1194,14 @@ struct mlx5_nl_ifindex_data {
  * Create network VLAN device with specified VLAN tag.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Base network interface index.
  * @param[in] tag
  *   VLAN tag for VLAN network device to create.
  */
-static uint32_t
-mlx5_vlan_vmwa_create(struct mlx5_vlan_vmwa_context *vmwa,
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex,
 		      uint16_t tag)
 {
@@ -1269,134 +1267,3 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
-
-/*
- * Release VLAN network device, created for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to release.
- */
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(vlan->created);
-	assert(priv->vmwa_context);
-	if (!vlan->created || !vmwa)
-		return;
-	vlan->created = 0;
-	assert(vlan_dev[vlan->tag].refcnt);
-	if (--vlan_dev[vlan->tag].refcnt == 0 &&
-	    vlan_dev[vlan->tag].ifindex) {
-		mlx5_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex = 0;
-	}
-}
-
-/**
- * Acquire VLAN interface with specified tag for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to acquire.
- */
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(!vlan->created);
-	assert(priv->vmwa_context);
-	if (vlan->created || !vmwa)
-		return;
-	if (vlan_dev[vlan->tag].refcnt == 0) {
-		assert(!vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex =
-			mlx5_vlan_vmwa_create(vmwa,
-					      vmwa->vf_ifindex,
-					      vlan->tag);
-	}
-	if (vlan_dev[vlan->tag].ifindex) {
-		vlan_dev[vlan->tag].refcnt++;
-		vlan->created = 1;
-	}
-}
-
-/*
- * Create per ethernet device VLAN VM workaround context
- */
-struct mlx5_vlan_vmwa_context *
-mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-		    uint32_t ifindex)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_dev_config *config = &priv->config;
-	struct mlx5_vlan_vmwa_context *vmwa;
-	enum rte_hypervisor hv_type;
-
-	/* Do not engage workaround over PF. */
-	if (!config->vf)
-		return NULL;
-	/* Check whether there is desired virtual environment */
-	hv_type = rte_hypervisor_get();
-	switch (hv_type) {
-	case RTE_HYPERVISOR_UNKNOWN:
-	case RTE_HYPERVISOR_VMWARE:
-		/*
-		 * The "white list" of configurations
-		 * to engage the workaround.
-		 */
-		break;
-	default:
-		/*
-		 * The configuration is not found in the "white list".
-		 * We should not engage the VLAN workaround.
-		 */
-		return NULL;
-	}
-	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
-	if (!vmwa) {
-		DRV_LOG(WARNING,
-			"Can not allocate memory"
-			" for VLAN workaround context");
-		return NULL;
-	}
-	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
-	if (vmwa->nl_socket < 0) {
-		DRV_LOG(WARNING,
-			"Can not create Netlink socket"
-			" for VLAN workaround context");
-		rte_free(vmwa);
-		return NULL;
-	}
-	vmwa->nl_sn = random();
-	vmwa->vf_ifindex = ifindex;
-	vmwa->dev = dev;
-	/* Cleanup for existing VLAN devices. */
-	return vmwa;
-}
-
-/*
- * Destroy per ethernet device VLAN VM workaround context
- */
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *vmwa)
-{
-	unsigned int i;
-
-	/* Delete all remaining VLAN devices. */
-	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
-		if (vmwa->vlan_dev[i].ifindex)
-			mlx5_vlan_vmwa_delete(vmwa, vmwa->vlan_dev[i].ifindex);
-	}
-	if (vmwa->nl_socket >= 0)
-		close(vmwa->nl_socket);
-	rte_free(vmwa);
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..7903673
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t nl_sn;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			 uint32_t index);
+int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
+void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
+int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
+int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			   uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index b0fa31a..fb52d8f 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -7,6 +7,8 @@
 #include <errno.h>
 #include <assert.h>
 #include <stdint.h>
+#include <unistd.h>
+
 
 /*
  * Not needed by this file; included to work around the lack of off_t
@@ -26,6 +28,8 @@
 
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_hypervisor.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -33,6 +37,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
@@ -193,3 +198,132 @@
 	}
 	return 0;
 }
+
+/*
+ * Release VLAN network device, created for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to release.
+ */
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(vlan->created);
+	assert(priv->vmwa_context);
+	if (!vlan->created || !vmwa)
+		return;
+	vlan->created = 0;
+	assert(vlan_dev[vlan->tag].refcnt);
+	if (--vlan_dev[vlan->tag].refcnt == 0 &&
+	    vlan_dev[vlan->tag].ifindex) {
+		mlx5_nl_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex = 0;
+	}
+}
+
+/**
+ * Acquire VLAN interface with specified tag for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to acquire.
+ */
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(!vlan->created);
+	assert(priv->vmwa_context);
+	if (vlan->created || !vmwa)
+		return;
+	if (vlan_dev[vlan->tag].refcnt == 0) {
+		assert(!vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex =
+			mlx5_nl_vlan_vmwa_create(vmwa, vmwa->vf_ifindex,
+						 vlan->tag);
+	}
+	if (vlan_dev[vlan->tag].ifindex) {
+		vlan_dev[vlan->tag].refcnt++;
+		vlan->created = 1;
+	}
+}
+
+/*
+ * Create per ethernet device VLAN VM workaround context
+ */
+struct mlx5_nl_vlan_vmwa_context *
+mlx5_vlan_vmwa_init(struct rte_eth_dev *dev, uint32_t ifindex)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
+	struct mlx5_nl_vlan_vmwa_context *vmwa;
+	enum rte_hypervisor hv_type;
+
+	/* Do not engage workaround over PF. */
+	if (!config->vf)
+		return NULL;
+	/* Check whether there is desired virtual environment */
+	hv_type = rte_hypervisor_get();
+	switch (hv_type) {
+	case RTE_HYPERVISOR_UNKNOWN:
+	case RTE_HYPERVISOR_VMWARE:
+		/*
+		 * The "white list" of configurations
+		 * to engage the workaround.
+		 */
+		break;
+	default:
+		/*
+		 * The configuration is not found in the "white list".
+		 * We should not engage the VLAN workaround.
+		 */
+		return NULL;
+	}
+	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
+	if (!vmwa) {
+		DRV_LOG(WARNING,
+			"Can not allocate memory"
+			" for VLAN workaround context");
+		return NULL;
+	}
+	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
+	if (vmwa->nl_socket < 0) {
+		DRV_LOG(WARNING,
+			"Can not create Netlink socket"
+			" for VLAN workaround context");
+		rte_free(vmwa);
+		return NULL;
+	}
+	vmwa->nl_sn = random();
+	vmwa->vf_ifindex = ifindex;
+	/* Cleanup for existing VLAN devices. */
+	return vmwa;
+}
+
+/*
+ * Destroy per ethernet device VLAN VM workaround context
+ */
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *vmwa)
+{
+	unsigned int i;
+
+	/* Delete all remaining VLAN devices. */
+	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
+		if (vmwa->vlan_dev[i].ifindex)
+			mlx5_nl_vlan_vmwa_delete(vmwa,
+						 vmwa->vlan_dev[i].ifindex);
+	}
+	if (vmwa->nl_socket >= 0)
+		close(vmwa->nl_socket);
+	rte_free(vmwa);
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 35/38] net/mlx5: reduce Netlink commands dependencies
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (33 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 34/38] net/mlx5: separate Netlink commands interface Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 36/38] mlx5: share Netlink commands Matan Azrad
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
As an arrangment for Netlink command moving to the common library,
reduce the net/mlx5 dependecies.
Replace ethdev class command parameters.
Improve Netlink sequence number mechanism to be controlled by the
mlx5 Netlink mechanism.
Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one
which uses it.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  10 +-
 drivers/net/mlx5/mlx5.h        |   3 -
 drivers/net/mlx5/mlx5_ethdev.c |  49 ------
 drivers/net/mlx5/mlx5_mac.c    |  14 +-
 drivers/net/mlx5/mlx5_nl.c     | 329 +++++++++++++++++++++++++----------------
 drivers/net/mlx5/mlx5_nl.h     |  23 +--
 drivers/net/mlx5/mlx5_rxmode.c |  12 +-
 drivers/net/mlx5/mlx5_vlan.c   |   1 -
 8 files changed, 236 insertions(+), 205 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 27cea8b..67daa43 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1266,7 +1266,9 @@ struct mlx5_flow_id_pool *
 	if (priv->reta_idx != NULL)
 		rte_free(priv->reta_idx);
 	if (priv->config.vf)
-		mlx5_nl_mac_addr_flush(dev);
+		mlx5_nl_mac_addr_flush(priv->nl_socket_route, mlx5_ifindex(dev),
+				       dev->data->mac_addrs,
+				       MLX5_MAX_MAC_ADDRESSES, priv->mac_own);
 	if (priv->nl_socket_route >= 0)
 		close(priv->nl_socket_route);
 	if (priv->nl_socket_rdma >= 0)
@@ -2323,7 +2325,6 @@ struct mlx5_flow_id_pool *
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
-	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
 	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
@@ -2641,7 +2642,10 @@ struct mlx5_flow_id_pool *
 	/* Register MAC address. */
 	claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
 	if (config.vf && config.vf_nl_en)
-		mlx5_nl_mac_addr_sync(eth_dev);
+		mlx5_nl_mac_addr_sync(priv->nl_socket_route,
+				      mlx5_ifindex(eth_dev),
+				      eth_dev->data->mac_addrs,
+				      MLX5_MAX_MAC_ADDRESSES);
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
 	TAILQ_INIT(&priv->flow_meters);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5cfcf99..5d25d8b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -537,7 +537,6 @@ struct mlx5_priv {
 	/* Context for Verbs allocator. */
 	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
-	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
@@ -615,8 +614,6 @@ int mlx5_sysfs_switch_info(unsigned int ifindex,
 			   struct mlx5_switch_info *info);
 void mlx5_sysfs_check_switch_info(bool device_dir,
 				  struct mlx5_switch_info *switch_info);
-void mlx5_nl_check_switch_info(bool nun_vf_set,
-			       struct mlx5_switch_info *switch_info);
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
 void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2628e64..5484104 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1891,55 +1891,6 @@ struct mlx5_priv *
 }
 
 /**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
  * Analyze gathered port parameters via sysfs to recognize master
  * and representor devices for E-Switch configuration.
  *
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index a646b90..0ab2a0e 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -74,8 +74,9 @@
 	if (rte_is_zero_ether_addr(&dev->data->mac_addrs[index]))
 		return;
 	if (vf)
-		mlx5_nl_mac_addr_remove(dev, &dev->data->mac_addrs[index],
-					index);
+		mlx5_nl_mac_addr_remove(priv->nl_socket_route,
+					mlx5_ifindex(dev), priv->mac_own,
+					&dev->data->mac_addrs[index], index);
 	memset(&dev->data->mac_addrs[index], 0, sizeof(struct rte_ether_addr));
 }
 
@@ -117,7 +118,9 @@
 		return -rte_errno;
 	}
 	if (vf) {
-		int ret = mlx5_nl_mac_addr_add(dev, mac, index);
+		int ret = mlx5_nl_mac_addr_add(priv->nl_socket_route,
+					       mlx5_ifindex(dev), priv->mac_own,
+					       mac, index);
 
 		if (ret)
 			return ret;
@@ -209,8 +212,9 @@
 			if (priv->master == 1) {
 				priv = dev->data->dev_private;
 				return mlx5_nl_vf_mac_addr_modify
-					(&rte_eth_devices[port_id],
-					 mac_addr, priv->representor_id);
+				       (priv->nl_socket_route,
+					mlx5_ifindex(&rte_eth_devices[port_id]),
+					mac_addr, priv->representor_id);
 			}
 		}
 		rte_errno = -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 3fe4b6f..6b8ca00 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -17,8 +17,11 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
+#include <rte_atomic.h>
+#include <rte_ether.h>
 
 #include "mlx5.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /* Size of the buffer to receive kernel messages */
@@ -109,6 +112,11 @@ struct mlx5_nl_ifindex_data {
 	uint32_t portnum; /**< IB device max port number (out). */
 };
 
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
 /**
  * Opens a Netlink socket.
  *
@@ -369,8 +377,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Get bridge MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac[out]
  *   Pointer to the array table of MAC addresses to fill.
  *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
@@ -381,11 +391,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_list(struct rte_eth_dev *dev, struct rte_ether_addr (*mac)[],
-		      int *mac_n)
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr	hdr;
 		struct ifinfomsg ifm;
@@ -404,33 +412,33 @@ struct mlx5_nl_ifindex_data {
 		.mac = mac,
 		.mac_n = 0,
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_request(fd, &req.hdr, sn, &req.ifm,
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
 			      sizeof(struct ifinfomsg));
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, mlx5_nl_mac_addr_cb, &data);
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
 	if (ret < 0)
 		goto error;
 	*mac_n = data.mac_n;
 	return 0;
 error:
-	DRV_LOG(DEBUG, "port %u cannot retrieve MAC address list %s",
-		dev->data->port_id, strerror(rte_errno));
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
 	return -rte_errno;
 }
 
 /**
  * Modify the MAC address neighbour table with Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *   MAC address to consider.
  * @param add
@@ -440,11 +448,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_modify(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			int add)
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ndmsg ndm;
@@ -468,28 +474,26 @@ struct mlx5_nl_ifindex_data {
 			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
 	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
 	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
 		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
 error:
 	DRV_LOG(DEBUG,
-		"port %u cannot %s MAC address %02X:%02X:%02X:%02X:%02X:%02X"
-		" %s",
-		dev->data->port_id,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
 		add ? "add" : "remove",
 		mac->addr_bytes[0], mac->addr_bytes[1],
 		mac->addr_bytes[2], mac->addr_bytes[3],
@@ -501,8 +505,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Modify the VF MAC address neighbour table with Netlink.
  *
- * @param dev
- *    Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *    MAC address to consider.
  * @param vf_index
@@ -512,12 +518,10 @@ struct mlx5_nl_ifindex_data {
  *    0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			   struct rte_ether_addr *mac, int vf_index)
 {
-	int fd, ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
+	int ret;
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifm;
@@ -546,10 +550,10 @@ struct mlx5_nl_ifindex_data {
 			.rta_type = IFLA_VF_MAC,
 		},
 	};
-	uint32_t sn = priv->nl_sn++;
 	struct ifla_vf_mac ivm = {
 		.vf = vf_index,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 
 	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
 	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
@@ -564,13 +568,12 @@ struct mlx5_nl_ifindex_data {
 	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
 					       &req.vf_info_rta);
 
-	fd = priv->nl_socket_route;
-	if (fd < 0)
+	if (nlsk_fd < 0)
 		return -1;
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
@@ -589,8 +592,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Add a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to register.
  * @param index
@@ -600,15 +607,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
 		     uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_modify(dev, mac, 1);
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
 	if (!ret)
-		BITFIELD_SET(priv->mac_own, index);
+		BITFIELD_SET(mac_own, index);
 	if (ret == -EEXIST)
 		return 0;
 	return ret;
@@ -617,8 +624,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Remove a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to remove.
  * @param index
@@ -628,46 +639,50 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			uint32_t index)
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	BITFIELD_RESET(priv->mac_own, index);
-	return mlx5_nl_mac_addr_modify(dev, mac, 0);
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
 }
 
 /**
  * Synchronize Netlink bridge table to the internal table.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
  */
 void
-mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
 {
-	struct rte_ether_addr macs[MLX5_MAX_MAC_ADDRESSES];
+	struct rte_ether_addr macs[n];
 	int macs_n = 0;
 	int i;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_list(dev, &macs, &macs_n);
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
 	if (ret)
 		return;
 	for (i = 0; i != macs_n; ++i) {
 		int j;
 
 		/* Verify the address is not in the array yet. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j)
-			if (rte_is_same_ether_addr(&macs[i],
-					       &dev->data->mac_addrs[j]))
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
 				break;
-		if (j != MLX5_MAX_MAC_ADDRESSES)
+		if (j != n)
 			continue;
 		/* Find the first entry available. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j) {
-			if (rte_is_zero_ether_addr(&dev->data->mac_addrs[j])) {
-				dev->data->mac_addrs[j] = macs[i];
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
 				break;
 			}
 		}
@@ -677,28 +692,40 @@ struct mlx5_nl_ifindex_data {
 /**
  * Flush all added MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  */
 void
-mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 
-	for (i = MLX5_MAX_MAC_ADDRESSES - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &dev->data->mac_addrs[i];
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
 
-		if (BITFIELD_ISSET(priv->mac_own, i))
-			mlx5_nl_mac_addr_remove(dev, m, i);
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
 	}
 }
 
 /**
  * Enable promiscuous / all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param flags
  *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
  * @param enable
@@ -708,10 +735,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_device_flags(struct rte_eth_dev *dev, uint32_t flags, int enable)
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifi;
@@ -727,14 +753,13 @@ struct mlx5_nl_ifindex_data {
 			.ifi_index = iface_idx,
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (priv->nl_socket_route < 0)
+	if (nlsk_fd < 0)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_send(fd, &req.hdr, priv->nl_sn++);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		return ret;
 	return 0;
@@ -743,8 +768,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable promiscuous mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -752,14 +779,14 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_promisc(struct rte_eth_dev *dev, int enable)
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_PROMISC, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s promisc mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -767,8 +794,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -776,14 +805,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable)
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_ALLMULTI, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s allmulti mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -879,7 +909,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
 		.flags = 0,
@@ -900,19 +929,20 @@ struct mlx5_nl_ifindex_data {
 		},
 	};
 	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
 	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
 		goto error;
 	data.flags = 0;
-	++seq;
+	sn = MLX5_NL_SN_GENERATE;
 	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
 					     RDMA_NLDEV_CMD_PORT_GET);
 	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
@@ -927,10 +957,10 @@ struct mlx5_nl_ifindex_data {
 	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
 	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
 	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -959,7 +989,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_portnum(int nl, const char *name)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.flags = 0,
 		.name = name,
@@ -972,12 +1001,13 @@ struct mlx5_nl_ifindex_data {
 					       RDMA_NLDEV_CMD_GET),
 		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req, seq);
+	ret = mlx5_nl_send(nl, &req, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -992,6 +1022,55 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
@@ -1072,7 +1151,6 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_switch_info(int nl, unsigned int ifindex,
 		    struct mlx5_switch_info *info)
 {
-	uint32_t seq = random();
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
@@ -1096,11 +1174,12 @@ struct mlx5_nl_ifindex_data {
 		},
 		.extmask = RTE_LE32(1),
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, seq, mlx5_nl_switch_info_cb, info);
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
 	if (info->master && info->representor) {
 		DRV_LOG(ERR, "ifindex %u device is recognized as master"
 			     " and as representor", ifindex);
@@ -1122,6 +1201,7 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 	struct {
 		struct nlmsghdr nh;
@@ -1139,18 +1219,12 @@ struct mlx5_nl_ifindex_data {
 	};
 
 	if (ifindex) {
-		++vmwa->nl_sn;
-		if (!vmwa->nl_sn)
-			++vmwa->nl_sn;
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, vmwa->nl_sn);
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
 		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket,
-					   vmwa->nl_sn,
-					   NULL, NULL);
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting"
-					 " VLAN WA ifindex %u, %d",
-					 ifindex, ret);
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
 	}
 }
 
@@ -1202,8 +1276,7 @@ struct mlx5_nl_ifindex_data {
  */
 uint32_t
 mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex,
-		      uint16_t tag)
+			 uint32_t ifindex, uint16_t tag)
 {
 	struct nlmsghdr *nlh;
 	struct ifinfomsg *ifm;
@@ -1220,12 +1293,10 @@ struct mlx5_nl_ifindex_data {
 		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
 	struct nlattr *na_info;
 	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	memset(buf, 0, sizeof(buf));
-	++vmwa->nl_sn;
-	if (!vmwa->nl_sn)
-		++vmwa->nl_sn;
 	nlh = (struct nlmsghdr *)buf;
 	nlh->nlmsg_len = sizeof(struct nlmsghdr);
 	nlh->nlmsg_type = RTM_NEWLINK;
@@ -1249,20 +1320,18 @@ struct mlx5_nl_ifindex_data {
 	nl_attr_nest_end(nlh, na_vlan);
 	nl_attr_nest_end(nlh, na_info);
 	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, vmwa->nl_sn);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, vmwa->nl_sn, NULL, NULL);
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 	if (ret < 0) {
-		DRV_LOG(WARNING,
-			"netlink: VLAN %s create failure (%d)",
-			name, ret);
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
 	}
 	// Try to get ifindex of created or pre-existing device.
 	ret = if_nametoindex(name);
 	if (!ret) {
-		DRV_LOG(WARNING,
-			"VLAN %s failed to get index (%d)",
-			name, errno);
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
 		return 0;
 	}
 	return ret;
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
index 7903673..9be87c0 100644
--- a/drivers/net/mlx5/mlx5_nl.h
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -39,30 +39,33 @@ struct mlx5_nl_vlan_dev {
  */
 struct mlx5_nl_vlan_vmwa_context {
 	int nl_socket;
-	uint32_t nl_sn;
 	uint32_t vf_ifindex;
 	struct mlx5_nl_vlan_dev vlan_dev[4096];
 };
 
 
 int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
 			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
 unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			       struct rte_ether_addr *mac, int vf_index);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
 void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			   uint32_t ifindex);
+			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
 
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 760cc2f..84c8b05 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -47,7 +47,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 1);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      1);
 		if (ret)
 			return ret;
 	}
@@ -80,7 +81,8 @@
 
 	dev->data->promiscuous = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 0);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      0);
 		if (ret)
 			return ret;
 	}
@@ -120,7 +122,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 1);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       1);
 		if (ret)
 			goto error;
 	}
@@ -153,7 +156,8 @@
 
 	dev->data->all_multicast = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 0);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       0);
 		if (ret)
 			goto error;
 	}
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fb52d8f..fc1a91c 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -304,7 +304,6 @@ struct mlx5_nl_vlan_vmwa_context *
 		rte_free(vmwa);
 		return NULL;
 	}
-	vmwa->nl_sn = random();
 	vmwa->vf_ifindex = ifindex;
 	/* Cleanup for existing VLAN devices. */
 	return vmwa;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 36/38] mlx5: share Netlink commands
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (34 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 35/38] net/mlx5: reduce Netlink commands dependencies Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 37/38] common/mlx5: support ROCE disable through Netlink Matan Azrad
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Move Netlink mechanism and its dependecies from net/mlx5 to common/mlx5
in order to be ready to use by other mlx5 drivers.
The dependencies are BITFIELD defines, the ppc64 compilation workaround
for bool type and the function mlx5_translate_port_name.
Update build mechanism accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |    3 +-
 drivers/common/mlx5/meson.build                 |    1 +
 drivers/common/mlx5/mlx5_common.c               |   55 +
 drivers/common/mlx5/mlx5_common.h               |   58 +
 drivers/common/mlx5/mlx5_nl.c                   | 1337 ++++++++++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   57 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   18 +-
 drivers/net/mlx5/Makefile                       |    1 -
 drivers/net/mlx5/meson.build                    |    1 -
 drivers/net/mlx5/mlx5.h                         |    2 +-
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_ethdev.c                  |   55 -
 drivers/net/mlx5/mlx5_nl.c                      | 1338 -----------------------
 drivers/net/mlx5/mlx5_nl.h                      |   72 --
 drivers/net/mlx5/mlx5_vlan.c                    |    2 +-
 15 files changed, 1529 insertions(+), 1479 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index aeacce3..60bec3f 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -16,6 +16,7 @@ SRCS-y += mlx5_glue.c
 endif
 SRCS-y += mlx5_devx_cmds.c
 SRCS-y += mlx5_common.c
+SRCS-y += mlx5_nl.c
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 INSTALL-y-lib += $(LIB_GLUE)
 endif
@@ -42,7 +43,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs -lrte_net
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 8291a92..46c7c3bb 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -42,6 +42,7 @@ if build
 	sources = files(
 		'mlx5_devx_cmds.c',
 		'mlx5_common.c',
+		'mlx5_nl.c',
 	)
 	if not pmd_dlopen
 		sources += files('mlx5_glue.c')
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index f756b6b..7ca6136 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -107,6 +107,61 @@
 	return ret;
 }
 
+/**
+ * Extract port name, as a number, from sysfs or netlink information.
+ *
+ * @param[in] port_name_in
+ *   String representing the port name.
+ * @param[out] port_info_out
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   port_name field set according to recognized name format.
+ */
+void
+mlx5_translate_port_name(const char *port_name_in,
+			 struct mlx5_switch_info *port_info_out)
+{
+	char pf_c1, pf_c2, vf_c1, vf_c2;
+	char *end;
+	int sc_items;
+
+	/*
+	 * Check for port-name as a string of the form pf0vf0
+	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
+			  &pf_c1, &pf_c2, &port_info_out->pf_num,
+			  &vf_c1, &vf_c2, &port_info_out->port_name);
+	if (sc_items == 6 &&
+	    pf_c1 == 'p' && pf_c2 == 'f' &&
+	    vf_c1 == 'v' && vf_c2 == 'f') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
+		return;
+	}
+	/*
+	 * Check for port-name as a string of the form p0
+	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%d",
+			  &pf_c1, &port_info_out->port_name);
+	if (sc_items == 2 && pf_c1 == 'p') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
+		return;
+	}
+	/* Check for port-name as a number (support kernel ver < 5.0 */
+	errno = 0;
+	port_info_out->port_name = strtol(port_name_in, &end, 0);
+	if (!errno &&
+	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
+		return;
+	}
+	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
+	return;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index aeaa7b9..ac65105 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -18,6 +18,35 @@
 
 
 /*
+ * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
+ * Otherwise there would be a type conflict between stdbool and altivec.
+ */
+#if defined(__PPC64__) && !defined(__APPLE_ALTIVEC__)
+#undef bool
+/* redefine as in stdbool.h */
+#define bool _Bool
+#endif
+
+/* Bit-field manipulation. */
+#define BITFIELD_DECLARE(bf, type, size) \
+	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \
+		 !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
+#define BITFIELD_DEFINE(bf, type, size) \
+	BITFIELD_DECLARE((bf), type, (size)) = { 0 }
+#define BITFIELD_SET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |= \
+		((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_RESET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &= \
+		~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_ISSET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] & \
+	     ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
+
+/*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
  * manner.
  */
@@ -112,6 +141,33 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* Maximum number of simultaneous unicast MAC addresses. */
+#define MLX5_MAX_UC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous Multicast MAC addresses. */
+#define MLX5_MAX_MC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous MAC addresses. */
+#define MLX5_MAX_MAC_ADDRESSES \
+	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
 /* CQE status. */
 enum mlx5_cqe_status {
 	MLX5_CQE_STATUS_SW_OWN = -1,
@@ -152,5 +208,7 @@ enum mlx5_cqe_status {
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 int mlx5_vdpa_mode_selected(struct rte_devargs *devargs);
+void mlx5_translate_port_name(const char *port_name_in,
+			      struct mlx5_switch_info *port_info_out);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
new file mode 100644
index 0000000..b4fc053
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -0,0 +1,1337 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <linux/if_link.h>
+#include <linux/rtnetlink.h>
+#include <net/if.h>
+#include <rdma/rdma_netlink.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdalign.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <stdbool.h>
+
+#include <rte_errno.h>
+#include <rte_atomic.h>
+
+#include "mlx5_nl.h"
+#include "mlx5_common_utils.h"
+
+/* Size of the buffer to receive kernel messages */
+#define MLX5_NL_BUF_SIZE (32 * 1024)
+/* Send buffer size for the Netlink socket */
+#define MLX5_SEND_BUF_SIZE 32768
+/* Receive buffer size for the Netlink socket */
+#define MLX5_RECV_BUF_SIZE 32768
+
+/** Parameters of VLAN devices created by driver. */
+#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
+/*
+ * Define NDA_RTA as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef MLX5_NDA_RTA
+#define MLX5_NDA_RTA(r) \
+	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
+#endif
+/*
+ * Define NLMSG_TAIL as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef NLMSG_TAIL
+#define NLMSG_TAIL(nmsg) \
+	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
+#endif
+/*
+ * The following definitions are normally found in rdma/rdma_netlink.h,
+ * however they are so recent that most systems do not expose them yet.
+ */
+#ifndef HAVE_RDMA_NL_NLDEV
+#define RDMA_NL_NLDEV 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_GET
+#define RDMA_NLDEV_CMD_GET 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
+#define RDMA_NLDEV_CMD_PORT_GET 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
+#define RDMA_NLDEV_ATTR_DEV_INDEX 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
+#define RDMA_NLDEV_ATTR_DEV_NAME 2
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
+#define RDMA_NLDEV_ATTR_PORT_INDEX 3
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
+#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
+#endif
+
+/* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
+#ifndef HAVE_IFLA_PHYS_SWITCH_ID
+#define IFLA_PHYS_SWITCH_ID 36
+#endif
+#ifndef HAVE_IFLA_PHYS_PORT_NAME
+#define IFLA_PHYS_PORT_NAME 38
+#endif
+
+/* Add/remove MAC address through Netlink */
+struct mlx5_nl_mac_addr {
+	struct rte_ether_addr (*mac)[];
+	/**< MAC address handled by the device. */
+	int mac_n; /**< Number of addresses in the array. */
+};
+
+#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
+#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
+#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
+#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
+
+/** Data structure used by mlx5_nl_cmdget_cb(). */
+struct mlx5_nl_ifindex_data {
+	const char *name; /**< IB device name (in). */
+	uint32_t flags; /**< found attribute flags (out). */
+	uint32_t ibindex; /**< IB device index (out). */
+	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number (out). */
+};
+
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
+/**
+ * Opens a Netlink socket.
+ *
+ * @param protocol
+ *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
+ *
+ * @return
+ *   A file descriptor on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+int
+mlx5_nl_init(int protocol)
+{
+	int fd;
+	int sndbuf_size = MLX5_SEND_BUF_SIZE;
+	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
+	struct sockaddr_nl local = {
+		.nl_family = AF_NETLINK,
+	};
+	int ret;
+
+	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
+	if (fd == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	return fd;
+error:
+	close(fd);
+	return -rte_errno;
+}
+
+/**
+ * Send a request message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] ssn
+ *   Sequence number.
+ * @param[in] req
+ *   Pointer to the request structure.
+ * @param[in] len
+ *   Length of the request in bytes.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
+		int len)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov[2] = {
+		{ .iov_base = nh, .iov_len = sizeof(*nh), },
+		{ .iov_base = req, .iov_len = len, },
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = iov,
+		.msg_iovlen = 2,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Send a message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] sn
+ *   Sequence number.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov = {
+		.iov_base = nh,
+		.iov_len = nh->nlmsg_len,
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Receive a message from the kernel on the Netlink socket, following
+ * mlx5_nl_send().
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] sn
+ *   Sequence number.
+ * @param[in] cb
+ *   The callback function to call for each Netlink message received.
+ * @param[in, out] arg
+ *   Custom arguments for the callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
+	     void *arg)
+{
+	struct sockaddr_nl sa;
+	char buf[MLX5_RECV_BUF_SIZE];
+	struct iovec iov = {
+		.iov_base = buf,
+		.iov_len = sizeof(buf),
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		/* One message at a time */
+		.msg_iovlen = 1,
+	};
+	int multipart = 0;
+	int ret = 0;
+
+	do {
+		struct nlmsghdr *nh;
+		int recv_bytes = 0;
+
+		do {
+			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
+			if (recv_bytes == -1) {
+				rte_errno = errno;
+				return -rte_errno;
+			}
+			nh = (struct nlmsghdr *)buf;
+		} while (nh->nlmsg_seq != sn);
+		for (;
+		     NLMSG_OK(nh, (unsigned int)recv_bytes);
+		     nh = NLMSG_NEXT(nh, recv_bytes)) {
+			if (nh->nlmsg_type == NLMSG_ERROR) {
+				struct nlmsgerr *err_data = NLMSG_DATA(nh);
+
+				if (err_data->error < 0) {
+					rte_errno = -err_data->error;
+					return -rte_errno;
+				}
+				/* Ack message. */
+				return 0;
+			}
+			/* Multi-part msgs and their trailing DONE message. */
+			if (nh->nlmsg_flags & NLM_F_MULTI) {
+				if (nh->nlmsg_type == NLMSG_DONE)
+					return 0;
+				multipart = 1;
+			}
+			if (cb) {
+				ret = cb(nh, arg);
+				if (ret < 0)
+					return ret;
+			}
+		}
+	} while (multipart);
+	return ret;
+}
+
+/**
+ * Parse Netlink message to retrieve the bridge MAC address.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_mac_addr *data = arg;
+	struct ndmsg *r = NLMSG_DATA(nh);
+	struct rtattr *attribute;
+	int len;
+
+	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
+	for (attribute = MLX5_NDA_RTA(r);
+	     RTA_OK(attribute, len);
+	     attribute = RTA_NEXT(attribute, len)) {
+		if (attribute->rta_type == NDA_LLADDR) {
+			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
+				DRV_LOG(WARNING,
+					"not enough room to finalize the"
+					" request");
+				rte_errno = ENOMEM;
+				return -rte_errno;
+			}
+#ifndef NDEBUG
+			char m[18];
+
+			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
+			DRV_LOG(DEBUG, "bridge MAC address %s", m);
+#endif
+			memcpy(&(*data->mac)[data->mac_n++],
+			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
+		}
+	}
+	return 0;
+}
+
+/**
+ * Get bridge MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac[out]
+ *   Pointer to the array table of MAC addresses to fill.
+ *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
+ * @param mac_n[out]
+ *   Number of entries filled in MAC array.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
+{
+	struct {
+		struct nlmsghdr	hdr;
+		struct ifinfomsg ifm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_GETNEIGH,
+			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		},
+		.ifm = {
+			.ifi_family = PF_BRIDGE,
+			.ifi_index = iface_idx,
+		},
+	};
+	struct mlx5_nl_mac_addr data = {
+		.mac = mac,
+		.mac_n = 0,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
+			      sizeof(struct ifinfomsg));
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
+	if (ret < 0)
+		goto error;
+	*mac_n = data.mac_n;
+	return 0;
+error:
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *   MAC address to consider.
+ * @param add
+ *   1 to add the MAC address, 0 to remove the MAC address.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ndmsg ndm;
+		struct rtattr rta;
+		uint8_t buffer[RTE_ETHER_ADDR_LEN];
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+				NLM_F_EXCL | NLM_F_ACK,
+			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
+		},
+		.ndm = {
+			.ndm_family = PF_BRIDGE,
+			.ndm_state = NUD_NOARP | NUD_PERMANENT,
+			.ndm_ifindex = iface_idx,
+			.ndm_flags = NTF_SELF,
+		},
+		.rta = {
+			.rta_type = NDA_LLADDR,
+			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.rta.rta_len);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
+		add ? "add" : "remove",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the VF MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *    MAC address to consider.
+ * @param vf_index
+ *    VF index.
+ *
+ * @return
+ *    0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac, int vf_index)
+{
+	int ret;
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifm;
+		struct rtattr vf_list_rta;
+		struct rtattr vf_info_rta;
+		struct rtattr vf_mac_rta;
+		struct ifla_vf_mac ivm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+			.nlmsg_type = RTM_BASE,
+		},
+		.ifm = {
+			.ifi_index = iface_idx,
+		},
+		.vf_list_rta = {
+			.rta_type = IFLA_VFINFO_LIST,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_info_rta = {
+			.rta_type = IFLA_VF_INFO,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_mac_rta = {
+			.rta_type = IFLA_VF_MAC,
+		},
+	};
+	struct ifla_vf_mac ivm = {
+		.vf = vf_index,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+
+	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
+	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
+
+	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.vf_list_rta.rta_len) +
+		RTA_ALIGN(req.vf_info_rta.rta_len) +
+		RTA_ALIGN(req.vf_mac_rta.rta_len);
+	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_list_rta);
+	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_info_rta);
+
+	if (nlsk_fd < 0)
+		return -1;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(ERR,
+		"representor %u cannot set VF MAC address "
+		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
+		vf_index,
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Add a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
+		     uint32_t index)
+{
+	int ret;
+
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
+	if (!ret)
+		BITFIELD_SET(mac_own, index);
+	if (ret == -EEXIST)
+		return 0;
+	return ret;
+}
+
+/**
+ * Remove a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to remove.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
+{
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
+}
+
+/**
+ * Synchronize Netlink bridge table to the internal table.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
+ */
+void
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
+{
+	struct rte_ether_addr macs[n];
+	int macs_n = 0;
+	int i;
+	int ret;
+
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
+	if (ret)
+		return;
+	for (i = 0; i != macs_n; ++i) {
+		int j;
+
+		/* Verify the address is not in the array yet. */
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
+				break;
+		if (j != n)
+			continue;
+		/* Find the first entry available. */
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
+				break;
+			}
+		}
+	}
+}
+
+/**
+ * Flush all added MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ */
+void
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
+{
+	int i;
+
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
+
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
+	}
+}
+
+/**
+ * Enable promiscuous / all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param flags
+ *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifi;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_NEWLINK,
+			.nlmsg_flags = NLM_F_REQUEST,
+		},
+		.ifi = {
+			.ifi_flags = enable ? flags : 0,
+			.ifi_change = flags,
+			.ifi_index = iface_idx,
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
+	if (nlsk_fd < 0)
+		return 0;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+/**
+ * Enable promiscuous mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Enable all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Process network interface information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_ifindex_data *data = arg;
+	struct mlx5_nl_ifindex_data local = {
+		.flags = 0,
+	};
+	size_t off = NLMSG_HDRLEN;
+
+	if (nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
+	    nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct nlattr *na = (void *)((uintptr_t)nh + off);
+		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
+
+		if (na->nla_len > nh->nlmsg_len - off)
+			goto error;
+		switch (na->nla_type) {
+		case RDMA_NLDEV_ATTR_DEV_INDEX:
+			local.ibindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_DEV_NAME:
+			if (!strcmp(payload, data->name))
+				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
+			break;
+		case RDMA_NLDEV_ATTR_NDEV_INDEX:
+			local.ifindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			local.portnum = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
+			break;
+		default:
+			break;
+		}
+		off += NLA_ALIGN(na->nla_len);
+	}
+	/*
+	 * It is possible to have multiple messages for all
+	 * Infiniband devices in the system with appropriate name.
+	 * So we should gather parameters locally and copy to
+	 * query context only in case of coinciding device name.
+	 */
+	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
+		data->flags = local.flags;
+		data->ibindex = local.ibindex;
+		data->ifindex = local.ifindex;
+		data->portnum = local.portnum;
+	}
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get index of network interface associated with some IB device.
+ *
+ * This is the only somewhat safe method to avoid resorting to heuristics
+ * when faced with port representors. Unfortunately it requires at least
+ * Linux 4.17.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ * @param[in] pindex
+ *   IB device port index, starting from 1
+ * @return
+ *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
+ *   is set.
+ */
+unsigned int
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.flags = 0,
+		.ibindex = 0, /* Determined during first pass. */
+		.ifindex = 0, /* Determined during second pass. */
+	};
+	union {
+		struct nlmsghdr nh;
+		uint8_t buf[NLMSG_HDRLEN +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(0),
+			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+						       RDMA_NLDEV_CMD_GET),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+		},
+	};
+	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
+		goto error;
+	data.flags = 0;
+	sn = MLX5_NL_SN_GENERATE;
+	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					     RDMA_NLDEV_CMD_PORT_GET);
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
+	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
+	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
+	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &data.ibindex, sizeof(data.ibindex));
+	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
+	na->nla_len = NLA_HDRLEN + sizeof(pindex);
+	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &pindex, sizeof(pindex));
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
+	    !data.ifindex)
+		goto error;
+	return data.ifindex;
+error:
+	rte_errno = ENODEV;
+	return 0;
+}
+
+/**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.flags = 0,
+		.name = name,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
+ * Process switch information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_switch_info info = {
+		.master = 0,
+		.representor = 0,
+		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
+		.port_name = 0,
+		.switch_id = 0,
+	};
+	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+	bool switch_id_set = false;
+	bool num_vf_set = false;
+
+	if (nh->nlmsg_type != RTM_NEWLINK)
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct rtattr *ra = (void *)((uintptr_t)nh + off);
+		void *payload = RTA_DATA(ra);
+		unsigned int i;
+
+		if (ra->rta_len > nh->nlmsg_len - off)
+			goto error;
+		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
+		case IFLA_PHYS_PORT_NAME:
+			mlx5_translate_port_name((char *)payload, &info);
+			break;
+		case IFLA_PHYS_SWITCH_ID:
+			info.switch_id = 0;
+			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
+				info.switch_id <<= 8;
+				info.switch_id |= ((uint8_t *)payload)[i];
+			}
+			switch_id_set = true;
+			break;
+		}
+		off += RTA_ALIGN(ra->rta_len);
+	}
+	if (switch_id_set) {
+		/* We have some E-Switch configuration. */
+		mlx5_nl_check_switch_info(num_vf_set, &info);
+	}
+	assert(!(info.master && info.representor));
+	memcpy(arg, &info, sizeof(info));
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get switch information associated with network interface.
+ *
+ * @param nl
+ *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
+ * @param ifindex
+ *   Network interface index.
+ * @param[out] info
+ *   Switch information object, populated in case of success.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
+{
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
+			.nlmsg_type = RTM_GETLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
+	if (info->master && info->representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		ret = -rte_errno;
+	}
+	return ret;
+}
+
+/*
+ * Delete VLAN network device by ifindex.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Interface index of network device to delete.
+ */
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+		      uint32_t ifindex)
+{
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_DELLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+	};
+
+	if (ifindex) {
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
+		if (ret >= 0)
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+		if (ret < 0)
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
+	}
+}
+
+/* Set of subroutines to build Netlink message. */
+static struct nlattr *
+nl_msg_tail(struct nlmsghdr *nlh)
+{
+	return (struct nlattr *)
+		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
+}
+
+static void
+nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
+{
+	struct nlattr *nla = nl_msg_tail(nlh);
+
+	nla->nla_type = type;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
+	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+
+	if (alen)
+		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
+}
+
+static struct nlattr *
+nl_attr_nest_start(struct nlmsghdr *nlh, int type)
+{
+	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
+
+	nl_attr_put(nlh, type, NULL, 0);
+	return nest;
+}
+
+static void
+nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
+{
+	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
+}
+
+/*
+ * Create network VLAN device with specified VLAN tag.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Base network interface index.
+ * @param[in] tag
+ *   VLAN tag for VLAN network device to create.
+ */
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			 uint32_t ifindex, uint16_t tag)
+{
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
+
+	alignas(RTE_CACHE_LINE_SIZE)
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(name)) +
+		    NLMSG_ALIGN(sizeof("vlan")) +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
+	struct nlattr *na_info;
+	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = RTM_NEWLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+			   NLM_F_EXCL | NLM_F_ACK;
+	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct ifinfomsg);
+	ifm->ifi_family = AF_UNSPEC;
+	ifm->ifi_type = 0;
+	ifm->ifi_index = 0;
+	ifm->ifi_flags = IFF_UP;
+	ifm->ifi_change = 0xffffffff;
+	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
+	ret = snprintf(name, sizeof(name), "%s.%u.%u",
+		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
+	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
+	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
+	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
+	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
+	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
+	nl_attr_nest_end(nlh, na_vlan);
+	nl_attr_nest_end(nlh, na_info);
+	assert(sizeof(buf) >= nlh->nlmsg_len);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
+	}
+	// Try to get ifindex of created or pre-existing device.
+	ret = if_nametoindex(name);
+	if (!ret) {
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
+		return 0;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..8e66a98
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+#include <rte_ether.h>
+
+#include "mlx5_common.h"
+
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			      uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 16b9b34..318a024 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -24,5 +24,21 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-	mlx5_vdpa_mode_selected;
+
+	mlx5_nl_allmulti;
+	mlx5_nl_ifindex;
+	mlx5_nl_init;
+	mlx5_nl_mac_addr_add;
+	mlx5_nl_mac_addr_flush;
+	mlx5_nl_mac_addr_remove;
+	mlx5_nl_mac_addr_sync;
+	mlx5_nl_portnum;
+	mlx5_nl_promisc;
+	mlx5_nl_switch_info;
+	mlx5_nl_vf_mac_addr_modify;
+	mlx5_nl_vlan_vmwa_create;
+	mlx5_nl_vlan_vmwa_delete;
+
+	mlx5_translate_port_name;
+        mlx5_vdpa_mode_selected;
 };
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index dc6b3c8..d26afbb 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -30,7 +30,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_meter.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index e10ef3a..d45be00 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -19,7 +19,6 @@ sources = files(
 	'mlx5_flow_verbs.c',
 	'mlx5_mac.c',
 	'mlx5_mr.c',
-	'mlx5_nl.c',
 	'mlx5_rss.c',
 	'mlx5_rxmode.c',
 	'mlx5_rxq.c',
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5d25d8b..48f31c2 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -35,11 +35,11 @@
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
-#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index dc9b965..9b392ed 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -14,14 +14,6 @@
 /* Reported driver name. */
 #define MLX5_DRIVER_NAME "net_mlx5"
 
-/* Maximum number of simultaneous unicast MAC addresses. */
-#define MLX5_MAX_UC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous Multicast MAC addresses. */
-#define MLX5_MAX_MC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous MAC addresses. */
-#define MLX5_MAX_MAC_ADDRESSES \
-	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
-
 /* Maximum number of simultaneous VLAN filters. */
 #define MLX5_MAX_VLAN_IDS 128
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5484104..b765636 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1940,61 +1940,6 @@ struct mlx5_priv *
 }
 
 /**
- * Extract port name, as a number, from sysfs or netlink information.
- *
- * @param[in] port_name_in
- *   String representing the port name.
- * @param[out] port_info_out
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   port_name field set according to recognized name format.
- */
-void
-mlx5_translate_port_name(const char *port_name_in,
-			 struct mlx5_switch_info *port_info_out)
-{
-	char pf_c1, pf_c2, vf_c1, vf_c2;
-	char *end;
-	int sc_items;
-
-	/*
-	 * Check for port-name as a string of the form pf0vf0
-	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
-			  &pf_c1, &pf_c2, &port_info_out->pf_num,
-			  &vf_c1, &vf_c2, &port_info_out->port_name);
-	if (sc_items == 6 &&
-	    pf_c1 == 'p' && pf_c2 == 'f' &&
-	    vf_c1 == 'v' && vf_c2 == 'f') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
-		return;
-	}
-	/*
-	 * Check for port-name as a string of the form p0
-	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%d",
-			  &pf_c1, &port_info_out->port_name);
-	if (sc_items == 2 && pf_c1 == 'p') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
-		return;
-	}
-	/* Check for port-name as a number (support kernel ver < 5.0 */
-	errno = 0;
-	port_info_out->port_name = strtol(port_name_in, &end, 0);
-	if (!errno &&
-	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
-		return;
-	}
-	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
-	return;
-}
-
-/**
  * DPDK callback to retrieve plug-in module EEPROM information (type and size).
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
deleted file mode 100644
index 6b8ca00..0000000
--- a/drivers/net/mlx5/mlx5_nl.c
+++ /dev/null
@@ -1,1338 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <linux/if_link.h>
-#include <linux/rtnetlink.h>
-#include <net/if.h>
-#include <rdma/rdma_netlink.h>
-#include <stdbool.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <stdalign.h>
-#include <string.h>
-#include <sys/socket.h>
-#include <unistd.h>
-
-#include <rte_errno.h>
-#include <rte_atomic.h>
-#include <rte_ether.h>
-
-#include "mlx5.h"
-#include "mlx5_nl.h"
-#include "mlx5_utils.h"
-
-/* Size of the buffer to receive kernel messages */
-#define MLX5_NL_BUF_SIZE (32 * 1024)
-/* Send buffer size for the Netlink socket */
-#define MLX5_SEND_BUF_SIZE 32768
-/* Receive buffer size for the Netlink socket */
-#define MLX5_RECV_BUF_SIZE 32768
-
-/** Parameters of VLAN devices created by driver. */
-#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
-/*
- * Define NDA_RTA as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef MLX5_NDA_RTA
-#define MLX5_NDA_RTA(r) \
-	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
-#endif
-/*
- * Define NLMSG_TAIL as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef NLMSG_TAIL
-#define NLMSG_TAIL(nmsg) \
-	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
-#endif
-/*
- * The following definitions are normally found in rdma/rdma_netlink.h,
- * however they are so recent that most systems do not expose them yet.
- */
-#ifndef HAVE_RDMA_NL_NLDEV
-#define RDMA_NL_NLDEV 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_GET
-#define RDMA_NLDEV_CMD_GET 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
-#define RDMA_NLDEV_CMD_PORT_GET 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
-#define RDMA_NLDEV_ATTR_DEV_INDEX 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
-#define RDMA_NLDEV_ATTR_DEV_NAME 2
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
-#define RDMA_NLDEV_ATTR_PORT_INDEX 3
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
-#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
-#endif
-
-/* These are normally found in linux/if_link.h. */
-#ifndef HAVE_IFLA_NUM_VF
-#define IFLA_NUM_VF 21
-#endif
-#ifndef HAVE_IFLA_EXT_MASK
-#define IFLA_EXT_MASK 29
-#endif
-#ifndef HAVE_IFLA_PHYS_SWITCH_ID
-#define IFLA_PHYS_SWITCH_ID 36
-#endif
-#ifndef HAVE_IFLA_PHYS_PORT_NAME
-#define IFLA_PHYS_PORT_NAME 38
-#endif
-
-/* Add/remove MAC address through Netlink */
-struct mlx5_nl_mac_addr {
-	struct rte_ether_addr (*mac)[];
-	/**< MAC address handled by the device. */
-	int mac_n; /**< Number of addresses in the array. */
-};
-
-#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
-#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
-#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
-#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
-
-/** Data structure used by mlx5_nl_cmdget_cb(). */
-struct mlx5_nl_ifindex_data {
-	const char *name; /**< IB device name (in). */
-	uint32_t flags; /**< found attribute flags (out). */
-	uint32_t ibindex; /**< IB device index (out). */
-	uint32_t ifindex; /**< Network interface index (out). */
-	uint32_t portnum; /**< IB device max port number (out). */
-};
-
-rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
-
-/* Generate Netlink sequence number. */
-#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
-
-/**
- * Opens a Netlink socket.
- *
- * @param protocol
- *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
- *
- * @return
- *   A file descriptor on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-int
-mlx5_nl_init(int protocol)
-{
-	int fd;
-	int sndbuf_size = MLX5_SEND_BUF_SIZE;
-	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
-	struct sockaddr_nl local = {
-		.nl_family = AF_NETLINK,
-	};
-	int ret;
-
-	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
-	if (fd == -1) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	return fd;
-error:
-	close(fd);
-	return -rte_errno;
-}
-
-/**
- * Send a request message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] ssn
- *   Sequence number.
- * @param[in] req
- *   Pointer to the request structure.
- * @param[in] len
- *   Length of the request in bytes.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
-		int len)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov[2] = {
-		{ .iov_base = nh, .iov_len = sizeof(*nh), },
-		{ .iov_base = req, .iov_len = len, },
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = iov,
-		.msg_iovlen = 2,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Send a message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] sn
- *   Sequence number.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov = {
-		.iov_base = nh,
-		.iov_len = nh->nlmsg_len,
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		.msg_iovlen = 1,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Receive a message from the kernel on the Netlink socket, following
- * mlx5_nl_send().
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] sn
- *   Sequence number.
- * @param[in] cb
- *   The callback function to call for each Netlink message received.
- * @param[in, out] arg
- *   Custom arguments for the callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
-	     void *arg)
-{
-	struct sockaddr_nl sa;
-	char buf[MLX5_RECV_BUF_SIZE];
-	struct iovec iov = {
-		.iov_base = buf,
-		.iov_len = sizeof(buf),
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		/* One message at a time */
-		.msg_iovlen = 1,
-	};
-	int multipart = 0;
-	int ret = 0;
-
-	do {
-		struct nlmsghdr *nh;
-		int recv_bytes = 0;
-
-		do {
-			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
-			if (recv_bytes == -1) {
-				rte_errno = errno;
-				return -rte_errno;
-			}
-			nh = (struct nlmsghdr *)buf;
-		} while (nh->nlmsg_seq != sn);
-		for (;
-		     NLMSG_OK(nh, (unsigned int)recv_bytes);
-		     nh = NLMSG_NEXT(nh, recv_bytes)) {
-			if (nh->nlmsg_type == NLMSG_ERROR) {
-				struct nlmsgerr *err_data = NLMSG_DATA(nh);
-
-				if (err_data->error < 0) {
-					rte_errno = -err_data->error;
-					return -rte_errno;
-				}
-				/* Ack message. */
-				return 0;
-			}
-			/* Multi-part msgs and their trailing DONE message. */
-			if (nh->nlmsg_flags & NLM_F_MULTI) {
-				if (nh->nlmsg_type == NLMSG_DONE)
-					return 0;
-				multipart = 1;
-			}
-			if (cb) {
-				ret = cb(nh, arg);
-				if (ret < 0)
-					return ret;
-			}
-		}
-	} while (multipart);
-	return ret;
-}
-
-/**
- * Parse Netlink message to retrieve the bridge MAC address.
- *
- * @param nh
- *   Pointer to Netlink Message Header.
- * @param arg
- *   PMD data register with this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_mac_addr *data = arg;
-	struct ndmsg *r = NLMSG_DATA(nh);
-	struct rtattr *attribute;
-	int len;
-
-	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
-	for (attribute = MLX5_NDA_RTA(r);
-	     RTA_OK(attribute, len);
-	     attribute = RTA_NEXT(attribute, len)) {
-		if (attribute->rta_type == NDA_LLADDR) {
-			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
-				DRV_LOG(WARNING,
-					"not enough room to finalize the"
-					" request");
-				rte_errno = ENOMEM;
-				return -rte_errno;
-			}
-#ifndef NDEBUG
-			char m[18];
-
-			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
-			DRV_LOG(DEBUG, "bridge MAC address %s", m);
-#endif
-			memcpy(&(*data->mac)[data->mac_n++],
-			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
-		}
-	}
-	return 0;
-}
-
-/**
- * Get bridge MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac[out]
- *   Pointer to the array table of MAC addresses to fill.
- *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
- * @param mac_n[out]
- *   Number of entries filled in MAC array.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr (*mac)[], int *mac_n)
-{
-	struct {
-		struct nlmsghdr	hdr;
-		struct ifinfomsg ifm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_GETNEIGH,
-			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
-		},
-		.ifm = {
-			.ifi_family = PF_BRIDGE,
-			.ifi_index = iface_idx,
-		},
-	};
-	struct mlx5_nl_mac_addr data = {
-		.mac = mac,
-		.mac_n = 0,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
-			      sizeof(struct ifinfomsg));
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
-	if (ret < 0)
-		goto error;
-	*mac_n = data.mac_n;
-	return 0;
-error:
-	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
-		iface_idx, strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *   MAC address to consider.
- * @param add
- *   1 to add the MAC address, 0 to remove the MAC address.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			struct rte_ether_addr *mac, int add)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ndmsg ndm;
-		struct rtattr rta;
-		uint8_t buffer[RTE_ETHER_ADDR_LEN];
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-				NLM_F_EXCL | NLM_F_ACK,
-			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
-		},
-		.ndm = {
-			.ndm_family = PF_BRIDGE,
-			.ndm_state = NUD_NOARP | NUD_PERMANENT,
-			.ndm_ifindex = iface_idx,
-			.ndm_flags = NTF_SELF,
-		},
-		.rta = {
-			.rta_type = NDA_LLADDR,
-			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(DEBUG,
-		"Interface %u cannot %s MAC address"
-		" %02X:%02X:%02X:%02X:%02X:%02X %s",
-		iface_idx,
-		add ? "add" : "remove",
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the VF MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *    MAC address to consider.
- * @param vf_index
- *    VF index.
- *
- * @return
- *    0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac, int vf_index)
-{
-	int ret;
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifm;
-		struct rtattr vf_list_rta;
-		struct rtattr vf_info_rta;
-		struct rtattr vf_mac_rta;
-		struct ifla_vf_mac ivm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-			.nlmsg_type = RTM_BASE,
-		},
-		.ifm = {
-			.ifi_index = iface_idx,
-		},
-		.vf_list_rta = {
-			.rta_type = IFLA_VFINFO_LIST,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_info_rta = {
-			.rta_type = IFLA_VF_INFO,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_mac_rta = {
-			.rta_type = IFLA_VF_MAC,
-		},
-	};
-	struct ifla_vf_mac ivm = {
-		.vf = vf_index,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-
-	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
-	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
-
-	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.vf_list_rta.rta_len) +
-		RTA_ALIGN(req.vf_info_rta.rta_len) +
-		RTA_ALIGN(req.vf_mac_rta.rta_len);
-	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_list_rta);
-	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_info_rta);
-
-	if (nlsk_fd < 0)
-		return -1;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(ERR,
-		"representor %u cannot set VF MAC address "
-		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
-		vf_index,
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Add a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to register.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
-		     uint64_t *mac_own, struct rte_ether_addr *mac,
-		     uint32_t index)
-{
-	int ret;
-
-	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
-	if (!ret)
-		BITFIELD_SET(mac_own, index);
-	if (ret == -EEXIST)
-		return 0;
-	return ret;
-}
-
-/**
- * Remove a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to remove.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			struct rte_ether_addr *mac, uint32_t index)
-{
-	BITFIELD_RESET(mac_own, index);
-	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
-}
-
-/**
- * Synchronize Netlink bridge table to the internal table.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_addrs
- *   Mac addresses array to sync.
- * @param n
- *   @p mac_addrs array size.
- */
-void
-mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr *mac_addrs, int n)
-{
-	struct rte_ether_addr macs[n];
-	int macs_n = 0;
-	int i;
-	int ret;
-
-	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
-	if (ret)
-		return;
-	for (i = 0; i != macs_n; ++i) {
-		int j;
-
-		/* Verify the address is not in the array yet. */
-		for (j = 0; j != n; ++j)
-			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
-				break;
-		if (j != n)
-			continue;
-		/* Find the first entry available. */
-		for (j = 0; j != n; ++j) {
-			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
-				mac_addrs[j] = macs[i];
-				break;
-			}
-		}
-	}
-}
-
-/**
- * Flush all added MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param[in] mac_addrs
- *   Mac addresses array to flush.
- * @param n
- *   @p mac_addrs array size.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- */
-void
-mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-		       struct rte_ether_addr *mac_addrs, int n,
-		       uint64_t *mac_own)
-{
-	int i;
-
-	for (i = n - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &mac_addrs[i];
-
-		if (BITFIELD_ISSET(mac_own, i))
-			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
-						i);
-	}
-}
-
-/**
- * Enable promiscuous / all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param flags
- *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
-		     int enable)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifi;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_NEWLINK,
-			.nlmsg_flags = NLM_F_REQUEST,
-		},
-		.ifi = {
-			.ifi_flags = enable ? flags : 0,
-			.ifi_change = flags,
-			.ifi_index = iface_idx,
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (nlsk_fd < 0)
-		return 0;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		return ret;
-	return 0;
-}
-
-/**
- * Enable promiscuous mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s promisc mode: Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Enable all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
-				       enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s allmulti : Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Process network interface information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_ifindex_data *data = arg;
-	struct mlx5_nl_ifindex_data local = {
-		.flags = 0,
-	};
-	size_t off = NLMSG_HDRLEN;
-
-	if (nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
-	    nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct nlattr *na = (void *)((uintptr_t)nh + off);
-		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
-
-		if (na->nla_len > nh->nlmsg_len - off)
-			goto error;
-		switch (na->nla_type) {
-		case RDMA_NLDEV_ATTR_DEV_INDEX:
-			local.ibindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_DEV_NAME:
-			if (!strcmp(payload, data->name))
-				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
-			break;
-		case RDMA_NLDEV_ATTR_NDEV_INDEX:
-			local.ifindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_PORT_INDEX:
-			local.portnum = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
-			break;
-		default:
-			break;
-		}
-		off += NLA_ALIGN(na->nla_len);
-	}
-	/*
-	 * It is possible to have multiple messages for all
-	 * Infiniband devices in the system with appropriate name.
-	 * So we should gather parameters locally and copy to
-	 * query context only in case of coinciding device name.
-	 */
-	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
-		data->flags = local.flags;
-		data->ibindex = local.ibindex;
-		data->ifindex = local.ifindex;
-		data->portnum = local.portnum;
-	}
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get index of network interface associated with some IB device.
- *
- * This is the only somewhat safe method to avoid resorting to heuristics
- * when faced with port representors. Unfortunately it requires at least
- * Linux 4.17.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- * @param[in] pindex
- *   IB device port index, starting from 1
- * @return
- *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
- *   is set.
- */
-unsigned int
-mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.name = name,
-		.flags = 0,
-		.ibindex = 0, /* Determined during first pass. */
-		.ifindex = 0, /* Determined during second pass. */
-	};
-	union {
-		struct nlmsghdr nh;
-		uint8_t buf[NLMSG_HDRLEN +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(0),
-			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-						       RDMA_NLDEV_CMD_GET),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-		},
-	};
-	struct nlattr *na;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
-		goto error;
-	data.flags = 0;
-	sn = MLX5_NL_SN_GENERATE;
-	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					     RDMA_NLDEV_CMD_PORT_GET);
-	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
-	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
-	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
-	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &data.ibindex, sizeof(data.ibindex));
-	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
-	na->nla_len = NLA_HDRLEN + sizeof(pindex);
-	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
-	    !data.ifindex)
-		goto error;
-	return data.ifindex;
-error:
-	rte_errno = ENODEV;
-	return 0;
-}
-
-/**
- * Get the number of physical ports of given IB device.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- *
- * @return
- *   A valid (nonzero) number of ports on success, 0 otherwise
- *   and rte_errno is set.
- */
-unsigned int
-mlx5_nl_portnum(int nl, const char *name)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.flags = 0,
-		.name = name,
-		.ifindex = 0,
-		.portnum = 0,
-	};
-	struct nlmsghdr req = {
-		.nlmsg_len = NLMSG_LENGTH(0),
-		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					       RDMA_NLDEV_CMD_GET),
-		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
-		rte_errno = ENODEV;
-		return 0;
-	}
-	if (!data.portnum)
-		rte_errno = EINVAL;
-	return data.portnum;
-}
-
-/**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-static void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
- * Process switch information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_switch_info info = {
-		.master = 0,
-		.representor = 0,
-		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
-		.port_name = 0,
-		.switch_id = 0,
-	};
-	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-	bool switch_id_set = false;
-	bool num_vf_set = false;
-
-	if (nh->nlmsg_type != RTM_NEWLINK)
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct rtattr *ra = (void *)((uintptr_t)nh + off);
-		void *payload = RTA_DATA(ra);
-		unsigned int i;
-
-		if (ra->rta_len > nh->nlmsg_len - off)
-			goto error;
-		switch (ra->rta_type) {
-		case IFLA_NUM_VF:
-			num_vf_set = true;
-			break;
-		case IFLA_PHYS_PORT_NAME:
-			mlx5_translate_port_name((char *)payload, &info);
-			break;
-		case IFLA_PHYS_SWITCH_ID:
-			info.switch_id = 0;
-			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
-				info.switch_id <<= 8;
-				info.switch_id |= ((uint8_t *)payload)[i];
-			}
-			switch_id_set = true;
-			break;
-		}
-		off += RTA_ALIGN(ra->rta_len);
-	}
-	if (switch_id_set) {
-		/* We have some E-Switch configuration. */
-		mlx5_nl_check_switch_info(num_vf_set, &info);
-	}
-	assert(!(info.master && info.representor));
-	memcpy(arg, &info, sizeof(info));
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get switch information associated with network interface.
- *
- * @param nl
- *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
- * @param ifindex
- *   Network interface index.
- * @param[out] info
- *   Switch information object, populated in case of success.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_switch_info(int nl, unsigned int ifindex,
-		    struct mlx5_switch_info *info)
-{
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-		struct rtattr rta;
-		uint32_t extmask;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH
-					(sizeof(req.info) +
-					 RTA_LENGTH(sizeof(uint32_t))),
-			.nlmsg_type = RTM_GETLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-		.rta = {
-			.rta_type = IFLA_EXT_MASK,
-			.rta_len = RTA_LENGTH(sizeof(int32_t)),
-		},
-		.extmask = RTE_LE32(1),
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
-	if (info->master && info->representor) {
-		DRV_LOG(ERR, "ifindex %u device is recognized as master"
-			     " and as representor", ifindex);
-		rte_errno = ENODEV;
-		ret = -rte_errno;
-	}
-	return ret;
-}
-
-/*
- * Delete VLAN network device by ifindex.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Interface index of network device to delete.
- */
-void
-mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex)
-{
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_DELLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-	};
-
-	if (ifindex) {
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
-		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
-				" ifindex %u, %d", ifindex, ret);
-	}
-}
-
-/* Set of subroutines to build Netlink message. */
-static struct nlattr *
-nl_msg_tail(struct nlmsghdr *nlh)
-{
-	return (struct nlattr *)
-		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
-}
-
-static void
-nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
-{
-	struct nlattr *nla = nl_msg_tail(nlh);
-
-	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
-
-	if (alen)
-		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
-}
-
-static struct nlattr *
-nl_attr_nest_start(struct nlmsghdr *nlh, int type)
-{
-	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
-
-	nl_attr_put(nlh, type, NULL, 0);
-	return nest;
-}
-
-static void
-nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
-{
-	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
-}
-
-/*
- * Create network VLAN device with specified VLAN tag.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Base network interface index.
- * @param[in] tag
- *   VLAN tag for VLAN network device to create.
- */
-uint32_t
-mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			 uint32_t ifindex, uint16_t tag)
-{
-	struct nlmsghdr *nlh;
-	struct ifinfomsg *ifm;
-	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
-
-	alignas(RTE_CACHE_LINE_SIZE)
-	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
-		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
-		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(name)) +
-		    NLMSG_ALIGN(sizeof("vlan")) +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
-	struct nlattr *na_info;
-	struct nlattr *na_vlan;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	memset(buf, 0, sizeof(buf));
-	nlh = (struct nlmsghdr *)buf;
-	nlh->nlmsg_len = sizeof(struct nlmsghdr);
-	nlh->nlmsg_type = RTM_NEWLINK;
-	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-			   NLM_F_EXCL | NLM_F_ACK;
-	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
-	nlh->nlmsg_len += sizeof(struct ifinfomsg);
-	ifm->ifi_family = AF_UNSPEC;
-	ifm->ifi_type = 0;
-	ifm->ifi_index = 0;
-	ifm->ifi_flags = IFF_UP;
-	ifm->ifi_change = 0xffffffff;
-	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
-	ret = snprintf(name, sizeof(name), "%s.%u.%u",
-		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
-	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
-	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
-	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
-	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
-	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
-	nl_attr_nest_end(nlh, na_vlan);
-	nl_attr_nest_end(nlh, na_info);
-	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-	if (ret < 0) {
-		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
-			ret);
-	}
-	// Try to get ifindex of created or pre-existing device.
-	ret = if_nametoindex(name);
-	if (!ret) {
-		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
-			errno);
-		return 0;
-	}
-	return ret;
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
deleted file mode 100644
index 9be87c0..0000000
--- a/drivers/net/mlx5/mlx5_nl.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_NL_H_
-#define RTE_PMD_MLX5_NL_H_
-
-#include <linux/netlink.h>
-
-
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_nl_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
-
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_nl_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_nl_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t vf_ifindex;
-	struct mlx5_nl_vlan_dev vlan_dev[4096];
-};
-
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			 struct rte_ether_addr *mac, uint32_t index);
-int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
-			    uint64_t *mac_own, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac_addrs, int n);
-void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-			    struct rte_ether_addr *mac_addrs, int n,
-			    uint64_t *mac_own);
-int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
-int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			      uint32_t ifindex);
-uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-				  uint32_t ifindex, uint16_t tag);
-
-#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fc1a91c..8e63b67 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -33,11 +33,11 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_nl.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 37/38] common/mlx5: support ROCE disable through Netlink
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (35 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 36/38] mlx5: share Netlink commands Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 38/38] vdpa/mlx5: disable ROCE Matan Azrad
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
Add new 4 Netlink commands to support enable/disable ROCE:
        1. mlx5_nl_devlink_family_id_get to get the Devlink family ID of
           Netlink general command.
        2. mlx5_nl_enable_roce_get to get the ROCE current status.
        3. mlx5_nl_driver_reload - to reload the device kernel driver.
        4. mlx5_nl_enable_roce_set - to set the ROCE status.
When the user changes the ROCE status, the IB device may disappear and
appear again, so DPDK driver should wait for it and to restart itself.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |   5 +
 drivers/common/mlx5/meson.build                 |   1 +
 drivers/common/mlx5/mlx5_nl.c                   | 366 +++++++++++++++++++++++-
 drivers/common/mlx5/mlx5_nl.h                   |   6 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   4 +
 5 files changed, 380 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 60bec3f..c4b7999 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -261,6 +261,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum IFLA_PHYS_PORT_NAME \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_DEVLINK \
+		linux/devlink.h \
+		define DEVLINK_GENL_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_SUPPORTED_40000baseKR4_Full \
 		/usr/include/linux/ethtool.h \
 		define SUPPORTED_40000baseKR4_Full \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 46c7c3bb..6dad6e2 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -168,6 +168,7 @@ if build
 		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
 		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
 		'mlx5dv_dump_dr_domain'],
+		[ 'HAVE_DEVLINK', 'linux/devlink.h', 'DEVLINK_GENL_NAME' ],
 	]
 	config = configuration_data()
 	foreach arg:has_sym_args
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
index b4fc053..0d1efd2 100644
--- a/drivers/common/mlx5/mlx5_nl.c
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -6,6 +6,7 @@
 #include <errno.h>
 #include <linux/if_link.h>
 #include <linux/rtnetlink.h>
+#include <linux/genetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
 #include <stdbool.h>
@@ -22,6 +23,10 @@
 
 #include "mlx5_nl.h"
 #include "mlx5_common_utils.h"
+#ifdef HAVE_DEVLINK
+#include <linux/devlink.h>
+#endif
+
 
 /* Size of the buffer to receive kernel messages */
 #define MLX5_NL_BUF_SIZE (32 * 1024)
@@ -90,6 +95,59 @@
 #define IFLA_PHYS_PORT_NAME 38
 #endif
 
+/*
+ * Some Devlink defines may be missed in old kernel versions,
+ * adjust used defines.
+ */
+#ifndef DEVLINK_GENL_NAME
+#define DEVLINK_GENL_NAME "devlink"
+#endif
+#ifndef DEVLINK_GENL_VERSION
+#define DEVLINK_GENL_VERSION 1
+#endif
+#ifndef DEVLINK_ATTR_BUS_NAME
+#define DEVLINK_ATTR_BUS_NAME 1
+#endif
+#ifndef DEVLINK_ATTR_DEV_NAME
+#define DEVLINK_ATTR_DEV_NAME 2
+#endif
+#ifndef DEVLINK_ATTR_PARAM
+#define DEVLINK_ATTR_PARAM 80
+#endif
+#ifndef DEVLINK_ATTR_PARAM_NAME
+#define DEVLINK_ATTR_PARAM_NAME 81
+#endif
+#ifndef DEVLINK_ATTR_PARAM_TYPE
+#define DEVLINK_ATTR_PARAM_TYPE 83
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUES_LIST
+#define DEVLINK_ATTR_PARAM_VALUES_LIST 84
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE
+#define DEVLINK_ATTR_PARAM_VALUE 85
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_DATA
+#define DEVLINK_ATTR_PARAM_VALUE_DATA 86
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_CMODE
+#define DEVLINK_ATTR_PARAM_VALUE_CMODE 87
+#endif
+#ifndef DEVLINK_PARAM_CMODE_DRIVERINIT
+#define DEVLINK_PARAM_CMODE_DRIVERINIT 1
+#endif
+#ifndef DEVLINK_CMD_RELOAD
+#define DEVLINK_CMD_RELOAD 37
+#endif
+#ifndef DEVLINK_CMD_PARAM_GET
+#define DEVLINK_CMD_PARAM_GET 38
+#endif
+#ifndef DEVLINK_CMD_PARAM_SET
+#define DEVLINK_CMD_PARAM_SET 39
+#endif
+#ifndef NLA_FLAG
+#define NLA_FLAG 6
+#endif
+
 /* Add/remove MAC address through Netlink */
 struct mlx5_nl_mac_addr {
 	struct rte_ether_addr (*mac)[];
@@ -1241,8 +1299,8 @@ struct mlx5_nl_ifindex_data {
 	struct nlattr *nla = nl_msg_tail(nlh);
 
 	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr)) + alen;
+	nlh->nlmsg_len += NLMSG_ALIGN(nla->nla_len);
 
 	if (alen)
 		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
@@ -1335,3 +1393,307 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
+
+/**
+ * Parse Netlink message to retrieve the general family ID.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_family_id_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	for (; nla->nla_len && nla < tail;
+	     nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len))) {
+		if (nla->nla_type == CTRL_ATTR_FAMILY_ID) {
+			*(uint16_t *)arg = *(uint16_t *)(nla + 1);
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
+#define MLX5_NL_MAX_ATTR_SIZE 100
+/**
+ * Get generic netlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] name
+ *   The family name.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_generic_family_id_get(int nlsk_fd, const char *name)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int name_size = strlen(name) + 1;
+	int ret;
+	uint16_t id = -1;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE)];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = GENL_ID_CTRL;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = CTRL_CMD_GETFAMILY;
+	genl->version = 1;
+	nl_attr_put(nlh, CTRL_ATTR_FAMILY_NAME, name, name_size);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_family_id_cb, &id);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get Netlink %s family ID: %d.", name,
+			ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Netlink \"%s\" family ID is %u.", name, id);
+	return (int)id;
+}
+
+/**
+ * Get Devlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+
+int
+mlx5_nl_devlink_family_id_get(int nlsk_fd)
+{
+	return mlx5_nl_generic_family_id_get(nlsk_fd, DEVLINK_GENL_NAME);
+}
+
+/**
+ * Parse Netlink message to retrieve the ROCE enable status.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_roce_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	int ret = -EINVAL;
+	int *enable = arg;
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	while (nla->nla_len && nla < tail) {
+		switch (nla->nla_type) {
+		/* Expected nested attributes case. */
+		case DEVLINK_ATTR_PARAM:
+		case DEVLINK_ATTR_PARAM_VALUES_LIST:
+		case DEVLINK_ATTR_PARAM_VALUE:
+			ret = 0;
+			nla += 1;
+			break;
+		case DEVLINK_ATTR_PARAM_VALUE_DATA:
+			*enable = 1;
+			return 0;
+		default:
+			nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len));
+		}
+	}
+	*enable = 0;
+	return ret;
+}
+
+/**
+ * Get ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   Where to store the enable status.
+ *
+ * @return
+ *   0 on success and @p enable is updated, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			int *enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	int cur_en;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 4 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 4];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_GET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_roce_cb, &cur_en);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable on device %s: %d.",
+			pci_addr, ret);
+		return ret;
+	}
+	*enable = cur_en;
+	DRV_LOG(DEBUG, "ROCE is %sabled for device \"%s\".",
+		cur_en ? "en" : "dis", pci_addr);
+	return ret;
+}
+
+/**
+ * Reload mlx5 device kernel driver through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 2 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 2];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_RELOAD;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to reload %s device by Netlink - %d",
+			pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device \"%s\" was reloaded by Netlink successfully.",
+		pci_addr);
+	return 0;
+}
+
+/**
+ * Set ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			int enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 6 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 6];
+	uint8_t cmode = DEVLINK_PARAM_CMODE_DRIVERINIT;
+	uint8_t ptype = NLA_FLAG;
+;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_SET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_CMODE, &cmode, sizeof(cmode));
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_TYPE, &ptype, sizeof(ptype));
+	if (enable)
+		nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, NULL, 0);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to %sable ROCE for device %s by Netlink:"
+			" %d.", enable ? "en" : "dis", pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device %s ROCE was %sabled by Netlink successfully.",
+		pci_addr, enable ? "en" : "dis");
+	/* Now, need to reload the driver. */
+	return mlx5_nl_driver_reload(nlsk_fd, family_id, pci_addr);
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
index 8e66a98..2c3f837 100644
--- a/drivers/common/mlx5/mlx5_nl.h
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -53,5 +53,11 @@ void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
+int mlx5_nl_devlink_family_id_get(int nlsk_fd);
+int mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			    int *enable);
+int mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr);
+int mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			    int enable);
 
 #endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 318a024..959f12c 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -26,6 +26,10 @@ DPDK_20.02 {
 	mlx5_dev_to_pci_addr;
 
 	mlx5_nl_allmulti;
+	mlx5_nl_devlink_family_id_get;
+	mlx5_nl_driver_reload;
+	mlx5_nl_enable_roce_get;
+	mlx5_nl_enable_roce_set;
 	mlx5_nl_ifindex;
 	mlx5_nl_init;
 	mlx5_nl_mac_addr_add;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v1 38/38] vdpa/mlx5: disable ROCE
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (36 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 37/38] common/mlx5: support ROCE disable through Netlink Matan Azrad
@ 2020-01-20 17:03 ` Matan Azrad
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
  39 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-20 17:03 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Thomas Monjalon
In order to support virtio queue creation by the FW, ROCE mode
should be disabled in the device.
Do it by netlink which is like the devlink tool commands:
	1. devlink dev param set pci/[pci] name enable_roce value false
	   cmode driverinit
    	2. devlink dev reload pci/[pci]
Or by sysfs which is like:
	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
The IB device is matched again after ROCE disabling.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile    |   2 +-
 drivers/vdpa/mlx5/meson.build |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c | 190 +++++++++++++++++++++++++++++++++++-------
 3 files changed, 160 insertions(+), 34 deletions(-)
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 62938b8..3dee270 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -29,7 +29,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 LDLIBS += -lrte_common_mlx5
-LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci -lrte_sched
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_pci -lrte_bus_pci -lrte_sched
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 60cefd7..7e46e34 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -9,7 +9,7 @@ endif
 
 fmt_name = 'mlx5_vdpa'
 allow_experimental_apis = true
-deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
+deps += ['hash', 'common_mlx5', 'vhost', 'pci', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 2ceef18..5ea5739 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,15 +1,19 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
+#include <unistd.h>
+
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
+#include <rte_pci.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
@@ -228,6 +232,144 @@
 	.get_notify_area = NULL,
 };
 
+static struct ibv_device *
+mlx5_vdpa_get_ib_device_match(struct rte_pci_addr *addr)
+{
+	int n;
+	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
+	struct ibv_device *ibv_match = NULL;
+
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		return NULL;
+	}
+	while (n-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
+		if (mlx5_dev_to_pci_addr(ibv_list[n]->ibdev_path, &pci_addr))
+			continue;
+		if (memcmp(addr, &pci_addr, sizeof(pci_addr)))
+			continue;
+		ibv_match = ibv_list[n];
+		break;
+	}
+	if (!ibv_match)
+		rte_errno = ENOENT;
+	return ibv_match;
+}
+
+/* Try to disable ROCE by Netlink\Devlink. */
+static int
+mlx5_vdpa_nl_roce_disable(const char *addr)
+{
+	int nlsk_fd = mlx5_nl_init(NETLINK_GENERIC);
+	int devlink_id;
+	int enable;
+	int ret;
+
+	if (nlsk_fd < 0)
+		return nlsk_fd;
+	devlink_id = mlx5_nl_devlink_family_id_get(nlsk_fd);
+	if (devlink_id < 0) {
+		ret = devlink_id;
+		DRV_LOG(DEBUG, "Failed to get devlink id for ROCE operations by"
+			" Netlink.");
+		goto close;
+	}
+	ret = mlx5_nl_enable_roce_get(nlsk_fd, devlink_id, addr, &enable);
+	if (ret) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable by Netlink: %d.",
+			ret);
+		goto close;
+	} else if (!enable) {
+		DRV_LOG(INFO, "ROCE has already disabled(Netlink).");
+		goto close;
+	}
+	ret = mlx5_nl_enable_roce_set(nlsk_fd, devlink_id, addr, 0);
+	if (ret)
+		DRV_LOG(DEBUG, "Failed to disable ROCE by Netlink: %d.", ret);
+	else
+		DRV_LOG(INFO, "ROCE is disabled by Netlink successfully.");
+close:
+	close(nlsk_fd);
+	return ret;
+}
+
+/* Try to disable ROCE by sysfs. */
+static int
+mlx5_vdpa_sys_roce_disable(const char *addr)
+{
+	FILE *file_o;
+	int enable;
+	int ret;
+
+	MKSTR(file_p, "/sys/bus/pci/devices/%s/roce_enable", addr);
+	file_o = fopen(file_p, "rb");
+	if (!file_o) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	ret = fscanf(file_o, "%d", &enable);
+	if (ret != 1) {
+		rte_errno = EINVAL;
+		ret = EINVAL;
+		goto close;
+	} else if (!enable) {
+		ret = 0;
+		DRV_LOG(INFO, "ROCE has already disabled(sysfs).");
+		goto close;
+	}
+	fclose(file_o);
+	file_o = fopen(file_p, "wb");
+	if (!file_o) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	fprintf(file_o, "0\n");
+	ret = 0;
+close:
+	if (ret)
+		DRV_LOG(DEBUG, "Failed to disable ROCE by sysfs: %d.", ret);
+	else
+		DRV_LOG(INFO, "ROCE is disabled by sysfs successfully.");
+	fclose(file_o);
+	return ret;
+}
+
+#define MLX5_VDPA_MAX_RETRIES 20
+#define MLX5_VDPA_USEC 1000
+static int
+mlx5_vdpa_roce_disable(struct rte_pci_addr *addr, struct ibv_device **ibv)
+{
+	char addr_name[64] = {0};
+
+	rte_pci_device_name(addr, addr_name, sizeof(addr_name));
+	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
+	if (mlx5_vdpa_nl_roce_disable(addr_name) == 0 ||
+	    mlx5_vdpa_sys_roce_disable(addr_name) == 0) {
+		/*
+		 * Succeed to disable ROCE, wait for the IB device to appear
+		 * again after reload.
+		 */
+		int r;
+		struct ibv_device *ibv_new;
+
+		for (r = MLX5_VDPA_MAX_RETRIES; r; r--) {
+			ibv_new = mlx5_vdpa_get_ib_device_match(addr);
+			if (ibv_new) {
+				*ibv = ibv_new;
+				return 0;
+			}
+			usleep(MLX5_VDPA_USEC);
+		}
+		DRV_LOG(ERR, "Cannot much device %s after ROCE disable, "
+			"retries exceed %d", addr_name, MLX5_VDPA_MAX_RETRIES);
+		rte_errno = EAGAIN;
+	}
+	return -rte_errno;
+}
+
 /**
  * DPDK callback to register a PCI device.
  *
@@ -245,8 +387,7 @@
 mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		    struct rte_pci_device *pci_dev)
 {
-	struct ibv_device **ibv_list;
-	struct ibv_device *ibv_match = NULL;
+	struct ibv_device *ibv;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx;
 	struct mlx5_hca_attr attr;
@@ -257,41 +398,26 @@
 			" devargs.");
 		return 1;
 	}
-	errno = 0;
-	ibv_list = mlx5_glue->get_device_list(&ret);
-	if (!ibv_list) {
-		rte_errno = errno;
-		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
-		return -ENOSYS;
-	}
-	while (ret-- > 0) {
-		struct rte_pci_addr pci_addr;
-
-		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
-		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
-			continue;
-		if (pci_dev->addr.domain != pci_addr.domain ||
-		    pci_dev->addr.bus != pci_addr.bus ||
-		    pci_dev->addr.devid != pci_addr.devid ||
-		    pci_dev->addr.function != pci_addr.function)
-			continue;
-		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-			ibv_list[ret]->name);
-		ibv_match = ibv_list[ret];
-		break;
-	}
-	if (!ibv_match) {
+	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
+	if (!ibv) {
 		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
+			PCI_PRI_FMT ".", pci_dev->addr.domain,
+			pci_dev->addr.bus, pci_dev->addr.devid,
+			pci_dev->addr.function);
 		rte_errno = ENOENT;
 		return -rte_errno;
+	} else {
+		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
+			ibv->name);
+	}
+	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
+		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
+			ibv->name);
+		//return -rte_errno;
 	}
-	ctx = mlx5_glue->dv_open_device(ibv_match);
+	ctx = mlx5_glue->dv_open_device(ibv);
 	if (!ctx) {
-		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
-			ibv_match->name);
+		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (37 preceding siblings ...)
  2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 38/38] vdpa/mlx5: disable ROCE Matan Azrad
@ 2020-01-28 10:05 ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 01/25] net/mlx5: separate DevX commands interface Matan Azrad
                     ` (25 more replies)
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
  39 siblings, 26 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Steps:
- Prepare net/mlx5 for code sharing.
- Introduce new common lib for mlx5 devices.
- Share code from net/mlx5 to common/mlx5.
v2:
- Reorder patches for 2 serieses - this is the first one for common directory and vDPA preparation,
  the second will be sent later for vDPA new driver part.
- Fix spelling and per patch complition issues.
- moved to use claim_zero instead of pure asserts.
- improve title names.
Matan Azrad (25):
  net/mlx5: separate DevX commands interface
  drivers: introduce mlx5 common library
  common/mlx5: share the mlx5 glue reference
  common/mlx5: share mlx5 PCI device detection
  common/mlx5: share mlx5 devices information
  common/mlx5: share CQ entry check
  common/mlx5: add query vDPA DevX capabilities
  common/mlx5: glue null memory region allocation
  common/mlx5: support DevX indirect mkey creation
  common/mlx5: glue event queue query
  common/mlx5: glue event interrupt commands
  common/mlx5: glue UAR allocation
  common/mlx5: add DevX command to create CQ
  common/mlx5: glue VAR allocation
  common/mlx5: add DevX virtq commands
  common/mlx5: add support for DevX QP operations
  common/mlx5: allow type configuration for DevX RQT
  common/mlx5: add TIR field constants
  common/mlx5: add DevX command to modify RQT
  common/mlx5: get DevX capability for max RQT size
  net/mlx5: select driver by vDPA device argument
  net/mlx5: separate Netlink command interface
  net/mlx5: reduce Netlink commands dependencies
  common/mlx5: share Netlink commands
  common/mlx5: support ROCE disable through Netlink
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  347 ++++
 drivers/common/mlx5/meson.build                 |  210 ++
 drivers/common/mlx5/mlx5_common.c               |  332 +++
 drivers/common/mlx5/mlx5_common.h               |  214 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            | 1530 ++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  351 ++++
 drivers/common/mlx5/mlx5_glue.c                 | 1296 ++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  305 +++
 drivers/common/mlx5/mlx5_nl.c                   | 1699 +++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   63 +
 drivers/common/mlx5/mlx5_prm.h                  | 2542 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   50 +
 drivers/net/mlx5/Makefile                       |  307 +--
 drivers/net/mlx5/meson.build                    |  257 +--
 drivers/net/mlx5/mlx5.c                         |  194 +-
 drivers/net/mlx5/mlx5.h                         |  326 +--
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_devx_cmds.c               |  969 ---------
 drivers/net/mlx5/mlx5_ethdev.c                  |  161 +-
 drivers/net/mlx5/mlx5_flow.c                    |   12 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |   12 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 ----------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ---
 drivers/net/mlx5/mlx5_mac.c                     |   16 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_nl.c                      | 1402 -------------
 drivers/net/mlx5/mlx5_prm.h                     | 1888 -----------------
 drivers/net/mlx5/mlx5_rss.c                     |    2 +-
 drivers/net/mlx5/mlx5_rxmode.c                  |   12 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    7 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |   46 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    5 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |  137 +-
 mk/rte.app.mk                                   |    1 +
 49 files changed, 9273 insertions(+), 7000 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 01/25] net/mlx5: separate DevX commands interface
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 02/25] drivers: introduce mlx5 common library Matan Azrad
                     ` (24 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The DevX commands interface is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency on DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |   1 +
 drivers/net/mlx5/mlx5.h           | 221 +-----------------------------------
 drivers/net/mlx5/mlx5_devx_cmds.c |  33 +++---
 drivers/net/mlx5/mlx5_devx_cmds.h | 231 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c    |   1 +
 drivers/net/mlx5/mlx5_flow.c      |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c   |   1 +
 drivers/net/mlx5/mlx5_rxq.c       |   1 +
 drivers/net/mlx5/mlx5_rxtx.c      |   1 +
 drivers/net/mlx5/mlx5_txq.c       |   1 +
 drivers/net/mlx5/mlx5_vlan.c      |   1 +
 11 files changed, 263 insertions(+), 234 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2049370..7126edf 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -46,6 +46,7 @@
 #include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5818349..4d0485d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -38,6 +38,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
@@ -156,62 +157,6 @@ struct mlx5_stats_ctrl {
 	uint64_t imissed_base;
 };
 
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint32_t flow_meter_reg_share:1;
-	/* Whether reg_c share is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
 /* Flow list . */
 TAILQ_HEAD(mlx5_flows, rte_flow);
 
@@ -291,133 +236,6 @@ struct mlx5_dev_config {
 	struct mlx5_lro_config lro; /* LRO configuration. */
 };
 
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
 
 /**
  * Type of object being allocated.
@@ -1026,43 +844,6 @@ void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
 void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
 			    struct mlx5_vf_vlan *vf_vlan);
 
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					     struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				struct mlx5_devx_create_rq_attr *rq_attr,
-				int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
-	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq
-	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
-	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-
-int mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh, FILE *file);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 282d501..62ca590 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -1,13 +1,15 @@
 // SPDX-License-Identifier: BSD-3-Clause
 /* Copyright 2018 Mellanox Technologies, Ltd */
 
+#include <unistd.h>
+
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
-#include <unistd.h>
 
-#include "mlx5.h"
-#include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_utils.h"
+
 
 /**
  * Allocate flow counters via devx interface.
@@ -936,8 +938,12 @@ struct mlx5_devx_obj *
 /**
  * Dump all flows to file.
  *
- * @param[in] sh
- *   Pointer to context.
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
  * @param[out] file
  *   Pointer to file stream.
  *
@@ -945,23 +951,24 @@ struct mlx5_devx_obj *
  *   0 on success, a nagative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh __rte_unused,
-			FILE *file __rte_unused)
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
 {
 	int ret = 0;
 
 #ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (sh->fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, sh->fdb_domain);
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
 		if (ret)
 			return ret;
 	}
-	assert(sh->rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->rx_domain);
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
 	if (ret)
 		return ret;
-	assert(sh->tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->tx_domain);
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
 #else
 	ret = ENOTSUP;
 #endif
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..2d58d96
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t flow_meter_reg_share:1;
+	/* Whether reg_c share is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+/* HCA attributes. */
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3b4c5db..ce0109c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include "mlx5.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 5c9fea6..983b1c3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -31,6 +31,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_flow.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
@@ -5692,6 +5693,8 @@ struct mlx5_flow_counter *
 		   struct rte_flow_error *error __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
 
-	return mlx5_devx_cmd_flow_dump(priv->sh, file);
+	return mlx5_devx_cmd_flow_dump(sh->fdb_domain, sh->rx_domain,
+				       sh->tx_domain, file);
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b90734e..653d649 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -32,6 +32,7 @@
 #include "mlx5.h"
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_flow.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4092cb7..371b996 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -37,6 +37,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5e31f01..5a03556 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -29,6 +29,7 @@
 #include <rte_flow.h>
 
 #include "mlx5.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1a76f6e..5adb4dc 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -34,6 +34,7 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index 5f6554a..feac0f1 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -30,6 +30,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 02/25] drivers: introduce mlx5 common library
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 01/25] net/mlx5: separate DevX commands interface Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
                     ` (23 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  331 ++++
 drivers/common/mlx5/meson.build                 |  205 +++
 drivers/common/mlx5/mlx5_common.c               |   17 +
 drivers/common/mlx5/mlx5_common.h               |   87 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            |  976 ++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  231 +++
 drivers/common/mlx5/mlx5_glue.c                 | 1138 ++++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  265 ++++
 drivers/common/mlx5/mlx5_prm.h                  | 1889 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   20 +
 drivers/net/mlx5/Makefile                       |  309 +---
 drivers/net/mlx5/meson.build                    |  256 +--
 drivers/net/mlx5/mlx5.c                         |    7 +-
 drivers/net/mlx5/mlx5.h                         |    9 +-
 drivers/net/mlx5/mlx5_devx_cmds.c               |  976 ------------
 drivers/net/mlx5/mlx5_devx_cmds.h               |  231 ---
 drivers/net/mlx5/mlx5_ethdev.c                  |    5 +-
 drivers/net/mlx5/mlx5_flow.c                    |    9 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |    9 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 --------------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ----
 drivers/net/mlx5/mlx5_mac.c                     |    2 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_prm.h                     | 1888 ----------------------
                      |    2 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    8 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    2 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |    5 +-
 mk/rte.app.mk                                   |    1 +
 45 files changed, 5313 insertions(+), 5144 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 94bccae..150d507 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -737,6 +737,7 @@ M: Matan Azrad <matan@mellanox.com>
 M: Shahaf Shuler <shahafs@mellanox.com>
 M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
 T: git://dpdk.org/next/dpdk-next-net-mlx
+F: drivers/common/mlx5/
 F: drivers/net/mlx5/
 F: buildtools/options-ibverbs-static.sh
 F: doc/guides/nics/mlx5.rst
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 3254c52..4775d4b 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,4 +35,8 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+DIRS-y += mlx5
+endif
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/common/meson.build b/drivers/common/meson.build
index fc620f7..ffd06e2 100644
--- a/drivers/common/meson.build
+++ b/drivers/common/meson.build
@@ -2,6 +2,6 @@
 # Copyright(c) 2018 Cavium, Inc
 
 std_deps = ['eal']
-drivers = ['cpt', 'dpaax', 'iavf', 'mvep', 'octeontx', 'octeontx2', 'qat']
+drivers = ['cpt', 'dpaax', 'iavf', 'mlx5', 'mvep', 'octeontx', 'octeontx2', 'qat']
 config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON'
 driver_name_fmt = 'rte_common_@0@'
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
new file mode 100644
index 0000000..b94d3c0
--- /dev/null
+++ b/drivers/common/mlx5/Makefile
@@ -0,0 +1,331 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_common_mlx5.a
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
+LIB_GLUE_VERSION = 20.02.0
+
+# Sources.
+ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+endif
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+endif
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+CFLAGS_mlx5_glue.o += -fPIC
+LDLIBS += -ldl
+else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
+LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
+else
+LDLIBS += -libverbs -lmlx5
+endif
+
+LDLIBS += -lrte_eal
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
+
+EXPORT_MAP := rte_common_mlx5_version.map
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up mlx5_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+mlx5_autoconf.h.new: FORCE
+
+mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
+	$Q $(RM) -f -- '$@'
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_MPLS_SUPPORT \
+		infiniband/verbs.h \
+		enum IBV_FLOW_SPEC_MPLS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAG_RX_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_SWP \
+		infiniband/mlx5dv.h \
+		type 'struct mlx5dv_sw_parsing_caps' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_MPW \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DV_SUPPORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_create_flow_action_packet_reformat \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ESWITCH \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_VLAN \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_push_vlan \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_DEVX_PORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_query_devx_port \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_OBJ \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_create \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DEVX_COUNTERS \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_ASYNC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_query_async \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_dest_devx_tir \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_flow_meter \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_FLOW_DUMP \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dump_dr_domain \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
+		infiniband/mlx5dv.h \
+		enum MLX5_MMAP_GET_NC_PAGES_CMD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_25G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_50G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_100G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
+		infiniband/verbs.h \
+		type 'struct ibv_counter_set_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
+		infiniband/verbs.h \
+		type 'struct ibv_counters_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NL_NLDEV \
+		rdma/rdma_netlink.h \
+		enum RDMA_NL_NLDEV \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_PORT_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_PORT_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_PORT_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_SWITCH_ID \
+		linux/if_link.h \
+		enum IFLA_PHYS_SWITCH_ID \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_PORT_NAME \
+		linux/if_link.h \
+		enum IFLA_PHYS_PORT_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_STATIC_ASSERT \
+		/usr/include/assert.h \
+		define static_assert \
+		$(AUTOCONF_OUTPUT)
+
+# Create mlx5_autoconf.h or update it in case it differs from the new one.
+
+mlx5_autoconf.h: mlx5_autoconf.h.new
+	$Q [ -f '$@' ] && \
+		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
+		mv '$<' '$@'
+
+$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+
+# Generate dependency plug-in for rdma-core when the PMD must not be linked
+# directly, so that applications do not inherit this dependency.
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+
+$(LIB): $(LIB_GLUE)
+
+ifeq ($(LINK_USING_CC),1)
+GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
+else
+GLUE_LDFLAGS := $(LDFLAGS)
+endif
+$(LIB_GLUE): mlx5_glue.o
+	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
+		-shared -o $@ $< -libverbs -lmlx5
+
+mlx5_glue.o: mlx5_autoconf.h
+
+endif
+
+clean_mlx5: FORCE
+	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
+	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
+
+clean: clean_mlx5
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
new file mode 100644
index 0000000..bfd07f9
--- /dev/null
+++ b/drivers/common/mlx5/meson.build
@@ -0,0 +1,205 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+build = true
+
+pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
+LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
+LIB_GLUE_VERSION = '20.02.0'
+LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
+if pmd_dlopen
+	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
+	cflags += [
+		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
+		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
+	]
+endif
+
+libnames = [ 'mlx5', 'ibverbs' ]
+libs = []
+foreach libname:libnames
+	lib = dependency('lib' + libname, required:false)
+	if not lib.found()
+		lib = cc.find_library(libname, required:false)
+	endif
+	if lib.found()
+		libs += lib
+		ext_deps += lib
+	else
+		build = false
+		reason = 'missing dependency, "' + libname + '"'
+	endif
+endforeach
+
+if build
+	allow_experimental_apis = true
+	deps += ['hash', 'pci', 'net', 'eal']
+	sources = files(
+		'mlx5_devx_cmds.c',
+		'mlx5_common.c',
+	)
+	if not pmd_dlopen
+		sources += files('mlx5_glue.c')
+	endif
+	cflags_options = [
+		'-std=c11',
+		'-Wno-strict-prototypes',
+		'-D_BSD_SOURCE',
+		'-D_DEFAULT_SOURCE',
+		'-D_XOPEN_SOURCE=600'
+	]
+	foreach option:cflags_options
+		if cc.has_argument(option)
+			cflags += option
+		endif
+	endforeach
+	if get_option('buildtype').contains('debug')
+		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+	else
+		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
+	endif
+	# To maintain the compatibility with the make build system
+	# mlx5_autoconf.h file is still generated.
+	# input array for meson member search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search", "struct member to search" ]
+	has_member_args = [
+		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
+		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
+		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
+		'struct ibv_counters_init_attr', 'comp_mask' ],
+	]
+	# input array for meson symbol search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search" ]
+	has_sym_args = [
+		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
+		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
+		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
+		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_create_flow_action_packet_reformat' ],
+		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
+		'IBV_FLOW_SPEC_MPLS' ],
+		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
+		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAG_RX_END_PADDING' ],
+		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_query_devx_port' ],
+		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_create' ],
+		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
+		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
+		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_query_async' ],
+		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_flow_meter' ],
+		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
+		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
+		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
+		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
+		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseLR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseLR4_Full' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
+		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
+		'IFLA_PHYS_SWITCH_ID' ],
+		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
+		'IFLA_PHYS_PORT_NAME' ],
+		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
+		'RDMA_NL_NLDEV' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_GET' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_PORT_GET' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_NAME' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
+		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
+		'mlx5dv_dump_dr_domain'],
+	]
+	config = configuration_data()
+	foreach arg:has_sym_args
+		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
+			dependencies: libs))
+	endforeach
+	foreach arg:has_member_args
+		file_prefix = '#include <' + arg[1] + '>'
+		config.set(arg[0], cc.has_member(arg[2], arg[3],
+			prefix : file_prefix, dependencies: libs))
+	endforeach
+	configure_file(output : 'mlx5_autoconf.h', configuration : config)
+endif
+# Build Glue Library
+if pmd_dlopen and build
+	dlopen_name = 'mlx5_glue'
+	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
+	dlopen_so_version = LIB_GLUE_VERSION
+	dlopen_sources = files('mlx5_glue.c')
+	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
+	dlopen_includes = [global_inc]
+	dlopen_includes += include_directories(
+		'../../../lib/librte_eal/common/include/generic',
+	)
+	shared_lib = shared_library(
+		dlopen_lib_name,
+		dlopen_sources,
+		include_directories: dlopen_includes,
+		c_args: cflags,
+		dependencies: libs,
+		link_args: [
+		'-Wl,-export-dynamic',
+		'-Wl,-h,@0@'.format(LIB_GLUE),
+		],
+		soversion: dlopen_so_version,
+		install: true,
+		install_dir: dlopen_install_dir,
+	)
+endif
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
new file mode 100644
index 0000000..14ebd30
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#include "mlx5_common.h"
+
+
+int mlx5_common_logtype;
+
+
+RTE_INIT(rte_mlx5_common_pmd_init)
+{
+	/* Initialize driver log type. */
+	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
+	if (mlx5_common_logtype >= 0)
+		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
new file mode 100644
index 0000000..9f10def
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_H_
+#define RTE_PMD_MLX5_COMMON_H_
+
+#include <assert.h>
+
+#include <rte_log.h>
+
+
+/*
+ * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
+ * manner.
+ */
+#define PMD_DRV_LOG_STRIP(a, b) a
+#define PMD_DRV_LOG_OPAREN (
+#define PMD_DRV_LOG_CPAREN )
+#define PMD_DRV_LOG_COMMA ,
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+	const char *n = s;
+
+	while (*n)
+		if (*(n++) == '/')
+			s = n;
+	return s;
+}
+
+#define PMD_DRV_LOG___(level, type, name, ...) \
+	rte_log(RTE_LOG_ ## level, \
+		type, \
+		RTE_FMT(name ": " \
+			RTE_FMT_HEAD(__VA_ARGS__,), \
+		RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
+ */
+#ifndef NDEBUG
+
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, "%s:%u: %s(): " __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name,\
+		s "\n" PMD_DRV_LOG_COMMA \
+		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
+		__LINE__ PMD_DRV_LOG_COMMA \
+		__func__, \
+		__VA_ARGS__)
+
+#else /* NDEBUG */
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name, s "\n", __VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* claim_zero() does not perform any check when debugging is disabled. */
+#ifndef NDEBUG
+
+#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
+
+#else /* NDEBUG */
+
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+#define claim_nonzero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
+	char name[mkstr_size_##name + 1]; \
+	\
+	snprintf(name, sizeof(name), "" __VA_ARGS__)
+
+#endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_utils.h b/drivers/common/mlx5/mlx5_common_utils.h
new file mode 100644
index 0000000..32c3adf
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_UTILS_H_
+#define RTE_PMD_MLX5_COMMON_UTILS_H_
+
+#include "mlx5_common.h"
+
+
+extern int mlx5_common_logtype;
+
+#define MLX5_COMMON_LOG_PREFIX "common_mlx5"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_common_logtype, MLX5_COMMON_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_COMMON_UTILS_H_ */
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
new file mode 100644
index 0000000..4d94f92
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -0,0 +1,976 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/* Copyright 2018 Mellanox Technologies, Ltd */
+
+#include <unistd.h>
+
+#include <rte_errno.h>
+#include <rte_malloc.h>
+
+#include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_common_utils.h"
+
+
+/**
+ * Allocate flow counters via devx interface.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param dcs
+ *   Pointer to counters properties structure to be filled by the routine.
+ * @param bulk_n_128
+ *   Bulk counter numbers in 128 counters units.
+ *
+ * @return
+ *   Pointer to counter object on success, a negative value otherwise and
+ *   rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
+{
+	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		rte_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
+/**
+ * Query flow counters values.
+ *
+ * @param[in] dcs
+ *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
+ * @param[in] clear
+ *   Whether hardware should clear the counters after the query or not.
+ * @param[in] n_counters
+ *   0 in case of 1 counter to read, otherwise the counter number to read.
+ *  @param pkts
+ *   The number of packets that matched the flow.
+ *  @param bytes
+ *    The number of bytes that matched the flow.
+ *  @param mkey
+ *   The mkey key for batch query.
+ *  @param addr
+ *    The address in the mkey range for batch query.
+ *  @param cmd_comp
+ *   The completion object for asynchronous batch query.
+ *  @param async_id
+ *    The ID to be returned in the asynchronous batch query response.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				 int clear, uint32_t n_counters,
+				 uint64_t *pkts, uint64_t *bytes,
+				 uint32_t mkey, void *addr,
+				 struct mlx5dv_devx_cmd_comp *cmd_comp,
+				 uint64_t async_id)
+{
+	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			MLX5_ST_SZ_BYTES(traffic_counter);
+	uint32_t out[out_len];
+	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
+	void *stats;
+	int rc;
+
+	MLX5_SET(query_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
+	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
+	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
+	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
+
+	if (n_counters) {
+		MLX5_SET(query_flow_counter_in, in, num_of_counters,
+			 n_counters);
+		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
+		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
+		MLX5_SET64(query_flow_counter_in, in, address,
+			   (uint64_t)(uintptr_t)addr);
+	}
+	if (!cmd_comp)
+		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
+					       out_len);
+	else
+		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
+						     out_len, async_id,
+						     cmd_comp);
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
+		rte_errno = rc;
+		return -rc;
+	}
+	if (!n_counters) {
+		stats = MLX5_ADDR_OF(query_flow_counter_out,
+				     out, flow_statistics);
+		*pkts = MLX5_GET64(traffic_counter, stats, packets);
+		*bytes = MLX5_GET64(traffic_counter, stats, octets);
+	}
+	return 0;
+}
+
+/**
+ * Create a new mkey.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] attr
+ *   Attributes of the requested mkey.
+ *
+ * @return
+ *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
+ *   is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+			  struct mlx5_devx_mkey_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
+	void *mkc;
+	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
+	size_t pgsize;
+	uint32_t translation_size;
+
+	if (!mkey) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pgsize = sysconf(_SC_PAGESIZE);
+	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 translation_size);
+	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 0x1);
+	MLX5_SET(mkc, mkc, lr, 0x1);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, attr->pd);
+	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
+	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
+	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
+	MLX5_SET64(mkc, mkc, len, attr->size);
+	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+					       sizeof(out));
+	if (!mkey->obj) {
+		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		rte_errno = errno;
+		rte_free(mkey);
+		return NULL;
+	}
+	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
+	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
+	return mkey;
+}
+
+/**
+ * Get status of devx command response.
+ * Mainly used for asynchronous commands.
+ *
+ * @param[in] out
+ *   The out response buffer.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+int
+mlx5_devx_get_out_command_status(void *out)
+{
+	int status;
+
+	if (!out)
+		return -EINVAL;
+	status = MLX5_GET(query_flow_counter_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
+
+		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
+			syndrome);
+	}
+	return status;
+}
+
+/**
+ * Destroy any object allocated by a Devx API.
+ *
+ * @param[in] obj
+ *   Pointer to a general object.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
+{
+	int ret;
+
+	if (!obj)
+		return 0;
+	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
+	rte_free(obj);
+	return ret;
+}
+
+/**
+ * Query NIC vport context.
+ * Fills minimal inline attribute.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] vport
+ *   vport index
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+static int
+mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
+				      unsigned int vport,
+				      struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
+	void *vctx;
+	int status, syndrome, rc;
+
+	/* Query NIC vport context to determine inline mode. */
+	MLX5_SET(query_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+	if (vport)
+		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_nic_vport_context_out, out, status);
+	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+			    nic_vport_context);
+	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
+					   min_wqe_inline_mode);
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query HCA attributes.
+ * Using those attributes we can check on run time if the device
+ * is having the required capabilities.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+			     struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr;
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in), out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->flow_counter_bulk_alloc_bitmap =
+			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
+	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
+					    flow_counters_dump);
+	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
+	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
+					  eth_net_offloads);
+	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
+	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
+					       flex_parser_protocols);
+	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	if (attr->qos.sup) {
+		MLX5_SET(query_hca_cap_in, in, op_mod,
+			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
+			 MLX5_HCA_CAP_OPMOD_GET_CUR);
+		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
+						 out, sizeof(out));
+		if (rc)
+			goto error;
+		if (status) {
+			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
+				" status %x, syndrome = %x",
+				status, syndrome);
+			return -1;
+		}
+		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+		attr->qos.srtcm_sup =
+				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
+		attr->qos.log_max_flow_meter =
+				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
+		attr->qos.flow_meter_reg_c_ids =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
+		attr->qos.flow_meter_reg_share =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+	}
+	if (!attr->eth_net_offloads)
+		return 0;
+
+	/* Query HCA offloads for Ethernet protocol. */
+	memset(in, 0, sizeof(in));
+	memset(out, 0, sizeof(out));
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc) {
+		attr->eth_net_offloads = 0;
+		goto error;
+	}
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		attr->eth_net_offloads = 0;
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_vlan_insert);
+	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_cap);
+	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
+					hcattr, tunnel_lro_gre);
+	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
+					  hcattr, tunnel_lro_vxlan);
+	attr->lro_max_msg_sz_mode = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, lro_max_msg_sz_mode);
+	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
+		attr->lro_timer_supported_periods[i] =
+			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_timer_supported_periods[i]);
+	}
+	attr->tunnel_stateless_geneve_rx =
+			    MLX5_GET(per_protocol_networking_offload_caps,
+				     hcattr, tunnel_stateless_geneve_rx);
+	attr->geneve_max_opt_len =
+		    MLX5_GET(per_protocol_networking_offload_caps,
+			     hcattr, max_geneve_opt_len);
+	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_inline_mode);
+	attr->tunnel_stateless_gtp = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, tunnel_stateless_gtp);
+	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return 0;
+	if (attr->eth_virt) {
+		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
+		if (rc) {
+			attr->eth_virt = 0;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query TIS transport domain from QP verbs object using DevX API.
+ *
+ * @param[in] qp
+ *   Pointer to verbs QP returned by ibv_create_qp .
+ * @param[in] tis_num
+ *   TIS number of TIS to query.
+ * @param[out] tis_td
+ *   Pointer to TIS transport domain variable, to be set by the routine.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+			      uint32_t *tis_td)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
+	int rc;
+	void *tis_ctx;
+
+	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
+	MLX5_SET(query_tis_in, in, tisn, tis_num);
+	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query QP using DevX");
+		return -rc;
+	};
+	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
+	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
+	return 0;
+}
+
+/**
+ * Fill WQ data for DevX API command.
+ * Utility function for use when creating DevX objects containing a WQ.
+ *
+ * @param[in] wq_ctx
+ *   Pointer to WQ context to fill with data.
+ * @param [in] wq_attr
+ *   Pointer to WQ attributes structure to fill in WQ context.
+ */
+static void
+devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
+{
+	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
+	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
+	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
+	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
+	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
+	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
+	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
+	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
+	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
+	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
+	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
+	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
+	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
+	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
+	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
+	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
+	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
+	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
+	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
+		 wq_attr->log_hairpin_num_packets);
+	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
+	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
+		 wq_attr->single_wqe_log_num_of_strides);
+	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
+	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
+		 wq_attr->single_stride_log_num_of_bytes);
+	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
+	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
+	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
+}
+
+/**
+ * Create RQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rq_attr
+ *   Pointer to create RQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+			struct mlx5_devx_create_rq_attr *rq_attr,
+			int socket)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *rq = NULL;
+
+	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
+	if (!rq) {
+		DRV_LOG(ERR, "Failed to allocate RQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
+	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
+	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
+	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
+	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
+	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
+	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
+	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
+	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+	wq_attr = &rq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						  out, sizeof(out));
+	if (!rq->obj) {
+		DRV_LOG(ERR, "Failed to create RQ using DevX");
+		rte_errno = errno;
+		rte_free(rq);
+		return NULL;
+	}
+	rq->id = MLX5_GET(create_rq_out, out, rqn);
+	return rq;
+}
+
+/**
+ * Modify RQ using DevX API.
+ *
+ * @param[in] rq
+ *   Pointer to RQ object structure.
+ * @param [in] rq_attr
+ *   Pointer to modify RQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			struct mlx5_devx_modify_rq_attr *rq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	int ret;
+
+	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
+	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
+	MLX5_SET(modify_rq_in, in, rqn, rq->id);
+	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
+	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
+		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
+		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
+		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
+		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
+	}
+	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIR using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tir_attr
+ *   Pointer to TIR attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+			 struct mlx5_devx_tir_attr *tir_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
+	void *tir_ctx, *outer, *inner;
+	struct mlx5_devx_obj *tir = NULL;
+	int i;
+
+	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
+	if (!tir) {
+		DRV_LOG(ERR, "Failed to allocate TIR data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
+	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
+	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
+	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
+		 tir_attr->lro_timeout_period_usecs);
+	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
+	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
+	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
+	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
+	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
+		 tir_attr->tunneled_offload_en);
+	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
+	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
+	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
+	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
+	for (i = 0; i < 10; i++) {
+		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
+			 tir_attr->rx_hash_toeplitz_key[i]);
+	}
+	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
+	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, selected_fields,
+		 tir_attr->rx_hash_field_selector_outer.selected_fields);
+	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
+	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, selected_fields,
+		 tir_attr->rx_hash_field_selector_inner.selected_fields);
+	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						   out, sizeof(out));
+	if (!tir->obj) {
+		DRV_LOG(ERR, "Failed to create TIR using DevX");
+		rte_errno = errno;
+		rte_free(tir);
+		return NULL;
+	}
+	tir->id = MLX5_GET(create_tir_out, out, tirn);
+	return tir;
+}
+
+/**
+ * Create RQT using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t *in = NULL;
+	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
+	void *rqt_ctx;
+	struct mlx5_devx_obj *rqt = NULL;
+	int i;
+
+	in = rte_calloc(__func__, 1, inlen, 0);
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT IN data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
+	if (!rqt) {
+		DRV_LOG(ERR, "Failed to allocate RQT data");
+		rte_errno = ENOMEM;
+		rte_free(in);
+		return NULL;
+	}
+	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
+	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (!rqt->obj) {
+		DRV_LOG(ERR, "Failed to create RQT using DevX");
+		rte_errno = errno;
+		rte_free(rqt);
+		return NULL;
+	}
+	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
+	return rqt;
+}
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
+
+/**
+ * Dump all flows to file.
+ *
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
+ * @param[out] file
+ *   Pointer to file stream.
+ *
+ * @return
+ *   0 on success, a nagative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
+{
+	int ret = 0;
+
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
+		if (ret)
+			return ret;
+	}
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
+	if (ret)
+		return ret;
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
+#else
+	ret = ENOTSUP;
+#endif
+	return -ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..2d58d96
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t flow_meter_reg_share:1;
+	/* Whether reg_c share is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+/* HCA attributes. */
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
new file mode 100644
index 0000000..d5bc84e
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -0,0 +1,1138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+/*
+ * Not needed by this file; included to work around the lack of off_t
+ * definition for mlx5dv.h with unpatched rdma-core versions.
+ */
+#include <sys/types.h>
+
+#include <rte_config.h>
+
+#include "mlx5_glue.h"
+
+static int
+mlx5_glue_fork_init(void)
+{
+	return ibv_fork_init();
+}
+
+static struct ibv_pd *
+mlx5_glue_alloc_pd(struct ibv_context *context)
+{
+	return ibv_alloc_pd(context);
+}
+
+static int
+mlx5_glue_dealloc_pd(struct ibv_pd *pd)
+{
+	return ibv_dealloc_pd(pd);
+}
+
+static struct ibv_device **
+mlx5_glue_get_device_list(int *num_devices)
+{
+	return ibv_get_device_list(num_devices);
+}
+
+static void
+mlx5_glue_free_device_list(struct ibv_device **list)
+{
+	ibv_free_device_list(list);
+}
+
+static struct ibv_context *
+mlx5_glue_open_device(struct ibv_device *device)
+{
+	return ibv_open_device(device);
+}
+
+static int
+mlx5_glue_close_device(struct ibv_context *context)
+{
+	return ibv_close_device(context);
+}
+
+static int
+mlx5_glue_query_device(struct ibv_context *context,
+		       struct ibv_device_attr *device_attr)
+{
+	return ibv_query_device(context, device_attr);
+}
+
+static int
+mlx5_glue_query_device_ex(struct ibv_context *context,
+			  const struct ibv_query_device_ex_input *input,
+			  struct ibv_device_attr_ex *attr)
+{
+	return ibv_query_device_ex(context, input, attr);
+}
+
+static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+			  struct ibv_values_ex *values)
+{
+	return ibv_query_rt_values_ex(context, values);
+}
+
+static int
+mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
+		     struct ibv_port_attr *port_attr)
+{
+	return ibv_query_port(context, port_num, port_attr);
+}
+
+static struct ibv_comp_channel *
+mlx5_glue_create_comp_channel(struct ibv_context *context)
+{
+	return ibv_create_comp_channel(context);
+}
+
+static int
+mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
+{
+	return ibv_destroy_comp_channel(channel);
+}
+
+static struct ibv_cq *
+mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
+		    struct ibv_comp_channel *channel, int comp_vector)
+{
+	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
+}
+
+static int
+mlx5_glue_destroy_cq(struct ibv_cq *cq)
+{
+	return ibv_destroy_cq(cq);
+}
+
+static int
+mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
+		       void **cq_context)
+{
+	return ibv_get_cq_event(channel, cq, cq_context);
+}
+
+static void
+mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
+{
+	ibv_ack_cq_events(cq, nevents);
+}
+
+static struct ibv_rwq_ind_table *
+mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
+			       struct ibv_rwq_ind_table_init_attr *init_attr)
+{
+	return ibv_create_rwq_ind_table(context, init_attr);
+}
+
+static int
+mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
+{
+	return ibv_destroy_rwq_ind_table(rwq_ind_table);
+}
+
+static struct ibv_wq *
+mlx5_glue_create_wq(struct ibv_context *context,
+		    struct ibv_wq_init_attr *wq_init_attr)
+{
+	return ibv_create_wq(context, wq_init_attr);
+}
+
+static int
+mlx5_glue_destroy_wq(struct ibv_wq *wq)
+{
+	return ibv_destroy_wq(wq);
+}
+static int
+mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
+{
+	return ibv_modify_wq(wq, wq_attr);
+}
+
+static struct ibv_flow *
+mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
+{
+	return ibv_create_flow(qp, flow);
+}
+
+static int
+mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
+{
+	return ibv_destroy_flow(flow_id);
+}
+
+static int
+mlx5_glue_destroy_flow_action(void *action)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_destroy(action);
+#else
+	struct mlx5dv_flow_action_attr *attr = action;
+	int res = 0;
+	switch (attr->type) {
+	case MLX5DV_FLOW_ACTION_TAG:
+		break;
+	default:
+		res = ibv_destroy_flow_action(attr->action);
+		break;
+	}
+	free(action);
+	return res;
+#endif
+#else
+	(void)action;
+	return ENOTSUP;
+#endif
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
+{
+	return ibv_create_qp(pd, qp_init_attr);
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp_ex(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
+{
+	return ibv_create_qp_ex(context, qp_init_attr_ex);
+}
+
+static int
+mlx5_glue_destroy_qp(struct ibv_qp *qp)
+{
+	return ibv_destroy_qp(qp);
+}
+
+static int
+mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
+{
+	return ibv_modify_qp(qp, attr, attr_mask);
+}
+
+static struct ibv_mr *
+mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
+{
+	return ibv_reg_mr(pd, addr, length, access);
+}
+
+static int
+mlx5_glue_dereg_mr(struct ibv_mr *mr)
+{
+	return ibv_dereg_mr(mr);
+}
+
+static struct ibv_counter_set *
+mlx5_glue_create_counter_set(struct ibv_context *context,
+			     struct ibv_counter_set_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)init_attr;
+	return NULL;
+#else
+	return ibv_create_counter_set(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)cs;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counter_set(cs);
+#endif
+}
+
+static int
+mlx5_glue_describe_counter_set(struct ibv_context *context,
+			       uint16_t counter_set_id,
+			       struct ibv_counter_set_description *cs_desc)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)counter_set_id;
+	(void)cs_desc;
+	return ENOTSUP;
+#else
+	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
+#endif
+}
+
+static int
+mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
+			    struct ibv_counter_set_data *cs_data)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)query_attr;
+	(void)cs_data;
+	return ENOTSUP;
+#else
+	return ibv_query_counter_set(query_attr, cs_data);
+#endif
+}
+
+static struct ibv_counters *
+mlx5_glue_create_counters(struct ibv_context *context,
+			  struct ibv_counters_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)context;
+	(void)init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return ibv_create_counters(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counters(struct ibv_counters *counters)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counters(counters);
+#endif
+}
+
+static int
+mlx5_glue_attach_counters(struct ibv_counters *counters,
+			  struct ibv_counter_attach_attr *attr,
+			  struct ibv_flow *flow)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)attr;
+	(void)flow;
+	return ENOTSUP;
+#else
+	return ibv_attach_counters_point_flow(counters, attr, flow);
+#endif
+}
+
+static int
+mlx5_glue_query_counters(struct ibv_counters *counters,
+			 uint64_t *counters_value,
+			 uint32_t ncounters,
+			 uint32_t flags)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)counters_value;
+	(void)ncounters;
+	(void)flags;
+	return ENOTSUP;
+#else
+	return ibv_read_counters(counters, counters_value, ncounters, flags);
+#endif
+}
+
+static void
+mlx5_glue_ack_async_event(struct ibv_async_event *event)
+{
+	ibv_ack_async_event(event);
+}
+
+static int
+mlx5_glue_get_async_event(struct ibv_context *context,
+			  struct ibv_async_event *event)
+{
+	return ibv_get_async_event(context, event);
+}
+
+static const char *
+mlx5_glue_port_state_str(enum ibv_port_state port_state)
+{
+	return ibv_port_state_str(port_state);
+}
+
+static struct ibv_cq *
+mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
+{
+	return ibv_cq_ex_to_cq(cq);
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_table(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
+#else
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_dest_vport(domain, port);
+#else
+	(void)domain;
+	(void)port;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_drop(void)
+{
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_drop();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
+					  rte_be32_t vlan_tag)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
+#else
+	(void)domain;
+	(void)vlan_tag;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_pop_vlan(void)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_pop_vlan();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_create(domain, level);
+#else
+	(void)domain;
+	(void)level;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_destroy(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_domain(struct ibv_context *ctx,
+			   enum  mlx5dv_dr_domain_type domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_create(ctx, domain);
+#else
+	(void)ctx;
+	(void)domain;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_domain(void *domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_destroy(domain);
+#else
+	(void)domain;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_cq_ex *
+mlx5_glue_dv_create_cq(struct ibv_context *context,
+		       struct ibv_cq_init_attr_ex *cq_attr,
+		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
+{
+	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
+}
+
+static struct ibv_wq *
+mlx5_glue_dv_create_wq(struct ibv_context *context,
+		       struct ibv_wq_init_attr *wq_attr,
+		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
+{
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+	(void)context;
+	(void)wq_attr;
+	(void)mlx5_wq_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
+#endif
+}
+
+static int
+mlx5_glue_dv_query_device(struct ibv_context *ctx,
+			  struct mlx5dv_context *attrs_out)
+{
+	return mlx5dv_query_device(ctx, attrs_out);
+}
+
+static int
+mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
+			      enum mlx5dv_set_ctx_attr_type type, void *attr)
+{
+	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
+}
+
+static int
+mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
+{
+	return mlx5dv_init_obj(obj, obj_type);
+}
+
+static struct ibv_qp *
+mlx5_glue_dv_create_qp(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
+{
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
+#else
+	(void)context;
+	(void)qp_init_attr_ex;
+	(void)dv_qp_init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
+				 struct mlx5dv_flow_matcher_attr *matcher_attr,
+				 void *tbl)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)context;
+	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
+					matcher_attr->match_criteria_enable,
+					matcher_attr->match_mask);
+#else
+	(void)tbl;
+	return mlx5dv_create_flow_matcher(context, matcher_attr);
+#endif
+#else
+	(void)context;
+	(void)matcher_attr;
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow(void *matcher,
+			 void *match_value,
+			 size_t num_actions,
+			 void *actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
+				     (struct mlx5dv_dr_action **)actions);
+#else
+	struct mlx5dv_flow_action_attr actions_attr[8];
+
+	if (num_actions > 8)
+		return NULL;
+	for (size_t i = 0; i < num_actions; i++)
+		actions_attr[i] =
+			*((struct mlx5dv_flow_action_attr *)(actions[i]));
+	return mlx5dv_create_flow(matcher, match_value,
+				  num_actions, actions_attr);
+#endif
+#else
+	(void)matcher;
+	(void)match_value;
+	(void)num_actions;
+	(void)actions;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)offset;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
+	action->obj = counter_obj;
+	return action;
+#endif
+#else
+	(void)counter_obj;
+	(void)offset;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
+	action->obj = qp;
+	return action;
+#endif
+#else
+	(void)qp;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
+{
+#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
+	return mlx5dv_dr_action_create_dest_devx_tir(tir);
+#else
+	(void)tir;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_modify_header
+					(struct ibv_context *ctx,
+					 enum mlx5dv_flow_table_type ft_type,
+					 void *domain, uint64_t flags,
+					 size_t actions_sz,
+					 uint64_t actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
+						     (__be64 *)actions);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)domain;
+	(void)flags;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_modify_header
+		(ctx, actions_sz, actions, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)actions_sz;
+	(void)actions;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_packet_reformat
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
+						       reformat_type, data_sz,
+						       data);
+#else
+	(void)domain;
+	(void)flags;
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_packet_reformat
+		(ctx, data_sz, data, reformat_type, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)reformat_type;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)data_sz;
+	(void)data;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_tag(tag);
+#else
+	struct mlx5dv_flow_action_attr *action;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_TAG;
+	action->tag_value = tag;
+	return action;
+#endif
+#endif
+	(void)tag;
+	errno = ENOTSUP;
+	return NULL;
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_create_flow_meter(attr);
+#else
+	(void)attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dv_modify_flow_action_meter(void *action,
+				      struct mlx5dv_dr_flow_meter_attr *attr,
+				      uint64_t modify_bits)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
+#else
+	(void)action;
+	(void)attr;
+	(void)modify_bits;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow(void *flow_id)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_destroy(flow_id);
+#else
+	return ibv_destroy_flow(flow_id);
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow_matcher(void *matcher)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_matcher_destroy(matcher);
+#else
+	return mlx5dv_destroy_flow_matcher(matcher);
+#endif
+#else
+	(void)matcher;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_context *
+mlx5_glue_dv_open_device(struct ibv_device *device)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_open_device(device,
+				  &(struct mlx5dv_context_attr){
+					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
+				  });
+#else
+	(void)device;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static struct mlx5dv_devx_obj *
+mlx5_glue_devx_obj_create(struct ibv_context *ctx,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_destroy(obj);
+#else
+	(void)obj;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
+			 const void *in, size_t inlen,
+			 void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
+			   const void *in, size_t inlen,
+			   void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_cmd_comp *
+mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_create_cmd_comp(ctx);
+#else
+	(void)ctx;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
+#else
+	(void)cmd_comp;
+	errno = -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
+			       size_t inlen, size_t outlen, uint64_t wr_id,
+			       struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
+					   cmd_comp);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)outlen;
+	(void)wr_id;
+	(void)cmd_comp;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
+				  size_t cmd_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
+					      cmd_resp_len);
+#else
+	(void)cmd_comp;
+	(void)cmd_resp;
+	(void)cmd_resp_len;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_umem *
+mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
+			uint32_t access)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_reg(context, addr, size, access);
+#else
+	(void)context;
+	(void)addr;
+	(void)size;
+	(void)access;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_dereg(dv_devx_umem);
+#else
+	(void)dv_devx_umem;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_qp_query(struct ibv_qp *qp,
+			const void *in, size_t inlen,
+			void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
+#else
+	(void)qp;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_devx_port_query(struct ibv_context *ctx,
+			  uint32_t port_num,
+			  struct mlx5dv_devx_port *mlx5_devx_port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
+#else
+	(void)ctx;
+	(void)port_num;
+	(void)mlx5_devx_port;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dr_dump_domain(FILE *file, void *domain)
+{
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	return mlx5dv_dump_dr_domain(file, domain);
+#else
+	RTE_SET_USED(file);
+	RTE_SET_USED(domain);
+	return -ENOTSUP;
+#endif
+}
+
+alignas(RTE_CACHE_LINE_SIZE)
+const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
+	.version = MLX5_GLUE_VERSION,
+	.fork_init = mlx5_glue_fork_init,
+	.alloc_pd = mlx5_glue_alloc_pd,
+	.dealloc_pd = mlx5_glue_dealloc_pd,
+	.get_device_list = mlx5_glue_get_device_list,
+	.free_device_list = mlx5_glue_free_device_list,
+	.open_device = mlx5_glue_open_device,
+	.close_device = mlx5_glue_close_device,
+	.query_device = mlx5_glue_query_device,
+	.query_device_ex = mlx5_glue_query_device_ex,
+	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
+	.query_port = mlx5_glue_query_port,
+	.create_comp_channel = mlx5_glue_create_comp_channel,
+	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
+	.create_cq = mlx5_glue_create_cq,
+	.destroy_cq = mlx5_glue_destroy_cq,
+	.get_cq_event = mlx5_glue_get_cq_event,
+	.ack_cq_events = mlx5_glue_ack_cq_events,
+	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
+	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
+	.create_wq = mlx5_glue_create_wq,
+	.destroy_wq = mlx5_glue_destroy_wq,
+	.modify_wq = mlx5_glue_modify_wq,
+	.create_flow = mlx5_glue_create_flow,
+	.destroy_flow = mlx5_glue_destroy_flow,
+	.destroy_flow_action = mlx5_glue_destroy_flow_action,
+	.create_qp = mlx5_glue_create_qp,
+	.create_qp_ex = mlx5_glue_create_qp_ex,
+	.destroy_qp = mlx5_glue_destroy_qp,
+	.modify_qp = mlx5_glue_modify_qp,
+	.reg_mr = mlx5_glue_reg_mr,
+	.dereg_mr = mlx5_glue_dereg_mr,
+	.create_counter_set = mlx5_glue_create_counter_set,
+	.destroy_counter_set = mlx5_glue_destroy_counter_set,
+	.describe_counter_set = mlx5_glue_describe_counter_set,
+	.query_counter_set = mlx5_glue_query_counter_set,
+	.create_counters = mlx5_glue_create_counters,
+	.destroy_counters = mlx5_glue_destroy_counters,
+	.attach_counters = mlx5_glue_attach_counters,
+	.query_counters = mlx5_glue_query_counters,
+	.ack_async_event = mlx5_glue_ack_async_event,
+	.get_async_event = mlx5_glue_get_async_event,
+	.port_state_str = mlx5_glue_port_state_str,
+	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
+	.dr_create_flow_action_dest_flow_tbl =
+		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
+	.dr_create_flow_action_dest_port =
+		mlx5_glue_dr_create_flow_action_dest_port,
+	.dr_create_flow_action_drop =
+		mlx5_glue_dr_create_flow_action_drop,
+	.dr_create_flow_action_push_vlan =
+		mlx5_glue_dr_create_flow_action_push_vlan,
+	.dr_create_flow_action_pop_vlan =
+		mlx5_glue_dr_create_flow_action_pop_vlan,
+	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
+	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
+	.dr_create_domain = mlx5_glue_dr_create_domain,
+	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
+	.dv_create_cq = mlx5_glue_dv_create_cq,
+	.dv_create_wq = mlx5_glue_dv_create_wq,
+	.dv_query_device = mlx5_glue_dv_query_device,
+	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
+	.dv_init_obj = mlx5_glue_dv_init_obj,
+	.dv_create_qp = mlx5_glue_dv_create_qp,
+	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
+	.dv_create_flow = mlx5_glue_dv_create_flow,
+	.dv_create_flow_action_counter =
+		mlx5_glue_dv_create_flow_action_counter,
+	.dv_create_flow_action_dest_ibv_qp =
+		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
+	.dv_create_flow_action_dest_devx_tir =
+		mlx5_glue_dv_create_flow_action_dest_devx_tir,
+	.dv_create_flow_action_modify_header =
+		mlx5_glue_dv_create_flow_action_modify_header,
+	.dv_create_flow_action_packet_reformat =
+		mlx5_glue_dv_create_flow_action_packet_reformat,
+	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
+	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
+	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
+	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
+	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
+	.dv_open_device = mlx5_glue_dv_open_device,
+	.devx_obj_create = mlx5_glue_devx_obj_create,
+	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
+	.devx_obj_query = mlx5_glue_devx_obj_query,
+	.devx_obj_modify = mlx5_glue_devx_obj_modify,
+	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
+	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
+	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
+	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
+	.devx_umem_reg = mlx5_glue_devx_umem_reg,
+	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
+	.devx_qp_query = mlx5_glue_devx_qp_query,
+	.devx_port_query = mlx5_glue_devx_port_query,
+	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+};
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
new file mode 100644
index 0000000..f4c3180
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -0,0 +1,265 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#ifndef MLX5_GLUE_H_
+#define MLX5_GLUE_H_
+
+#include <stddef.h>
+#include <stdint.h>
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+#ifndef MLX5_GLUE_VERSION
+#define MLX5_GLUE_VERSION ""
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+struct ibv_counter_set;
+struct ibv_counter_set_data;
+struct ibv_counter_set_description;
+struct ibv_counter_set_init_attr;
+struct ibv_query_counter_set_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+struct ibv_counters;
+struct ibv_counters_init_attr;
+struct ibv_counter_attach_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+struct mlx5dv_qp_init_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+struct mlx5dv_wq_init_attr;
+#endif
+
+#ifndef HAVE_IBV_FLOW_DV_SUPPORT
+struct mlx5dv_flow_matcher;
+struct mlx5dv_flow_matcher_attr;
+struct mlx5dv_flow_action_attr;
+struct mlx5dv_flow_match_parameters;
+struct mlx5dv_dr_flow_meter_attr;
+struct ibv_flow_action;
+enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
+enum mlx5dv_flow_table_type { flow_table_type = 0, };
+#endif
+
+#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
+#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
+#endif
+
+#ifndef HAVE_IBV_DEVX_OBJ
+struct mlx5dv_devx_obj;
+struct mlx5dv_devx_umem { uint32_t umem_id; };
+#endif
+
+#ifndef HAVE_IBV_DEVX_ASYNC
+struct mlx5dv_devx_cmd_comp;
+struct mlx5dv_devx_async_cmd_hdr;
+#endif
+
+#ifndef HAVE_MLX5DV_DR
+enum  mlx5dv_dr_domain_type { unused, };
+struct mlx5dv_dr_domain;
+#endif
+
+#ifndef HAVE_MLX5DV_DR_DEVX_PORT
+struct mlx5dv_devx_port;
+#endif
+
+#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
+struct mlx5dv_dr_flow_meter_attr;
+#endif
+
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
+struct mlx5_glue {
+	const char *version;
+	int (*fork_init)(void);
+	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
+	int (*dealloc_pd)(struct ibv_pd *pd);
+	struct ibv_device **(*get_device_list)(int *num_devices);
+	void (*free_device_list)(struct ibv_device **list);
+	struct ibv_context *(*open_device)(struct ibv_device *device);
+	int (*close_device)(struct ibv_context *context);
+	int (*query_device)(struct ibv_context *context,
+			    struct ibv_device_attr *device_attr);
+	int (*query_device_ex)(struct ibv_context *context,
+			       const struct ibv_query_device_ex_input *input,
+			       struct ibv_device_attr_ex *attr);
+	int (*query_rt_values_ex)(struct ibv_context *context,
+			       struct ibv_values_ex *values);
+	int (*query_port)(struct ibv_context *context, uint8_t port_num,
+			  struct ibv_port_attr *port_attr);
+	struct ibv_comp_channel *(*create_comp_channel)
+		(struct ibv_context *context);
+	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
+	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
+				    void *cq_context,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector);
+	int (*destroy_cq)(struct ibv_cq *cq);
+	int (*get_cq_event)(struct ibv_comp_channel *channel,
+			    struct ibv_cq **cq, void **cq_context);
+	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
+	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
+		(struct ibv_context *context,
+		 struct ibv_rwq_ind_table_init_attr *init_attr);
+	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
+	struct ibv_wq *(*create_wq)(struct ibv_context *context,
+				    struct ibv_wq_init_attr *wq_init_attr);
+	int (*destroy_wq)(struct ibv_wq *wq);
+	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
+	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
+					struct ibv_flow_attr *flow);
+	int (*destroy_flow)(struct ibv_flow *flow_id);
+	int (*destroy_flow_action)(void *action);
+	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *qp_init_attr);
+	struct ibv_qp *(*create_qp_ex)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
+	int (*destroy_qp)(struct ibv_qp *qp);
+	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+			 int attr_mask);
+	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
+				 size_t length, int access);
+	int (*dereg_mr)(struct ibv_mr *mr);
+	struct ibv_counter_set *(*create_counter_set)
+		(struct ibv_context *context,
+		 struct ibv_counter_set_init_attr *init_attr);
+	int (*destroy_counter_set)(struct ibv_counter_set *cs);
+	int (*describe_counter_set)
+		(struct ibv_context *context,
+		 uint16_t counter_set_id,
+		 struct ibv_counter_set_description *cs_desc);
+	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
+				 struct ibv_counter_set_data *cs_data);
+	struct ibv_counters *(*create_counters)
+		(struct ibv_context *context,
+		 struct ibv_counters_init_attr *init_attr);
+	int (*destroy_counters)(struct ibv_counters *counters);
+	int (*attach_counters)(struct ibv_counters *counters,
+			       struct ibv_counter_attach_attr *attr,
+			       struct ibv_flow *flow);
+	int (*query_counters)(struct ibv_counters *counters,
+			      uint64_t *counters_value,
+			      uint32_t ncounters,
+			      uint32_t flags);
+	void (*ack_async_event)(struct ibv_async_event *event);
+	int (*get_async_event)(struct ibv_context *context,
+			       struct ibv_async_event *event);
+	const char *(*port_state_str)(enum ibv_port_state port_state);
+	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
+	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
+	void *(*dr_create_flow_action_dest_port)(void *domain,
+						 uint32_t port);
+	void *(*dr_create_flow_action_drop)();
+	void *(*dr_create_flow_action_push_vlan)
+					(struct mlx5dv_dr_domain *domain,
+					 rte_be32_t vlan_tag);
+	void *(*dr_create_flow_action_pop_vlan)();
+	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
+	int (*dr_destroy_flow_tbl)(void *tbl);
+	void *(*dr_create_domain)(struct ibv_context *ctx,
+				  enum mlx5dv_dr_domain_type domain);
+	int (*dr_destroy_domain)(void *domain);
+	struct ibv_cq_ex *(*dv_create_cq)
+		(struct ibv_context *context,
+		 struct ibv_cq_init_attr_ex *cq_attr,
+		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
+	struct ibv_wq *(*dv_create_wq)
+		(struct ibv_context *context,
+		 struct ibv_wq_init_attr *wq_attr,
+		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
+	int (*dv_query_device)(struct ibv_context *ctx_in,
+			       struct mlx5dv_context *attrs_out);
+	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
+				   enum mlx5dv_set_ctx_attr_type type,
+				   void *attr);
+	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
+	struct ibv_qp *(*dv_create_qp)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
+	void *(*dv_create_flow_matcher)
+		(struct ibv_context *context,
+		 struct mlx5dv_flow_matcher_attr *matcher_attr,
+		 void *tbl);
+	void *(*dv_create_flow)(void *matcher, void *match_value,
+			  size_t num_actions, void *actions[]);
+	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
+	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
+	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
+	void *(*dv_create_flow_action_modify_header)
+		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
+		 void *domain, uint64_t flags, size_t actions_sz,
+		 uint64_t actions[]);
+	void *(*dv_create_flow_action_packet_reformat)
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data);
+	void *(*dv_create_flow_action_tag)(uint32_t tag);
+	void *(*dv_create_flow_action_meter)
+		(struct mlx5dv_dr_flow_meter_attr *attr);
+	int (*dv_modify_flow_action_meter)(void *action,
+		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
+	int (*dv_destroy_flow)(void *flow);
+	int (*dv_destroy_flow_matcher)(void *matcher);
+	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_obj *(*devx_obj_create)
+					(struct ibv_context *ctx,
+					 const void *in, size_t inlen,
+					 void *out, size_t outlen);
+	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
+	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
+			      const void *in, size_t inlen,
+			      void *out, size_t outlen);
+	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
+			       const void *in, size_t inlen,
+			       void *out, size_t outlen);
+	int (*devx_general_cmd)(struct ibv_context *context,
+				const void *in, size_t inlen,
+				void *out, size_t outlen);
+	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
+					(struct ibv_context *context);
+	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
+				    const void *in, size_t inlen,
+				    size_t outlen, uint64_t wr_id,
+				    struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				       struct mlx5dv_devx_async_cmd_hdr *resp,
+				       size_t cmd_resp_len);
+	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
+						  void *addr, size_t size,
+						  uint32_t access);
+	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
+	int (*devx_qp_query)(struct ibv_qp *qp,
+			     const void *in, size_t inlen,
+			     void *out, size_t outlen);
+	int (*devx_port_query)(struct ibv_context *ctx,
+			       uint32_t port_num,
+			       struct mlx5dv_devx_port *mlx5_devx_port);
+	int (*dr_dump_domain)(FILE *file, void *domain);
+};
+
+const struct mlx5_glue *mlx5_glue;
+
+#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
new file mode 100644
index 0000000..5730ad1
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -0,0 +1,1889 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2016 6WIND S.A.
+ * Copyright 2016 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_PRM_H_
+#define RTE_PMD_MLX5_PRM_H_
+
+#include <assert.h>
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_vect.h>
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+/* RSS hash key size. */
+#define MLX5_RSS_HASH_KEY_LEN 40
+
+/* Get CQE owner bit. */
+#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
+
+/* Get CQE format. */
+#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
+
+/* Get CQE opcode. */
+#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
+
+/* Get CQE solicited event. */
+#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
+
+/* Invalidate a CQE. */
+#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
+
+/* WQE Segment sizes in bytes. */
+#define MLX5_WSEG_SIZE 16u
+#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
+#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
+#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
+
+/* WQE/WQEBB size in bytes. */
+#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
+
+/*
+ * Max size of a WQE session.
+ * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
+ * the WQE size field in Control Segment is 6 bits wide.
+ */
+#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
+
+/*
+ * Default minimum number of Tx queues for inlining packets.
+ * If there are less queues as specified we assume we have
+ * no enough CPU resources (cycles) to perform inlining,
+ * the PCIe throughput is not supposed as bottleneck and
+ * inlining is disabled.
+ */
+#define MLX5_INLINE_MAX_TXQS 8u
+#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
+
+/*
+ * Default packet length threshold to be inlined with
+ * enhanced MPW. If packet length exceeds the threshold
+ * the data are not inlined. Should be aligned in WQEBB
+ * boundary with accounting the title Control and Ethernet
+ * segments.
+ */
+#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Maximal inline data length sent with enhanced MPW.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Minimal amount of packets to be sent with EMPW.
+ * This limits the minimal required size of sent EMPW.
+ * If there are no enough resources to built minimal
+ * EMPW the sending loop exits.
+ */
+#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
+/*
+ * Maximal amount of packets to be sent with EMPW.
+ * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
+ * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
+ * without CQE generation request, being multiplied by
+ * MLX5_TX_COMP_MAX_CQE it may cause significant latency
+ * in tx burst routine at the moment of freeing multiple mbufs.
+ */
+#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
+#define MLX5_MPW_MAX_PACKETS 6
+#define MLX5_MPW_INLINE_MAX_PACKETS 2
+
+/*
+ * Default packet length threshold to be inlined with
+ * ordinary SEND. Inlining saves the MR key search
+ * and extra PCIe data fetch transaction, but eats the
+ * CPU cycles.
+ */
+#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE)
+/*
+ * Maximal inline data length sent with ordinary SEND.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE)
+
+/* Missed in mlv5dv.h, should define here. */
+#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
+
+/* CQE value to inform that VLAN is stripped. */
+#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
+
+/* IPv4 options. */
+#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
+
+/* IPv6 packet. */
+#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
+
+/* IPv4 packet. */
+#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
+
+/* TCP packet. */
+#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
+
+/* UDP packet. */
+#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
+
+/* IP is fragmented. */
+#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
+
+/* L2 header is valid. */
+#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
+
+/* L3 header is valid. */
+#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
+
+/* L4 header is valid. */
+#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
+
+/* Outer packet, 0 IPv4, 1 IPv6. */
+#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
+
+/* Tunnel packet bit in the CQE. */
+#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
+
+/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
+#define MLX5_CQE_LRO_PUSH_MASK 0x40
+
+/* Mask for L4 type in the CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_MASK 0x70
+
+/* The bit index of L4 type in CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_SHIFT 0x4
+
+/* L4 type to indicate TCP packet without acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
+
+/* L4 type to indicate TCP packet with acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
+
+/* Inner L3 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
+
+/* Inner L4 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
+
+/* Outer L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
+
+/* Outer L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
+
+/* Outer L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
+
+/* Outer L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
+
+/* Inner L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
+
+/* Inner L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
+
+/* Inner L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
+
+/* Inner L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
+
+/* VLAN insertion flag. */
+#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
+
+/* Data inline segment flag. */
+#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
+
+/* Is flow mark valid. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
+#else
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
+#endif
+
+/* INVALID is used by packets matching no flow rules. */
+#define MLX5_FLOW_MARK_INVALID 0
+
+/* Maximum allowed value to mark a packet. */
+#define MLX5_FLOW_MARK_MAX 0xfffff0
+
+/* Default mark value used when none is provided. */
+#define MLX5_FLOW_MARK_DEFAULT 0xffffff
+
+/* Default mark mask for metadata legacy mode. */
+#define MLX5_FLOW_MARK_MASK 0xffffff
+
+/* Maximum number of DS in WQE. Limited by 6-bit field. */
+#define MLX5_DSEG_MAX 63
+
+/* The completion mode offset in the WQE control segment line 2. */
+#define MLX5_COMP_MODE_OFFSET 2
+
+/* Amount of data bytes in minimal inline data segment. */
+#define MLX5_DSEG_MIN_INLINE_SIZE 12u
+
+/* Amount of data bytes in minimal inline eth segment. */
+#define MLX5_ESEG_MIN_INLINE_SIZE 18u
+
+/* Amount of data bytes after eth data segment. */
+#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
+
+/* The maximum log value of segments per RQ WQE. */
+#define MLX5_MAX_LOG_RQ_SEGS 5u
+
+/* The alignment needed for WQ buffer. */
+#define MLX5_WQE_BUF_ALIGNMENT 512
+
+/* Completion mode. */
+enum mlx5_completion_mode {
+	MLX5_COMP_ONLY_ERR = 0x0,
+	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
+	MLX5_COMP_ALWAYS = 0x2,
+	MLX5_COMP_CQE_AND_EQE = 0x3,
+};
+
+/* MPW mode. */
+enum mlx5_mpw_mode {
+	MLX5_MPW_DISABLED,
+	MLX5_MPW,
+	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
+};
+
+/* WQE Control segment. */
+struct mlx5_wqe_cseg {
+	uint32_t opcode;
+	uint32_t sq_ds;
+	uint32_t flags;
+	uint32_t misc;
+} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
+
+/* Header of data segment. Minimal size Data Segment */
+struct mlx5_wqe_dseg {
+	uint32_t bcount;
+	union {
+		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
+		struct {
+			uint32_t lkey;
+			uint64_t pbuf;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* Subset of struct WQE Ethernet Segment. */
+struct mlx5_wqe_eseg {
+	union {
+		struct {
+			uint32_t swp_offs;
+			uint8_t	cs_flags;
+			uint8_t	swp_flags;
+			uint16_t mss;
+			uint32_t metadata;
+			uint16_t inline_hdr_sz;
+			union {
+				uint16_t inline_data;
+				uint16_t vlan_tag;
+			};
+		} __rte_packed;
+		struct {
+			uint32_t offsets;
+			uint32_t flags;
+			uint32_t flow_metadata;
+			uint32_t inline_hdr;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* The title WQEBB, header of WQE. */
+struct mlx5_wqe {
+	union {
+		struct mlx5_wqe_cseg cseg;
+		uint32_t ctrl[4];
+	};
+	struct mlx5_wqe_eseg eseg;
+	union {
+		struct mlx5_wqe_dseg dseg[2];
+		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
+	};
+} __rte_packed;
+
+/* WQE for Multi-Packet RQ. */
+struct mlx5_wqe_mprq {
+	struct mlx5_wqe_srq_next_seg next_seg;
+	struct mlx5_wqe_data_seg dseg;
+};
+
+#define MLX5_MPRQ_LEN_MASK 0x000ffff
+#define MLX5_MPRQ_LEN_SHIFT 0
+#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
+#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
+#define MLX5_MPRQ_FILLER_MASK 0x80000000
+#define MLX5_MPRQ_FILLER_SHIFT 31
+
+#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
+
+/* CQ element structure - should be equal to the cache line size */
+struct mlx5_cqe {
+#if (RTE_CACHE_LINE_SIZE == 128)
+	uint8_t padding[64];
+#endif
+	uint8_t pkt_info;
+	uint8_t rsvd0;
+	uint16_t wqe_id;
+	uint8_t lro_tcppsh_abort_dupack;
+	uint8_t lro_min_ttl;
+	uint16_t lro_tcp_win;
+	uint32_t lro_ack_seq_num;
+	uint32_t rx_hash_res;
+	uint8_t rx_hash_type;
+	uint8_t rsvd1[3];
+	uint16_t csum;
+	uint8_t rsvd2[6];
+	uint16_t hdr_type_etc;
+	uint16_t vlan_info;
+	uint8_t lro_num_seg;
+	uint8_t rsvd3[3];
+	uint32_t flow_table_metadata;
+	uint8_t rsvd4[4];
+	uint32_t byte_cnt;
+	uint64_t timestamp;
+	uint32_t sop_drop_qpn;
+	uint16_t wqe_counter;
+	uint8_t rsvd5;
+	uint8_t op_own;
+};
+
+/* Adding direct verbs to data-path. */
+
+/* CQ sequence number mask. */
+#define MLX5_CQ_SQN_MASK 0x3
+
+/* CQ sequence number index. */
+#define MLX5_CQ_SQN_OFFSET 28
+
+/* CQ doorbell index mask. */
+#define MLX5_CI_MASK 0xffffff
+
+/* CQ doorbell offset. */
+#define MLX5_CQ_ARM_DB 1
+
+/* CQ doorbell offset*/
+#define MLX5_CQ_DOORBELL 0x20
+
+/* CQE format value. */
+#define MLX5_COMPRESSED 0x3
+
+/* Action type of header modification. */
+enum {
+	MLX5_MODIFICATION_TYPE_SET = 0x1,
+	MLX5_MODIFICATION_TYPE_ADD = 0x2,
+	MLX5_MODIFICATION_TYPE_COPY = 0x3,
+};
+
+/* The field of packet to be modified. */
+enum mlx5_modification_field {
+	MLX5_MODI_OUT_NONE = -1,
+	MLX5_MODI_OUT_SMAC_47_16 = 1,
+	MLX5_MODI_OUT_SMAC_15_0,
+	MLX5_MODI_OUT_ETHERTYPE,
+	MLX5_MODI_OUT_DMAC_47_16,
+	MLX5_MODI_OUT_DMAC_15_0,
+	MLX5_MODI_OUT_IP_DSCP,
+	MLX5_MODI_OUT_TCP_FLAGS,
+	MLX5_MODI_OUT_TCP_SPORT,
+	MLX5_MODI_OUT_TCP_DPORT,
+	MLX5_MODI_OUT_IPV4_TTL,
+	MLX5_MODI_OUT_UDP_SPORT,
+	MLX5_MODI_OUT_UDP_DPORT,
+	MLX5_MODI_OUT_SIPV6_127_96,
+	MLX5_MODI_OUT_SIPV6_95_64,
+	MLX5_MODI_OUT_SIPV6_63_32,
+	MLX5_MODI_OUT_SIPV6_31_0,
+	MLX5_MODI_OUT_DIPV6_127_96,
+	MLX5_MODI_OUT_DIPV6_95_64,
+	MLX5_MODI_OUT_DIPV6_63_32,
+	MLX5_MODI_OUT_DIPV6_31_0,
+	MLX5_MODI_OUT_SIPV4,
+	MLX5_MODI_OUT_DIPV4,
+	MLX5_MODI_OUT_FIRST_VID,
+	MLX5_MODI_IN_SMAC_47_16 = 0x31,
+	MLX5_MODI_IN_SMAC_15_0,
+	MLX5_MODI_IN_ETHERTYPE,
+	MLX5_MODI_IN_DMAC_47_16,
+	MLX5_MODI_IN_DMAC_15_0,
+	MLX5_MODI_IN_IP_DSCP,
+	MLX5_MODI_IN_TCP_FLAGS,
+	MLX5_MODI_IN_TCP_SPORT,
+	MLX5_MODI_IN_TCP_DPORT,
+	MLX5_MODI_IN_IPV4_TTL,
+	MLX5_MODI_IN_UDP_SPORT,
+	MLX5_MODI_IN_UDP_DPORT,
+	MLX5_MODI_IN_SIPV6_127_96,
+	MLX5_MODI_IN_SIPV6_95_64,
+	MLX5_MODI_IN_SIPV6_63_32,
+	MLX5_MODI_IN_SIPV6_31_0,
+	MLX5_MODI_IN_DIPV6_127_96,
+	MLX5_MODI_IN_DIPV6_95_64,
+	MLX5_MODI_IN_DIPV6_63_32,
+	MLX5_MODI_IN_DIPV6_31_0,
+	MLX5_MODI_IN_SIPV4,
+	MLX5_MODI_IN_DIPV4,
+	MLX5_MODI_OUT_IPV6_HOPLIMIT,
+	MLX5_MODI_IN_IPV6_HOPLIMIT,
+	MLX5_MODI_META_DATA_REG_A,
+	MLX5_MODI_META_DATA_REG_B = 0x50,
+	MLX5_MODI_META_REG_C_0,
+	MLX5_MODI_META_REG_C_1,
+	MLX5_MODI_META_REG_C_2,
+	MLX5_MODI_META_REG_C_3,
+	MLX5_MODI_META_REG_C_4,
+	MLX5_MODI_META_REG_C_5,
+	MLX5_MODI_META_REG_C_6,
+	MLX5_MODI_META_REG_C_7,
+	MLX5_MODI_OUT_TCP_SEQ_NUM,
+	MLX5_MODI_IN_TCP_SEQ_NUM,
+	MLX5_MODI_OUT_TCP_ACK_NUM,
+	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
+};
+
+/* Total number of metadata reg_c's. */
+#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
+
+enum modify_reg {
+	REG_NONE = 0,
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Modification sub command. */
+struct mlx5_modification_cmd {
+	union {
+		uint32_t data0;
+		struct {
+			unsigned int length:5;
+			unsigned int rsvd0:3;
+			unsigned int offset:5;
+			unsigned int rsvd1:3;
+			unsigned int field:12;
+			unsigned int action_type:4;
+		};
+	};
+	union {
+		uint32_t data1;
+		uint8_t data[4];
+		struct {
+			unsigned int rsvd2:8;
+			unsigned int dst_offset:5;
+			unsigned int rsvd3:3;
+			unsigned int dst_field:12;
+			unsigned int rsvd4:4;
+		};
+	};
+};
+
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+
+#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
+#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
+#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
+				  (&(__mlx5_nullp(typ)->fld)))
+#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0x1f))
+#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
+#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
+#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
+				  __mlx5_dw_bit_off(typ, fld))
+#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
+#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0xf))
+#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
+#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
+#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
+#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
+
+/* insert a value to a struct */
+#define MLX5_SET(typ, p, fld, v) \
+	do { \
+		u32 _v = v; \
+		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
+		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
+				  __mlx5_dw_off(typ, fld))) & \
+				  (~__mlx5_dw_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask(typ, fld)) << \
+				   __mlx5_dw_bit_off(typ, fld))); \
+	} while (0)
+
+#define MLX5_SET64(typ, p, fld, v) \
+	do { \
+		assert(__mlx5_bit_sz(typ, fld) == 64); \
+		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
+			rte_cpu_to_be_64(v); \
+	} while (0)
+
+#define MLX5_GET(typ, p, fld) \
+	((rte_be_to_cpu_32(*((__be32 *)(p) +\
+	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
+	__mlx5_mask(typ, fld))
+#define MLX5_GET16(typ, p, fld) \
+	((rte_be_to_cpu_16(*((__be16 *)(p) + \
+	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
+	 __mlx5_mask16(typ, fld))
+#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
+						   __mlx5_64_off(typ, fld)))
+#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
+
+struct mlx5_ifc_fte_match_set_misc_bits {
+	u8 gre_c_present[0x1];
+	u8 reserved_at_1[0x1];
+	u8 gre_k_present[0x1];
+	u8 gre_s_present[0x1];
+	u8 source_vhci_port[0x4];
+	u8 source_sqn[0x18];
+	u8 reserved_at_20[0x10];
+	u8 source_port[0x10];
+	u8 outer_second_prio[0x3];
+	u8 outer_second_cfi[0x1];
+	u8 outer_second_vid[0xc];
+	u8 inner_second_prio[0x3];
+	u8 inner_second_cfi[0x1];
+	u8 inner_second_vid[0xc];
+	u8 outer_second_cvlan_tag[0x1];
+	u8 inner_second_cvlan_tag[0x1];
+	u8 outer_second_svlan_tag[0x1];
+	u8 inner_second_svlan_tag[0x1];
+	u8 reserved_at_64[0xc];
+	u8 gre_protocol[0x10];
+	u8 gre_key_h[0x18];
+	u8 gre_key_l[0x8];
+	u8 vxlan_vni[0x18];
+	u8 reserved_at_b8[0x8];
+	u8 geneve_vni[0x18];
+	u8 reserved_at_e4[0x7];
+	u8 geneve_oam[0x1];
+	u8 reserved_at_e0[0xc];
+	u8 outer_ipv6_flow_label[0x14];
+	u8 reserved_at_100[0xc];
+	u8 inner_ipv6_flow_label[0x14];
+	u8 reserved_at_120[0xa];
+	u8 geneve_opt_len[0x6];
+	u8 geneve_protocol_type[0x10];
+	u8 reserved_at_140[0xc0];
+};
+
+struct mlx5_ifc_ipv4_layout_bits {
+	u8 reserved_at_0[0x60];
+	u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+	u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+	u8 reserved_at_0[0x80];
+};
+
+struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
+	u8 smac_47_16[0x20];
+	u8 smac_15_0[0x10];
+	u8 ethertype[0x10];
+	u8 dmac_47_16[0x20];
+	u8 dmac_15_0[0x10];
+	u8 first_prio[0x3];
+	u8 first_cfi[0x1];
+	u8 first_vid[0xc];
+	u8 ip_protocol[0x8];
+	u8 ip_dscp[0x6];
+	u8 ip_ecn[0x2];
+	u8 cvlan_tag[0x1];
+	u8 svlan_tag[0x1];
+	u8 frag[0x1];
+	u8 ip_version[0x4];
+	u8 tcp_flags[0x9];
+	u8 tcp_sport[0x10];
+	u8 tcp_dport[0x10];
+	u8 reserved_at_c0[0x20];
+	u8 udp_sport[0x10];
+	u8 udp_dport[0x10];
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
+};
+
+struct mlx5_ifc_fte_match_mpls_bits {
+	u8 mpls_label[0x14];
+	u8 mpls_exp[0x3];
+	u8 mpls_s_bos[0x1];
+	u8 mpls_ttl[0x8];
+};
+
+struct mlx5_ifc_fte_match_set_misc2_bits {
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
+	u8 metadata_reg_c_7[0x20];
+	u8 metadata_reg_c_6[0x20];
+	u8 metadata_reg_c_5[0x20];
+	u8 metadata_reg_c_4[0x20];
+	u8 metadata_reg_c_3[0x20];
+	u8 metadata_reg_c_2[0x20];
+	u8 metadata_reg_c_1[0x20];
+	u8 metadata_reg_c_0[0x20];
+	u8 metadata_reg_a[0x20];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
+};
+
+struct mlx5_ifc_fte_match_set_misc3_bits {
+	u8 inner_tcp_seq_num[0x20];
+	u8 outer_tcp_seq_num[0x20];
+	u8 inner_tcp_ack_num[0x20];
+	u8 outer_tcp_ack_num[0x20];
+	u8 reserved_at_auto1[0x8];
+	u8 outer_vxlan_gpe_vni[0x18];
+	u8 outer_vxlan_gpe_next_protocol[0x8];
+	u8 outer_vxlan_gpe_flags[0x8];
+	u8 reserved_at_a8[0x10];
+	u8 icmp_header_data[0x20];
+	u8 icmpv6_header_data[0x20];
+	u8 icmp_type[0x8];
+	u8 icmp_code[0x8];
+	u8 icmpv6_type[0x8];
+	u8 icmpv6_code[0x8];
+	u8 reserved_at_120[0x20];
+	u8 gtpu_teid[0x20];
+	u8 gtpu_msg_type[0x08];
+	u8 gtpu_msg_flags[0x08];
+	u8 reserved_at_170[0x90];
+};
+
+/* Flow matcher. */
+struct mlx5_ifc_fte_match_param_bits {
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
+	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
+	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
+	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
+};
+
+enum {
+	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
+};
+
+enum {
+	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
+	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
+	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
+	MLX5_CMD_OP_CREATE_RQ = 0x908,
+	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
+	MLX5_CMD_OP_QUERY_TIS = 0x915,
+	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
+	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+};
+
+enum {
+	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+};
+
+/* Flow counters. */
+struct mlx5_ifc_alloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_40[0x18];
+	u8         flow_counter_bulk[0x8];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_traffic_counter_bits {
+	u8         packets[0x40];
+	u8         octets[0x40];
+};
+
+struct mlx5_ifc_query_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
+};
+
+struct mlx5_ifc_query_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         reserved_at_40[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+	u8         clear[0x1];
+	u8         dump_to_memory[0x1];
+	u8         num_of_counters[0x1e];
+	u8         flow_counter_id[0x20];
+};
+
+struct mlx5_ifc_mkc_bits {
+	u8         reserved_at_0[0x1];
+	u8         free[0x1];
+	u8         reserved_at_2[0x1];
+	u8         access_mode_4_2[0x3];
+	u8         reserved_at_6[0x7];
+	u8         relaxed_ordering_write[0x1];
+	u8         reserved_at_e[0x1];
+	u8         small_fence_on_rdma_read_response[0x1];
+	u8         umr_en[0x1];
+	u8         a[0x1];
+	u8         rw[0x1];
+	u8         rr[0x1];
+	u8         lw[0x1];
+	u8         lr[0x1];
+	u8         access_mode_1_0[0x2];
+	u8         reserved_at_18[0x8];
+
+	u8         qpn[0x18];
+	u8         mkey_7_0[0x8];
+
+	u8         reserved_at_40[0x20];
+
+	u8         length64[0x1];
+	u8         bsf_en[0x1];
+	u8         sync_umr[0x1];
+	u8         reserved_at_63[0x2];
+	u8         expected_sigerr_count[0x1];
+	u8         reserved_at_66[0x1];
+	u8         en_rinval[0x1];
+	u8         pd[0x18];
+
+	u8         start_addr[0x40];
+
+	u8         len[0x40];
+
+	u8         bsf_octword_size[0x20];
+
+	u8         reserved_at_120[0x80];
+
+	u8         translations_octword_size[0x20];
+
+	u8         reserved_at_1c0[0x1b];
+	u8         log_page_size[0x5];
+
+	u8         reserved_at_1e0[0x20];
+};
+
+struct mlx5_ifc_create_mkey_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x8];
+	u8         mkey_index[0x18];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_mkey_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         pg_access[0x1];
+	u8         reserved_at_61[0x1f];
+
+	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
+
+	u8         reserved_at_280[0x80];
+
+	u8         translations_octword_actual_size[0x20];
+
+	u8         mkey_umem_id[0x20];
+
+	u8         mkey_umem_offset[0x40];
+
+	u8         reserved_at_380[0x500];
+
+	u8         klm_pas_mtt[][0x20];
+};
+
+enum {
+	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+};
+
+enum {
+	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
+	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
+};
+
+enum {
+	MLX5_CAP_INLINE_MODE_L2,
+	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
+	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
+};
+
+enum {
+	MLX5_INLINE_MODE_NONE,
+	MLX5_INLINE_MODE_L2,
+	MLX5_INLINE_MODE_IP,
+	MLX5_INLINE_MODE_TCP_UDP,
+	MLX5_INLINE_MODE_RESERVED4,
+	MLX5_INLINE_MODE_INNER_L2,
+	MLX5_INLINE_MODE_INNER_IP,
+	MLX5_INLINE_MODE_INNER_TCP_UDP,
+};
+
+/* HCA bit masks indicating which Flex parser protocols are already enabled. */
+#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
+#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
+#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
+#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
+#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
+#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
+#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
+#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
+
+struct mlx5_ifc_cmd_hca_cap_bits {
+	u8 reserved_at_0[0x30];
+	u8 vhca_id[0x10];
+	u8 reserved_at_40[0x40];
+	u8 log_max_srq_sz[0x8];
+	u8 log_max_qp_sz[0x8];
+	u8 reserved_at_90[0xb];
+	u8 log_max_qp[0x5];
+	u8 reserved_at_a0[0xb];
+	u8 log_max_srq[0x5];
+	u8 reserved_at_b0[0x10];
+	u8 reserved_at_c0[0x8];
+	u8 log_max_cq_sz[0x8];
+	u8 reserved_at_d0[0xb];
+	u8 log_max_cq[0x5];
+	u8 log_max_eq_sz[0x8];
+	u8 reserved_at_e8[0x2];
+	u8 log_max_mkey[0x6];
+	u8 reserved_at_f0[0x8];
+	u8 dump_fill_mkey[0x1];
+	u8 reserved_at_f9[0x3];
+	u8 log_max_eq[0x4];
+	u8 max_indirection[0x8];
+	u8 fixed_buffer_size[0x1];
+	u8 log_max_mrw_sz[0x7];
+	u8 force_teardown[0x1];
+	u8 reserved_at_111[0x1];
+	u8 log_max_bsf_list_size[0x6];
+	u8 umr_extended_translation_offset[0x1];
+	u8 null_mkey[0x1];
+	u8 log_max_klm_list_size[0x6];
+	u8 reserved_at_120[0xa];
+	u8 log_max_ra_req_dc[0x6];
+	u8 reserved_at_130[0xa];
+	u8 log_max_ra_res_dc[0x6];
+	u8 reserved_at_140[0xa];
+	u8 log_max_ra_req_qp[0x6];
+	u8 reserved_at_150[0xa];
+	u8 log_max_ra_res_qp[0x6];
+	u8 end_pad[0x1];
+	u8 cc_query_allowed[0x1];
+	u8 cc_modify_allowed[0x1];
+	u8 start_pad[0x1];
+	u8 cache_line_128byte[0x1];
+	u8 reserved_at_165[0xa];
+	u8 qcam_reg[0x1];
+	u8 gid_table_size[0x10];
+	u8 out_of_seq_cnt[0x1];
+	u8 vport_counters[0x1];
+	u8 retransmission_q_counters[0x1];
+	u8 debug[0x1];
+	u8 modify_rq_counter_set_id[0x1];
+	u8 rq_delay_drop[0x1];
+	u8 max_qp_cnt[0xa];
+	u8 pkey_table_size[0x10];
+	u8 vport_group_manager[0x1];
+	u8 vhca_group_manager[0x1];
+	u8 ib_virt[0x1];
+	u8 eth_virt[0x1];
+	u8 vnic_env_queue_counters[0x1];
+	u8 ets[0x1];
+	u8 nic_flow_table[0x1];
+	u8 eswitch_manager[0x1];
+	u8 device_memory[0x1];
+	u8 mcam_reg[0x1];
+	u8 pcam_reg[0x1];
+	u8 local_ca_ack_delay[0x5];
+	u8 port_module_event[0x1];
+	u8 enhanced_error_q_counters[0x1];
+	u8 ports_check[0x1];
+	u8 reserved_at_1b3[0x1];
+	u8 disable_link_up[0x1];
+	u8 beacon_led[0x1];
+	u8 port_type[0x2];
+	u8 num_ports[0x8];
+	u8 reserved_at_1c0[0x1];
+	u8 pps[0x1];
+	u8 pps_modify[0x1];
+	u8 log_max_msg[0x5];
+	u8 reserved_at_1c8[0x4];
+	u8 max_tc[0x4];
+	u8 temp_warn_event[0x1];
+	u8 dcbx[0x1];
+	u8 general_notification_event[0x1];
+	u8 reserved_at_1d3[0x2];
+	u8 fpga[0x1];
+	u8 rol_s[0x1];
+	u8 rol_g[0x1];
+	u8 reserved_at_1d8[0x1];
+	u8 wol_s[0x1];
+	u8 wol_g[0x1];
+	u8 wol_a[0x1];
+	u8 wol_b[0x1];
+	u8 wol_m[0x1];
+	u8 wol_u[0x1];
+	u8 wol_p[0x1];
+	u8 stat_rate_support[0x10];
+	u8 reserved_at_1f0[0xc];
+	u8 cqe_version[0x4];
+	u8 compact_address_vector[0x1];
+	u8 striding_rq[0x1];
+	u8 reserved_at_202[0x1];
+	u8 ipoib_enhanced_offloads[0x1];
+	u8 ipoib_basic_offloads[0x1];
+	u8 reserved_at_205[0x1];
+	u8 repeated_block_disabled[0x1];
+	u8 umr_modify_entity_size_disabled[0x1];
+	u8 umr_modify_atomic_disabled[0x1];
+	u8 umr_indirect_mkey_disabled[0x1];
+	u8 umr_fence[0x2];
+	u8 reserved_at_20c[0x3];
+	u8 drain_sigerr[0x1];
+	u8 cmdif_checksum[0x2];
+	u8 sigerr_cqe[0x1];
+	u8 reserved_at_213[0x1];
+	u8 wq_signature[0x1];
+	u8 sctr_data_cqe[0x1];
+	u8 reserved_at_216[0x1];
+	u8 sho[0x1];
+	u8 tph[0x1];
+	u8 rf[0x1];
+	u8 dct[0x1];
+	u8 qos[0x1];
+	u8 eth_net_offloads[0x1];
+	u8 roce[0x1];
+	u8 atomic[0x1];
+	u8 reserved_at_21f[0x1];
+	u8 cq_oi[0x1];
+	u8 cq_resize[0x1];
+	u8 cq_moderation[0x1];
+	u8 reserved_at_223[0x3];
+	u8 cq_eq_remap[0x1];
+	u8 pg[0x1];
+	u8 block_lb_mc[0x1];
+	u8 reserved_at_229[0x1];
+	u8 scqe_break_moderation[0x1];
+	u8 cq_period_start_from_cqe[0x1];
+	u8 cd[0x1];
+	u8 reserved_at_22d[0x1];
+	u8 apm[0x1];
+	u8 vector_calc[0x1];
+	u8 umr_ptr_rlky[0x1];
+	u8 imaicl[0x1];
+	u8 reserved_at_232[0x4];
+	u8 qkv[0x1];
+	u8 pkv[0x1];
+	u8 set_deth_sqpn[0x1];
+	u8 reserved_at_239[0x3];
+	u8 xrc[0x1];
+	u8 ud[0x1];
+	u8 uc[0x1];
+	u8 rc[0x1];
+	u8 uar_4k[0x1];
+	u8 reserved_at_241[0x9];
+	u8 uar_sz[0x6];
+	u8 reserved_at_250[0x8];
+	u8 log_pg_sz[0x8];
+	u8 bf[0x1];
+	u8 driver_version[0x1];
+	u8 pad_tx_eth_packet[0x1];
+	u8 reserved_at_263[0x8];
+	u8 log_bf_reg_size[0x5];
+	u8 reserved_at_270[0xb];
+	u8 lag_master[0x1];
+	u8 num_lag_ports[0x4];
+	u8 reserved_at_280[0x10];
+	u8 max_wqe_sz_sq[0x10];
+	u8 reserved_at_2a0[0x10];
+	u8 max_wqe_sz_rq[0x10];
+	u8 max_flow_counter_31_16[0x10];
+	u8 max_wqe_sz_sq_dc[0x10];
+	u8 reserved_at_2e0[0x7];
+	u8 max_qp_mcg[0x19];
+	u8 reserved_at_300[0x10];
+	u8 flow_counter_bulk_alloc[0x08];
+	u8 log_max_mcg[0x8];
+	u8 reserved_at_320[0x3];
+	u8 log_max_transport_domain[0x5];
+	u8 reserved_at_328[0x3];
+	u8 log_max_pd[0x5];
+	u8 reserved_at_330[0xb];
+	u8 log_max_xrcd[0x5];
+	u8 nic_receive_steering_discard[0x1];
+	u8 receive_discard_vport_down[0x1];
+	u8 transmit_discard_vport_down[0x1];
+	u8 reserved_at_343[0x5];
+	u8 log_max_flow_counter_bulk[0x8];
+	u8 max_flow_counter_15_0[0x10];
+	u8 modify_tis[0x1];
+	u8 flow_counters_dump[0x1];
+	u8 reserved_at_360[0x1];
+	u8 log_max_rq[0x5];
+	u8 reserved_at_368[0x3];
+	u8 log_max_sq[0x5];
+	u8 reserved_at_370[0x3];
+	u8 log_max_tir[0x5];
+	u8 reserved_at_378[0x3];
+	u8 log_max_tis[0x5];
+	u8 basic_cyclic_rcv_wqe[0x1];
+	u8 reserved_at_381[0x2];
+	u8 log_max_rmp[0x5];
+	u8 reserved_at_388[0x3];
+	u8 log_max_rqt[0x5];
+	u8 reserved_at_390[0x3];
+	u8 log_max_rqt_size[0x5];
+	u8 reserved_at_398[0x3];
+	u8 log_max_tis_per_sq[0x5];
+	u8 ext_stride_num_range[0x1];
+	u8 reserved_at_3a1[0x2];
+	u8 log_max_stride_sz_rq[0x5];
+	u8 reserved_at_3a8[0x3];
+	u8 log_min_stride_sz_rq[0x5];
+	u8 reserved_at_3b0[0x3];
+	u8 log_max_stride_sz_sq[0x5];
+	u8 reserved_at_3b8[0x3];
+	u8 log_min_stride_sz_sq[0x5];
+	u8 hairpin[0x1];
+	u8 reserved_at_3c1[0x2];
+	u8 log_max_hairpin_queues[0x5];
+	u8 reserved_at_3c8[0x3];
+	u8 log_max_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3d0[0x3];
+	u8 log_max_hairpin_num_packets[0x5];
+	u8 reserved_at_3d8[0x3];
+	u8 log_max_wq_sz[0x5];
+	u8 nic_vport_change_event[0x1];
+	u8 disable_local_lb_uc[0x1];
+	u8 disable_local_lb_mc[0x1];
+	u8 log_min_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3e8[0x3];
+	u8 log_max_vlan_list[0x5];
+	u8 reserved_at_3f0[0x3];
+	u8 log_max_current_mc_list[0x5];
+	u8 reserved_at_3f8[0x3];
+	u8 log_max_current_uc_list[0x5];
+	u8 general_obj_types[0x40];
+	u8 reserved_at_440[0x20];
+	u8 reserved_at_460[0x10];
+	u8 max_num_eqs[0x10];
+	u8 reserved_at_480[0x3];
+	u8 log_max_l2_table[0x5];
+	u8 reserved_at_488[0x8];
+	u8 log_uar_page_sz[0x10];
+	u8 reserved_at_4a0[0x20];
+	u8 device_frequency_mhz[0x20];
+	u8 device_frequency_khz[0x20];
+	u8 reserved_at_500[0x20];
+	u8 num_of_uars_per_page[0x20];
+	u8 flex_parser_protocols[0x20];
+	u8 reserved_at_560[0x20];
+	u8 reserved_at_580[0x3c];
+	u8 mini_cqe_resp_stride_index[0x1];
+	u8 cqe_128_always[0x1];
+	u8 cqe_compression_128[0x1];
+	u8 cqe_compression[0x1];
+	u8 cqe_compression_timeout[0x10];
+	u8 cqe_compression_max_num[0x10];
+	u8 reserved_at_5e0[0x10];
+	u8 tag_matching[0x1];
+	u8 rndv_offload_rc[0x1];
+	u8 rndv_offload_dc[0x1];
+	u8 log_tag_matching_list_sz[0x5];
+	u8 reserved_at_5f8[0x3];
+	u8 log_max_xrq[0x5];
+	u8 affiliate_nic_vport_criteria[0x8];
+	u8 native_port_num[0x8];
+	u8 num_vhca_ports[0x8];
+	u8 reserved_at_618[0x6];
+	u8 sw_owner_id[0x1];
+	u8 reserved_at_61f[0x1e1];
+};
+
+struct mlx5_ifc_qos_cap_bits {
+	u8 packet_pacing[0x1];
+	u8 esw_scheduling[0x1];
+	u8 esw_bw_share[0x1];
+	u8 esw_rate_limit[0x1];
+	u8 reserved_at_4[0x1];
+	u8 packet_pacing_burst_bound[0x1];
+	u8 packet_pacing_typical_size[0x1];
+	u8 flow_meter_srtcm[0x1];
+	u8 reserved_at_8[0x8];
+	u8 log_max_flow_meter[0x8];
+	u8 flow_meter_reg_id[0x8];
+	u8 reserved_at_25[0x8];
+	u8 flow_meter_reg_share[0x1];
+	u8 reserved_at_2e[0x17];
+	u8 packet_pacing_max_rate[0x20];
+	u8 packet_pacing_min_rate[0x20];
+	u8 reserved_at_80[0x10];
+	u8 packet_pacing_rate_table_size[0x10];
+	u8 esw_element_type[0x10];
+	u8 esw_tsar_type[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 max_qos_para_vport[0x10];
+	u8 max_tsar_bw_share[0x20];
+	u8 reserved_at_100[0x6e8];
+};
+
+struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
+	u8 csum_cap[0x1];
+	u8 vlan_cap[0x1];
+	u8 lro_cap[0x1];
+	u8 lro_psh_flag[0x1];
+	u8 lro_time_stamp[0x1];
+	u8 lro_max_msg_sz_mode[0x2];
+	u8 wqe_vlan_insert[0x1];
+	u8 self_lb_en_modifiable[0x1];
+	u8 self_lb_mc[0x1];
+	u8 self_lb_uc[0x1];
+	u8 max_lso_cap[0x5];
+	u8 multi_pkt_send_wqe[0x2];
+	u8 wqe_inline_mode[0x2];
+	u8 rss_ind_tbl_cap[0x4];
+	u8 reg_umr_sq[0x1];
+	u8 scatter_fcs[0x1];
+	u8 enhanced_multi_pkt_send_wqe[0x1];
+	u8 tunnel_lso_const_out_ip_id[0x1];
+	u8 tunnel_lro_gre[0x1];
+	u8 tunnel_lro_vxlan[0x1];
+	u8 tunnel_stateless_gre[0x1];
+	u8 tunnel_stateless_vxlan[0x1];
+	u8 swp[0x1];
+	u8 swp_csum[0x1];
+	u8 swp_lso[0x1];
+	u8 reserved_at_23[0x8];
+	u8 tunnel_stateless_gtp[0x1];
+	u8 reserved_at_25[0x4];
+	u8 max_vxlan_udp_ports[0x8];
+	u8 reserved_at_38[0x6];
+	u8 max_geneve_opt_len[0x1];
+	u8 tunnel_stateless_geneve_rx[0x1];
+	u8 reserved_at_40[0x10];
+	u8 lro_min_mss_size[0x10];
+	u8 reserved_at_60[0x120];
+	u8 lro_timer_supported_periods[4][0x20];
+	u8 reserved_at_200[0x600];
+};
+
+union mlx5_ifc_hca_cap_union_bits {
+	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
+	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
+	       per_protocol_networking_offload_caps;
+	struct mlx5_ifc_qos_cap_bits qos_cap;
+	u8 reserved_at_0[0x8000];
+};
+
+struct mlx5_ifc_query_hca_cap_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	union mlx5_ifc_hca_cap_union_bits capability;
+};
+
+struct mlx5_ifc_query_hca_cap_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mac_address_layout_bits {
+	u8 reserved_at_0[0x10];
+	u8 mac_addr_47_32[0x10];
+	u8 mac_addr_31_0[0x20];
+};
+
+struct mlx5_ifc_nic_vport_context_bits {
+	u8 reserved_at_0[0x5];
+	u8 min_wqe_inline_mode[0x3];
+	u8 reserved_at_8[0x15];
+	u8 disable_mc_local_lb[0x1];
+	u8 disable_uc_local_lb[0x1];
+	u8 roce_en[0x1];
+	u8 arm_change_event[0x1];
+	u8 reserved_at_21[0x1a];
+	u8 event_on_mtu[0x1];
+	u8 event_on_promisc_change[0x1];
+	u8 event_on_vlan_change[0x1];
+	u8 event_on_mc_address_change[0x1];
+	u8 event_on_uc_address_change[0x1];
+	u8 reserved_at_40[0xc];
+	u8 affiliation_criteria[0x4];
+	u8 affiliated_vhca_id[0x10];
+	u8 reserved_at_60[0xd0];
+	u8 mtu[0x10];
+	u8 system_image_guid[0x40];
+	u8 port_guid[0x40];
+	u8 node_guid[0x40];
+	u8 reserved_at_200[0x140];
+	u8 qkey_violation_counter[0x10];
+	u8 reserved_at_350[0x430];
+	u8 promisc_uc[0x1];
+	u8 promisc_mc[0x1];
+	u8 promisc_all[0x1];
+	u8 reserved_at_783[0x2];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_788[0xc];
+	u8 allowed_list_size[0xc];
+	struct mlx5_ifc_mac_address_layout_bits permanent_address;
+	u8 reserved_at_7e0[0x20];
+};
+
+struct mlx5_ifc_query_nic_vport_context_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
+};
+
+struct mlx5_ifc_query_nic_vport_context_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 other_vport[0x1];
+	u8 reserved_at_41[0xf];
+	u8 vport_number[0x10];
+	u8 reserved_at_60[0x5];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_68[0x18];
+};
+
+struct mlx5_ifc_tisc_bits {
+	u8 strict_lag_tx_port_affinity[0x1];
+	u8 reserved_at_1[0x3];
+	u8 lag_tx_port_affinity[0x04];
+	u8 reserved_at_8[0x4];
+	u8 prio[0x4];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x100];
+	u8 reserved_at_120[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_140[0x8];
+	u8 underlay_qpn[0x18];
+	u8 reserved_at_160[0x3a0];
+};
+
+struct mlx5_ifc_query_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_tisc_bits tis_context;
+};
+
+struct mlx5_ifc_query_tis_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+enum {
+	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
+	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
+	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
+	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
+};
+
+enum {
+	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
+	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
+};
+
+struct mlx5_ifc_wq_bits {
+	u8 wq_type[0x4];
+	u8 wq_signature[0x1];
+	u8 end_padding_mode[0x2];
+	u8 cd_slave[0x1];
+	u8 reserved_at_8[0x18];
+	u8 hds_skip_first_sge[0x1];
+	u8 log2_hds_buf_size[0x3];
+	u8 reserved_at_24[0x7];
+	u8 page_offset[0x5];
+	u8 lwm[0x10];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x8];
+	u8 uar_page[0x18];
+	u8 dbr_addr[0x40];
+	u8 hw_counter[0x20];
+	u8 sw_counter[0x20];
+	u8 reserved_at_100[0xc];
+	u8 log_wq_stride[0x4];
+	u8 reserved_at_110[0x3];
+	u8 log_wq_pg_sz[0x5];
+	u8 reserved_at_118[0x3];
+	u8 log_wq_sz[0x5];
+	u8 dbr_umem_valid[0x1];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_122[0x1];
+	u8 log_hairpin_num_packets[0x5];
+	u8 reserved_at_128[0x3];
+	u8 log_hairpin_data_sz[0x5];
+	u8 reserved_at_130[0x4];
+	u8 single_wqe_log_num_of_strides[0x4];
+	u8 two_byte_shift_en[0x1];
+	u8 reserved_at_139[0x4];
+	u8 single_stride_log_num_of_bytes[0x3];
+	u8 dbr_umem_id[0x20];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_offset[0x40];
+	u8 reserved_at_1c0[0x440];
+};
+
+enum {
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
+};
+
+enum {
+	MLX5_RQC_STATE_RST  = 0x0,
+	MLX5_RQC_STATE_RDY  = 0x1,
+	MLX5_RQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_rqc_bits {
+	u8 rlky[0x1];
+	u8 delay_drop_en[0x1];
+	u8 scatter_fcs[0x1];
+	u8 vsd[0x1];
+	u8 mem_rq_type[0x4];
+	u8 state[0x4];
+	u8 reserved_at_c[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 counter_set_id[0x8];
+	u8 reserved_at_68[0x18];
+	u8 reserved_at_80[0x8];
+	u8 rmpn[0x18];
+	u8 reserved_at_a0[0x8];
+	u8 hairpin_peer_sq[0x18];
+	u8 reserved_at_c0[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_e0[0xa0];
+	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
+};
+
+struct mlx5_ifc_create_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+struct mlx5_ifc_modify_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
+enum {
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
+};
+
+struct mlx5_ifc_modify_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 rq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+enum {
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
+};
+
+struct mlx5_ifc_rx_hash_field_select_bits {
+	u8 l3_prot_type[0x1];
+	u8 l4_prot_type[0x1];
+	u8 selected_fields[0x1e];
+};
+
+enum {
+	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
+	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
+};
+
+enum {
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
+};
+
+enum {
+	MLX5_RX_HASH_FN_NONE           = 0x0,
+	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
+	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
+};
+
+enum {
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
+};
+
+enum {
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
+};
+
+struct mlx5_ifc_tirc_bits {
+	u8 reserved_at_0[0x20];
+	u8 disp_type[0x4];
+	u8 reserved_at_24[0x1c];
+	u8 reserved_at_40[0x40];
+	u8 reserved_at_80[0x4];
+	u8 lro_timeout_period_usecs[0x10];
+	u8 lro_enable_mask[0x4];
+	u8 lro_max_msg_sz[0x8];
+	u8 reserved_at_a0[0x40];
+	u8 reserved_at_e0[0x8];
+	u8 inline_rqn[0x18];
+	u8 rx_hash_symmetric[0x1];
+	u8 reserved_at_101[0x1];
+	u8 tunneled_offload_en[0x1];
+	u8 reserved_at_103[0x5];
+	u8 indirect_table[0x18];
+	u8 rx_hash_fn[0x4];
+	u8 reserved_at_124[0x2];
+	u8 self_lb_block[0x2];
+	u8 transport_domain[0x18];
+	u8 rx_hash_toeplitz_key[10][0x20];
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
+	u8 reserved_at_2c0[0x4c0];
+};
+
+struct mlx5_ifc_create_tir_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tirn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tir_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tirc_bits ctx;
+};
+
+struct mlx5_ifc_rq_num_bits {
+	u8 reserved_at_0[0x8];
+	u8 rq_num[0x18];
+};
+
+struct mlx5_ifc_rqtc_bits {
+	u8 reserved_at_0[0xa0];
+	u8 reserved_at_a0[0x10];
+	u8 rqt_max_size[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 rqt_actual_size[0x10];
+	u8 reserved_at_e0[0x6a0];
+	struct mlx5_ifc_rq_num_bits rq_num[];
+};
+
+struct mlx5_ifc_create_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+enum {
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
+};
+
+struct mlx5_ifc_flow_meter_parameters_bits {
+	u8         valid[0x1];			// 00h
+	u8         bucket_overflow[0x1];
+	u8         start_color[0x2];
+	u8         both_buckets_on_green[0x1];
+	u8         meter_mode[0x2];
+	u8         reserved_at_1[0x19];
+	u8         reserved_at_2[0x20]; //04h
+	u8         reserved_at_3[0x3];
+	u8         cbs_exponent[0x5];		// 08h
+	u8         cbs_mantissa[0x8];
+	u8         reserved_at_4[0x3];
+	u8         cir_exponent[0x5];
+	u8         cir_mantissa[0x8];
+	u8         reserved_at_5[0x20];		// 0Ch
+	u8         reserved_at_6[0x3];
+	u8         ebs_exponent[0x5];		// 10h
+	u8         ebs_mantissa[0x8];
+	u8         reserved_at_7[0x3];
+	u8         eir_exponent[0x5];
+	u8         eir_mantissa[0x8];
+	u8         reserved_at_8[0x60];		// 14h-1Ch
+};
+
+/* CQE format mask. */
+#define MLX5E_CQE_FORMAT_MASK 0xc
+
+/* MPW opcode. */
+#define MLX5_OPC_MOD_MPW 0x01
+
+/* Compressed Rx CQE structure. */
+struct mlx5_mini_cqe8 {
+	union {
+		uint32_t rx_hash_result;
+		struct {
+			uint16_t checksum;
+			uint16_t stride_idx;
+		};
+		struct {
+			uint16_t wqe_counter;
+			uint8_t  s_wqe_opcode;
+			uint8_t  reserved;
+		} s_wqe_info;
+	};
+	uint32_t byte_cnt;
+};
+
+/* srTCM PRM flow meter parameters. */
+enum {
+	MLX5_FLOW_COLOR_RED = 0,
+	MLX5_FLOW_COLOR_YELLOW,
+	MLX5_FLOW_COLOR_GREEN,
+	MLX5_FLOW_COLOR_UNDEFINED,
+};
+
+/* Maximum value of srTCM metering parameters. */
+#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
+#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
+#define MLX5_SRTCM_EBS_MAX 0
+
+/* The bits meter color use. */
+#define MLX5_MTR_COLOR_BITS 8
+
+/**
+ * Convert a user mark to flow mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_set(uint32_t val)
+{
+	uint32_t ret;
+
+	/*
+	 * Add one to the user value to differentiate un-marked flows from
+	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
+	 * remains untouched.
+	 */
+	if (val != MLX5_FLOW_MARK_DEFAULT)
+		++val;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	/*
+	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
+	 * word, byte-swapped by the kernel on little-endian systems. In this
+	 * case, left-shifting the resulting big-endian value ensures the
+	 * least significant 24 bits are retained when converting it back.
+	 */
+	ret = rte_cpu_to_be_32(val) >> 8;
+#else
+	ret = val;
+#endif
+	return ret;
+}
+
+/**
+ * Convert a mark to user mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_get(uint32_t val)
+{
+	/*
+	 * Subtract one from the retrieved value. It was added by
+	 * mlx5_flow_mark_set() to distinguish unmarked flows.
+	 */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	return (val >> 8) - 1;
+#else
+	return val - 1;
+#endif
+}
+
+#endif /* RTE_PMD_MLX5_PRM_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
new file mode 100644
index 0000000..e4f85e2
--- /dev/null
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -0,0 +1,20 @@
+DPDK_20.02 {
+	global:
+
+	mlx5_devx_cmd_create_rq;
+	mlx5_devx_cmd_create_rqt;
+	mlx5_devx_cmd_create_sq;
+	mlx5_devx_cmd_create_tir;
+	mlx5_devx_cmd_create_td;
+	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_destroy;
+	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_query;
+	mlx5_devx_cmd_flow_dump;
+	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_qp_query_tis_td;
+	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_get_out_command_status;
+};
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 0466d9d..a9558ca 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -10,11 +10,14 @@ LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
 LIB_GLUE_VERSION = 20.02.0
 
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+LDLIBS += -ldl
+endif
+
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
-ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
-endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
@@ -37,34 +40,22 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
-endif
-
 # Basic CFLAGS.
 CFLAGS += -O3
 CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
-CFLAGS += -I.
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-CFLAGS_mlx5_glue.o += -fPIC
-LDLIBS += -ldl
-else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
-LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-else
-LDLIBS += -libverbs -lmlx5
-endif
+LDLIBS += -lrte_common_mlx5
 LDLIBS += -lm
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -74,6 +65,7 @@ LDLIBS += -lrte_bus_pci
 CFLAGS += -Wno-error=cast-qual
 
 EXPORT_MAP := rte_pmd_mlx5_version.map
+
 # memseg walk is not part of stable API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
@@ -96,282 +88,3 @@ endif
 
 include $(RTE_SDK)/mk/rte.lib.mk
 
-# Generate and clean-up mlx5_autoconf.h.
-
-export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
-export AUTO_CONFIG_CFLAGS += -Wno-error
-
-ifndef V
-AUTOCONF_OUTPUT := >/dev/null
-endif
-
-mlx5_autoconf.h.new: FORCE
-
-mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
-	$Q $(RM) -f -- '$@'
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_MPLS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_FLOW_SPEC_MPLS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAG_RX_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_SWP \
-		infiniband/mlx5dv.h \
-		type 'struct mlx5dv_sw_parsing_caps' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_MPW \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DV_SUPPORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_create_flow_action_packet_reformat \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ESWITCH \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_VLAN \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_push_vlan \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_DEVX_PORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_query_devx_port \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_OBJ \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_create \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DEVX_COUNTERS \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_ASYNC \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_query_async \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_dest_devx_tir \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_flow_meter \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_FLOW_DUMP \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dump_dr_domain \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
-		infiniband/mlx5dv.h \
-		enum MLX5_MMAP_GET_NC_PAGES_CMD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_25G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_50G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_100G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
-		infiniband/verbs.h \
-		type 'struct ibv_counter_set_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
-		infiniband/verbs.h \
-		type 'struct ibv_counters_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NL_NLDEV \
-		rdma/rdma_netlink.h \
-		enum RDMA_NL_NLDEV \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_PORT_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_PORT_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_PORT_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_NUM_VF \
-		linux/if_link.h \
-		enum IFLA_NUM_VF \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_EXT_MASK \
-		linux/if_link.h \
-		enum IFLA_EXT_MASK \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_SWITCH_ID \
-		linux/if_link.h \
-		enum IFLA_PHYS_SWITCH_ID \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_PORT_NAME \
-		linux/if_link.h \
-		enum IFLA_PHYS_PORT_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_STATIC_ASSERT \
-		/usr/include/assert.h \
-		define static_assert \
-		$(AUTOCONF_OUTPUT)
-
-# Create mlx5_autoconf.h or update it in case it differs from the new one.
-
-mlx5_autoconf.h: mlx5_autoconf.h.new
-	$Q [ -f '$@' ] && \
-		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
-		mv '$<' '$@'
-
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
-
-# Generate dependency plug-in for rdma-core when the PMD must not be linked
-# directly, so that applications do not inherit this dependency.
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-
-$(LIB): $(LIB_GLUE)
-
-ifeq ($(LINK_USING_CC),1)
-GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
-else
-GLUE_LDFLAGS := $(LDFLAGS)
-endif
-$(LIB_GLUE): mlx5_glue.o
-	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
-		-Wl,-h,$(LIB_GLUE) \
-		-shared -o $@ $< -libverbs -lmlx5
-
-mlx5_glue.o: mlx5_autoconf.h
-
-endif
-
-clean_mlx5: FORCE
-	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
-	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
-
-clean: clean_mlx5
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 3ad4f02..f6d0db9 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -7,224 +7,54 @@ if not is_linux
 	reason = 'only supported on Linux'
 	subdir_done()
 endif
-build = true
 
-pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
 LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
 LIB_GLUE_VERSION = '20.02.0'
 LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-if pmd_dlopen
-	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
-	cflags += [
-		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
-		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
-	]
-endif
 
-libnames = [ 'mlx5', 'ibverbs' ]
-libs = []
-foreach libname:libnames
-	lib = dependency('lib' + libname, required:false)
-	if not lib.found()
-		lib = cc.find_library(libname, required:false)
-	endif
-	if lib.found()
-		libs += [ lib ]
-	else
-		build = false
-		reason = 'missing dependency, "' + libname + '"'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5']
+sources = files(
+	'mlx5.c',
+	'mlx5_ethdev.c',
+	'mlx5_flow.c',
+	'mlx5_flow_meter.c',
+	'mlx5_flow_dv.c',
+	'mlx5_flow_verbs.c',
+	'mlx5_mac.c',
+	'mlx5_mr.c',
+	'mlx5_nl.c',
+	'mlx5_rss.c',
+	'mlx5_rxmode.c',
+	'mlx5_rxq.c',
+	'mlx5_rxtx.c',
+	'mlx5_mp.c',
+	'mlx5_stats.c',
+	'mlx5_trigger.c',
+	'mlx5_txq.c',
+	'mlx5_vlan.c',
+	'mlx5_utils.c',
+	'mlx5_socket.c',
+)
+if (dpdk_conf.has('RTE_ARCH_X86_64')
+	or dpdk_conf.has('RTE_ARCH_ARM64')
+	or dpdk_conf.has('RTE_ARCH_PPC_64'))
+	sources += files('mlx5_rxtx_vec.c')
+endif
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
 	endif
 endforeach
-
-if build
-	allow_experimental_apis = true
-	deps += ['hash']
-	ext_deps += libs
-	sources = files(
-		'mlx5.c',
-		'mlx5_ethdev.c',
-		'mlx5_flow.c',
-		'mlx5_flow_meter.c',
-		'mlx5_flow_dv.c',
-		'mlx5_flow_verbs.c',
-		'mlx5_mac.c',
-		'mlx5_mr.c',
-		'mlx5_nl.c',
-		'mlx5_rss.c',
-		'mlx5_rxmode.c',
-		'mlx5_rxq.c',
-		'mlx5_rxtx.c',
-		'mlx5_mp.c',
-		'mlx5_stats.c',
-		'mlx5_trigger.c',
-		'mlx5_txq.c',
-		'mlx5_vlan.c',
-		'mlx5_devx_cmds.c',
-		'mlx5_utils.c',
-		'mlx5_socket.c',
-	)
-	if (dpdk_conf.has('RTE_ARCH_X86_64')
-		or dpdk_conf.has('RTE_ARCH_ARM64')
-		or dpdk_conf.has('RTE_ARCH_PPC_64'))
-		sources += files('mlx5_rxtx_vec.c')
-	endif
-	if not pmd_dlopen
-		sources += files('mlx5_glue.c')
-	endif
-	cflags_options = [
-		'-std=c11',
-		'-Wno-strict-prototypes',
-		'-D_BSD_SOURCE',
-		'-D_DEFAULT_SOURCE',
-		'-D_XOPEN_SOURCE=600'
-	]
-	foreach option:cflags_options
-		if cc.has_argument(option)
-			cflags += option
-		endif
-	endforeach
-	if get_option('buildtype').contains('debug')
-		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
-	else
-		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
-	endif
-	# To maintain the compatibility with the make build system
-	# mlx5_autoconf.h file is still generated.
-	# input array for meson member search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search", "struct member to search" ]
-	has_member_args = [
-		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
-		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
-		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
-		'struct ibv_counters_init_attr', 'comp_mask' ],
-	]
-	# input array for meson symbol search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search" ]
-	has_sym_args = [
-		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
-		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
-		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
-		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_create_flow_action_packet_reformat' ],
-		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
-		'IBV_FLOW_SPEC_MPLS' ],
-		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
-		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAG_RX_END_PADDING' ],
-		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_query_devx_port' ],
-		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_create' ],
-		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
-		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
-		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_query_async' ],
-		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_dest_devx_tir' ],
-		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_flow_meter' ],
-		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
-		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
-		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
-		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
-		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_push_vlan' ],
-		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseLR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseLR4_Full' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
-		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
-		'IFLA_NUM_VF' ],
-		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
-		'IFLA_EXT_MASK' ],
-		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
-		'IFLA_PHYS_SWITCH_ID' ],
-		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
-		'IFLA_PHYS_PORT_NAME' ],
-		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
-		'RDMA_NL_NLDEV' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_GET' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_PORT_GET' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_NAME' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
-		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
-		'mlx5dv_dump_dr_domain'],
-	]
-	config = configuration_data()
-	foreach arg:has_sym_args
-		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
-			dependencies: libs))
-	endforeach
-	foreach arg:has_member_args
-		file_prefix = '#include <' + arg[1] + '>'
-		config.set(arg[0], cc.has_member(arg[2], arg[3],
-			prefix : file_prefix, dependencies: libs))
-	endforeach
-	configure_file(output : 'mlx5_autoconf.h', configuration : config)
-endif
-# Build Glue Library
-if pmd_dlopen and build
-	dlopen_name = 'mlx5_glue'
-	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
-	dlopen_so_version = LIB_GLUE_VERSION
-	dlopen_sources = files('mlx5_glue.c')
-	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
-	dlopen_includes = [global_inc]
-	dlopen_includes += include_directories(
-		'../../../lib/librte_eal/common/include/generic',
-	)
-	shared_lib = shared_library(
-		dlopen_lib_name,
-		dlopen_sources,
-		include_directories: dlopen_includes,
-		c_args: cflags,
-		dependencies: libs,
-		link_args: [
-		'-Wl,-export-dynamic',
-		'-Wl,-h,@0@'.format(LIB_GLUE),
-		],
-		soversion: dlopen_so_version,
-		install: true,
-		install_dir: dlopen_install_dir,
-	)
+if get_option('buildtype').contains('debug')
+	cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+else
+	cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
 endif
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7126edf..7cf357d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -38,15 +38,16 @@
 #include <rte_string_fns.h>
 #include <rte_alarm.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4d0485d..872fccb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -32,13 +32,14 @@
 #include <rte_errno.h>
 #include <rte_flow.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
deleted file mode 100644
index 62ca590..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ /dev/null
@@ -1,976 +0,0 @@
-// SPDX-License-Identifier: BSD-3-Clause
-/* Copyright 2018 Mellanox Technologies, Ltd */
-
-#include <unistd.h>
-
-#include <rte_flow_driver.h>
-#include <rte_malloc.h>
-
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_utils.h"
-
-
-/**
- * Allocate flow counters via devx interface.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param dcs
- *   Pointer to counters properties structure to be filled by the routine.
- * @param bulk_n_128
- *   Bulk counter numbers in 128 counters units.
- *
- * @return
- *   Pointer to counter object on success, a negative value otherwise and
- *   rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
-{
-	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
-	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
-
-	if (!dcs) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
-	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
-	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
-					      sizeof(in), out, sizeof(out));
-	if (!dcs->obj) {
-		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
-		rte_errno = errno;
-		rte_free(dcs);
-		return NULL;
-	}
-	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
-	return dcs;
-}
-
-/**
- * Query flow counters values.
- *
- * @param[in] dcs
- *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
- * @param[in] clear
- *   Whether hardware should clear the counters after the query or not.
- * @param[in] n_counters
- *   0 in case of 1 counter to read, otherwise the counter number to read.
- *  @param pkts
- *   The number of packets that matched the flow.
- *  @param bytes
- *    The number of bytes that matched the flow.
- *  @param mkey
- *   The mkey key for batch query.
- *  @param addr
- *    The address in the mkey range for batch query.
- *  @param cmd_comp
- *   The completion object for asynchronous batch query.
- *  @param async_id
- *    The ID to be returned in the asynchronous batch query response.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				 int clear, uint32_t n_counters,
-				 uint64_t *pkts, uint64_t *bytes,
-				 uint32_t mkey, void *addr,
-				 struct mlx5dv_devx_cmd_comp *cmd_comp,
-				 uint64_t async_id)
-{
-	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-			MLX5_ST_SZ_BYTES(traffic_counter);
-	uint32_t out[out_len];
-	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
-	void *stats;
-	int rc;
-
-	MLX5_SET(query_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
-	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
-	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
-	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
-
-	if (n_counters) {
-		MLX5_SET(query_flow_counter_in, in, num_of_counters,
-			 n_counters);
-		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
-		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
-		MLX5_SET64(query_flow_counter_in, in, address,
-			   (uint64_t)(uintptr_t)addr);
-	}
-	if (!cmd_comp)
-		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
-					       out_len);
-	else
-		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
-						     out_len, async_id,
-						     cmd_comp);
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
-		rte_errno = rc;
-		return -rc;
-	}
-	if (!n_counters) {
-		stats = MLX5_ADDR_OF(query_flow_counter_out,
-				     out, flow_statistics);
-		*pkts = MLX5_GET64(traffic_counter, stats, packets);
-		*bytes = MLX5_GET64(traffic_counter, stats, octets);
-	}
-	return 0;
-}
-
-/**
- * Create a new mkey.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] attr
- *   Attributes of the requested mkey.
- *
- * @return
- *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
- *   is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-			  struct mlx5_devx_mkey_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
-	void *mkc;
-	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
-	size_t pgsize;
-	uint32_t translation_size;
-
-	if (!mkey) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
-	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
-	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
-		 translation_size);
-	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
-	MLX5_SET(mkc, mkc, lw, 0x1);
-	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
-	MLX5_SET(mkc, mkc, qpn, 0xffffff);
-	MLX5_SET(mkc, mkc, pd, attr->pd);
-	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
-	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
-	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
-	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
-					       sizeof(out));
-	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
-		rte_errno = errno;
-		rte_free(mkey);
-		return NULL;
-	}
-	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
-	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
-	return mkey;
-}
-
-/**
- * Get status of devx command response.
- * Mainly used for asynchronous commands.
- *
- * @param[in] out
- *   The out response buffer.
- *
- * @return
- *   0 on success, non-zero value otherwise.
- */
-int
-mlx5_devx_get_out_command_status(void *out)
-{
-	int status;
-
-	if (!out)
-		return -EINVAL;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	if (status) {
-		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-
-		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
-			syndrome);
-	}
-	return status;
-}
-
-/**
- * Destroy any object allocated by a Devx API.
- *
- * @param[in] obj
- *   Pointer to a general object.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
-{
-	int ret;
-
-	if (!obj)
-		return 0;
-	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
-	rte_free(obj);
-	return ret;
-}
-
-/**
- * Query NIC vport context.
- * Fills minimal inline attribute.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] vport
- *   vport index
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-static int
-mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
-				      unsigned int vport,
-				      struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
-	void *vctx;
-	int status, syndrome, rc;
-
-	/* Query NIC vport context to determine inline mode. */
-	MLX5_SET(query_nic_vport_context_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
-	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
-	if (vport)
-		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_nic_vport_context_out, out, status);
-	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
-			    nic_vport_context);
-	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
-					   min_wqe_inline_mode);
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query HCA attributes.
- * Using those attributes we can check on run time if the device
- * is having the required capabilities.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-			     struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
-	void *hcattr;
-	int status, syndrome, rc;
-
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in), out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->flow_counter_bulk_alloc_bitmap =
-			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
-	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
-					    flow_counters_dump);
-	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
-	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
-	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
-						log_max_hairpin_queues);
-	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
-						    log_max_hairpin_wq_data_sz);
-	attr->log_max_hairpin_num_packets = MLX5_GET
-		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
-	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
-	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
-					  eth_net_offloads);
-	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
-					       flex_parser_protocols);
-	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
-	if (attr->qos.sup) {
-		MLX5_SET(query_hca_cap_in, in, op_mod,
-			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
-			 MLX5_HCA_CAP_OPMOD_GET_CUR);
-		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
-						 out, sizeof(out));
-		if (rc)
-			goto error;
-		if (status) {
-			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
-				" status %x, syndrome = %x",
-				status, syndrome);
-			return -1;
-		}
-		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-		attr->qos.srtcm_sup =
-				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
-		attr->qos.log_max_flow_meter =
-				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
-		attr->qos.flow_meter_reg_c_ids =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
-		attr->qos.flow_meter_reg_share =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
-	}
-	if (!attr->eth_net_offloads)
-		return 0;
-
-	/* Query HCA offloads for Ethernet protocol. */
-	memset(in, 0, sizeof(in));
-	memset(out, 0, sizeof(out));
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc) {
-		attr->eth_net_offloads = 0;
-		goto error;
-	}
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		attr->eth_net_offloads = 0;
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_vlan_insert);
-	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_cap);
-	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
-					hcattr, tunnel_lro_gre);
-	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
-					  hcattr, tunnel_lro_vxlan);
-	attr->lro_max_msg_sz_mode = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, lro_max_msg_sz_mode);
-	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
-		attr->lro_timer_supported_periods[i] =
-			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_timer_supported_periods[i]);
-	}
-	attr->tunnel_stateless_geneve_rx =
-			    MLX5_GET(per_protocol_networking_offload_caps,
-				     hcattr, tunnel_stateless_geneve_rx);
-	attr->geneve_max_opt_len =
-		    MLX5_GET(per_protocol_networking_offload_caps,
-			     hcattr, max_geneve_opt_len);
-	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_inline_mode);
-	attr->tunnel_stateless_gtp = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, tunnel_stateless_gtp);
-	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-		return 0;
-	if (attr->eth_virt) {
-		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
-		if (rc) {
-			attr->eth_virt = 0;
-			goto error;
-		}
-	}
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query TIS transport domain from QP verbs object using DevX API.
- *
- * @param[in] qp
- *   Pointer to verbs QP returned by ibv_create_qp .
- * @param[in] tis_num
- *   TIS number of TIS to query.
- * @param[out] tis_td
- *   Pointer to TIS transport domain variable, to be set by the routine.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-			      uint32_t *tis_td)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
-	int rc;
-	void *tis_ctx;
-
-	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
-	MLX5_SET(query_tis_in, in, tisn, tis_num);
-	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query QP using DevX");
-		return -rc;
-	};
-	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
-	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
-	return 0;
-}
-
-/**
- * Fill WQ data for DevX API command.
- * Utility function for use when creating DevX objects containing a WQ.
- *
- * @param[in] wq_ctx
- *   Pointer to WQ context to fill with data.
- * @param [in] wq_attr
- *   Pointer to WQ attributes structure to fill in WQ context.
- */
-static void
-devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
-{
-	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
-	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
-	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
-	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
-	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
-	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
-	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
-	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
-	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
-	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
-	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
-	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
-	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
-	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
-	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
-	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
-	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
-	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
-	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
-		 wq_attr->log_hairpin_num_packets);
-	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
-	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
-		 wq_attr->single_wqe_log_num_of_strides);
-	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
-	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
-		 wq_attr->single_stride_log_num_of_bytes);
-	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
-	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
-	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
-}
-
-/**
- * Create RQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rq_attr
- *   Pointer to create RQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-			struct mlx5_devx_create_rq_attr *rq_attr,
-			int socket)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *rq = NULL;
-
-	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
-	if (!rq) {
-		DRV_LOG(ERR, "Failed to allocate RQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
-	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
-	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
-	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
-	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
-	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
-	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
-	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
-	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-	wq_attr = &rq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						  out, sizeof(out));
-	if (!rq->obj) {
-		DRV_LOG(ERR, "Failed to create RQ using DevX");
-		rte_errno = errno;
-		rte_free(rq);
-		return NULL;
-	}
-	rq->id = MLX5_GET(create_rq_out, out, rqn);
-	return rq;
-}
-
-/**
- * Modify RQ using DevX API.
- *
- * @param[in] rq
- *   Pointer to RQ object structure.
- * @param [in] rq_attr
- *   Pointer to modify RQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			struct mlx5_devx_modify_rq_attr *rq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	int ret;
-
-	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
-	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
-	MLX5_SET(modify_rq_in, in, rqn, rq->id);
-	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
-	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
-		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
-		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
-		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
-		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
-	}
-	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify RQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIR using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tir_attr
- *   Pointer to TIR attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-			 struct mlx5_devx_tir_attr *tir_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
-	void *tir_ctx, *outer, *inner;
-	struct mlx5_devx_obj *tir = NULL;
-	int i;
-
-	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
-	if (!tir) {
-		DRV_LOG(ERR, "Failed to allocate TIR data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
-	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
-	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
-	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
-		 tir_attr->lro_timeout_period_usecs);
-	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
-	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
-	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
-	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
-	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
-		 tir_attr->tunneled_offload_en);
-	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
-	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
-	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
-	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
-	for (i = 0; i < 10; i++) {
-		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
-			 tir_attr->rx_hash_toeplitz_key[i]);
-	}
-	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
-	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, selected_fields,
-		 tir_attr->rx_hash_field_selector_outer.selected_fields);
-	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
-	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, selected_fields,
-		 tir_attr->rx_hash_field_selector_inner.selected_fields);
-	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						   out, sizeof(out));
-	if (!tir->obj) {
-		DRV_LOG(ERR, "Failed to create TIR using DevX");
-		rte_errno = errno;
-		rte_free(tir);
-		return NULL;
-	}
-	tir->id = MLX5_GET(create_tir_out, out, tirn);
-	return tir;
-}
-
-/**
- * Create RQT using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rqt_attr
- *   Pointer to RQT attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-			 struct mlx5_devx_rqt_attr *rqt_attr)
-{
-	uint32_t *in = NULL;
-	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
-			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
-	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
-	void *rqt_ctx;
-	struct mlx5_devx_obj *rqt = NULL;
-	int i;
-
-	in = rte_calloc(__func__, 1, inlen, 0);
-	if (!in) {
-		DRV_LOG(ERR, "Failed to allocate RQT IN data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
-	if (!rqt) {
-		DRV_LOG(ERR, "Failed to allocate RQT data");
-		rte_errno = ENOMEM;
-		rte_free(in);
-		return NULL;
-	}
-	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
-	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
-	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
-	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
-	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
-		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
-	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
-	rte_free(in);
-	if (!rqt->obj) {
-		DRV_LOG(ERR, "Failed to create RQT using DevX");
-		rte_errno = errno;
-		rte_free(rqt);
-		return NULL;
-	}
-	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
-	return rqt;
-}
-
-/**
- * Create SQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- **/
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-			struct mlx5_devx_create_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
-	void *sq_ctx;
-	void *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *sq = NULL;
-
-	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
-	if (!sq) {
-		DRV_LOG(ERR, "Failed to allocate SQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
-	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
-	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
-	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
-	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
-		 sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
-		 sq_attr->min_wqe_inline_mode);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
-	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
-	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
-	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
-	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
-	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
-		 sq_attr->packet_pacing_rate_limit_index);
-	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
-	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
-	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
-	wq_attr = &sq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!sq->obj) {
-		DRV_LOG(ERR, "Failed to create SQ using DevX");
-		rte_errno = errno;
-		rte_free(sq);
-		return NULL;
-	}
-	sq->id = MLX5_GET(create_sq_out, out, sqn);
-	return sq;
-}
-
-/**
- * Modify SQ using DevX API.
- *
- * @param[in] sq
- *   Pointer to SQ object structure.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			struct mlx5_devx_modify_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
-	void *sq_ctx;
-	int ret;
-
-	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
-	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
-	MLX5_SET(modify_sq_in, in, sqn, sq->id);
-	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
-	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify SQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIS using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tis_attr
- *   Pointer to TIS attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-			 struct mlx5_devx_tis_attr *tis_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
-	struct mlx5_devx_obj *tis = NULL;
-	void *tis_ctx;
-
-	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
-	if (!tis) {
-		DRV_LOG(ERR, "Failed to allocate TIS object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
-	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
-	MLX5_SET(tisc, tis_ctx, transport_domain,
-		 tis_attr->transport_domain);
-	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					      out, sizeof(out));
-	if (!tis->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(tis);
-		return NULL;
-	}
-	tis->id = MLX5_GET(create_tis_out, out, tisn);
-	return tis;
-}
-
-/**
- * Create transport domain using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_td(struct ibv_context *ctx)
-{
-	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
-	struct mlx5_devx_obj *td = NULL;
-
-	td = rte_calloc(__func__, 1, sizeof(*td), 0);
-	if (!td) {
-		DRV_LOG(ERR, "Failed to allocate TD object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_transport_domain_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
-	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!td->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(td);
-		return NULL;
-	}
-	td->id = MLX5_GET(alloc_transport_domain_out, out,
-			   transport_domain);
-	return td;
-}
-
-/**
- * Dump all flows to file.
- *
- * @param[in] fdb_domain
- *   FDB domain.
- * @param[in] rx_domain
- *   RX domain.
- * @param[in] tx_domain
- *   TX domain.
- * @param[out] file
- *   Pointer to file stream.
- *
- * @return
- *   0 on success, a nagative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
-			void *rx_domain __rte_unused,
-			void *tx_domain __rte_unused, FILE *file __rte_unused)
-{
-	int ret = 0;
-
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
-		if (ret)
-			return ret;
-	}
-	assert(rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
-	if (ret)
-		return ret;
-	assert(tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
-#else
-	ret = ENOTSUP;
-#endif
-	return -ret;
-}
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
deleted file mode 100644
index 2d58d96..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.h
+++ /dev/null
@@ -1,231 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
-#define RTE_PMD_MLX5_DEVX_CMDS_H_
-
-#include "mlx5_glue.h"
-
-
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint32_t flow_meter_reg_share:1;
-	/* Whether reg_c share is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
-
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					      struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				       struct mlx5_devx_create_rq_attr *rq_attr,
-				       int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					   struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					   struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-				      struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			    struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-					   struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
-			    FILE *file);
-#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ce0109c..eddf888 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -36,9 +36,10 @@
 #include <rte_rwlock.h>
 #include <rte_cycles.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 983b1c3..47ba521 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -27,12 +27,13 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 /* Dev ops structure defined in mlx5.c */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 39be5ba..4255472 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -25,8 +25,9 @@
 #include <rte_alarm.h>
 #include <rte_mtr.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5.h"
-#include "mlx5_prm.h"
 
 /* Private rte flow items. */
 enum mlx5_rte_flow_item_type {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 653d649..1b31602 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -29,12 +29,13 @@
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index c4d28b2..32d51c0 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -9,6 +9,8 @@
 #include <rte_mtr.h>
 #include <rte_mtr_driver.h>
 
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_flow.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 72fb1e4..9231451 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -26,11 +26,12 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #define VERBS_SPEC_INNER(item_flags) \
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
deleted file mode 100644
index 4906eeb..0000000
--- a/drivers/net/mlx5/mlx5_glue.c
+++ /dev/null
@@ -1,1150 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <stdalign.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <stdlib.h>
-
-/*
- * Not needed by this file; included to work around the lack of off_t
- * definition for mlx5dv.h with unpatched rdma-core versions.
- */
-#include <sys/types.h>
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_config.h>
-
-#include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-
-static int
-mlx5_glue_fork_init(void)
-{
-	return ibv_fork_init();
-}
-
-static struct ibv_pd *
-mlx5_glue_alloc_pd(struct ibv_context *context)
-{
-	return ibv_alloc_pd(context);
-}
-
-static int
-mlx5_glue_dealloc_pd(struct ibv_pd *pd)
-{
-	return ibv_dealloc_pd(pd);
-}
-
-static struct ibv_device **
-mlx5_glue_get_device_list(int *num_devices)
-{
-	return ibv_get_device_list(num_devices);
-}
-
-static void
-mlx5_glue_free_device_list(struct ibv_device **list)
-{
-	ibv_free_device_list(list);
-}
-
-static struct ibv_context *
-mlx5_glue_open_device(struct ibv_device *device)
-{
-	return ibv_open_device(device);
-}
-
-static int
-mlx5_glue_close_device(struct ibv_context *context)
-{
-	return ibv_close_device(context);
-}
-
-static int
-mlx5_glue_query_device(struct ibv_context *context,
-		       struct ibv_device_attr *device_attr)
-{
-	return ibv_query_device(context, device_attr);
-}
-
-static int
-mlx5_glue_query_device_ex(struct ibv_context *context,
-			  const struct ibv_query_device_ex_input *input,
-			  struct ibv_device_attr_ex *attr)
-{
-	return ibv_query_device_ex(context, input, attr);
-}
-
-static int
-mlx5_glue_query_rt_values_ex(struct ibv_context *context,
-			  struct ibv_values_ex *values)
-{
-	return ibv_query_rt_values_ex(context, values);
-}
-
-static int
-mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
-		     struct ibv_port_attr *port_attr)
-{
-	return ibv_query_port(context, port_num, port_attr);
-}
-
-static struct ibv_comp_channel *
-mlx5_glue_create_comp_channel(struct ibv_context *context)
-{
-	return ibv_create_comp_channel(context);
-}
-
-static int
-mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
-{
-	return ibv_destroy_comp_channel(channel);
-}
-
-static struct ibv_cq *
-mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
-		    struct ibv_comp_channel *channel, int comp_vector)
-{
-	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
-}
-
-static int
-mlx5_glue_destroy_cq(struct ibv_cq *cq)
-{
-	return ibv_destroy_cq(cq);
-}
-
-static int
-mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
-		       void **cq_context)
-{
-	return ibv_get_cq_event(channel, cq, cq_context);
-}
-
-static void
-mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
-{
-	ibv_ack_cq_events(cq, nevents);
-}
-
-static struct ibv_rwq_ind_table *
-mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
-			       struct ibv_rwq_ind_table_init_attr *init_attr)
-{
-	return ibv_create_rwq_ind_table(context, init_attr);
-}
-
-static int
-mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
-{
-	return ibv_destroy_rwq_ind_table(rwq_ind_table);
-}
-
-static struct ibv_wq *
-mlx5_glue_create_wq(struct ibv_context *context,
-		    struct ibv_wq_init_attr *wq_init_attr)
-{
-	return ibv_create_wq(context, wq_init_attr);
-}
-
-static int
-mlx5_glue_destroy_wq(struct ibv_wq *wq)
-{
-	return ibv_destroy_wq(wq);
-}
-static int
-mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
-{
-	return ibv_modify_wq(wq, wq_attr);
-}
-
-static struct ibv_flow *
-mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
-{
-	return ibv_create_flow(qp, flow);
-}
-
-static int
-mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
-{
-	return ibv_destroy_flow(flow_id);
-}
-
-static int
-mlx5_glue_destroy_flow_action(void *action)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_destroy(action);
-#else
-	struct mlx5dv_flow_action_attr *attr = action;
-	int res = 0;
-	switch (attr->type) {
-	case MLX5DV_FLOW_ACTION_TAG:
-		break;
-	default:
-		res = ibv_destroy_flow_action(attr->action);
-		break;
-	}
-	free(action);
-	return res;
-#endif
-#else
-	(void)action;
-	return ENOTSUP;
-#endif
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
-{
-	return ibv_create_qp(pd, qp_init_attr);
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp_ex(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
-{
-	return ibv_create_qp_ex(context, qp_init_attr_ex);
-}
-
-static int
-mlx5_glue_destroy_qp(struct ibv_qp *qp)
-{
-	return ibv_destroy_qp(qp);
-}
-
-static int
-mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
-{
-	return ibv_modify_qp(qp, attr, attr_mask);
-}
-
-static struct ibv_mr *
-mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
-{
-	return ibv_reg_mr(pd, addr, length, access);
-}
-
-static int
-mlx5_glue_dereg_mr(struct ibv_mr *mr)
-{
-	return ibv_dereg_mr(mr);
-}
-
-static struct ibv_counter_set *
-mlx5_glue_create_counter_set(struct ibv_context *context,
-			     struct ibv_counter_set_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)init_attr;
-	return NULL;
-#else
-	return ibv_create_counter_set(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)cs;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counter_set(cs);
-#endif
-}
-
-static int
-mlx5_glue_describe_counter_set(struct ibv_context *context,
-			       uint16_t counter_set_id,
-			       struct ibv_counter_set_description *cs_desc)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)counter_set_id;
-	(void)cs_desc;
-	return ENOTSUP;
-#else
-	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
-#endif
-}
-
-static int
-mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
-			    struct ibv_counter_set_data *cs_data)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)query_attr;
-	(void)cs_data;
-	return ENOTSUP;
-#else
-	return ibv_query_counter_set(query_attr, cs_data);
-#endif
-}
-
-static struct ibv_counters *
-mlx5_glue_create_counters(struct ibv_context *context,
-			  struct ibv_counters_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)context;
-	(void)init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return ibv_create_counters(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counters(struct ibv_counters *counters)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counters(counters);
-#endif
-}
-
-static int
-mlx5_glue_attach_counters(struct ibv_counters *counters,
-			  struct ibv_counter_attach_attr *attr,
-			  struct ibv_flow *flow)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)attr;
-	(void)flow;
-	return ENOTSUP;
-#else
-	return ibv_attach_counters_point_flow(counters, attr, flow);
-#endif
-}
-
-static int
-mlx5_glue_query_counters(struct ibv_counters *counters,
-			 uint64_t *counters_value,
-			 uint32_t ncounters,
-			 uint32_t flags)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)counters_value;
-	(void)ncounters;
-	(void)flags;
-	return ENOTSUP;
-#else
-	return ibv_read_counters(counters, counters_value, ncounters, flags);
-#endif
-}
-
-static void
-mlx5_glue_ack_async_event(struct ibv_async_event *event)
-{
-	ibv_ack_async_event(event);
-}
-
-static int
-mlx5_glue_get_async_event(struct ibv_context *context,
-			  struct ibv_async_event *event)
-{
-	return ibv_get_async_event(context, event);
-}
-
-static const char *
-mlx5_glue_port_state_str(enum ibv_port_state port_state)
-{
-	return ibv_port_state_str(port_state);
-}
-
-static struct ibv_cq *
-mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
-{
-	return ibv_cq_ex_to_cq(cq);
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_table(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
-#else
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_dest_vport(domain, port);
-#else
-	(void)domain;
-	(void)port;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_drop(void)
-{
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_drop();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
-					  rte_be32_t vlan_tag)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
-#else
-	(void)domain;
-	(void)vlan_tag;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_pop_vlan(void)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_pop_vlan();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_create(domain, level);
-#else
-	(void)domain;
-	(void)level;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_destroy(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_domain(struct ibv_context *ctx,
-			   enum  mlx5dv_dr_domain_type domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_create(ctx, domain);
-#else
-	(void)ctx;
-	(void)domain;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_domain(void *domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_destroy(domain);
-#else
-	(void)domain;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_cq_ex *
-mlx5_glue_dv_create_cq(struct ibv_context *context,
-		       struct ibv_cq_init_attr_ex *cq_attr,
-		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
-{
-	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
-}
-
-static struct ibv_wq *
-mlx5_glue_dv_create_wq(struct ibv_context *context,
-		       struct ibv_wq_init_attr *wq_attr,
-		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
-{
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-	(void)context;
-	(void)wq_attr;
-	(void)mlx5_wq_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
-#endif
-}
-
-static int
-mlx5_glue_dv_query_device(struct ibv_context *ctx,
-			  struct mlx5dv_context *attrs_out)
-{
-	return mlx5dv_query_device(ctx, attrs_out);
-}
-
-static int
-mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
-			      enum mlx5dv_set_ctx_attr_type type, void *attr)
-{
-	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
-}
-
-static int
-mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
-{
-	return mlx5dv_init_obj(obj, obj_type);
-}
-
-static struct ibv_qp *
-mlx5_glue_dv_create_qp(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
-{
-#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
-#else
-	(void)context;
-	(void)qp_init_attr_ex;
-	(void)dv_qp_init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
-				 struct mlx5dv_flow_matcher_attr *matcher_attr,
-				 void *tbl)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)context;
-	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
-					matcher_attr->match_criteria_enable,
-					matcher_attr->match_mask);
-#else
-	(void)tbl;
-	return mlx5dv_create_flow_matcher(context, matcher_attr);
-#endif
-#else
-	(void)context;
-	(void)matcher_attr;
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow(void *matcher,
-			 void *match_value,
-			 size_t num_actions,
-			 void *actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
-				     (struct mlx5dv_dr_action **)actions);
-#else
-	struct mlx5dv_flow_action_attr actions_attr[8];
-
-	if (num_actions > 8)
-		return NULL;
-	for (size_t i = 0; i < num_actions; i++)
-		actions_attr[i] =
-			*((struct mlx5dv_flow_action_attr *)(actions[i]));
-	return mlx5dv_create_flow(matcher, match_value,
-				  num_actions, actions_attr);
-#endif
-#else
-	(void)matcher;
-	(void)match_value;
-	(void)num_actions;
-	(void)actions;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)offset;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
-	action->obj = counter_obj;
-	return action;
-#endif
-#else
-	(void)counter_obj;
-	(void)offset;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
-	action->obj = qp;
-	return action;
-#endif
-#else
-	(void)qp;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
-{
-#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
-	return mlx5dv_dr_action_create_dest_devx_tir(tir);
-#else
-	(void)tir;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_modify_header
-					(struct ibv_context *ctx,
-					 enum mlx5dv_flow_table_type ft_type,
-					 void *domain, uint64_t flags,
-					 size_t actions_sz,
-					 uint64_t actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
-						     (__be64 *)actions);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)domain;
-	(void)flags;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_modify_header
-		(ctx, actions_sz, actions, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)actions_sz;
-	(void)actions;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_packet_reformat
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
-						       reformat_type, data_sz,
-						       data);
-#else
-	(void)domain;
-	(void)flags;
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_packet_reformat
-		(ctx, data_sz, data, reformat_type, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)reformat_type;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)data_sz;
-	(void)data;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_tag(tag);
-#else
-	struct mlx5dv_flow_action_attr *action;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_TAG;
-	action->tag_value = tag;
-	return action;
-#endif
-#endif
-	(void)tag;
-	errno = ENOTSUP;
-	return NULL;
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_create_flow_meter(attr);
-#else
-	(void)attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dv_modify_flow_action_meter(void *action,
-				      struct mlx5dv_dr_flow_meter_attr *attr,
-				      uint64_t modify_bits)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
-#else
-	(void)action;
-	(void)attr;
-	(void)modify_bits;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow(void *flow_id)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_destroy(flow_id);
-#else
-	return ibv_destroy_flow(flow_id);
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow_matcher(void *matcher)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_matcher_destroy(matcher);
-#else
-	return mlx5dv_destroy_flow_matcher(matcher);
-#endif
-#else
-	(void)matcher;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_context *
-mlx5_glue_dv_open_device(struct ibv_device *device)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_open_device(device,
-				  &(struct mlx5dv_context_attr){
-					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
-				  });
-#else
-	(void)device;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static struct mlx5dv_devx_obj *
-mlx5_glue_devx_obj_create(struct ibv_context *ctx,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_destroy(obj);
-#else
-	(void)obj;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
-			 const void *in, size_t inlen,
-			 void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
-			   const void *in, size_t inlen,
-			   void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_cmd_comp *
-mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_create_cmd_comp(ctx);
-#else
-	(void)ctx;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void
-mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
-#else
-	(void)cmd_comp;
-	errno = -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
-			       size_t inlen, size_t outlen, uint64_t wr_id,
-			       struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
-					   cmd_comp);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)outlen;
-	(void)wr_id;
-	(void)cmd_comp;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
-				  size_t cmd_resp_len)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
-					      cmd_resp_len);
-#else
-	(void)cmd_comp;
-	(void)cmd_resp;
-	(void)cmd_resp_len;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_umem *
-mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
-			uint32_t access)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_reg(context, addr, size, access);
-#else
-	(void)context;
-	(void)addr;
-	(void)size;
-	(void)access;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_dereg(dv_devx_umem);
-#else
-	(void)dv_devx_umem;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_qp_query(struct ibv_qp *qp,
-			const void *in, size_t inlen,
-			void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
-#else
-	(void)qp;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_devx_port_query(struct ibv_context *ctx,
-			  uint32_t port_num,
-			  struct mlx5dv_devx_port *mlx5_devx_port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
-#else
-	(void)ctx;
-	(void)port_num;
-	(void)mlx5_devx_port;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dr_dump_domain(FILE *file, void *domain)
-{
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	return mlx5dv_dump_dr_domain(file, domain);
-#else
-	RTE_SET_USED(file);
-	RTE_SET_USED(domain);
-	return -ENOTSUP;
-#endif
-}
-
-alignas(RTE_CACHE_LINE_SIZE)
-const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
-	.version = MLX5_GLUE_VERSION,
-	.fork_init = mlx5_glue_fork_init,
-	.alloc_pd = mlx5_glue_alloc_pd,
-	.dealloc_pd = mlx5_glue_dealloc_pd,
-	.get_device_list = mlx5_glue_get_device_list,
-	.free_device_list = mlx5_glue_free_device_list,
-	.open_device = mlx5_glue_open_device,
-	.close_device = mlx5_glue_close_device,
-	.query_device = mlx5_glue_query_device,
-	.query_device_ex = mlx5_glue_query_device_ex,
-	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
-	.query_port = mlx5_glue_query_port,
-	.create_comp_channel = mlx5_glue_create_comp_channel,
-	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
-	.create_cq = mlx5_glue_create_cq,
-	.destroy_cq = mlx5_glue_destroy_cq,
-	.get_cq_event = mlx5_glue_get_cq_event,
-	.ack_cq_events = mlx5_glue_ack_cq_events,
-	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
-	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
-	.create_wq = mlx5_glue_create_wq,
-	.destroy_wq = mlx5_glue_destroy_wq,
-	.modify_wq = mlx5_glue_modify_wq,
-	.create_flow = mlx5_glue_create_flow,
-	.destroy_flow = mlx5_glue_destroy_flow,
-	.destroy_flow_action = mlx5_glue_destroy_flow_action,
-	.create_qp = mlx5_glue_create_qp,
-	.create_qp_ex = mlx5_glue_create_qp_ex,
-	.destroy_qp = mlx5_glue_destroy_qp,
-	.modify_qp = mlx5_glue_modify_qp,
-	.reg_mr = mlx5_glue_reg_mr,
-	.dereg_mr = mlx5_glue_dereg_mr,
-	.create_counter_set = mlx5_glue_create_counter_set,
-	.destroy_counter_set = mlx5_glue_destroy_counter_set,
-	.describe_counter_set = mlx5_glue_describe_counter_set,
-	.query_counter_set = mlx5_glue_query_counter_set,
-	.create_counters = mlx5_glue_create_counters,
-	.destroy_counters = mlx5_glue_destroy_counters,
-	.attach_counters = mlx5_glue_attach_counters,
-	.query_counters = mlx5_glue_query_counters,
-	.ack_async_event = mlx5_glue_ack_async_event,
-	.get_async_event = mlx5_glue_get_async_event,
-	.port_state_str = mlx5_glue_port_state_str,
-	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
-	.dr_create_flow_action_dest_flow_tbl =
-		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
-	.dr_create_flow_action_dest_port =
-		mlx5_glue_dr_create_flow_action_dest_port,
-	.dr_create_flow_action_drop =
-		mlx5_glue_dr_create_flow_action_drop,
-	.dr_create_flow_action_push_vlan =
-		mlx5_glue_dr_create_flow_action_push_vlan,
-	.dr_create_flow_action_pop_vlan =
-		mlx5_glue_dr_create_flow_action_pop_vlan,
-	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
-	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
-	.dr_create_domain = mlx5_glue_dr_create_domain,
-	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
-	.dv_create_cq = mlx5_glue_dv_create_cq,
-	.dv_create_wq = mlx5_glue_dv_create_wq,
-	.dv_query_device = mlx5_glue_dv_query_device,
-	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
-	.dv_init_obj = mlx5_glue_dv_init_obj,
-	.dv_create_qp = mlx5_glue_dv_create_qp,
-	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
-	.dv_create_flow = mlx5_glue_dv_create_flow,
-	.dv_create_flow_action_counter =
-		mlx5_glue_dv_create_flow_action_counter,
-	.dv_create_flow_action_dest_ibv_qp =
-		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
-	.dv_create_flow_action_dest_devx_tir =
-		mlx5_glue_dv_create_flow_action_dest_devx_tir,
-	.dv_create_flow_action_modify_header =
-		mlx5_glue_dv_create_flow_action_modify_header,
-	.dv_create_flow_action_packet_reformat =
-		mlx5_glue_dv_create_flow_action_packet_reformat,
-	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
-	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
-	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
-	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
-	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
-	.dv_open_device = mlx5_glue_dv_open_device,
-	.devx_obj_create = mlx5_glue_devx_obj_create,
-	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
-	.devx_obj_query = mlx5_glue_devx_obj_query,
-	.devx_obj_modify = mlx5_glue_devx_obj_modify,
-	.devx_general_cmd = mlx5_glue_devx_general_cmd,
-	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
-	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
-	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
-	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
-	.devx_umem_reg = mlx5_glue_devx_umem_reg,
-	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
-	.devx_qp_query = mlx5_glue_devx_qp_query,
-	.devx_port_query = mlx5_glue_devx_port_query,
-	.dr_dump_domain = mlx5_glue_dr_dump_domain,
-};
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
deleted file mode 100644
index 6771a18..0000000
--- a/drivers/net/mlx5/mlx5_glue.h
+++ /dev/null
@@ -1,264 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#ifndef MLX5_GLUE_H_
-#define MLX5_GLUE_H_
-
-#include <stddef.h>
-#include <stdint.h>
-
-#include "rte_byteorder.h"
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#ifndef MLX5_GLUE_VERSION
-#define MLX5_GLUE_VERSION ""
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-struct ibv_counter_set;
-struct ibv_counter_set_data;
-struct ibv_counter_set_description;
-struct ibv_counter_set_init_attr;
-struct ibv_query_counter_set_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-struct ibv_counters;
-struct ibv_counters_init_attr;
-struct ibv_counter_attach_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-struct mlx5dv_qp_init_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-struct mlx5dv_wq_init_attr;
-#endif
-
-#ifndef HAVE_IBV_FLOW_DV_SUPPORT
-struct mlx5dv_flow_matcher;
-struct mlx5dv_flow_matcher_attr;
-struct mlx5dv_flow_action_attr;
-struct mlx5dv_flow_match_parameters;
-struct mlx5dv_dr_flow_meter_attr;
-struct ibv_flow_action;
-enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
-enum mlx5dv_flow_table_type { flow_table_type = 0, };
-#endif
-
-#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
-#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
-#endif
-
-#ifndef HAVE_IBV_DEVX_OBJ
-struct mlx5dv_devx_obj;
-struct mlx5dv_devx_umem { uint32_t umem_id; };
-#endif
-
-#ifndef HAVE_IBV_DEVX_ASYNC
-struct mlx5dv_devx_cmd_comp;
-struct mlx5dv_devx_async_cmd_hdr;
-#endif
-
-#ifndef HAVE_MLX5DV_DR
-enum  mlx5dv_dr_domain_type { unused, };
-struct mlx5dv_dr_domain;
-#endif
-
-#ifndef HAVE_MLX5DV_DR_DEVX_PORT
-struct mlx5dv_devx_port;
-#endif
-
-#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
-struct mlx5dv_dr_flow_meter_attr;
-#endif
-
-/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
-struct mlx5_glue {
-	const char *version;
-	int (*fork_init)(void);
-	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
-	int (*dealloc_pd)(struct ibv_pd *pd);
-	struct ibv_device **(*get_device_list)(int *num_devices);
-	void (*free_device_list)(struct ibv_device **list);
-	struct ibv_context *(*open_device)(struct ibv_device *device);
-	int (*close_device)(struct ibv_context *context);
-	int (*query_device)(struct ibv_context *context,
-			    struct ibv_device_attr *device_attr);
-	int (*query_device_ex)(struct ibv_context *context,
-			       const struct ibv_query_device_ex_input *input,
-			       struct ibv_device_attr_ex *attr);
-	int (*query_rt_values_ex)(struct ibv_context *context,
-			       struct ibv_values_ex *values);
-	int (*query_port)(struct ibv_context *context, uint8_t port_num,
-			  struct ibv_port_attr *port_attr);
-	struct ibv_comp_channel *(*create_comp_channel)
-		(struct ibv_context *context);
-	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
-	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
-				    void *cq_context,
-				    struct ibv_comp_channel *channel,
-				    int comp_vector);
-	int (*destroy_cq)(struct ibv_cq *cq);
-	int (*get_cq_event)(struct ibv_comp_channel *channel,
-			    struct ibv_cq **cq, void **cq_context);
-	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
-	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
-		(struct ibv_context *context,
-		 struct ibv_rwq_ind_table_init_attr *init_attr);
-	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
-	struct ibv_wq *(*create_wq)(struct ibv_context *context,
-				    struct ibv_wq_init_attr *wq_init_attr);
-	int (*destroy_wq)(struct ibv_wq *wq);
-	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
-	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
-					struct ibv_flow_attr *flow);
-	int (*destroy_flow)(struct ibv_flow *flow_id);
-	int (*destroy_flow_action)(void *action);
-	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
-				    struct ibv_qp_init_attr *qp_init_attr);
-	struct ibv_qp *(*create_qp_ex)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
-	int (*destroy_qp)(struct ibv_qp *qp);
-	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
-			 int attr_mask);
-	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
-				 size_t length, int access);
-	int (*dereg_mr)(struct ibv_mr *mr);
-	struct ibv_counter_set *(*create_counter_set)
-		(struct ibv_context *context,
-		 struct ibv_counter_set_init_attr *init_attr);
-	int (*destroy_counter_set)(struct ibv_counter_set *cs);
-	int (*describe_counter_set)
-		(struct ibv_context *context,
-		 uint16_t counter_set_id,
-		 struct ibv_counter_set_description *cs_desc);
-	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
-				 struct ibv_counter_set_data *cs_data);
-	struct ibv_counters *(*create_counters)
-		(struct ibv_context *context,
-		 struct ibv_counters_init_attr *init_attr);
-	int (*destroy_counters)(struct ibv_counters *counters);
-	int (*attach_counters)(struct ibv_counters *counters,
-			       struct ibv_counter_attach_attr *attr,
-			       struct ibv_flow *flow);
-	int (*query_counters)(struct ibv_counters *counters,
-			      uint64_t *counters_value,
-			      uint32_t ncounters,
-			      uint32_t flags);
-	void (*ack_async_event)(struct ibv_async_event *event);
-	int (*get_async_event)(struct ibv_context *context,
-			       struct ibv_async_event *event);
-	const char *(*port_state_str)(enum ibv_port_state port_state);
-	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
-	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
-	void *(*dr_create_flow_action_dest_port)(void *domain,
-						 uint32_t port);
-	void *(*dr_create_flow_action_drop)();
-	void *(*dr_create_flow_action_push_vlan)
-					(struct mlx5dv_dr_domain *domain,
-					 rte_be32_t vlan_tag);
-	void *(*dr_create_flow_action_pop_vlan)();
-	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
-	int (*dr_destroy_flow_tbl)(void *tbl);
-	void *(*dr_create_domain)(struct ibv_context *ctx,
-				  enum mlx5dv_dr_domain_type domain);
-	int (*dr_destroy_domain)(void *domain);
-	struct ibv_cq_ex *(*dv_create_cq)
-		(struct ibv_context *context,
-		 struct ibv_cq_init_attr_ex *cq_attr,
-		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
-	struct ibv_wq *(*dv_create_wq)
-		(struct ibv_context *context,
-		 struct ibv_wq_init_attr *wq_attr,
-		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
-	int (*dv_query_device)(struct ibv_context *ctx_in,
-			       struct mlx5dv_context *attrs_out);
-	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
-				   enum mlx5dv_set_ctx_attr_type type,
-				   void *attr);
-	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
-	struct ibv_qp *(*dv_create_qp)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
-	void *(*dv_create_flow_matcher)
-		(struct ibv_context *context,
-		 struct mlx5dv_flow_matcher_attr *matcher_attr,
-		 void *tbl);
-	void *(*dv_create_flow)(void *matcher, void *match_value,
-			  size_t num_actions, void *actions[]);
-	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
-	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
-	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
-	void *(*dv_create_flow_action_modify_header)
-		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
-		 void *domain, uint64_t flags, size_t actions_sz,
-		 uint64_t actions[]);
-	void *(*dv_create_flow_action_packet_reformat)
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data);
-	void *(*dv_create_flow_action_tag)(uint32_t tag);
-	void *(*dv_create_flow_action_meter)
-		(struct mlx5dv_dr_flow_meter_attr *attr);
-	int (*dv_modify_flow_action_meter)(void *action,
-		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
-	int (*dv_destroy_flow)(void *flow);
-	int (*dv_destroy_flow_matcher)(void *matcher);
-	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
-	struct mlx5dv_devx_obj *(*devx_obj_create)
-					(struct ibv_context *ctx,
-					 const void *in, size_t inlen,
-					 void *out, size_t outlen);
-	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
-	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
-			      const void *in, size_t inlen,
-			      void *out, size_t outlen);
-	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
-			       const void *in, size_t inlen,
-			       void *out, size_t outlen);
-	int (*devx_general_cmd)(struct ibv_context *context,
-				const void *in, size_t inlen,
-				void *out, size_t outlen);
-	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
-					(struct ibv_context *context);
-	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
-				    const void *in, size_t inlen,
-				    size_t outlen, uint64_t wr_id,
-				    struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				       struct mlx5dv_devx_async_cmd_hdr *resp,
-				       size_t cmd_resp_len);
-	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
-						  void *addr, size_t size,
-						  uint32_t access);
-	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
-	int (*devx_qp_query)(struct ibv_qp *qp,
-			     const void *in, size_t inlen,
-			     void *out, size_t outlen);
-	int (*devx_port_query)(struct ibv_context *ctx,
-			       uint32_t port_num,
-			       struct mlx5dv_devx_port *mlx5_devx_port);
-	int (*dr_dump_domain)(FILE *file, void *domain);
-};
-
-const struct mlx5_glue *mlx5_glue;
-
-#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 7bdaa2a..a646b90 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -27,10 +27,10 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 /**
  * Get MAC address by querying netdevice.
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 0d549b6..b1cd9f7 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -17,10 +17,11 @@
 #include <rte_rwlock.h>
 #include <rte_bus_pci.h>
 
+#include <mlx5_glue.h>
+
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_glue.h"
 
 struct mr_find_contig_memsegs_data {
 	uintptr_t addr;
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
deleted file mode 100644
index 8a67025..0000000
--- a/drivers/net/mlx5/mlx5_prm.h
+++ /dev/null
@@ -1,1888 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2016 6WIND S.A.
- * Copyright 2016 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_PRM_H_
-#define RTE_PMD_MLX5_PRM_H_
-
-#include <assert.h>
-
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_vect.h>
-#include "mlx5_autoconf.h"
-
-/* RSS hash key size. */
-#define MLX5_RSS_HASH_KEY_LEN 40
-
-/* Get CQE owner bit. */
-#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
-
-/* Get CQE format. */
-#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
-
-/* Get CQE opcode. */
-#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
-
-/* Get CQE solicited event. */
-#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
-
-/* Invalidate a CQE. */
-#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
-
-/* WQE Segment sizes in bytes. */
-#define MLX5_WSEG_SIZE 16u
-#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
-#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
-#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
-
-/* WQE/WQEBB size in bytes. */
-#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
-
-/*
- * Max size of a WQE session.
- * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
- * the WQE size field in Control Segment is 6 bits wide.
- */
-#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
-
-/*
- * Default minimum number of Tx queues for inlining packets.
- * If there are less queues as specified we assume we have
- * no enough CPU resources (cycles) to perform inlining,
- * the PCIe throughput is not supposed as bottleneck and
- * inlining is disabled.
- */
-#define MLX5_INLINE_MAX_TXQS 8u
-#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
-
-/*
- * Default packet length threshold to be inlined with
- * enhanced MPW. If packet length exceeds the threshold
- * the data are not inlined. Should be aligned in WQEBB
- * boundary with accounting the title Control and Ethernet
- * segments.
- */
-#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Maximal inline data length sent with enhanced MPW.
- * Is based on maximal WQE size.
- */
-#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Minimal amount of packets to be sent with EMPW.
- * This limits the minimal required size of sent EMPW.
- * If there are no enough resources to built minimal
- * EMPW the sending loop exits.
- */
-#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
-/*
- * Maximal amount of packets to be sent with EMPW.
- * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
- * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
- * without CQE generation request, being multiplied by
- * MLX5_TX_COMP_MAX_CQE it may cause significant latency
- * in tx burst routine at the moment of freeing multiple mbufs.
- */
-#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
-#define MLX5_MPW_MAX_PACKETS 6
-#define MLX5_MPW_INLINE_MAX_PACKETS 2
-
-/*
- * Default packet length threshold to be inlined with
- * ordinary SEND. Inlining saves the MR key search
- * and extra PCIe data fetch transaction, but eats the
- * CPU cycles.
- */
-#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE)
-/*
- * Maximal inline data length sent with ordinary SEND.
- * Is based on maximal WQE size.
- */
-#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE)
-
-/* Missed in mlv5dv.h, should define here. */
-#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
-
-/* CQE value to inform that VLAN is stripped. */
-#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
-
-/* IPv4 options. */
-#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
-
-/* IPv6 packet. */
-#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
-
-/* IPv4 packet. */
-#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
-
-/* TCP packet. */
-#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
-
-/* UDP packet. */
-#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
-
-/* IP is fragmented. */
-#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
-
-/* L2 header is valid. */
-#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
-
-/* L3 header is valid. */
-#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
-
-/* L4 header is valid. */
-#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
-
-/* Outer packet, 0 IPv4, 1 IPv6. */
-#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
-
-/* Tunnel packet bit in the CQE. */
-#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
-
-/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
-#define MLX5_CQE_LRO_PUSH_MASK 0x40
-
-/* Mask for L4 type in the CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_MASK 0x70
-
-/* The bit index of L4 type in CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_SHIFT 0x4
-
-/* L4 type to indicate TCP packet without acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
-
-/* L4 type to indicate TCP packet with acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
-
-/* Inner L3 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
-
-/* Inner L4 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
-
-/* Outer L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
-
-/* Outer L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
-
-/* Outer L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
-
-/* Outer L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
-
-/* Inner L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
-
-/* Inner L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
-
-/* Inner L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
-
-/* Inner L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
-
-/* VLAN insertion flag. */
-#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
-
-/* Data inline segment flag. */
-#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
-
-/* Is flow mark valid. */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
-#else
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
-#endif
-
-/* INVALID is used by packets matching no flow rules. */
-#define MLX5_FLOW_MARK_INVALID 0
-
-/* Maximum allowed value to mark a packet. */
-#define MLX5_FLOW_MARK_MAX 0xfffff0
-
-/* Default mark value used when none is provided. */
-#define MLX5_FLOW_MARK_DEFAULT 0xffffff
-
-/* Default mark mask for metadata legacy mode. */
-#define MLX5_FLOW_MARK_MASK 0xffffff
-
-/* Maximum number of DS in WQE. Limited by 6-bit field. */
-#define MLX5_DSEG_MAX 63
-
-/* The completion mode offset in the WQE control segment line 2. */
-#define MLX5_COMP_MODE_OFFSET 2
-
-/* Amount of data bytes in minimal inline data segment. */
-#define MLX5_DSEG_MIN_INLINE_SIZE 12u
-
-/* Amount of data bytes in minimal inline eth segment. */
-#define MLX5_ESEG_MIN_INLINE_SIZE 18u
-
-/* Amount of data bytes after eth data segment. */
-#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
-
-/* The maximum log value of segments per RQ WQE. */
-#define MLX5_MAX_LOG_RQ_SEGS 5u
-
-/* The alignment needed for WQ buffer. */
-#define MLX5_WQE_BUF_ALIGNMENT 512
-
-/* Completion mode. */
-enum mlx5_completion_mode {
-	MLX5_COMP_ONLY_ERR = 0x0,
-	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
-	MLX5_COMP_ALWAYS = 0x2,
-	MLX5_COMP_CQE_AND_EQE = 0x3,
-};
-
-/* MPW mode. */
-enum mlx5_mpw_mode {
-	MLX5_MPW_DISABLED,
-	MLX5_MPW,
-	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
-};
-
-/* WQE Control segment. */
-struct mlx5_wqe_cseg {
-	uint32_t opcode;
-	uint32_t sq_ds;
-	uint32_t flags;
-	uint32_t misc;
-} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
-
-/* Header of data segment. Minimal size Data Segment */
-struct mlx5_wqe_dseg {
-	uint32_t bcount;
-	union {
-		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
-		struct {
-			uint32_t lkey;
-			uint64_t pbuf;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* Subset of struct WQE Ethernet Segment. */
-struct mlx5_wqe_eseg {
-	union {
-		struct {
-			uint32_t swp_offs;
-			uint8_t	cs_flags;
-			uint8_t	swp_flags;
-			uint16_t mss;
-			uint32_t metadata;
-			uint16_t inline_hdr_sz;
-			union {
-				uint16_t inline_data;
-				uint16_t vlan_tag;
-			};
-		} __rte_packed;
-		struct {
-			uint32_t offsets;
-			uint32_t flags;
-			uint32_t flow_metadata;
-			uint32_t inline_hdr;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* The title WQEBB, header of WQE. */
-struct mlx5_wqe {
-	union {
-		struct mlx5_wqe_cseg cseg;
-		uint32_t ctrl[4];
-	};
-	struct mlx5_wqe_eseg eseg;
-	union {
-		struct mlx5_wqe_dseg dseg[2];
-		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
-	};
-} __rte_packed;
-
-/* WQE for Multi-Packet RQ. */
-struct mlx5_wqe_mprq {
-	struct mlx5_wqe_srq_next_seg next_seg;
-	struct mlx5_wqe_data_seg dseg;
-};
-
-#define MLX5_MPRQ_LEN_MASK 0x000ffff
-#define MLX5_MPRQ_LEN_SHIFT 0
-#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
-#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
-#define MLX5_MPRQ_FILLER_MASK 0x80000000
-#define MLX5_MPRQ_FILLER_SHIFT 31
-
-#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
-
-/* CQ element structure - should be equal to the cache line size */
-struct mlx5_cqe {
-#if (RTE_CACHE_LINE_SIZE == 128)
-	uint8_t padding[64];
-#endif
-	uint8_t pkt_info;
-	uint8_t rsvd0;
-	uint16_t wqe_id;
-	uint8_t lro_tcppsh_abort_dupack;
-	uint8_t lro_min_ttl;
-	uint16_t lro_tcp_win;
-	uint32_t lro_ack_seq_num;
-	uint32_t rx_hash_res;
-	uint8_t rx_hash_type;
-	uint8_t rsvd1[3];
-	uint16_t csum;
-	uint8_t rsvd2[6];
-	uint16_t hdr_type_etc;
-	uint16_t vlan_info;
-	uint8_t lro_num_seg;
-	uint8_t rsvd3[3];
-	uint32_t flow_table_metadata;
-	uint8_t rsvd4[4];
-	uint32_t byte_cnt;
-	uint64_t timestamp;
-	uint32_t sop_drop_qpn;
-	uint16_t wqe_counter;
-	uint8_t rsvd5;
-	uint8_t op_own;
-};
-
-/* Adding direct verbs to data-path. */
-
-/* CQ sequence number mask. */
-#define MLX5_CQ_SQN_MASK 0x3
-
-/* CQ sequence number index. */
-#define MLX5_CQ_SQN_OFFSET 28
-
-/* CQ doorbell index mask. */
-#define MLX5_CI_MASK 0xffffff
-
-/* CQ doorbell offset. */
-#define MLX5_CQ_ARM_DB 1
-
-/* CQ doorbell offset*/
-#define MLX5_CQ_DOORBELL 0x20
-
-/* CQE format value. */
-#define MLX5_COMPRESSED 0x3
-
-/* Action type of header modification. */
-enum {
-	MLX5_MODIFICATION_TYPE_SET = 0x1,
-	MLX5_MODIFICATION_TYPE_ADD = 0x2,
-	MLX5_MODIFICATION_TYPE_COPY = 0x3,
-};
-
-/* The field of packet to be modified. */
-enum mlx5_modification_field {
-	MLX5_MODI_OUT_NONE = -1,
-	MLX5_MODI_OUT_SMAC_47_16 = 1,
-	MLX5_MODI_OUT_SMAC_15_0,
-	MLX5_MODI_OUT_ETHERTYPE,
-	MLX5_MODI_OUT_DMAC_47_16,
-	MLX5_MODI_OUT_DMAC_15_0,
-	MLX5_MODI_OUT_IP_DSCP,
-	MLX5_MODI_OUT_TCP_FLAGS,
-	MLX5_MODI_OUT_TCP_SPORT,
-	MLX5_MODI_OUT_TCP_DPORT,
-	MLX5_MODI_OUT_IPV4_TTL,
-	MLX5_MODI_OUT_UDP_SPORT,
-	MLX5_MODI_OUT_UDP_DPORT,
-	MLX5_MODI_OUT_SIPV6_127_96,
-	MLX5_MODI_OUT_SIPV6_95_64,
-	MLX5_MODI_OUT_SIPV6_63_32,
-	MLX5_MODI_OUT_SIPV6_31_0,
-	MLX5_MODI_OUT_DIPV6_127_96,
-	MLX5_MODI_OUT_DIPV6_95_64,
-	MLX5_MODI_OUT_DIPV6_63_32,
-	MLX5_MODI_OUT_DIPV6_31_0,
-	MLX5_MODI_OUT_SIPV4,
-	MLX5_MODI_OUT_DIPV4,
-	MLX5_MODI_OUT_FIRST_VID,
-	MLX5_MODI_IN_SMAC_47_16 = 0x31,
-	MLX5_MODI_IN_SMAC_15_0,
-	MLX5_MODI_IN_ETHERTYPE,
-	MLX5_MODI_IN_DMAC_47_16,
-	MLX5_MODI_IN_DMAC_15_0,
-	MLX5_MODI_IN_IP_DSCP,
-	MLX5_MODI_IN_TCP_FLAGS,
-	MLX5_MODI_IN_TCP_SPORT,
-	MLX5_MODI_IN_TCP_DPORT,
-	MLX5_MODI_IN_IPV4_TTL,
-	MLX5_MODI_IN_UDP_SPORT,
-	MLX5_MODI_IN_UDP_DPORT,
-	MLX5_MODI_IN_SIPV6_127_96,
-	MLX5_MODI_IN_SIPV6_95_64,
-	MLX5_MODI_IN_SIPV6_63_32,
-	MLX5_MODI_IN_SIPV6_31_0,
-	MLX5_MODI_IN_DIPV6_127_96,
-	MLX5_MODI_IN_DIPV6_95_64,
-	MLX5_MODI_IN_DIPV6_63_32,
-	MLX5_MODI_IN_DIPV6_31_0,
-	MLX5_MODI_IN_SIPV4,
-	MLX5_MODI_IN_DIPV4,
-	MLX5_MODI_OUT_IPV6_HOPLIMIT,
-	MLX5_MODI_IN_IPV6_HOPLIMIT,
-	MLX5_MODI_META_DATA_REG_A,
-	MLX5_MODI_META_DATA_REG_B = 0x50,
-	MLX5_MODI_META_REG_C_0,
-	MLX5_MODI_META_REG_C_1,
-	MLX5_MODI_META_REG_C_2,
-	MLX5_MODI_META_REG_C_3,
-	MLX5_MODI_META_REG_C_4,
-	MLX5_MODI_META_REG_C_5,
-	MLX5_MODI_META_REG_C_6,
-	MLX5_MODI_META_REG_C_7,
-	MLX5_MODI_OUT_TCP_SEQ_NUM,
-	MLX5_MODI_IN_TCP_SEQ_NUM,
-	MLX5_MODI_OUT_TCP_ACK_NUM,
-	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
-};
-
-/* Total number of metadata reg_c's. */
-#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
-
-enum modify_reg {
-	REG_NONE = 0,
-	REG_A,
-	REG_B,
-	REG_C_0,
-	REG_C_1,
-	REG_C_2,
-	REG_C_3,
-	REG_C_4,
-	REG_C_5,
-	REG_C_6,
-	REG_C_7,
-};
-
-/* Modification sub command. */
-struct mlx5_modification_cmd {
-	union {
-		uint32_t data0;
-		struct {
-			unsigned int length:5;
-			unsigned int rsvd0:3;
-			unsigned int offset:5;
-			unsigned int rsvd1:3;
-			unsigned int field:12;
-			unsigned int action_type:4;
-		};
-	};
-	union {
-		uint32_t data1;
-		uint8_t data[4];
-		struct {
-			unsigned int rsvd2:8;
-			unsigned int dst_offset:5;
-			unsigned int rsvd3:3;
-			unsigned int dst_field:12;
-			unsigned int rsvd4:4;
-		};
-	};
-};
-
-typedef uint32_t u32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-
-#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
-#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
-#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
-				  (&(__mlx5_nullp(typ)->fld)))
-#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0x1f))
-#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
-#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
-#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
-				  __mlx5_dw_bit_off(typ, fld))
-#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
-#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0xf))
-#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
-#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
-#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
-#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
-
-/* insert a value to a struct */
-#define MLX5_SET(typ, p, fld, v) \
-	do { \
-		u32 _v = v; \
-		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
-		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
-				  __mlx5_dw_off(typ, fld))) & \
-				  (~__mlx5_dw_mask(typ, fld))) | \
-				 (((_v) & __mlx5_mask(typ, fld)) << \
-				   __mlx5_dw_bit_off(typ, fld))); \
-	} while (0)
-
-#define MLX5_SET64(typ, p, fld, v) \
-	do { \
-		assert(__mlx5_bit_sz(typ, fld) == 64); \
-		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
-			rte_cpu_to_be_64(v); \
-	} while (0)
-
-#define MLX5_GET(typ, p, fld) \
-	((rte_be_to_cpu_32(*((__be32 *)(p) +\
-	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
-	__mlx5_mask(typ, fld))
-#define MLX5_GET16(typ, p, fld) \
-	((rte_be_to_cpu_16(*((__be16 *)(p) + \
-	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
-	 __mlx5_mask16(typ, fld))
-#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
-						   __mlx5_64_off(typ, fld)))
-#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
-
-struct mlx5_ifc_fte_match_set_misc_bits {
-	u8 gre_c_present[0x1];
-	u8 reserved_at_1[0x1];
-	u8 gre_k_present[0x1];
-	u8 gre_s_present[0x1];
-	u8 source_vhci_port[0x4];
-	u8 source_sqn[0x18];
-	u8 reserved_at_20[0x10];
-	u8 source_port[0x10];
-	u8 outer_second_prio[0x3];
-	u8 outer_second_cfi[0x1];
-	u8 outer_second_vid[0xc];
-	u8 inner_second_prio[0x3];
-	u8 inner_second_cfi[0x1];
-	u8 inner_second_vid[0xc];
-	u8 outer_second_cvlan_tag[0x1];
-	u8 inner_second_cvlan_tag[0x1];
-	u8 outer_second_svlan_tag[0x1];
-	u8 inner_second_svlan_tag[0x1];
-	u8 reserved_at_64[0xc];
-	u8 gre_protocol[0x10];
-	u8 gre_key_h[0x18];
-	u8 gre_key_l[0x8];
-	u8 vxlan_vni[0x18];
-	u8 reserved_at_b8[0x8];
-	u8 geneve_vni[0x18];
-	u8 reserved_at_e4[0x7];
-	u8 geneve_oam[0x1];
-	u8 reserved_at_e0[0xc];
-	u8 outer_ipv6_flow_label[0x14];
-	u8 reserved_at_100[0xc];
-	u8 inner_ipv6_flow_label[0x14];
-	u8 reserved_at_120[0xa];
-	u8 geneve_opt_len[0x6];
-	u8 geneve_protocol_type[0x10];
-	u8 reserved_at_140[0xc0];
-};
-
-struct mlx5_ifc_ipv4_layout_bits {
-	u8 reserved_at_0[0x60];
-	u8 ipv4[0x20];
-};
-
-struct mlx5_ifc_ipv6_layout_bits {
-	u8 ipv6[16][0x8];
-};
-
-union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
-	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
-	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
-	u8 reserved_at_0[0x80];
-};
-
-struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
-	u8 smac_47_16[0x20];
-	u8 smac_15_0[0x10];
-	u8 ethertype[0x10];
-	u8 dmac_47_16[0x20];
-	u8 dmac_15_0[0x10];
-	u8 first_prio[0x3];
-	u8 first_cfi[0x1];
-	u8 first_vid[0xc];
-	u8 ip_protocol[0x8];
-	u8 ip_dscp[0x6];
-	u8 ip_ecn[0x2];
-	u8 cvlan_tag[0x1];
-	u8 svlan_tag[0x1];
-	u8 frag[0x1];
-	u8 ip_version[0x4];
-	u8 tcp_flags[0x9];
-	u8 tcp_sport[0x10];
-	u8 tcp_dport[0x10];
-	u8 reserved_at_c0[0x20];
-	u8 udp_sport[0x10];
-	u8 udp_dport[0x10];
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
-};
-
-struct mlx5_ifc_fte_match_mpls_bits {
-	u8 mpls_label[0x14];
-	u8 mpls_exp[0x3];
-	u8 mpls_s_bos[0x1];
-	u8 mpls_ttl[0x8];
-};
-
-struct mlx5_ifc_fte_match_set_misc2_bits {
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
-	u8 metadata_reg_c_7[0x20];
-	u8 metadata_reg_c_6[0x20];
-	u8 metadata_reg_c_5[0x20];
-	u8 metadata_reg_c_4[0x20];
-	u8 metadata_reg_c_3[0x20];
-	u8 metadata_reg_c_2[0x20];
-	u8 metadata_reg_c_1[0x20];
-	u8 metadata_reg_c_0[0x20];
-	u8 metadata_reg_a[0x20];
-	u8 metadata_reg_b[0x20];
-	u8 reserved_at_1c0[0x40];
-};
-
-struct mlx5_ifc_fte_match_set_misc3_bits {
-	u8 inner_tcp_seq_num[0x20];
-	u8 outer_tcp_seq_num[0x20];
-	u8 inner_tcp_ack_num[0x20];
-	u8 outer_tcp_ack_num[0x20];
-	u8 reserved_at_auto1[0x8];
-	u8 outer_vxlan_gpe_vni[0x18];
-	u8 outer_vxlan_gpe_next_protocol[0x8];
-	u8 outer_vxlan_gpe_flags[0x8];
-	u8 reserved_at_a8[0x10];
-	u8 icmp_header_data[0x20];
-	u8 icmpv6_header_data[0x20];
-	u8 icmp_type[0x8];
-	u8 icmp_code[0x8];
-	u8 icmpv6_type[0x8];
-	u8 icmpv6_code[0x8];
-	u8 reserved_at_120[0x20];
-	u8 gtpu_teid[0x20];
-	u8 gtpu_msg_type[0x08];
-	u8 gtpu_msg_flags[0x08];
-	u8 reserved_at_170[0x90];
-};
-
-/* Flow matcher. */
-struct mlx5_ifc_fte_match_param_bits {
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
-	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
-	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
-	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
-};
-
-enum {
-	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
-};
-
-enum {
-	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
-	MLX5_CMD_OP_CREATE_MKEY = 0x200,
-	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
-	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
-	MLX5_CMD_OP_CREATE_TIR = 0x900,
-	MLX5_CMD_OP_CREATE_SQ = 0X904,
-	MLX5_CMD_OP_MODIFY_SQ = 0X905,
-	MLX5_CMD_OP_CREATE_RQ = 0x908,
-	MLX5_CMD_OP_MODIFY_RQ = 0x909,
-	MLX5_CMD_OP_CREATE_TIS = 0x912,
-	MLX5_CMD_OP_QUERY_TIS = 0x915,
-	MLX5_CMD_OP_CREATE_RQT = 0x916,
-	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
-	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
-};
-
-enum {
-	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
-};
-
-/* Flow counters. */
-struct mlx5_ifc_alloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_40[0x18];
-	u8         flow_counter_bulk[0x8];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_traffic_counter_bits {
-	u8         packets[0x40];
-	u8         octets[0x40];
-};
-
-struct mlx5_ifc_query_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
-};
-
-struct mlx5_ifc_query_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         reserved_at_40[0x20];
-	u8         mkey[0x20];
-	u8         address[0x40];
-	u8         clear[0x1];
-	u8         dump_to_memory[0x1];
-	u8         num_of_counters[0x1e];
-	u8         flow_counter_id[0x20];
-};
-
-struct mlx5_ifc_mkc_bits {
-	u8         reserved_at_0[0x1];
-	u8         free[0x1];
-	u8         reserved_at_2[0x1];
-	u8         access_mode_4_2[0x3];
-	u8         reserved_at_6[0x7];
-	u8         relaxed_ordering_write[0x1];
-	u8         reserved_at_e[0x1];
-	u8         small_fence_on_rdma_read_response[0x1];
-	u8         umr_en[0x1];
-	u8         a[0x1];
-	u8         rw[0x1];
-	u8         rr[0x1];
-	u8         lw[0x1];
-	u8         lr[0x1];
-	u8         access_mode_1_0[0x2];
-	u8         reserved_at_18[0x8];
-
-	u8         qpn[0x18];
-	u8         mkey_7_0[0x8];
-
-	u8         reserved_at_40[0x20];
-
-	u8         length64[0x1];
-	u8         bsf_en[0x1];
-	u8         sync_umr[0x1];
-	u8         reserved_at_63[0x2];
-	u8         expected_sigerr_count[0x1];
-	u8         reserved_at_66[0x1];
-	u8         en_rinval[0x1];
-	u8         pd[0x18];
-
-	u8         start_addr[0x40];
-
-	u8         len[0x40];
-
-	u8         bsf_octword_size[0x20];
-
-	u8         reserved_at_120[0x80];
-
-	u8         translations_octword_size[0x20];
-
-	u8         reserved_at_1c0[0x1b];
-	u8         log_page_size[0x5];
-
-	u8         reserved_at_1e0[0x20];
-};
-
-struct mlx5_ifc_create_mkey_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-
-	u8         syndrome[0x20];
-
-	u8         reserved_at_40[0x8];
-	u8         mkey_index[0x18];
-
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_mkey_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-
-	u8         reserved_at_40[0x20];
-
-	u8         pg_access[0x1];
-	u8         reserved_at_61[0x1f];
-
-	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
-
-	u8         reserved_at_280[0x80];
-
-	u8         translations_octword_actual_size[0x20];
-
-	u8         mkey_umem_id[0x20];
-
-	u8         mkey_umem_offset[0x40];
-
-	u8         reserved_at_380[0x500];
-
-	u8         klm_pas_mtt[][0x20];
-};
-
-enum {
-	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
-};
-
-enum {
-	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
-	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
-};
-
-enum {
-	MLX5_CAP_INLINE_MODE_L2,
-	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
-	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
-};
-
-enum {
-	MLX5_INLINE_MODE_NONE,
-	MLX5_INLINE_MODE_L2,
-	MLX5_INLINE_MODE_IP,
-	MLX5_INLINE_MODE_TCP_UDP,
-	MLX5_INLINE_MODE_RESERVED4,
-	MLX5_INLINE_MODE_INNER_L2,
-	MLX5_INLINE_MODE_INNER_IP,
-	MLX5_INLINE_MODE_INNER_TCP_UDP,
-};
-
-/* HCA bit masks indicating which Flex parser protocols are already enabled. */
-#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
-#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
-#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
-#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
-#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
-#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
-#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
-#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
-
-struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x30];
-	u8 vhca_id[0x10];
-	u8 reserved_at_40[0x40];
-	u8 log_max_srq_sz[0x8];
-	u8 log_max_qp_sz[0x8];
-	u8 reserved_at_90[0xb];
-	u8 log_max_qp[0x5];
-	u8 reserved_at_a0[0xb];
-	u8 log_max_srq[0x5];
-	u8 reserved_at_b0[0x10];
-	u8 reserved_at_c0[0x8];
-	u8 log_max_cq_sz[0x8];
-	u8 reserved_at_d0[0xb];
-	u8 log_max_cq[0x5];
-	u8 log_max_eq_sz[0x8];
-	u8 reserved_at_e8[0x2];
-	u8 log_max_mkey[0x6];
-	u8 reserved_at_f0[0x8];
-	u8 dump_fill_mkey[0x1];
-	u8 reserved_at_f9[0x3];
-	u8 log_max_eq[0x4];
-	u8 max_indirection[0x8];
-	u8 fixed_buffer_size[0x1];
-	u8 log_max_mrw_sz[0x7];
-	u8 force_teardown[0x1];
-	u8 reserved_at_111[0x1];
-	u8 log_max_bsf_list_size[0x6];
-	u8 umr_extended_translation_offset[0x1];
-	u8 null_mkey[0x1];
-	u8 log_max_klm_list_size[0x6];
-	u8 reserved_at_120[0xa];
-	u8 log_max_ra_req_dc[0x6];
-	u8 reserved_at_130[0xa];
-	u8 log_max_ra_res_dc[0x6];
-	u8 reserved_at_140[0xa];
-	u8 log_max_ra_req_qp[0x6];
-	u8 reserved_at_150[0xa];
-	u8 log_max_ra_res_qp[0x6];
-	u8 end_pad[0x1];
-	u8 cc_query_allowed[0x1];
-	u8 cc_modify_allowed[0x1];
-	u8 start_pad[0x1];
-	u8 cache_line_128byte[0x1];
-	u8 reserved_at_165[0xa];
-	u8 qcam_reg[0x1];
-	u8 gid_table_size[0x10];
-	u8 out_of_seq_cnt[0x1];
-	u8 vport_counters[0x1];
-	u8 retransmission_q_counters[0x1];
-	u8 debug[0x1];
-	u8 modify_rq_counter_set_id[0x1];
-	u8 rq_delay_drop[0x1];
-	u8 max_qp_cnt[0xa];
-	u8 pkey_table_size[0x10];
-	u8 vport_group_manager[0x1];
-	u8 vhca_group_manager[0x1];
-	u8 ib_virt[0x1];
-	u8 eth_virt[0x1];
-	u8 vnic_env_queue_counters[0x1];
-	u8 ets[0x1];
-	u8 nic_flow_table[0x1];
-	u8 eswitch_manager[0x1];
-	u8 device_memory[0x1];
-	u8 mcam_reg[0x1];
-	u8 pcam_reg[0x1];
-	u8 local_ca_ack_delay[0x5];
-	u8 port_module_event[0x1];
-	u8 enhanced_error_q_counters[0x1];
-	u8 ports_check[0x1];
-	u8 reserved_at_1b3[0x1];
-	u8 disable_link_up[0x1];
-	u8 beacon_led[0x1];
-	u8 port_type[0x2];
-	u8 num_ports[0x8];
-	u8 reserved_at_1c0[0x1];
-	u8 pps[0x1];
-	u8 pps_modify[0x1];
-	u8 log_max_msg[0x5];
-	u8 reserved_at_1c8[0x4];
-	u8 max_tc[0x4];
-	u8 temp_warn_event[0x1];
-	u8 dcbx[0x1];
-	u8 general_notification_event[0x1];
-	u8 reserved_at_1d3[0x2];
-	u8 fpga[0x1];
-	u8 rol_s[0x1];
-	u8 rol_g[0x1];
-	u8 reserved_at_1d8[0x1];
-	u8 wol_s[0x1];
-	u8 wol_g[0x1];
-	u8 wol_a[0x1];
-	u8 wol_b[0x1];
-	u8 wol_m[0x1];
-	u8 wol_u[0x1];
-	u8 wol_p[0x1];
-	u8 stat_rate_support[0x10];
-	u8 reserved_at_1f0[0xc];
-	u8 cqe_version[0x4];
-	u8 compact_address_vector[0x1];
-	u8 striding_rq[0x1];
-	u8 reserved_at_202[0x1];
-	u8 ipoib_enhanced_offloads[0x1];
-	u8 ipoib_basic_offloads[0x1];
-	u8 reserved_at_205[0x1];
-	u8 repeated_block_disabled[0x1];
-	u8 umr_modify_entity_size_disabled[0x1];
-	u8 umr_modify_atomic_disabled[0x1];
-	u8 umr_indirect_mkey_disabled[0x1];
-	u8 umr_fence[0x2];
-	u8 reserved_at_20c[0x3];
-	u8 drain_sigerr[0x1];
-	u8 cmdif_checksum[0x2];
-	u8 sigerr_cqe[0x1];
-	u8 reserved_at_213[0x1];
-	u8 wq_signature[0x1];
-	u8 sctr_data_cqe[0x1];
-	u8 reserved_at_216[0x1];
-	u8 sho[0x1];
-	u8 tph[0x1];
-	u8 rf[0x1];
-	u8 dct[0x1];
-	u8 qos[0x1];
-	u8 eth_net_offloads[0x1];
-	u8 roce[0x1];
-	u8 atomic[0x1];
-	u8 reserved_at_21f[0x1];
-	u8 cq_oi[0x1];
-	u8 cq_resize[0x1];
-	u8 cq_moderation[0x1];
-	u8 reserved_at_223[0x3];
-	u8 cq_eq_remap[0x1];
-	u8 pg[0x1];
-	u8 block_lb_mc[0x1];
-	u8 reserved_at_229[0x1];
-	u8 scqe_break_moderation[0x1];
-	u8 cq_period_start_from_cqe[0x1];
-	u8 cd[0x1];
-	u8 reserved_at_22d[0x1];
-	u8 apm[0x1];
-	u8 vector_calc[0x1];
-	u8 umr_ptr_rlky[0x1];
-	u8 imaicl[0x1];
-	u8 reserved_at_232[0x4];
-	u8 qkv[0x1];
-	u8 pkv[0x1];
-	u8 set_deth_sqpn[0x1];
-	u8 reserved_at_239[0x3];
-	u8 xrc[0x1];
-	u8 ud[0x1];
-	u8 uc[0x1];
-	u8 rc[0x1];
-	u8 uar_4k[0x1];
-	u8 reserved_at_241[0x9];
-	u8 uar_sz[0x6];
-	u8 reserved_at_250[0x8];
-	u8 log_pg_sz[0x8];
-	u8 bf[0x1];
-	u8 driver_version[0x1];
-	u8 pad_tx_eth_packet[0x1];
-	u8 reserved_at_263[0x8];
-	u8 log_bf_reg_size[0x5];
-	u8 reserved_at_270[0xb];
-	u8 lag_master[0x1];
-	u8 num_lag_ports[0x4];
-	u8 reserved_at_280[0x10];
-	u8 max_wqe_sz_sq[0x10];
-	u8 reserved_at_2a0[0x10];
-	u8 max_wqe_sz_rq[0x10];
-	u8 max_flow_counter_31_16[0x10];
-	u8 max_wqe_sz_sq_dc[0x10];
-	u8 reserved_at_2e0[0x7];
-	u8 max_qp_mcg[0x19];
-	u8 reserved_at_300[0x10];
-	u8 flow_counter_bulk_alloc[0x08];
-	u8 log_max_mcg[0x8];
-	u8 reserved_at_320[0x3];
-	u8 log_max_transport_domain[0x5];
-	u8 reserved_at_328[0x3];
-	u8 log_max_pd[0x5];
-	u8 reserved_at_330[0xb];
-	u8 log_max_xrcd[0x5];
-	u8 nic_receive_steering_discard[0x1];
-	u8 receive_discard_vport_down[0x1];
-	u8 transmit_discard_vport_down[0x1];
-	u8 reserved_at_343[0x5];
-	u8 log_max_flow_counter_bulk[0x8];
-	u8 max_flow_counter_15_0[0x10];
-	u8 modify_tis[0x1];
-	u8 flow_counters_dump[0x1];
-	u8 reserved_at_360[0x1];
-	u8 log_max_rq[0x5];
-	u8 reserved_at_368[0x3];
-	u8 log_max_sq[0x5];
-	u8 reserved_at_370[0x3];
-	u8 log_max_tir[0x5];
-	u8 reserved_at_378[0x3];
-	u8 log_max_tis[0x5];
-	u8 basic_cyclic_rcv_wqe[0x1];
-	u8 reserved_at_381[0x2];
-	u8 log_max_rmp[0x5];
-	u8 reserved_at_388[0x3];
-	u8 log_max_rqt[0x5];
-	u8 reserved_at_390[0x3];
-	u8 log_max_rqt_size[0x5];
-	u8 reserved_at_398[0x3];
-	u8 log_max_tis_per_sq[0x5];
-	u8 ext_stride_num_range[0x1];
-	u8 reserved_at_3a1[0x2];
-	u8 log_max_stride_sz_rq[0x5];
-	u8 reserved_at_3a8[0x3];
-	u8 log_min_stride_sz_rq[0x5];
-	u8 reserved_at_3b0[0x3];
-	u8 log_max_stride_sz_sq[0x5];
-	u8 reserved_at_3b8[0x3];
-	u8 log_min_stride_sz_sq[0x5];
-	u8 hairpin[0x1];
-	u8 reserved_at_3c1[0x2];
-	u8 log_max_hairpin_queues[0x5];
-	u8 reserved_at_3c8[0x3];
-	u8 log_max_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3d0[0x3];
-	u8 log_max_hairpin_num_packets[0x5];
-	u8 reserved_at_3d8[0x3];
-	u8 log_max_wq_sz[0x5];
-	u8 nic_vport_change_event[0x1];
-	u8 disable_local_lb_uc[0x1];
-	u8 disable_local_lb_mc[0x1];
-	u8 log_min_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3e8[0x3];
-	u8 log_max_vlan_list[0x5];
-	u8 reserved_at_3f0[0x3];
-	u8 log_max_current_mc_list[0x5];
-	u8 reserved_at_3f8[0x3];
-	u8 log_max_current_uc_list[0x5];
-	u8 general_obj_types[0x40];
-	u8 reserved_at_440[0x20];
-	u8 reserved_at_460[0x10];
-	u8 max_num_eqs[0x10];
-	u8 reserved_at_480[0x3];
-	u8 log_max_l2_table[0x5];
-	u8 reserved_at_488[0x8];
-	u8 log_uar_page_sz[0x10];
-	u8 reserved_at_4a0[0x20];
-	u8 device_frequency_mhz[0x20];
-	u8 device_frequency_khz[0x20];
-	u8 reserved_at_500[0x20];
-	u8 num_of_uars_per_page[0x20];
-	u8 flex_parser_protocols[0x20];
-	u8 reserved_at_560[0x20];
-	u8 reserved_at_580[0x3c];
-	u8 mini_cqe_resp_stride_index[0x1];
-	u8 cqe_128_always[0x1];
-	u8 cqe_compression_128[0x1];
-	u8 cqe_compression[0x1];
-	u8 cqe_compression_timeout[0x10];
-	u8 cqe_compression_max_num[0x10];
-	u8 reserved_at_5e0[0x10];
-	u8 tag_matching[0x1];
-	u8 rndv_offload_rc[0x1];
-	u8 rndv_offload_dc[0x1];
-	u8 log_tag_matching_list_sz[0x5];
-	u8 reserved_at_5f8[0x3];
-	u8 log_max_xrq[0x5];
-	u8 affiliate_nic_vport_criteria[0x8];
-	u8 native_port_num[0x8];
-	u8 num_vhca_ports[0x8];
-	u8 reserved_at_618[0x6];
-	u8 sw_owner_id[0x1];
-	u8 reserved_at_61f[0x1e1];
-};
-
-struct mlx5_ifc_qos_cap_bits {
-	u8 packet_pacing[0x1];
-	u8 esw_scheduling[0x1];
-	u8 esw_bw_share[0x1];
-	u8 esw_rate_limit[0x1];
-	u8 reserved_at_4[0x1];
-	u8 packet_pacing_burst_bound[0x1];
-	u8 packet_pacing_typical_size[0x1];
-	u8 flow_meter_srtcm[0x1];
-	u8 reserved_at_8[0x8];
-	u8 log_max_flow_meter[0x8];
-	u8 flow_meter_reg_id[0x8];
-	u8 reserved_at_25[0x8];
-	u8 flow_meter_reg_share[0x1];
-	u8 reserved_at_2e[0x17];
-	u8 packet_pacing_max_rate[0x20];
-	u8 packet_pacing_min_rate[0x20];
-	u8 reserved_at_80[0x10];
-	u8 packet_pacing_rate_table_size[0x10];
-	u8 esw_element_type[0x10];
-	u8 esw_tsar_type[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 max_qos_para_vport[0x10];
-	u8 max_tsar_bw_share[0x20];
-	u8 reserved_at_100[0x6e8];
-};
-
-struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
-	u8 csum_cap[0x1];
-	u8 vlan_cap[0x1];
-	u8 lro_cap[0x1];
-	u8 lro_psh_flag[0x1];
-	u8 lro_time_stamp[0x1];
-	u8 lro_max_msg_sz_mode[0x2];
-	u8 wqe_vlan_insert[0x1];
-	u8 self_lb_en_modifiable[0x1];
-	u8 self_lb_mc[0x1];
-	u8 self_lb_uc[0x1];
-	u8 max_lso_cap[0x5];
-	u8 multi_pkt_send_wqe[0x2];
-	u8 wqe_inline_mode[0x2];
-	u8 rss_ind_tbl_cap[0x4];
-	u8 reg_umr_sq[0x1];
-	u8 scatter_fcs[0x1];
-	u8 enhanced_multi_pkt_send_wqe[0x1];
-	u8 tunnel_lso_const_out_ip_id[0x1];
-	u8 tunnel_lro_gre[0x1];
-	u8 tunnel_lro_vxlan[0x1];
-	u8 tunnel_stateless_gre[0x1];
-	u8 tunnel_stateless_vxlan[0x1];
-	u8 swp[0x1];
-	u8 swp_csum[0x1];
-	u8 swp_lso[0x1];
-	u8 reserved_at_23[0x8];
-	u8 tunnel_stateless_gtp[0x1];
-	u8 reserved_at_25[0x4];
-	u8 max_vxlan_udp_ports[0x8];
-	u8 reserved_at_38[0x6];
-	u8 max_geneve_opt_len[0x1];
-	u8 tunnel_stateless_geneve_rx[0x1];
-	u8 reserved_at_40[0x10];
-	u8 lro_min_mss_size[0x10];
-	u8 reserved_at_60[0x120];
-	u8 lro_timer_supported_periods[4][0x20];
-	u8 reserved_at_200[0x600];
-};
-
-union mlx5_ifc_hca_cap_union_bits {
-	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
-	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
-	       per_protocol_networking_offload_caps;
-	struct mlx5_ifc_qos_cap_bits qos_cap;
-	u8 reserved_at_0[0x8000];
-};
-
-struct mlx5_ifc_query_hca_cap_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	union mlx5_ifc_hca_cap_union_bits capability;
-};
-
-struct mlx5_ifc_query_hca_cap_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_mac_address_layout_bits {
-	u8 reserved_at_0[0x10];
-	u8 mac_addr_47_32[0x10];
-	u8 mac_addr_31_0[0x20];
-};
-
-struct mlx5_ifc_nic_vport_context_bits {
-	u8 reserved_at_0[0x5];
-	u8 min_wqe_inline_mode[0x3];
-	u8 reserved_at_8[0x15];
-	u8 disable_mc_local_lb[0x1];
-	u8 disable_uc_local_lb[0x1];
-	u8 roce_en[0x1];
-	u8 arm_change_event[0x1];
-	u8 reserved_at_21[0x1a];
-	u8 event_on_mtu[0x1];
-	u8 event_on_promisc_change[0x1];
-	u8 event_on_vlan_change[0x1];
-	u8 event_on_mc_address_change[0x1];
-	u8 event_on_uc_address_change[0x1];
-	u8 reserved_at_40[0xc];
-	u8 affiliation_criteria[0x4];
-	u8 affiliated_vhca_id[0x10];
-	u8 reserved_at_60[0xd0];
-	u8 mtu[0x10];
-	u8 system_image_guid[0x40];
-	u8 port_guid[0x40];
-	u8 node_guid[0x40];
-	u8 reserved_at_200[0x140];
-	u8 qkey_violation_counter[0x10];
-	u8 reserved_at_350[0x430];
-	u8 promisc_uc[0x1];
-	u8 promisc_mc[0x1];
-	u8 promisc_all[0x1];
-	u8 reserved_at_783[0x2];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_788[0xc];
-	u8 allowed_list_size[0xc];
-	struct mlx5_ifc_mac_address_layout_bits permanent_address;
-	u8 reserved_at_7e0[0x20];
-};
-
-struct mlx5_ifc_query_nic_vport_context_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
-};
-
-struct mlx5_ifc_query_nic_vport_context_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 other_vport[0x1];
-	u8 reserved_at_41[0xf];
-	u8 vport_number[0x10];
-	u8 reserved_at_60[0x5];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_68[0x18];
-};
-
-struct mlx5_ifc_tisc_bits {
-	u8 strict_lag_tx_port_affinity[0x1];
-	u8 reserved_at_1[0x3];
-	u8 lag_tx_port_affinity[0x04];
-	u8 reserved_at_8[0x4];
-	u8 prio[0x4];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x100];
-	u8 reserved_at_120[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_140[0x8];
-	u8 underlay_qpn[0x18];
-	u8 reserved_at_160[0x3a0];
-};
-
-struct mlx5_ifc_query_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_tisc_bits tis_context;
-};
-
-struct mlx5_ifc_query_tis_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-enum {
-	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
-	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
-	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
-	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
-};
-
-enum {
-	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
-	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
-};
-
-struct mlx5_ifc_wq_bits {
-	u8 wq_type[0x4];
-	u8 wq_signature[0x1];
-	u8 end_padding_mode[0x2];
-	u8 cd_slave[0x1];
-	u8 reserved_at_8[0x18];
-	u8 hds_skip_first_sge[0x1];
-	u8 log2_hds_buf_size[0x3];
-	u8 reserved_at_24[0x7];
-	u8 page_offset[0x5];
-	u8 lwm[0x10];
-	u8 reserved_at_40[0x8];
-	u8 pd[0x18];
-	u8 reserved_at_60[0x8];
-	u8 uar_page[0x18];
-	u8 dbr_addr[0x40];
-	u8 hw_counter[0x20];
-	u8 sw_counter[0x20];
-	u8 reserved_at_100[0xc];
-	u8 log_wq_stride[0x4];
-	u8 reserved_at_110[0x3];
-	u8 log_wq_pg_sz[0x5];
-	u8 reserved_at_118[0x3];
-	u8 log_wq_sz[0x5];
-	u8 dbr_umem_valid[0x1];
-	u8 wq_umem_valid[0x1];
-	u8 reserved_at_122[0x1];
-	u8 log_hairpin_num_packets[0x5];
-	u8 reserved_at_128[0x3];
-	u8 log_hairpin_data_sz[0x5];
-	u8 reserved_at_130[0x4];
-	u8 single_wqe_log_num_of_strides[0x4];
-	u8 two_byte_shift_en[0x1];
-	u8 reserved_at_139[0x4];
-	u8 single_stride_log_num_of_bytes[0x3];
-	u8 dbr_umem_id[0x20];
-	u8 wq_umem_id[0x20];
-	u8 wq_umem_offset[0x40];
-	u8 reserved_at_1c0[0x440];
-};
-
-enum {
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
-};
-
-enum {
-	MLX5_RQC_STATE_RST  = 0x0,
-	MLX5_RQC_STATE_RDY  = 0x1,
-	MLX5_RQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_rqc_bits {
-	u8 rlky[0x1];
-	u8 delay_drop_en[0x1];
-	u8 scatter_fcs[0x1];
-	u8 vsd[0x1];
-	u8 mem_rq_type[0x4];
-	u8 state[0x4];
-	u8 reserved_at_c[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 counter_set_id[0x8];
-	u8 reserved_at_68[0x18];
-	u8 reserved_at_80[0x8];
-	u8 rmpn[0x18];
-	u8 reserved_at_a0[0x8];
-	u8 hairpin_peer_sq[0x18];
-	u8 reserved_at_c0[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_e0[0xa0];
-	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
-};
-
-struct mlx5_ifc_create_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-struct mlx5_ifc_modify_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_create_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tis_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tisc_bits ctx;
-};
-
-enum {
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
-};
-
-struct mlx5_ifc_modify_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 rq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-enum {
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
-};
-
-struct mlx5_ifc_rx_hash_field_select_bits {
-	u8 l3_prot_type[0x1];
-	u8 l4_prot_type[0x1];
-	u8 selected_fields[0x1e];
-};
-
-enum {
-	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
-	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
-};
-
-enum {
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
-};
-
-enum {
-	MLX5_RX_HASH_FN_NONE           = 0x0,
-	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
-	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
-};
-
-enum {
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
-};
-
-enum {
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
-};
-
-struct mlx5_ifc_tirc_bits {
-	u8 reserved_at_0[0x20];
-	u8 disp_type[0x4];
-	u8 reserved_at_24[0x1c];
-	u8 reserved_at_40[0x40];
-	u8 reserved_at_80[0x4];
-	u8 lro_timeout_period_usecs[0x10];
-	u8 lro_enable_mask[0x4];
-	u8 lro_max_msg_sz[0x8];
-	u8 reserved_at_a0[0x40];
-	u8 reserved_at_e0[0x8];
-	u8 inline_rqn[0x18];
-	u8 rx_hash_symmetric[0x1];
-	u8 reserved_at_101[0x1];
-	u8 tunneled_offload_en[0x1];
-	u8 reserved_at_103[0x5];
-	u8 indirect_table[0x18];
-	u8 rx_hash_fn[0x4];
-	u8 reserved_at_124[0x2];
-	u8 self_lb_block[0x2];
-	u8 transport_domain[0x18];
-	u8 rx_hash_toeplitz_key[10][0x20];
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
-	u8 reserved_at_2c0[0x4c0];
-};
-
-struct mlx5_ifc_create_tir_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tirn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tir_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tirc_bits ctx;
-};
-
-struct mlx5_ifc_rq_num_bits {
-	u8 reserved_at_0[0x8];
-	u8 rq_num[0x18];
-};
-
-struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
-	u8 rqt_max_size[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 rqt_actual_size[0x10];
-	u8 reserved_at_e0[0x6a0];
-	struct mlx5_ifc_rq_num_bits rq_num[];
-};
-
-struct mlx5_ifc_create_rqt_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqtn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-struct mlx5_ifc_create_rqt_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqtc_bits rqt_context;
-};
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-enum {
-	MLX5_SQC_STATE_RST  = 0x0,
-	MLX5_SQC_STATE_RDY  = 0x1,
-	MLX5_SQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_sqc_bits {
-	u8 rlky[0x1];
-	u8 cd_master[0x1];
-	u8 fre[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 allow_multi_pkt_send_wqe[0x1];
-	u8 min_wqe_inline_mode[0x3];
-	u8 state[0x4];
-	u8 reg_umr[0x1];
-	u8 allow_swp[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 reserved_at_60[0x8];
-	u8 hairpin_peer_rq[0x18];
-	u8 reserved_at_80[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_a0[0x50];
-	u8 packet_pacing_rate_limit_index[0x10];
-	u8 tis_lst_sz[0x10];
-	u8 reserved_at_110[0x10];
-	u8 reserved_at_120[0x40];
-	u8 reserved_at_160[0x8];
-	u8 tis_num_0[0x18];
-	struct mlx5_ifc_wq_bits wq;
-};
-
-struct mlx5_ifc_query_sq_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_modify_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_modify_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 sq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-struct mlx5_ifc_create_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-enum {
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
-};
-
-struct mlx5_ifc_flow_meter_parameters_bits {
-	u8         valid[0x1];			// 00h
-	u8         bucket_overflow[0x1];
-	u8         start_color[0x2];
-	u8         both_buckets_on_green[0x1];
-	u8         meter_mode[0x2];
-	u8         reserved_at_1[0x19];
-	u8         reserved_at_2[0x20]; //04h
-	u8         reserved_at_3[0x3];
-	u8         cbs_exponent[0x5];		// 08h
-	u8         cbs_mantissa[0x8];
-	u8         reserved_at_4[0x3];
-	u8         cir_exponent[0x5];
-	u8         cir_mantissa[0x8];
-	u8         reserved_at_5[0x20];		// 0Ch
-	u8         reserved_at_6[0x3];
-	u8         ebs_exponent[0x5];		// 10h
-	u8         ebs_mantissa[0x8];
-	u8         reserved_at_7[0x3];
-	u8         eir_exponent[0x5];
-	u8         eir_mantissa[0x8];
-	u8         reserved_at_8[0x60];		// 14h-1Ch
-};
-
-/* CQE format mask. */
-#define MLX5E_CQE_FORMAT_MASK 0xc
-
-/* MPW opcode. */
-#define MLX5_OPC_MOD_MPW 0x01
-
-/* Compressed Rx CQE structure. */
-struct mlx5_mini_cqe8 {
-	union {
-		uint32_t rx_hash_result;
-		struct {
-			uint16_t checksum;
-			uint16_t stride_idx;
-		};
-		struct {
-			uint16_t wqe_counter;
-			uint8_t  s_wqe_opcode;
-			uint8_t  reserved;
-		} s_wqe_info;
-	};
-	uint32_t byte_cnt;
-};
-
-/* srTCM PRM flow meter parameters. */
-enum {
-	MLX5_FLOW_COLOR_RED = 0,
-	MLX5_FLOW_COLOR_YELLOW,
-	MLX5_FLOW_COLOR_GREEN,
-	MLX5_FLOW_COLOR_UNDEFINED,
-};
-
-/* Maximum value of srTCM metering parameters. */
-#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
-#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
-#define MLX5_SRTCM_EBS_MAX 0
-
-/* The bits meter color use. */
-#define MLX5_MTR_COLOR_BITS 8
-
-/**
- * Convert a user mark to flow mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_set(uint32_t val)
-{
-	uint32_t ret;
-
-	/*
-	 * Add one to the user value to differentiate un-marked flows from
-	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
-	 * remains untouched.
-	 */
-	if (val != MLX5_FLOW_MARK_DEFAULT)
-		++val;
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	/*
-	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
-	 * word, byte-swapped by the kernel on little-endian systems. In this
-	 * case, left-shifting the resulting big-endian value ensures the
-	 * least significant 24 bits are retained when converting it back.
-	 */
-	ret = rte_cpu_to_be_32(val) >> 8;
-#else
-	ret = val;
-#endif
-	return ret;
-}
-
-/**
- * Convert a mark to user mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_get(uint32_t val)
-{
-	/*
-	 * Subtract one from the retrieved value. It was added by
-	 * mlx5_flow_mark_set() to distinguish unmarked flows.
-	 */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	return (val >> 8) - 1;
-#else
-	return val - 1;
-#endif
-}
-
-#endif /* RTE_PMD_MLX5_PRM_H_ */
 --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 1028264..345ce3a 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -22,8 +22,8 @@
 #include <rte_malloc.h>
 #include <rte_ethdev_driver.h>
 
-#include "mlx5.h"
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_rxtx.h"
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 371b996..e01cbfd 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -30,14 +30,16 @@
 #include <rte_debug.h>
 #include <rte_io.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
+
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5a03556..d8f6671 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -28,13 +28,14 @@
 #include <rte_cycles.h>
 #include <rte_flow.h>
 
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 3f659d2..fb13919 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -31,13 +31,14 @@
 #include <rte_bus_pci.h>
 #include <rte_malloc.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
-#include "mlx5_glue.h"
 
 /* Support tunnel matching. */
 #define MLX5_FLOW_TUNNEL 10
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d85f908..5505762 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -23,13 +23,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #if defined RTE_ARCH_X86_64
 #include "mlx5_rxtx_vec_sse.h"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h
index d8c07f2..82f77e5 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
@@ -9,8 +9,9 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5_autoconf.h"
-#include "mlx5_prm.h"
 
 /* HW checksum offload capabilities of vectorized Tx. */
 #define MLX5_VEC_TX_CKSUM_OFFLOAD_CAP \
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 9e5c6ee..1467a42 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -17,13 +17,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 332e9ac..5b846c1 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #pragma GCC diagnostic ignored "-Wcast-qual"
 
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 07d40d5..6e1b967 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 205e4fe..0ed7170 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,9 +13,9 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5adb4dc..1d2ba8a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -28,13 +28,14 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
-#include "mlx5_utils.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5_defs.h"
+#include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index ebf79b8..c868aee 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -13,8 +13,11 @@
 #include <assert.h>
 #include <errno.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 
+
 /*
  * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
  * Otherwise there would be a type conflict between stdbool and altivec.
@@ -50,81 +53,14 @@
 /* Save and restore errno around argument evaluation. */
 #define ERRNO_SAFE(x) ((errno = (int []){ errno, ((x), 0) }[0]))
 
-/*
- * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
- * manner.
- */
-#define PMD_DRV_LOG_STRIP(a, b) a
-#define PMD_DRV_LOG_OPAREN (
-#define PMD_DRV_LOG_CPAREN )
-#define PMD_DRV_LOG_COMMA ,
-
-/* Return the file name part of a path. */
-static inline const char *
-pmd_drv_log_basename(const char *s)
-{
-	const char *n = s;
-
-	while (*n)
-		if (*(n++) == '/')
-			s = n;
-	return s;
-}
-
 extern int mlx5_logtype;
 
-#define PMD_DRV_LOG___(level, ...) \
-	rte_log(RTE_LOG_ ## level, \
-		mlx5_logtype, \
-		RTE_FMT(MLX5_DRIVER_NAME ": " \
-			RTE_FMT_HEAD(__VA_ARGS__,), \
-		RTE_FMT_TAIL(__VA_ARGS__,)))
-
-/*
- * When debugging is enabled (NDEBUG not defined), file, line and function
- * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
- */
-#ifndef NDEBUG
-
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, "%s:%u: %s(): " __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, \
-		s "\n" PMD_DRV_LOG_COMMA \
-		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
-		__LINE__ PMD_DRV_LOG_COMMA \
-		__func__, \
-		__VA_ARGS__)
-
-#else /* NDEBUG */
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, s "\n", __VA_ARGS__)
-
-#endif /* NDEBUG */
-
 /* Generic printf()-like logging macro with automatic line feed. */
 #define DRV_LOG(level, ...) \
-	PMD_DRV_LOG_(level, \
+	PMD_DRV_LOG_(level, mlx5_logtype, MLX5_DRIVER_NAME, \
 		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
 		PMD_DRV_LOG_CPAREN)
 
-/* claim_zero() does not perform any check when debugging is disabled. */
-#ifndef NDEBUG
-
-#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
-#define claim_zero(...) assert((__VA_ARGS__) == 0)
-#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
-
-#else /* NDEBUG */
-
-#define DEBUG(...) (void)0
-#define claim_zero(...) (__VA_ARGS__)
-#define claim_nonzero(...) (__VA_ARGS__)
-
-#endif /* NDEBUG */
-
 #define INFO(...) DRV_LOG(INFO, __VA_ARGS__)
 #define WARN(...) DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) DRV_LOG(ERR, __VA_ARGS__)
@@ -144,13 +80,6 @@
 	 (((val) & (from)) / ((from) / (to))) : \
 	 (((val) & (from)) * ((to) / (from))))
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
-#define MKSTR(name, ...) \
-	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
-	char name[mkstr_size_##name + 1]; \
-	\
-	snprintf(name, sizeof(name), "" __VA_ARGS__)
-
 /**
  * Return logarithm of the nearest power of two above input value.
  *
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index feac0f1..b0fa31a 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -27,10 +27,11 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 15acf95..45f4cad 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,6 +196,7 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 03/25] common/mlx5: share the mlx5 glue reference
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 01/25] net/mlx5: separate DevX commands interface Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 02/25] drivers: introduce mlx5 common library Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
                     ` (22 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
Both, the mlx5 PMD and the vdpa mlx5 PMD should initialize the glue.
The glue initialization should be only one per process, so all the mlx5
PMDs using the glue should share the same glue object.
Move the glue initialization to be in common/mlx5 library to be
initialized by its constructor only once.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.c | 173 +++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/Makefile         |   9 --
 drivers/net/mlx5/meson.build      |   4 -
 drivers/net/mlx5/mlx5.c           | 172 +------------------------------------
 4 files changed, 173 insertions(+), 185 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 14ebd30..26d7a87 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -2,16 +2,185 @@
  * Copyright 2019 Mellanox Technologies, Ltd
  */
 
+#include <dlfcn.h>
+#include <unistd.h>
+#include <string.h>
+
+#include <rte_errno.h>
+
 #include "mlx5_common.h"
+#include "mlx5_common_utils.h"
+#include "mlx5_glue.h"
 
 
 int mlx5_common_logtype;
 
 
-RTE_INIT(rte_mlx5_common_pmd_init)
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+
+/**
+ * Suffix RTE_EAL_PMD_PATH with "-glue".
+ *
+ * This function performs a sanity check on RTE_EAL_PMD_PATH before
+ * suffixing its last component.
+ *
+ * @param buf[out]
+ *   Output buffer, should be large enough otherwise NULL is returned.
+ * @param size
+ *   Size of @p out.
+ *
+ * @return
+ *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
+ */
+static char *
+mlx5_glue_path(char *buf, size_t size)
+{
+	static const char *const bad[] = { "/", ".", "..", NULL };
+	const char *path = RTE_EAL_PMD_PATH;
+	size_t len = strlen(path);
+	size_t off;
+	int i;
+
+	while (len && path[len - 1] == '/')
+		--len;
+	for (off = len; off && path[off - 1] != '/'; --off)
+		;
+	for (i = 0; bad[i]; ++i)
+		if (!strncmp(path + off, bad[i], (int)(len - off)))
+			goto error;
+	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
+	if (i == -1 || (size_t)i >= size)
+		goto error;
+	return buf;
+error:
+	RTE_LOG(ERR, PMD, "unable to append \"-glue\" to last component of"
+		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"), please"
+		" re-configure DPDK");
+	return NULL;
+}
+#endif
+
+/**
+ * Initialization routine for run-time dependency on rdma-core.
+ */
+RTE_INIT_PRIO(mlx5_glue_init, CLASS)
 {
-	/* Initialize driver log type. */
+	void *handle = NULL;
+
+	/* Initialize common log type. */
 	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
 	if (mlx5_common_logtype >= 0)
 		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+	/*
+	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+	 * huge pages. Calling ibv_fork_init() during init allows
+	 * applications to use fork() safely for purposes other than
+	 * using this PMD, which is not supported in forked processes.
+	 */
+	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+	/* Match the size of Rx completion entry to the size of a cacheline. */
+	if (RTE_CACHE_LINE_SIZE == 128)
+		setenv("MLX5_CQE_SIZE", "128", 0);
+	/*
+	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
+	 * cleanup all the Verbs resources even when the device was removed.
+	 */
+	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
+	/* The glue initialization was done earlier by mlx5 common library. */
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
+	static const char *path[] = {
+		/*
+		 * A basic security check is necessary before trusting
+		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
+		 */
+		(geteuid() == getuid() && getegid() == getgid() ?
+		 getenv("MLX5_GLUE_PATH") : NULL),
+		/*
+		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
+		 * variant, otherwise let dlopen() look up libraries on its
+		 * own.
+		 */
+		(*RTE_EAL_PMD_PATH ?
+		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
+	};
+	unsigned int i = 0;
+	void **sym;
+	const char *dlmsg;
+
+	while (!handle && i != RTE_DIM(path)) {
+		const char *end;
+		size_t len;
+		int ret;
+
+		if (!path[i]) {
+			++i;
+			continue;
+		}
+		end = strpbrk(path[i], ":;");
+		if (!end)
+			end = path[i] + strlen(path[i]);
+		len = end - path[i];
+		ret = 0;
+		do {
+			char name[ret + 1];
+
+			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
+				       (int)len, path[i],
+				       (!len || *(end - 1) == '/') ? "" : "/");
+			if (ret == -1)
+				break;
+			if (sizeof(name) != (size_t)ret + 1)
+				continue;
+			DRV_LOG(DEBUG, "Looking for rdma-core glue as "
+				"\"%s\"", name);
+			handle = dlopen(name, RTLD_LAZY);
+			break;
+		} while (1);
+		path[i] = end + 1;
+		if (!*end)
+			++i;
+	}
+	if (!handle) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(WARNING, "Cannot load glue library: %s", dlmsg);
+		goto glue_error;
+	}
+	sym = dlsym(handle, "mlx5_glue");
+	if (!sym || !*sym) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(ERR, "Cannot resolve glue symbol: %s", dlmsg);
+		goto glue_error;
+	}
+	mlx5_glue = *sym;
+#endif /* RTE_IBVERBS_LINK_DLOPEN */
+#ifndef NDEBUG
+	/* Glue structure must not contain any NULL pointers. */
+	{
+		unsigned int i;
+
+		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
+			assert(((const void *const *)mlx5_glue)[i]);
+	}
+#endif
+	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
+		rte_errno = EINVAL;
+		DRV_LOG(ERR, "rdma-core glue \"%s\" mismatch: \"%s\" is "
+			"required", mlx5_glue->version, MLX5_GLUE_VERSION);
+		goto glue_error;
+	}
+	mlx5_glue->fork_init();
+	return;
+glue_error:
+	if (handle)
+		dlclose(handle);
+	DRV_LOG(WARNING, "Cannot initialize MLX5 common due to missing"
+		" run-time dependency on rdma-core libraries (libibverbs,"
+		" libmlx5)");
+	mlx5_glue = NULL;
+	return;
 }
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a9558ca..dc6b3c8 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -6,15 +6,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx5.a
-LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
-LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 20.02.0
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-LDLIBS += -ldl
-endif
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index f6d0db9..e10ef3a 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -8,10 +8,6 @@ if not is_linux
 	subdir_done()
 endif
 
-LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
-LIB_GLUE_VERSION = '20.02.0'
-LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-
 allow_experimental_apis = true
 deps += ['hash', 'common_mlx5']
 sources = files(
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7cf357d..8fbe826 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -7,7 +7,6 @@
 #include <unistd.h>
 #include <string.h>
 #include <assert.h>
-#include <dlfcn.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <errno.h>
@@ -3505,138 +3504,6 @@ struct mlx5_flow_id_pool *
 		     RTE_PCI_DRV_PROBE_AGAIN,
 };
 
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-
-/**
- * Suffix RTE_EAL_PMD_PATH with "-glue".
- *
- * This function performs a sanity check on RTE_EAL_PMD_PATH before
- * suffixing its last component.
- *
- * @param buf[out]
- *   Output buffer, should be large enough otherwise NULL is returned.
- * @param size
- *   Size of @p out.
- *
- * @return
- *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
- */
-static char *
-mlx5_glue_path(char *buf, size_t size)
-{
-	static const char *const bad[] = { "/", ".", "..", NULL };
-	const char *path = RTE_EAL_PMD_PATH;
-	size_t len = strlen(path);
-	size_t off;
-	int i;
-
-	while (len && path[len - 1] == '/')
-		--len;
-	for (off = len; off && path[off - 1] != '/'; --off)
-		;
-	for (i = 0; bad[i]; ++i)
-		if (!strncmp(path + off, bad[i], (int)(len - off)))
-			goto error;
-	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
-	if (i == -1 || (size_t)i >= size)
-		goto error;
-	return buf;
-error:
-	DRV_LOG(ERR,
-		"unable to append \"-glue\" to last component of"
-		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),"
-		" please re-configure DPDK");
-	return NULL;
-}
-
-/**
- * Initialization routine for run-time dependency on rdma-core.
- */
-static int
-mlx5_glue_init(void)
-{
-	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
-	const char *path[] = {
-		/*
-		 * A basic security check is necessary before trusting
-		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
-		 */
-		(geteuid() == getuid() && getegid() == getgid() ?
-		 getenv("MLX5_GLUE_PATH") : NULL),
-		/*
-		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
-		 * variant, otherwise let dlopen() look up libraries on its
-		 * own.
-		 */
-		(*RTE_EAL_PMD_PATH ?
-		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
-	};
-	unsigned int i = 0;
-	void *handle = NULL;
-	void **sym;
-	const char *dlmsg;
-
-	while (!handle && i != RTE_DIM(path)) {
-		const char *end;
-		size_t len;
-		int ret;
-
-		if (!path[i]) {
-			++i;
-			continue;
-		}
-		end = strpbrk(path[i], ":;");
-		if (!end)
-			end = path[i] + strlen(path[i]);
-		len = end - path[i];
-		ret = 0;
-		do {
-			char name[ret + 1];
-
-			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
-				       (int)len, path[i],
-				       (!len || *(end - 1) == '/') ? "" : "/");
-			if (ret == -1)
-				break;
-			if (sizeof(name) != (size_t)ret + 1)
-				continue;
-			DRV_LOG(DEBUG, "looking for rdma-core glue as \"%s\"",
-				name);
-			handle = dlopen(name, RTLD_LAZY);
-			break;
-		} while (1);
-		path[i] = end + 1;
-		if (!*end)
-			++i;
-	}
-	if (!handle) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(WARNING, "cannot load glue library: %s", dlmsg);
-		goto glue_error;
-	}
-	sym = dlsym(handle, "mlx5_glue");
-	if (!sym || !*sym) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(ERR, "cannot resolve glue symbol: %s", dlmsg);
-		goto glue_error;
-	}
-	mlx5_glue = *sym;
-	return 0;
-glue_error:
-	if (handle)
-		dlclose(handle);
-	DRV_LOG(WARNING,
-		"cannot initialize PMD due to missing run-time dependency on"
-		" rdma-core libraries (libibverbs, libmlx5)");
-	return -rte_errno;
-}
-
-#endif
-
 /**
  * Driver initialization routine.
  */
@@ -3651,43 +3518,8 @@ struct mlx5_flow_id_pool *
 	mlx5_set_ptype_table();
 	mlx5_set_cksum_table();
 	mlx5_set_swp_types_table();
-	/*
-	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
-	 * huge pages. Calling ibv_fork_init() during init allows
-	 * applications to use fork() safely for purposes other than
-	 * using this PMD, which is not supported in forked processes.
-	 */
-	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
-	/* Match the size of Rx completion entry to the size of a cacheline. */
-	if (RTE_CACHE_LINE_SIZE == 128)
-		setenv("MLX5_CQE_SIZE", "128", 0);
-	/*
-	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
-	 * cleanup all the Verbs resources even when the device was removed.
-	 */
-	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-	if (mlx5_glue_init())
-		return;
-	assert(mlx5_glue);
-#endif
-#ifndef NDEBUG
-	/* Glue structure must not contain any NULL pointers. */
-	{
-		unsigned int i;
-
-		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
-			assert(((const void *const *)mlx5_glue)[i]);
-	}
-#endif
-	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
-		DRV_LOG(ERR,
-			"rdma-core glue \"%s\" mismatch: \"%s\" is required",
-			mlx5_glue->version, MLX5_GLUE_VERSION);
-		return;
-	}
-	mlx5_glue->fork_init();
-	rte_pci_register(&mlx5_driver);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_driver);
 }
 
 RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 04/25] common/mlx5: share mlx5 PCI device detection
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (2 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 05/25] common/mlx5: share mlx5 devices information Matan Azrad
                     ` (21 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move PCI detection by IB device from mlx5 PMD to the common code.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 55 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  4 ++
 drivers/common/mlx5/rte_common_mlx5_version.map |  2 +
 drivers/net/mlx5/mlx5.c                         |  1 +
 drivers/net/mlx5/mlx5.h                         |  2 -
 drivers/net/mlx5/mlx5_ethdev.c                  | 53 +-----------------------
 drivers/net/mlx5/mlx5_rxtx.c                    |  1 +
 drivers/net/mlx5/mlx5_stats.c                   |  3 ++
 9 files changed, 68 insertions(+), 55 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b94d3c0..66585b2 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_pci
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 26d7a87..9471ff3 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -5,6 +5,9 @@
 #include <dlfcn.h>
 #include <unistd.h>
 #include <string.h>
+#include <stdio.h>
+
+#include <rte_errno.h>
 
 #include <rte_errno.h>
 
@@ -16,6 +19,58 @@
 int mlx5_common_logtype;
 
 
+/**
+ * Get PCI information by sysfs device path.
+ *
+ * @param dev_path
+ *   Pointer to device sysfs folder name.
+ * @param[out] pci_addr
+ *   PCI bus address output buffer.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_to_pci_addr(const char *dev_path,
+		     struct rte_pci_addr *pci_addr)
+{
+	FILE *file;
+	char line[32];
+	MKSTR(path, "%s/device/uevent", dev_path);
+
+	file = fopen(path, "rb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	while (fgets(line, sizeof(line), file) == line) {
+		size_t len = strlen(line);
+		int ret;
+
+		/* Truncate long lines. */
+		if (len == (sizeof(line) - 1))
+			while (line[(len - 1)] != '\n') {
+				ret = fgetc(file);
+				if (ret == EOF)
+					break;
+				line[(len - 1)] = ret;
+			}
+		/* Extract information. */
+		if (sscanf(line,
+			   "PCI_SLOT_NAME="
+			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
+			   &pci_addr->domain,
+			   &pci_addr->bus,
+			   &pci_addr->devid,
+			   &pci_addr->function) == 4) {
+			ret = 0;
+			break;
+		}
+	}
+	fclose(file);
+	return 0;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9f10def..107ab8d 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -6,7 +6,9 @@
 #define RTE_PMD_MLX5_COMMON_H_
 
 #include <assert.h>
+#include <stdio.h>
 
+#include <rte_pci.h>
 #include <rte_log.h>
 
 
@@ -84,4 +86,6 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index e4f85e2..0c01172 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,4 +17,6 @@ DPDK_20.02 {
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
 	mlx5_devx_get_out_command_status;
+
+	mlx5_dev_to_pci_addr;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8fbe826..d0fa2da 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -39,6 +39,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 872fccb..261a8fc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -655,8 +655,6 @@ int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
 int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
-int mlx5_dev_to_pci_addr(const char *dev_path,
-			 struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
 void mlx5_dev_interrupt_handler_devx(void *arg);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index eddf888..2628e64 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
@@ -1212,58 +1213,6 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get PCI information by sysfs device path.
- *
- * @param dev_path
- *   Pointer to device sysfs folder name.
- * @param[out] pci_addr
- *   PCI bus address output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_dev_to_pci_addr(const char *dev_path,
-		     struct rte_pci_addr *pci_addr)
-{
-	FILE *file;
-	char line[32];
-	MKSTR(path, "%s/device/uevent", dev_path);
-
-	file = fopen(path, "rb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	while (fgets(line, sizeof(line), file) == line) {
-		size_t len = strlen(line);
-		int ret;
-
-		/* Truncate long lines. */
-		if (len == (sizeof(line) - 1))
-			while (line[(len - 1)] != '\n') {
-				ret = fgetc(file);
-				if (ret == EOF)
-					break;
-				line[(len - 1)] = ret;
-			}
-		/* Extract information. */
-		if (sscanf(line,
-			   "PCI_SLOT_NAME="
-			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
-			   &pci_addr->domain,
-			   &pci_addr->bus,
-			   &pci_addr->devid,
-			   &pci_addr->function) == 4) {
-			ret = 0;
-			break;
-		}
-	}
-	fclose(file);
-	return 0;
-}
-
-/**
  * Handle asynchronous removal event for entire multiport device.
  *
  * @param sh
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index d8f6671..b14ae31 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 0ed7170..4c69e77 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,10 +13,13 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 
+
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
 		.dpdk_name = "rx_port_unicast_bytes",
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 05/25] common/mlx5: share mlx5 devices information
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (3 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 06/25] common/mlx5: share CQ entry check Matan Azrad
                     ` (20 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move the vendor information, vendor ID and device IDs from net/mlx5 PMD
to the common mlx5 file.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 21 +++++++++++++++++++++
 drivers/net/mlx5/mlx5.h           | 21 ---------------------
 drivers/net/mlx5/mlx5_txq.c       |  1 +
 3 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 107ab8d..0f57a27 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -86,6 +86,27 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+enum {
+	PCI_VENDOR_ID_MELLANOX = 0x15b3,
+};
+
+enum {
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
+};
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 261a8fc..3daf0db 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -41,27 +41,6 @@
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
 
-enum {
-	PCI_VENDOR_ID_MELLANOX = 0x15b3,
-};
-
-enum {
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
-};
-
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1d2ba8a..7bff769 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 06/25] common/mlx5: share CQ entry check
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (4 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 05/25] common/mlx5: share mlx5 devices information Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
                     ` (19 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The CQE has owner bit to indicate if it is in SW control or HW.
Share a CQE check for all the mlx5 drivers.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 41 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      | 39 +------------------------------------
 2 files changed, 42 insertions(+), 38 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 0f57a27..9d464d4 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -9,8 +9,11 @@
 #include <stdio.h>
 
 #include <rte_pci.h>
+#include <rte_atomic.h>
 #include <rte_log.h>
 
+#include "mlx5_prm.h"
+
 
 /*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
@@ -107,6 +110,44 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* CQE status. */
+enum mlx5_cqe_status {
+	MLX5_CQE_STATUS_SW_OWN = -1,
+	MLX5_CQE_STATUS_HW_OWN = -2,
+	MLX5_CQE_STATUS_ERR = -3,
+};
+
+/**
+ * Check whether CQE is valid.
+ *
+ * @param cqe
+ *   Pointer to CQE.
+ * @param cqes_n
+ *   Size of completion queue.
+ * @param ci
+ *   Consumer index.
+ *
+ * @return
+ *   The CQE status.
+ */
+static __rte_always_inline enum mlx5_cqe_status
+check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
+	  const uint16_t ci)
+{
+	const uint16_t idx = ci & cqes_n;
+	const uint8_t op_own = cqe->op_own;
+	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
+	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
+
+	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
+		return MLX5_CQE_STATUS_HW_OWN;
+	rte_cio_rmb();
+	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
+		     op_code == MLX5_CQE_REQ_ERR))
+		return MLX5_CQE_STATUS_ERR;
+	return MLX5_CQE_STATUS_SW_OWN;
+}
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index fb13919..c2cd23b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -33,6 +33,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
@@ -549,44 +550,6 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 #define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
 #endif
 
-/* CQE status. */
-enum mlx5_cqe_status {
-	MLX5_CQE_STATUS_SW_OWN = -1,
-	MLX5_CQE_STATUS_HW_OWN = -2,
-	MLX5_CQE_STATUS_ERR = -3,
-};
-
-/**
- * Check whether CQE is valid.
- *
- * @param cqe
- *   Pointer to CQE.
- * @param cqes_n
- *   Size of completion queue.
- * @param ci
- *   Consumer index.
- *
- * @return
- *   The CQE status.
- */
-static __rte_always_inline enum mlx5_cqe_status
-check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
-	  const uint16_t ci)
-{
-	const uint16_t idx = ci & cqes_n;
-	const uint8_t op_own = cqe->op_own;
-	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
-	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
-
-	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
-		return MLX5_CQE_STATUS_HW_OWN;
-	rte_cio_rmb();
-	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
-		     op_code == MLX5_CQE_REQ_ERR))
-		return MLX5_CQE_STATUS_ERR;
-	return MLX5_CQE_STATUS_SW_OWN;
-}
-
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
  * cloned mbuf is allocated is returned instead.
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 07/25] common/mlx5: add query vDPA DevX capabilities
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (5 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 06/25] common/mlx5: share CQ entry check Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 08/25] common/mlx5: glue null memory region allocation Matan Azrad
                     ` (18 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add the DevX capabilities for vDPA configuration and information of
Mellanox devices.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 90 ++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 24 ++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 45 ++++++++++++++++++
 3 files changed, 159 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 4d94f92..3a10ff0 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -285,6 +285,91 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Query NIC vDPA attributes.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] vdpa_attr
+ *   vDPA Attributes structure to fill.
+ */
+static void
+mlx5_devx_cmd_query_hca_vdpa_attr(struct ibv_context *ctx,
+				  struct mlx5_hca_vdpa_attr *vdpa_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+	rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out, sizeof(out));
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (rc || status) {
+		RTE_LOG(DEBUG, PMD, "Failed to query devx VDPA capabilities,"
+			" status %x, syndrome = %x", status, syndrome);
+		vdpa_attr->valid = 0;
+	} else {
+		vdpa_attr->valid = 1;
+		vdpa_attr->desc_tunnel_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 desc_tunnel_offload_type);
+		vdpa_attr->eth_frame_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 eth_frame_offload_type);
+		vdpa_attr->virtio_version_1_0 =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_version_1_0);
+		vdpa_attr->tso_ipv4 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv4);
+		vdpa_attr->tso_ipv6 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv6);
+		vdpa_attr->tx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      tx_csum);
+		vdpa_attr->rx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      rx_csum);
+		vdpa_attr->event_mode = MLX5_GET(virtio_emulation_cap, hcattr,
+						 event_mode);
+		vdpa_attr->virtio_queue_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_queue_type);
+		vdpa_attr->log_doorbell_stride =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_stride);
+		vdpa_attr->log_doorbell_bar_size =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_bar_size);
+		vdpa_attr->doorbell_bar_offset =
+			MLX5_GET64(virtio_emulation_cap, hcattr,
+				   doorbell_bar_offset);
+		vdpa_attr->max_num_virtio_queues =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 max_num_virtio_queues);
+		vdpa_attr->umem_1_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_a);
+		vdpa_attr->umem_1_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_b);
+		vdpa_attr->umem_2_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_2_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_b);
+	}
+}
+
+/**
  * Query HCA attributes.
  * Using those attributes we can check on run time if the device
  * is having the required capabilities.
@@ -343,6 +428,9 @@ struct mlx5_devx_obj *
 	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
 					       flex_parser_protocols);
 	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	attr->vdpa.valid = !!(MLX5_GET64(cmd_hca_cap, hcattr,
+					 general_obj_types) &
+			      MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q);
 	if (attr->qos.sup) {
 		MLX5_SET(query_hca_cap_in, in, op_mod,
 			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
@@ -367,6 +455,8 @@ struct mlx5_devx_obj *
 		attr->qos.flow_meter_reg_share =
 			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
 	}
+	if (attr->vdpa.valid)
+		mlx5_devx_cmd_query_hca_vdpa_attr(ctx, &attr->vdpa);
 	if (!attr->eth_net_offloads)
 		return 0;
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 2d58d96..c1c9e99 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -34,6 +34,29 @@ struct mlx5_hca_qos_attr {
 
 };
 
+struct mlx5_hca_vdpa_attr {
+	uint8_t virtio_queue_type;
+	uint32_t valid:1;
+	uint32_t desc_tunnel_offload_type:1;
+	uint32_t eth_frame_offload_type:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t log_doorbell_stride:5;
+	uint32_t log_doorbell_bar_size:5;
+	uint32_t max_num_virtio_queues;
+	uint32_t umem_1_buffer_param_a;
+	uint32_t umem_1_buffer_param_b;
+	uint32_t umem_2_buffer_param_a;
+	uint32_t umem_2_buffer_param_b;
+	uint32_t umem_3_buffer_param_a;
+	uint32_t umem_3_buffer_param_b;
+	uint64_t doorbell_bar_offset;
+};
+
 /* HCA supports this number of time periods for LRO. */
 #define MLX5_LRO_NUM_SUPP_PERIODS 4
 
@@ -62,6 +85,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_num_packets:5;
 	uint32_t vhca_id:16;
 	struct mlx5_hca_qos_attr qos;
+	struct mlx5_hca_vdpa_attr vdpa;
 };
 
 struct mlx5_devx_wq_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 5730ad1..efd6ad4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -881,6 +881,11 @@ enum {
 	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION = 0x13 << 1,
+};
+
+enum {
+	MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q = (1ULL << 0xd),
 };
 
 enum {
@@ -1256,11 +1261,51 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8 reserved_at_200[0x600];
 };
 
+enum {
+	MLX5_VIRTQ_TYPE_SPLIT = 0,
+	MLX5_VIRTQ_TYPE_PACKED = 1,
+};
+
+enum {
+	MLX5_VIRTQ_EVENT_MODE_NO_MSIX = 0,
+	MLX5_VIRTQ_EVENT_MODE_QP = 1,
+	MLX5_VIRTQ_EVENT_MODE_MSIX = 2,
+};
+
+struct mlx5_ifc_virtio_emulation_cap_bits {
+	u8 desc_tunnel_offload_type[0x1];
+	u8 eth_frame_offload_type[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_7[0x1][0x9];
+	u8 event_mode[0x8];
+	u8 virtio_queue_type[0x8];
+	u8 reserved_at_20[0x13];
+	u8 log_doorbell_stride[0x5];
+	u8 reserved_at_3b[0x3];
+	u8 log_doorbell_bar_size[0x5];
+	u8 doorbell_bar_offset[0x40];
+	u8 reserved_at_80[0x8];
+	u8 max_num_virtio_queues[0x18];
+	u8 reserved_at_a0[0x60];
+	u8 umem_1_buffer_param_a[0x20];
+	u8 umem_1_buffer_param_b[0x20];
+	u8 umem_2_buffer_param_a[0x20];
+	u8 umem_2_buffer_param_b[0x20];
+	u8 umem_3_buffer_param_a[0x20];
+	u8 umem_3_buffer_param_b[0x20];
+	u8 reserved_at_1c0[0x620];
+};
+
 union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
 	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
 	       per_protocol_networking_offload_caps;
 	struct mlx5_ifc_qos_cap_bits qos_cap;
+	struct mlx5_ifc_virtio_emulation_cap_bits vdpa_caps;
 	u8 reserved_at_0[0x8000];
 };
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 08/25] common/mlx5: glue null memory region allocation
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (6 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
                     ` (17 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add support for rdma-core API to allocate NULL MR.
When the device HW get a NULL MR address, it will do nothing with the
address, no read and no write.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 13 +++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  1 +
 2 files changed, 14 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index d5bc84e..e75e6bc 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -226,6 +226,18 @@
 	return ibv_reg_mr(pd, addr, length, access);
 }
 
+static struct ibv_mr *
+mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return ibv_alloc_null_mr(pd);
+#else
+	(void)pd;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static int
 mlx5_glue_dereg_mr(struct ibv_mr *mr)
 {
@@ -1070,6 +1082,7 @@
 	.destroy_qp = mlx5_glue_destroy_qp,
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
+	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
 	.destroy_counter_set = mlx5_glue_destroy_counter_set,
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index f4c3180..33afaf4 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -138,6 +138,7 @@ struct mlx5_glue {
 			 int attr_mask);
 	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
 				 size_t length, int access);
+	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
 		(struct ibv_context *context,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 09/25] common/mlx5: support DevX indirect mkey creation
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (7 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 08/25] common/mlx5: glue null memory region allocation Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 10/25] common/mlx5: glue event queue query Matan Azrad
                     ` (16 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add option to create an indirect mkey by the current
mlx5_devx_cmd_mkey_create command.
Indirect mkey points to set of direct mkeys.
By this way, the HW\SW can reference fragmented memory by one object.
Align the net/mlx5 driver usage in the above command.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 43 ++++++++++++++++++++++++++++++------
 drivers/common/mlx5/mlx5_devx_cmds.h | 17 ++++++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 12 ++++++++++
 drivers/net/mlx5/mlx5_flow_dv.c      |  4 ++++
 4 files changed, 69 insertions(+), 7 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 3a10ff0..2197705 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -142,7 +142,11 @@ struct mlx5_devx_obj *
 mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
 			  struct mlx5_devx_mkey_attr *attr)
 {
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	struct mlx5_klm *klm_array = attr->klm_array;
+	int klm_num = attr->klm_num;
+	int in_size_dw = MLX5_ST_SZ_DW(create_mkey_in) +
+		     (klm_num ? RTE_ALIGN(klm_num, 4) : 0) * MLX5_ST_SZ_DW(klm);
+	uint32_t in[in_size_dw];
 	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
 	void *mkc;
 	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
@@ -153,27 +157,52 @@ struct mlx5_devx_obj *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	memset(in, 0, in_size_dw * 4);
 	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
 	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	if (klm_num > 0) {
+		int i;
+		uint8_t *klm = (uint8_t *)MLX5_ADDR_OF(create_mkey_in, in,
+						       klm_pas_mtt);
+		translation_size = RTE_ALIGN(klm_num, 4);
+		for (i = 0; i < klm_num; i++) {
+			MLX5_SET(klm, klm, byte_count, klm_array[i].byte_count);
+			MLX5_SET(klm, klm, mkey, klm_array[i].mkey);
+			MLX5_SET64(klm, klm, address, klm_array[i].address);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		for (; i < (int)translation_size; i++) {
+			MLX5_SET(klm, klm, mkey, 0x0);
+			MLX5_SET64(klm, klm, address, 0x0);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		MLX5_SET(mkc, mkc, access_mode_1_0, attr->log_entity_size ?
+			 MLX5_MKC_ACCESS_MODE_KLM_FBS :
+			 MLX5_MKC_ACCESS_MODE_KLM);
+		MLX5_SET(mkc, mkc, log_page_size, attr->log_entity_size);
+	} else {
+		translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+		MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+		MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	}
 	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
 		 translation_size);
 	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(create_mkey_in, in, pg_access, attr->pg_access);
 	MLX5_SET(mkc, mkc, lw, 0x1);
 	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, attr->pd);
 	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
 	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
 	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
 	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, in_size_dw * 4, out,
 					       sizeof(out));
 	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		DRV_LOG(ERR, "Can't create %sdirect mkey - error %d\n",
+			klm_num ? "an in" : "a ", errno);
 		rte_errno = errno;
 		rte_free(mkey);
 		return NULL;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c1c9e99..c76c172 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -6,6 +6,7 @@
 #define RTE_PMD_MLX5_DEVX_CMDS_H_
 
 #include "mlx5_glue.h"
+#include "mlx5_prm.h"
 
 
 /* devX creation object */
@@ -14,11 +15,26 @@ struct mlx5_devx_obj {
 	int id; /* The object ID. */
 };
 
+/* UMR memory buffer used to define 1 entry in indirect mkey. */
+struct mlx5_klm {
+	uint32_t byte_count;
+	uint32_t mkey;
+	uint64_t address;
+};
+
+/* This is limitation of libibverbs: in length variable type is u16. */
+#define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
+		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
 	uint32_t umem_id;
 	uint32_t pd;
+	uint32_t log_entity_size;
+	uint32_t pg_access:1;
+	struct mlx5_klm *klm_array;
+	int klm_num;
 };
 
 /* HCA qos attributes. */
@@ -216,6 +232,7 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t hairpin_peer_vhca:16;
 };
 
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index efd6ad4..db15bb6 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -726,6 +726,8 @@ enum {
 
 enum {
 	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+	MLX5_MKC_ACCESS_MODE_KLM   = 0x2,
+	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
 /* Flow counters. */
@@ -790,6 +792,16 @@ struct mlx5_ifc_query_flow_counter_in_bits {
 	u8         flow_counter_id[0x20];
 };
 
+#define MLX5_MAX_KLM_BYTE_COUNT 0x80000000u
+#define MLX5_MIN_KLM_FIXED_BUFFER_SIZE 0x1000u
+
+
+struct mlx5_ifc_klm_bits {
+	u8         byte_count[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+};
+
 struct mlx5_ifc_mkc_bits {
 	u8         reserved_at_0[0x1];
 	u8         free[0x1];
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1b31602..5610d94 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3885,6 +3885,10 @@ struct field_modify_info modify_tcp[] = {
 	mkey_attr.size = size;
 	mkey_attr.umem_id = mem_mng->umem->umem_id;
 	mkey_attr.pd = sh->pdn;
+	mkey_attr.log_entity_size = 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = NULL;
+	mkey_attr.klm_num = 0;
 	mem_mng->dm = mlx5_devx_cmd_mkey_create(sh->ctx, &mkey_attr);
 	if (!mem_mng->dm) {
 		mlx5_glue->devx_umem_dereg(mem_mng->umem);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 10/25] common/mlx5: glue event queue query
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (8 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 11/25] common/mlx5: glue event interrupt commands Matan Azrad
                     ` (15 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The event queue is managed only by the kernel.
Add the rdma-core command in glue to query the kernel event queue
details.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  2 ++
 2 files changed, 17 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e75e6bc..fedce77 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1049,6 +1049,20 @@
 #endif
 }
 
+static int
+mlx5_glue_devx_query_eqn(struct ibv_context *ctx, uint32_t cpus,
+			 uint32_t *eqn)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_query_eqn(ctx, cpus, eqn);
+#else
+	(void)ctx;
+	(void)cpus;
+	(void)eqn;
+	return -ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1148,4 +1162,5 @@
 	.devx_qp_query = mlx5_glue_devx_qp_query,
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+	.devx_query_eqn = mlx5_glue_devx_query_eqn,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 33afaf4..fe51f97 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -259,6 +259,8 @@ struct mlx5_glue {
 			       uint32_t port_num,
 			       struct mlx5dv_devx_port *mlx5_devx_port);
 	int (*dr_dump_domain)(FILE *file, void *domain);
+	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
+			      uint32_t *eqn);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 11/25] common/mlx5: glue event interrupt commands
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (9 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 10/25] common/mlx5: glue event queue query Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 12/25] common/mlx5: glue UAR allocation Matan Azrad
                     ` (14 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add the next commands to glue in order to support interrupt event
channel operations associated to events in the EQ:
	devx_create_event_channel,
	devx_destroy_event_channel,
	devx_subscribe_devx_event,
	devx_subscribe_devx_event_fd,
	devx_get_event.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++
 drivers/common/mlx5/meson.build |  2 ++
 drivers/common/mlx5/mlx5_glue.c | 79 +++++++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h | 25 +++++++++++++
 4 files changed, 111 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 66585b2..7110231 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -154,6 +154,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_action_create_dest_devx_tir \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_EVENT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_get_event \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
 		infiniband/mlx5dv.h \
 		func mlx5dv_dr_action_create_flow_meter \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index bfd07f9..be6a06f 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -108,6 +108,8 @@ if build
 		'mlx5dv_devx_obj_query_async' ],
 		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_IBV_DEVX_EVENT', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_get_event' ],
 		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_flow_meter' ],
 		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index fedce77..e4eabdb 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1063,6 +1063,80 @@
 #endif
 }
 
+static struct mlx5dv_devx_event_channel *
+mlx5_glue_devx_create_event_channel(struct ibv_context *ctx, int flags)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_create_event_channel(ctx, flags);
+#else
+	(void)ctx;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_event_channel(struct mlx5dv_devx_event_channel *eventc)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	mlx5dv_devx_destroy_event_channel(eventc);
+#else
+	(void)eventc;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event(struct mlx5dv_devx_event_channel *eventc,
+				    struct mlx5dv_devx_obj *obj,
+				    uint16_t events_sz, uint16_t events_num[],
+				    uint64_t cookie)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event(eventc, obj, events_sz,
+						events_num, cookie);
+#else
+	(void)eventc;
+	(void)obj;
+	(void)events_sz;
+	(void)events_num;
+	(void)cookie;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event_fd(struct mlx5dv_devx_event_channel *eventc,
+				       int fd, struct mlx5dv_devx_obj *obj,
+				       uint16_t event_num)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event_fd(eventc, fd, obj, event_num);
+#else
+	(void)eventc;
+	(void)fd;
+	(void)obj;
+	(void)event_num;
+	return -ENOTSUP;
+#endif
+}
+
+static ssize_t
+mlx5_glue_devx_get_event(struct mlx5dv_devx_event_channel *eventc,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_get_event(eventc, event_data, event_resp_len);
+#else
+	(void)eventc;
+	(void)event_data;
+	(void)event_resp_len;
+	errno = ENOTSUP;
+	return -1;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1163,4 +1237,9 @@
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
 	.devx_query_eqn = mlx5_glue_devx_query_eqn,
+	.devx_create_event_channel = mlx5_glue_devx_create_event_channel,
+	.devx_destroy_event_channel = mlx5_glue_devx_destroy_event_channel,
+	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
+	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
+	.devx_get_event = mlx5_glue_devx_get_event,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index fe51f97..6fc00dd 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -86,6 +86,12 @@
 struct mlx5dv_dr_flow_meter_attr;
 #endif
 
+#ifndef HAVE_IBV_DEVX_EVENT
+struct mlx5dv_devx_event_channel { int fd; };
+struct mlx5dv_devx_async_event_hdr;
+#define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -261,6 +267,25 @@ struct mlx5_glue {
 	int (*dr_dump_domain)(FILE *file, void *domain);
 	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
 			      uint32_t *eqn);
+	struct mlx5dv_devx_event_channel *(*devx_create_event_channel)
+				(struct ibv_context *context, int flags);
+	void (*devx_destroy_event_channel)
+			(struct mlx5dv_devx_event_channel *event_channel);
+	int (*devx_subscribe_devx_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t events_sz,
+			 uint16_t events_num[],
+			 uint64_t cookie);
+	int (*devx_subscribe_devx_event_fd)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 int fd,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t event_num);
+	ssize_t (*devx_get_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 12/25] common/mlx5: glue UAR allocation
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (10 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 11/25] common/mlx5: glue event interrupt commands Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
                     ` (13 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The isolated, protected and independent direct access to the HW by
multiple processes is implemented via User Access Region (UAR)
mechanism.
The UAR is part of PCI address space that is mapped for direct access to
the HW from the CPU.
UAR is comprised of multiple pages, each page containing registers that
control the HW operation.
UAR mechanism is used to post execution or control requests to the HW.
It is used by the HW to enforce protection and isolation between
different processes.
Add a glue command to allocate and free an UAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 25 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  4 ++++
 2 files changed, 29 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e4eabdb..5691636 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1137,6 +1137,29 @@
 #endif
 }
 
+static struct mlx5dv_devx_uar *
+mlx5_glue_devx_alloc_uar(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_alloc_uar(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_free_uar(struct mlx5dv_devx_uar *devx_uar)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	mlx5dv_devx_free_uar(devx_uar);
+#else
+	(void)devx_uar;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1242,4 +1265,6 @@
 	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
 	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
 	.devx_get_event = mlx5_glue_devx_get_event,
+	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
+	.devx_free_uar = mlx5_glue_devx_free_uar,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 6fc00dd..7d9256e 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -66,6 +66,7 @@
 #ifndef HAVE_IBV_DEVX_OBJ
 struct mlx5dv_devx_obj;
 struct mlx5dv_devx_umem { uint32_t umem_id; };
+struct mlx5dv_devx_uar { void *reg_addr; void *base_addr; uint32_t page_id; };
 #endif
 
 #ifndef HAVE_IBV_DEVX_ASYNC
@@ -230,6 +231,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
+						  uint32_t flags);
+	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
 	struct mlx5dv_devx_obj *(*devx_obj_create)
 					(struct ibv_context *ctx,
 					 const void *in, size_t inlen,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 13/25] common/mlx5: add DevX command to create CQ
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (11 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 12/25] common/mlx5: glue UAR allocation Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 14/25] common/mlx5: glue VAR allocation Matan Azrad
                     ` (12 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
HW implements completion queues(CQ) used to post completion reports upon
completion of work request.
Used for Rx and Tx datapath.
Add DevX command to create a CQ.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 57 ++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            | 19 +++++++
 drivers/common/mlx5/mlx5_prm.h                  | 71 +++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 148 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2197705..cdc041b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1093,3 +1093,60 @@ struct mlx5_devx_obj *
 #endif
 	return -ret;
 }
+
+/*
+ * Create CQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to CQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_cq(struct ibv_context *ctx, struct mlx5_devx_cq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_cq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_cq_out)] = {0};
+	struct mlx5_devx_obj *cq_obj = rte_zmalloc(__func__, sizeof(*cq_obj),
+						   0);
+	void *cqctx = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+
+	if (!cq_obj) {
+		DRV_LOG(ERR, "Failed to allocate CQ object memory.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
+	if (attr->db_umem_valid) {
+		MLX5_SET(cqc, cqctx, dbr_umem_valid, attr->db_umem_valid);
+		MLX5_SET(cqc, cqctx, dbr_umem_id, attr->db_umem_id);
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_umem_offset);
+	} else {
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_addr);
+	}
+	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
+	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
+	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
+	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
+	if (attr->q_umem_valid) {
+		MLX5_SET(create_cq_in, in, cq_umem_valid, attr->q_umem_valid);
+		MLX5_SET(create_cq_in, in, cq_umem_id, attr->q_umem_id);
+		MLX5_SET64(create_cq_in, in, cq_umem_offset,
+			   attr->q_umem_offset);
+	}
+	cq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!cq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ using DevX errno=%d.", errno);
+		rte_free(cq_obj);
+		return NULL;
+	}
+	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
+	return cq_obj;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c76c172..581658b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -233,6 +233,23 @@ struct mlx5_devx_modify_sq_attr {
 };
 
 
+/* CQ attributes structure, used by CQ operations. */
+struct mlx5_devx_cq_attr {
+	uint32_t q_umem_valid:1;
+	uint32_t db_umem_valid:1;
+	uint32_t use_first_only:1;
+	uint32_t overrun_ignore:1;
+	uint32_t log_cq_size:5;
+	uint32_t log_page_size:5;
+	uint32_t uar_page_id;
+	uint32_t q_umem_id;
+	uint64_t q_umem_offset;
+	uint32_t db_umem_id;
+	uint64_t db_umem_offset;
+	uint32_t eqn;
+	uint64_t db_addr;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -269,4 +286,6 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
 struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
+					      struct mlx5_devx_cq_attr *attr);
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index db15bb6..a4082b9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -710,6 +710,7 @@ enum {
 enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_CREATE_CQ = 0x400,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -1846,6 +1847,76 @@ struct mlx5_ifc_flow_meter_parameters_bits {
 	u8         reserved_at_8[0x60];		// 14h-1Ch
 };
 
+struct mlx5_ifc_cqc_bits {
+	u8 status[0x4];
+	u8 as_notify[0x1];
+	u8 initiator_src_dct[0x1];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_7[0x1];
+	u8 cqe_sz[0x3];
+	u8 cc[0x1];
+	u8 reserved_at_c[0x1];
+	u8 scqe_break_moderation_en[0x1];
+	u8 oi[0x1];
+	u8 cq_period_mode[0x2];
+	u8 cqe_comp_en[0x1];
+	u8 mini_cqe_res_format[0x2];
+	u8 st[0x4];
+	u8 reserved_at_18[0x8];
+	u8 dbr_umem_id[0x20];
+	u8 reserved_at_40[0x14];
+	u8 page_offset[0x6];
+	u8 reserved_at_5a[0x6];
+	u8 reserved_at_60[0x3];
+	u8 log_cq_size[0x5];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x4];
+	u8 cq_period[0xc];
+	u8 cq_max_count[0x10];
+	u8 reserved_at_a0[0x18];
+	u8 c_eqn[0x8];
+	u8 reserved_at_c0[0x3];
+	u8 log_page_size[0x5];
+	u8 reserved_at_c8[0x18];
+	u8 reserved_at_e0[0x20];
+	u8 reserved_at_100[0x8];
+	u8 last_notified_index[0x18];
+	u8 reserved_at_120[0x8];
+	u8 last_solicit_index[0x18];
+	u8 reserved_at_140[0x8];
+	u8 consumer_counter[0x18];
+	u8 reserved_at_160[0x8];
+	u8 producer_counter[0x18];
+	u8 local_partition_id[0xc];
+	u8 process_id[0x14];
+	u8 reserved_at_1A0[0x20];
+	u8 dbr_addr[0x40];
+};
+
+struct mlx5_ifc_create_cq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_cq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_cqc_bits cq_context;
+	u8 cq_umem_offset[0x40];
+	u8 cq_umem_id[0x20];
+	u8 cq_umem_valid[0x1];
+	u8 reserved_at_2e1[0x1f];
+	u8 reserved_at_300[0x580];
+	u8 pas[];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 0c01172..c6a203d 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -1,6 +1,7 @@
 DPDK_20.02 {
 	global:
 
+	mlx5_devx_cmd_create_cq;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 14/25] common/mlx5: glue VAR allocation
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (12 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 15/25] common/mlx5: add DevX virtq commands Matan Azrad
                     ` (11 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Virtio access region(VAR) is the UAR that allocated for virtio emulation
access.
Add rdma-core operations to allocate and free VAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++++
 drivers/common/mlx5/meson.build |  1 +
 drivers/common/mlx5/mlx5_glue.c | 26 ++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  8 ++++++++
 4 files changed, 40 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 7110231..d1de3ec 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -174,6 +174,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5_MMAP_GET_NC_PAGES_CMD \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_VAR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_alloc_var \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index be6a06f..34de7d1 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -120,6 +120,7 @@ if build
 		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
 		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_IBV_VAR', 'infiniband/mlx5dv.h', 'mlx5dv_alloc_var' ],
 		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
 		'SUPPORTED_40000baseKR4_Full' ],
 		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index 5691636..27cf33c 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1160,6 +1160,30 @@
 #endif
 }
 
+static struct mlx5dv_var *
+mlx5_glue_dv_alloc_var(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_VAR
+	return mlx5dv_alloc_var(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_dv_free_var(struct mlx5dv_var *var)
+{
+#ifdef HAVE_IBV_VAR
+	mlx5dv_free_var(var);
+#else
+	(void)var;
+	errno = ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1267,4 +1291,6 @@
 	.devx_get_event = mlx5_glue_devx_get_event,
 	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
 	.devx_free_uar = mlx5_glue_devx_free_uar,
+	.dv_alloc_var = mlx5_glue_dv_alloc_var,
+	.dv_free_var = mlx5_glue_dv_free_var,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 7d9256e..6238b43 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -93,6 +93,11 @@
 #define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
 #endif
 
+#ifndef HAVE_IBV_VAR
+struct mlx5dv_var { uint32_t page_id; uint32_t length; off_t mmap_off;
+			uint64_t comp_mask; };
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -231,6 +236,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_var *(*dv_alloc_var)(struct ibv_context *context,
+					   uint32_t flags);
+	void (*dv_free_var)(struct mlx5dv_var *var);
 	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
 						  uint32_t flags);
 	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 15/25] common/mlx5: add DevX virtq commands
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (13 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 14/25] common/mlx5: glue VAR allocation Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
                     ` (10 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Virtio emulation offload allows SW to offload the I/O operations of a
virtio virtqueue, using the device, allowing an improved performance
for its users.
While supplying all the relevant Virtqueue information (type, size,
memory location, doorbell information, etc.). The device can then
offload the I/O operation of this queue, according to its device type
characteristics.
Some of the virtio features can be supported according to the device
capability, for example, TSO and checksum.
Add virtio queue create, modify and query DevX commands.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 199 +++++++++++++++++++++---
 drivers/common/mlx5/mlx5_devx_cmds.h            |  48 +++++-
 drivers/common/mlx5/mlx5_prm.h                  | 117 ++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   3 +
 4 files changed, 343 insertions(+), 24 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index cdc041b..2425513 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -377,24 +377,18 @@ struct mlx5_devx_obj *
 		vdpa_attr->max_num_virtio_queues =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 max_num_virtio_queues);
-		vdpa_attr->umem_1_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_a);
-		vdpa_attr->umem_1_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_b);
-		vdpa_attr->umem_2_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_2_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_b);
+		vdpa_attr->umems[0].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_a);
+		vdpa_attr->umems[0].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_b);
+		vdpa_attr->umems[1].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_a);
+		vdpa_attr->umems[1].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_b);
+		vdpa_attr->umems[2].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_a);
+		vdpa_attr->umems[2].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_b);
 	}
 }
 
@@ -1150,3 +1144,172 @@ struct mlx5_devx_obj *
 	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
 	return cq_obj;
 }
+
+/**
+ * Create VIRTQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to VIRTQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	struct mlx5_devx_obj *virtq_obj = rte_zmalloc(__func__,
+						     sizeof(*virtq_obj), 0);
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+
+	if (!virtq_obj) {
+		DRV_LOG(ERR, "Failed to allocate virtq data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	MLX5_SET16(virtio_net_q, virtq, hw_used_index, attr->hw_used_index);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+	MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+	MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+	MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
+	MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+	MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+	MLX5_SET64(virtio_q, virtctx, available_addr, attr->available_addr);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	MLX5_SET16(virtio_q, virtctx, queue_size, attr->q_size);
+	MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	MLX5_SET(virtio_q, virtctx, umem_1_id, attr->umems[0].id);
+	MLX5_SET(virtio_q, virtctx, umem_1_size, attr->umems[0].size);
+	MLX5_SET64(virtio_q, virtctx, umem_1_offset, attr->umems[0].offset);
+	MLX5_SET(virtio_q, virtctx, umem_2_id, attr->umems[1].id);
+	MLX5_SET(virtio_q, virtctx, umem_2_size, attr->umems[1].size);
+	MLX5_SET64(virtio_q, virtctx, umem_2_offset, attr->umems[1].offset);
+	MLX5_SET(virtio_q, virtctx, umem_3_id, attr->umems[2].id);
+	MLX5_SET(virtio_q, virtctx, umem_3_size, attr->umems[2].size);
+	MLX5_SET64(virtio_q, virtctx, umem_3_offset, attr->umems[2].offset);
+	MLX5_SET(virtio_net_q, virtq, tisn_or_qpn, attr->tis_id);
+	virtq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						    sizeof(out));
+	if (!virtq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create VIRTQ Obj using DevX.");
+		rte_free(virtq_obj);
+		return NULL;
+	}
+	virtq_obj->id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+	return virtq_obj;
+}
+
+/**
+ * Modify VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in] attr
+ *   Pointer to modify virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	switch (attr->type) {
+	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
+			 attr->dirty_bitmap_mkey);
+		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
+			 attr->dirty_bitmap_addr);
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
+			 attr->dirty_bitmap_size);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
+			 attr->dirty_bitmap_dump_enable);
+		break;
+	default:
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Query VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in/out] attr
+ *   Pointer to virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_virtq_out)] = {0};
+	void *hdr = MLX5_ADDR_OF(query_virtq_out, in, hdr);
+	void *virtq = MLX5_ADDR_OF(query_virtq_out, out, virtq);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	ret = mlx5_glue->devx_obj_query(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	attr->hw_available_index = MLX5_GET16(virtio_net_q, virtq,
+					      hw_available_index);
+	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 581658b..1631c08 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -64,12 +64,10 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t max_num_virtio_queues;
-	uint32_t umem_1_buffer_param_a;
-	uint32_t umem_1_buffer_param_b;
-	uint32_t umem_2_buffer_param_a;
-	uint32_t umem_2_buffer_param_b;
-	uint32_t umem_3_buffer_param_a;
-	uint32_t umem_3_buffer_param_b;
+	struct {
+		uint32_t a;
+		uint32_t b;
+	} umems[3];
 	uint64_t doorbell_bar_offset;
 };
 
@@ -250,6 +248,37 @@ struct mlx5_devx_cq_attr {
 	uint64_t db_addr;
 };
 
+/* Virtq attributes structure, used by VIRTQ operations. */
+struct mlx5_devx_virtq_attr {
+	uint16_t hw_available_index;
+	uint16_t hw_used_index;
+	uint16_t q_size;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t state:4;
+	uint32_t dirty_bitmap_dump_enable:1;
+	uint32_t dirty_bitmap_mkey;
+	uint32_t dirty_bitmap_size;
+	uint32_t mkey;
+	uint32_t qp_id;
+	uint32_t queue_index;
+	uint32_t tis_id;
+	uint64_t dirty_bitmap_addr;
+	uint64_t type;
+	uint64_t desc_addr;
+	uint64_t used_addr;
+	uint64_t available_addr;
+	struct {
+		uint32_t id;
+		uint32_t size;
+		uint64_t offset;
+	} umems[3];
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -288,4 +317,11 @@ int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
 					      struct mlx5_devx_cq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+					     struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			       struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			      struct mlx5_devx_virtq_attr *attr);
+
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index a4082b9..4b8a34c 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -527,6 +527,8 @@ struct mlx5_modification_cmd {
 #define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
 				    (__mlx5_bit_off(typ, fld) & 0xf))
 #define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << \
+				  __mlx5_16_bit_off(typ, fld))
 #define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
 #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
 #define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
@@ -551,6 +553,17 @@ struct mlx5_modification_cmd {
 			rte_cpu_to_be_64(v); \
 	} while (0)
 
+#define MLX5_SET16(typ, p, fld, v) \
+	do { \
+		u16 _v = v; \
+		*((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \
+		rte_cpu_to_be_16((rte_be_to_cpu_16(*((__be16 *)(p) + \
+				  __mlx5_16_off(typ, fld))) & \
+				  (~__mlx5_16_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask16(typ, fld)) << \
+				  __mlx5_16_bit_off(typ, fld))); \
+	} while (0)
+
 #define MLX5_GET(typ, p, fld) \
 	((rte_be_to_cpu_32(*((__be32 *)(p) +\
 	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
@@ -723,6 +736,9 @@ enum {
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
+	MLX5_CMD_OP_MODIFY_GENERAL_OBJECT = 0xa01,
+	MLX5_CMD_OP_QUERY_GENERAL_OBJECT = 0xa02,
 };
 
 enum {
@@ -1691,6 +1707,11 @@ struct mlx5_ifc_create_tir_in_bits {
 	struct mlx5_ifc_tirc_bits ctx;
 };
 
+enum {
+	MLX5_INLINE_Q_TYPE_RQ = 0x0,
+	MLX5_INLINE_Q_TYPE_VIRTQ = 0x1,
+};
+
 struct mlx5_ifc_rq_num_bits {
 	u8 reserved_at_0[0x8];
 	u8 rq_num[0x18];
@@ -1917,6 +1938,102 @@ struct mlx5_ifc_create_cq_in_bits {
 	u8 pas[];
 };
 
+enum {
+	MLX5_GENERAL_OBJ_TYPE_VIRTQ = 0x000d,
+};
+
+struct mlx5_ifc_general_obj_in_cmd_hdr_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x20];
+	u8 obj_type[0x10];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_general_obj_out_cmd_hdr_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+enum {
+	MLX5_VIRTQ_STATE_INIT = 0,
+	MLX5_VIRTQ_STATE_RDY = 1,
+	MLX5_VIRTQ_STATE_SUSPEND = 2,
+	MLX5_VIRTQ_STATE_ERROR = 3,
+};
+
+enum {
+	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+};
+
+struct mlx5_ifc_virtio_q_bits {
+	u8 virtio_q_type[0x8];
+	u8 reserved_at_8[0x5];
+	u8 event_mode[0x3];
+	u8 queue_index[0x10];
+	u8 full_emulation[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 reserved_at_22[0x2];
+	u8 offload_type[0x4];
+	u8 event_qpn_or_msix[0x18];
+	u8 doorbell_stride_idx[0x10];
+	u8 queue_size[0x10];
+	u8 device_emulation_id[0x20];
+	u8 desc_addr[0x40];
+	u8 used_addr[0x40];
+	u8 available_addr[0x40];
+	u8 virtio_q_mkey[0x20];
+	u8 reserved_at_160[0x20];
+	u8 umem_1_id[0x20];
+	u8 umem_1_size[0x20];
+	u8 umem_1_offset[0x40];
+	u8 umem_2_id[0x20];
+	u8 umem_2_size[0x20];
+	u8 umem_2_offset[0x40];
+	u8 umem_3_id[0x20];
+	u8 umem_3_size[0x20];
+	u8 umem_3_offset[0x40];
+	u8 reserved_at_300[0x100];
+};
+
+struct mlx5_ifc_virtio_net_q_bits {
+	u8 modify_field_select[0x40];
+	u8 reserved_at_40[0x40];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_84[0x6];
+	u8 dirty_bitmap_dump_enable[0x1];
+	u8 vhost_log_page[0x5];
+	u8 reserved_at_90[0xc];
+	u8 state[0x4];
+	u8 error_type[0x8];
+	u8 tisn_or_qpn[0x18];
+	u8 dirty_bitmap_mkey[0x20];
+	u8 dirty_bitmap_size[0x20];
+	u8 dirty_bitmap_addr[0x40];
+	u8 hw_available_index[0x10];
+	u8 hw_used_index[0x10];
+	u8 reserved_at_160[0xa0];
+	struct mlx5_ifc_virtio_q_bits virtio_q_context;
+};
+
+struct mlx5_ifc_create_virtq_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
+struct mlx5_ifc_query_virtq_out_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index c6a203d..f3082ce 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -8,6 +8,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_create_tir;
 	mlx5_devx_cmd_create_td;
 	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
 	mlx5_devx_cmd_flow_counter_query;
@@ -15,8 +16,10 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_cmd_query_virtq;
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 16/25] common/mlx5: add support for DevX QP operations
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (14 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 15/25] common/mlx5: add DevX virtq commands Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
                     ` (9 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
QP creation is needed for vDPA virtq support.
Add 2 DevX commands to create QP and to modify QP state.
The support is for RC QP only in force loopback address mode.
By this way, the packets can be sent to other inernal destinations in
the nic. For example: other QPs or virtqs.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 167 ++++++++++-
 drivers/common/mlx5/mlx5_devx_cmds.h            |  20 ++
 drivers/common/mlx5/mlx5_prm.h                  | 376 ++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   2 +
 4 files changed, 564 insertions(+), 1 deletion(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2425513..e7288c8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1124,7 +1124,8 @@ struct mlx5_devx_obj *
 	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
 	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
 	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
-	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size -
+		 MLX5_ADAPTER_PAGE_SHIFT);
 	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
 	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
 	if (attr->q_umem_valid) {
@@ -1313,3 +1314,167 @@ struct mlx5_devx_obj *
 	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
 	return ret;
 }
+
+/**
+ * Create QP using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to QP attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
+			struct mlx5_devx_qp_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_qp_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_qp_out)] = {0};
+	struct mlx5_devx_obj *qp_obj = rte_zmalloc(__func__, sizeof(*qp_obj),
+						   0);
+	void *qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+
+	if (!qp_obj) {
+		DRV_LOG(ERR, "Failed to allocate QP data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pd, attr->pd);
+	if (attr->uar_index) {
+		MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+		MLX5_SET(qpc, qpc, uar_page, attr->uar_index);
+		MLX5_SET(qpc, qpc, log_page_size, attr->log_page_size -
+			 MLX5_ADAPTER_PAGE_SHIFT);
+		if (attr->sq_size) {
+			RTE_ASSERT(RTE_IS_POWER_OF_2(attr->sq_size));
+			MLX5_SET(qpc, qpc, cqn_snd, attr->cqn);
+			MLX5_SET(qpc, qpc, log_sq_size,
+				 rte_log2_u32(attr->sq_size));
+		} else {
+			MLX5_SET(qpc, qpc, no_sq, 1);
+		}
+		if (attr->rq_size) {
+			RTE_ASSERT(RTE_IS_POWER_OF_2(attr->rq_size));
+			MLX5_SET(qpc, qpc, cqn_rcv, attr->cqn);
+			MLX5_SET(qpc, qpc, log_rq_stride, attr->log_rq_stride -
+				 MLX5_LOG_RQ_STRIDE_SHIFT);
+			MLX5_SET(qpc, qpc, log_rq_size,
+				 rte_log2_u32(attr->rq_size));
+			MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
+		} else {
+			MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		}
+		if (attr->dbr_umem_valid) {
+			MLX5_SET(qpc, qpc, dbr_umem_valid,
+				 attr->dbr_umem_valid);
+			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
+		}
+		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
+		MLX5_SET64(create_qp_in, in, wq_umem_offset,
+			   attr->wq_umem_offset);
+		MLX5_SET(create_qp_in, in, wq_umem_id, attr->wq_umem_id);
+		MLX5_SET(create_qp_in, in, wq_umem_valid, 1);
+	} else {
+		/* Special QP to be managed by FW - no SQ\RQ\CQ\UAR\DB rec. */
+		MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		MLX5_SET(qpc, qpc, no_sq, 1);
+	}
+	qp_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!qp_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create QP Obj using DevX.");
+		rte_free(qp_obj);
+		return NULL;
+	}
+	qp_obj->id = MLX5_GET(create_qp_out, out, qpn);
+	return qp_obj;
+}
+
+/**
+ * Modify QP using DevX API.
+ * Currently supports only force loop-back QP.
+ *
+ * @param[in] qp
+ *   Pointer to QP object structure.
+ * @param [in] qp_st_mod_op
+ *   The QP state modification operation.
+ * @param [in] remote_qp_id
+ *   The remote QP ID for MLX5_CMD_OP_INIT2RTR_QP operation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
+			      uint32_t remote_qp_id)
+{
+	union {
+		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
+		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
+		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+	} in;
+	union {
+		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
+		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
+		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+	} out;
+	void *qpc;
+	int ret;
+	unsigned int inlen;
+	unsigned int outlen;
+
+	memset(&in, 0, sizeof(in));
+	memset(&out, 0, sizeof(out));
+	MLX5_SET(rst2init_qp_in, &in, opcode, qp_st_mod_op);
+	switch (qp_st_mod_op) {
+	case MLX5_CMD_OP_RST2INIT_QP:
+		MLX5_SET(rst2init_qp_in, &in, qpn, qp->id);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, &in, qpc);
+		MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, 1);
+		MLX5_SET(qpc, qpc, rre, 1);
+		MLX5_SET(qpc, qpc, rwe, 1);
+		MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+		inlen = sizeof(in.rst2init);
+		outlen = sizeof(out.rst2init);
+		break;
+	case MLX5_CMD_OP_INIT2RTR_QP:
+		MLX5_SET(init2rtr_qp_in, &in, qpn, qp->id);
+		qpc = MLX5_ADDR_OF(init2rtr_qp_in, &in, qpc);
+		MLX5_SET(qpc, qpc, primary_address_path.fl, 1);
+		MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, 1);
+		MLX5_SET(qpc, qpc, mtu, 1);
+		MLX5_SET(qpc, qpc, log_msg_max, 30);
+		MLX5_SET(qpc, qpc, remote_qpn, remote_qp_id);
+		MLX5_SET(qpc, qpc, min_rnr_nak, 0);
+		inlen = sizeof(in.init2rtr);
+		outlen = sizeof(out.init2rtr);
+		break;
+	case MLX5_CMD_OP_RTR2RTS_QP:
+		qpc = MLX5_ADDR_OF(rtr2rts_qp_in, &in, qpc);
+		MLX5_SET(rtr2rts_qp_in, &in, qpn, qp->id);
+		MLX5_SET(qpc, qpc, primary_address_path.ack_timeout, 14);
+		MLX5_SET(qpc, qpc, log_ack_req_freq, 0);
+		MLX5_SET(qpc, qpc, retry_count, 7);
+		MLX5_SET(qpc, qpc, rnr_retry, 7);
+		inlen = sizeof(in.rtr2rts);
+		outlen = sizeof(out.rtr2rts);
+		break;
+	default:
+		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
+			qp_st_mod_op);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(qp->obj, &in, inlen, &out, outlen);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify QP using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 1631c08..d1a21b8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -279,6 +279,22 @@ struct mlx5_devx_virtq_attr {
 	} umems[3];
 };
 
+
+struct mlx5_devx_qp_attr {
+	uint32_t pd:24;
+	uint32_t uar_index:24;
+	uint32_t cqn:24;
+	uint32_t log_page_size:5;
+	uint32_t rq_size:17; /* Must be power of 2. */
+	uint32_t log_rq_stride:3;
+	uint32_t sq_size:17; /* Must be power of 2. */
+	uint32_t dbr_umem_valid:1;
+	uint32_t dbr_umem_id;
+	uint64_t dbr_address;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -323,5 +339,9 @@ int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 			       struct mlx5_devx_virtq_attr *attr);
 int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
 			      struct mlx5_devx_virtq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
+					      struct mlx5_devx_qp_attr *attr);
+int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
+				  uint32_t qp_st_mod_op, uint32_t remote_qp_id);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4b8a34c..e53dd61 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -724,6 +724,19 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_CREATE_CQ = 0x400,
+	MLX5_CMD_OP_CREATE_QP = 0x500,
+	MLX5_CMD_OP_RST2INIT_QP = 0x502,
+	MLX5_CMD_OP_INIT2RTR_QP = 0x503,
+	MLX5_CMD_OP_RTR2RTS_QP = 0x504,
+	MLX5_CMD_OP_RTS2RTS_QP = 0x505,
+	MLX5_CMD_OP_SQERR2RTS_QP = 0x506,
+	MLX5_CMD_OP_QP_2ERR = 0x507,
+	MLX5_CMD_OP_QP_2RST = 0x50A,
+	MLX5_CMD_OP_QUERY_QP = 0x50B,
+	MLX5_CMD_OP_SQD2RTS_QP = 0x50C,
+	MLX5_CMD_OP_INIT2INIT_QP = 0x50E,
+	MLX5_CMD_OP_SUSPEND_QP = 0x50F,
+	MLX5_CMD_OP_RESUME_QP = 0x510,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -747,6 +760,9 @@ enum {
 	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
+#define MLX5_ADAPTER_PAGE_SHIFT 12
+#define MLX5_LOG_RQ_STRIDE_SHIFT 4
+
 /* Flow counters. */
 struct mlx5_ifc_alloc_flow_counter_out_bits {
 	u8         status[0x8];
@@ -2034,6 +2050,366 @@ struct mlx5_ifc_query_virtq_out_bits {
 	struct mlx5_ifc_virtio_net_q_bits virtq;
 };
 
+enum {
+	MLX5_QP_ST_RC = 0x0,
+};
+
+enum {
+	MLX5_QP_PM_MIGRATED = 0x3,
+};
+
+enum {
+	MLX5_NON_ZERO_RQ = 0x0,
+	MLX5_SRQ_RQ = 0x1,
+	MLX5_CRQ_RQ = 0x2,
+	MLX5_ZERO_LEN_RQ = 0x3,
+};
+
+struct mlx5_ifc_ads_bits {
+	u8 fl[0x1];
+	u8 free_ar[0x1];
+	u8 reserved_at_2[0xe];
+	u8 pkey_index[0x10];
+	u8 reserved_at_20[0x8];
+	u8 grh[0x1];
+	u8 mlid[0x7];
+	u8 rlid[0x10];
+	u8 ack_timeout[0x5];
+	u8 reserved_at_45[0x3];
+	u8 src_addr_index[0x8];
+	u8 reserved_at_50[0x4];
+	u8 stat_rate[0x4];
+	u8 hop_limit[0x8];
+	u8 reserved_at_60[0x4];
+	u8 tclass[0x8];
+	u8 flow_label[0x14];
+	u8 rgid_rip[16][0x8];
+	u8 reserved_at_100[0x4];
+	u8 f_dscp[0x1];
+	u8 f_ecn[0x1];
+	u8 reserved_at_106[0x1];
+	u8 f_eth_prio[0x1];
+	u8 ecn[0x2];
+	u8 dscp[0x6];
+	u8 udp_sport[0x10];
+	u8 dei_cfi[0x1];
+	u8 eth_prio[0x3];
+	u8 sl[0x4];
+	u8 vhca_port_num[0x8];
+	u8 rmac_47_32[0x10];
+	u8 rmac_31_0[0x20];
+};
+
+struct mlx5_ifc_qpc_bits {
+	u8 state[0x4];
+	u8 lag_tx_port_affinity[0x4];
+	u8 st[0x8];
+	u8 reserved_at_10[0x3];
+	u8 pm_state[0x2];
+	u8 reserved_at_15[0x1];
+	u8 req_e2e_credit_mode[0x2];
+	u8 offload_type[0x4];
+	u8 end_padding_mode[0x2];
+	u8 reserved_at_1e[0x2];
+	u8 wq_signature[0x1];
+	u8 block_lb_mc[0x1];
+	u8 atomic_like_write_en[0x1];
+	u8 latency_sensitive[0x1];
+	u8 reserved_at_24[0x1];
+	u8 drain_sigerr[0x1];
+	u8 reserved_at_26[0x2];
+	u8 pd[0x18];
+	u8 mtu[0x3];
+	u8 log_msg_max[0x5];
+	u8 reserved_at_48[0x1];
+	u8 log_rq_size[0x4];
+	u8 log_rq_stride[0x3];
+	u8 no_sq[0x1];
+	u8 log_sq_size[0x4];
+	u8 reserved_at_55[0x6];
+	u8 rlky[0x1];
+	u8 ulp_stateless_offload_mode[0x4];
+	u8 counter_set_id[0x8];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_a0[0x3];
+	u8 log_page_size[0x5];
+	u8 remote_qpn[0x18];
+	struct mlx5_ifc_ads_bits primary_address_path;
+	struct mlx5_ifc_ads_bits secondary_address_path;
+	u8 log_ack_req_freq[0x4];
+	u8 reserved_at_384[0x4];
+	u8 log_sra_max[0x3];
+	u8 reserved_at_38b[0x2];
+	u8 retry_count[0x3];
+	u8 rnr_retry[0x3];
+	u8 reserved_at_393[0x1];
+	u8 fre[0x1];
+	u8 cur_rnr_retry[0x3];
+	u8 cur_retry_count[0x3];
+	u8 reserved_at_39b[0x5];
+	u8 reserved_at_3a0[0x20];
+	u8 reserved_at_3c0[0x8];
+	u8 next_send_psn[0x18];
+	u8 reserved_at_3e0[0x8];
+	u8 cqn_snd[0x18];
+	u8 reserved_at_400[0x8];
+	u8 deth_sqpn[0x18];
+	u8 reserved_at_420[0x20];
+	u8 reserved_at_440[0x8];
+	u8 last_acked_psn[0x18];
+	u8 reserved_at_460[0x8];
+	u8 ssn[0x18];
+	u8 reserved_at_480[0x8];
+	u8 log_rra_max[0x3];
+	u8 reserved_at_48b[0x1];
+	u8 atomic_mode[0x4];
+	u8 rre[0x1];
+	u8 rwe[0x1];
+	u8 rae[0x1];
+	u8 reserved_at_493[0x1];
+	u8 page_offset[0x6];
+	u8 reserved_at_49a[0x3];
+	u8 cd_slave_receive[0x1];
+	u8 cd_slave_send[0x1];
+	u8 cd_master[0x1];
+	u8 reserved_at_4a0[0x3];
+	u8 min_rnr_nak[0x5];
+	u8 next_rcv_psn[0x18];
+	u8 reserved_at_4c0[0x8];
+	u8 xrcd[0x18];
+	u8 reserved_at_4e0[0x8];
+	u8 cqn_rcv[0x18];
+	u8 dbr_addr[0x40];
+	u8 q_key[0x20];
+	u8 reserved_at_560[0x5];
+	u8 rq_type[0x3];
+	u8 srqn_rmpn_xrqn[0x18];
+	u8 reserved_at_580[0x8];
+	u8 rmsn[0x18];
+	u8 hw_sq_wqebb_counter[0x10];
+	u8 sw_sq_wqebb_counter[0x10];
+	u8 hw_rq_counter[0x20];
+	u8 sw_rq_counter[0x20];
+	u8 reserved_at_600[0x20];
+	u8 reserved_at_620[0xf];
+	u8 cgs[0x1];
+	u8 cs_req[0x8];
+	u8 cs_res[0x8];
+	u8 dc_access_key[0x40];
+	u8 reserved_at_680[0x3];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_684[0x9c];
+	u8 dbr_umem_id[0x20];
+};
+
+struct mlx5_ifc_create_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 wq_umem_offset[0x40];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_861[0x1f];
+	u8 pas[0][0x40];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+struct mlx5_ifc_sqerr2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_sqerr2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_sqd2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_sqd2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rts2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rts2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rtr2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rtr2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rst2init_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rst2init_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_init2rtr_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_init2rtr_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_init2init_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_init2init_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_query_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+	u8 pas[0][0x40];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+struct mlx5_ifc_query_qp_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index f3082ce..df8e064 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -2,6 +2,7 @@ DPDK_20.02 {
 	global:
 
 	mlx5_devx_cmd_create_cq;
+	mlx5_devx_cmd_create_qp;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
@@ -14,6 +15,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_qp_state;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 17/25] common/mlx5: allow type configuration for DevX RQT
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (15 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 18/25] common/mlx5: add TIR field constants Matan Azrad
                     ` (8 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Allow virtio queue type configuration in the RQ table.
The needed fields and configuration was added.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 1 +
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 drivers/common/mlx5/mlx5_prm.h       | 5 +++--
 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index e7288c8..e372df6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -846,6 +846,7 @@ struct mlx5_devx_obj *
 	}
 	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
 	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
 	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
 	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
 	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d1a21b8..9ef3ce2 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -188,6 +188,7 @@ struct mlx5_devx_tir_attr {
 
 /* RQT attributes structure, used by RQT operations. */
 struct mlx5_devx_rqt_attr {
+	uint8_t rq_type;
 	uint32_t rqt_max_size:16;
 	uint32_t rqt_actual_size:16;
 	uint32_t rq_list[];
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e53dd61..000ba1f 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1734,8 +1734,9 @@ struct mlx5_ifc_rq_num_bits {
 };
 
 struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
+	u8 reserved_at_0[0xa5];
+	u8 list_q_type[0x3];
+	u8 reserved_at_a8[0x8];
 	u8 rqt_max_size[0x10];
 	u8 reserved_at_c0[0x10];
 	u8 rqt_actual_size[0x10];
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 18/25] common/mlx5: add TIR field constants
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (16 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
                     ` (7 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The DevX TIR object configuration should get L3 and L4 protocols
expected to be forwarded by the TIR.
Add the PRM constant values needed to configure the L3 and L4 protocols.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 000ba1f..e326868 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1639,6 +1639,16 @@ struct mlx5_ifc_modify_rq_in_bits {
 };
 
 enum {
+	MLX5_L3_PROT_TYPE_IPV4 = 0,
+	MLX5_L3_PROT_TYPE_IPV6 = 1,
+};
+
+enum {
+	MLX5_L4_PROT_TYPE_TCP = 0,
+	MLX5_L4_PROT_TYPE_UDP = 1,
+};
+
+enum {
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 19/25] common/mlx5: add DevX command to modify RQT
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (17 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 18/25] common/mlx5: add TIR field constants Matan Azrad
@ 2020-01-28 10:05   ` Matan Azrad
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
                     ` (6 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:05 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
RQ table can be changed to support different list of queues.
Add DevX command to modify DevX RQT object to point on new RQ list.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 47 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  2 ++
 drivers/common/mlx5/mlx5_prm.h                  | 21 +++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 71 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index e372df6..1d3a729 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -864,6 +864,53 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Modify RQT using DevX API.
+ *
+ * @param[in] rqt
+ *   Pointer to RQT DevX object structure.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(modify_rqt_out)] = {0};
+	uint32_t *in = rte_calloc(__func__, 1, inlen, 0);
+	void *rqt_ctx;
+	int i;
+	int ret;
+
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT modify IN data.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	MLX5_SET(modify_rqt_in, in, opcode, MLX5_CMD_OP_MODIFY_RQT);
+	MLX5_SET(modify_rqt_in, in, rqtn, rqt->id);
+	MLX5_SET64(modify_rqt_in, in, modify_bitmask, 0x1);
+	rqt_ctx = MLX5_ADDR_OF(modify_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	ret = mlx5_glue->devx_obj_modify(rqt->obj, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQT using DevX.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return ret;
+}
+
+/**
  * Create SQ using DevX API.
  *
  * @param[in] ctx
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 9ef3ce2..b99c54b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -344,5 +344,7 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
 					      struct mlx5_devx_qp_attr *attr);
 int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
 				  uint32_t qp_st_mod_op, uint32_t remote_qp_id);
+int mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			     struct mlx5_devx_rqt_attr *rqt_attr);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e326868..b48cd0a 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -747,6 +747,7 @@ enum {
 	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_MODIFY_RQT = 0x917,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
 	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
@@ -1774,10 +1775,30 @@ struct mlx5_ifc_create_rqt_in_bits {
 	u8 reserved_at_40[0xc0];
 	struct mlx5_ifc_rqtc_bits rqt_context;
 };
+
+struct mlx5_ifc_modify_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_modify_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_SQC_STATE_RST  = 0x0,
 	MLX5_SQC_STATE_RDY  = 0x1,
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index df8e064..95ca54a 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,6 +17,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_qp_state;
 	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_rqt;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 20/25] common/mlx5: get DevX capability for max RQT size
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (18 preceding siblings ...)
  2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
@ 2020-01-28 10:06   ` Matan Azrad
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 21/25] net/mlx5: select driver by vDPA device argument Matan Azrad
                     ` (5 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:06 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
In order to allow RQT size configuration which is limited to the
correct maximum value, add log_max_rqt_size for DevX capability
structure.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 2 ++
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 2 files changed, 3 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d3a729..b0803ac 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -436,6 +436,8 @@ struct mlx5_devx_obj *
 			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
+	attr->log_max_rqt_size = MLX5_GET(cmd_hca_cap, hcattr,
+					  log_max_rqt_size);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
 	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index b99c54b..6912dc6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -78,6 +78,7 @@ struct mlx5_hca_vdpa_attr {
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
 	uint32_t flow_counters_dump:1;
+	uint32_t log_max_rqt_size:5;
 	uint8_t flow_counter_bulk_alloc_bitmap;
 	uint32_t eth_net_offloads:1;
 	uint32_t eth_virt:1;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 21/25] net/mlx5: select driver by vDPA device argument
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (19 preceding siblings ...)
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
@ 2020-01-28 10:06   ` Matan Azrad
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 22/25] net/mlx5: separate Netlink command interface Matan Azrad
                     ` (4 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:06 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
There might be a case that one Mellanox device can be probed by
multiple mlx5 drivers.
One case is that any mlx5 vDPA device can be probed by bothe net/mlx5
and vdpa/mlx5.
Add a new mlx5 common API to get the requested driver by devargs:
vdpa=1.
Skip net/mlx5 PMD probing while the device is selected to be probed by
the vDPA driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/meson.build                 |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 36 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  3 +++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 drivers/net/mlx5/mlx5.c                         |  5 ++++
 6 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index d1de3ec..b9e9803 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 34de7d1..174c64a 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -38,7 +38,7 @@ endforeach
 
 if build
 	allow_experimental_apis = true
-	deps += ['hash', 'pci', 'net', 'eal']
+	deps += ['hash', 'pci', 'net', 'eal', 'kvargs']
 	sources = files(
 		'mlx5_devx_cmds.c',
 		'mlx5_common.c',
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 9471ff3..c5af3ca 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -71,6 +71,42 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_check_handler(__rte_unused const char *key, const char *value,
+			__rte_unused void *opaque)
+{
+	if (strcmp(value, "1"))
+		return -1;
+	return 0;
+}
+
+int
+mlx5_vdpa_mode_selected(struct rte_devargs *devargs)
+{
+	struct rte_kvargs *kvlist;
+	const char *key = "vdpa";
+	int ret = 0;
+
+	if (devargs == NULL)
+		return 0;
+
+	kvlist = rte_kvargs_parse(devargs->args, NULL);
+	if (kvlist == NULL)
+		return 0;
+
+	if (!rte_kvargs_count(kvlist, key))
+		goto exit;
+
+	/* Vdpa mode selected when there's a key-value pair: vdpa=1. */
+	if (rte_kvargs_process(kvlist, key, mlx5_vdpa_check_handler, NULL) < 0)
+		goto exit;
+	ret = 1;
+
+exit:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9d464d4..aeaa7b9 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -11,6 +11,8 @@
 #include <rte_pci.h>
 #include <rte_atomic.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "mlx5_prm.h"
 
@@ -149,5 +151,6 @@ enum mlx5_cqe_status {
 }
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+int mlx5_vdpa_mode_selected(struct rte_devargs *devargs);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 95ca54a..d32d631 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -26,4 +26,5 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
+	mlx5_vdpa_mode_selected;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d0fa2da..353196b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2967,6 +2967,11 @@ struct mlx5_flow_id_pool *
 	struct mlx5_dev_config dev_config;
 	int ret;
 
+	if (mlx5_vdpa_mode_selected(pci_dev->device.devargs)) {
+		DRV_LOG(DEBUG, "Skip probing - should be probed by the vdpa"
+			" driver.");
+		return 1;
+	}
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
 		mlx5_pmd_socket_init();
 	ret = mlx5_init_once();
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 22/25] net/mlx5: separate Netlink command interface
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (20 preceding siblings ...)
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 21/25] net/mlx5: select driver by vDPA device argument Matan Azrad
@ 2020-01-28 10:06   ` Matan Azrad
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
                     ` (3 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:06 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The Netlink commands interfaces is included in the mlx5.h file with a
lot of other PMD interfaces.
As an arrangement to make the Netlink commands shared with different
PMDs, this patch moves the Netlink interface to a new file called
mlx5_nl.h.
Move non Netlink pure vlan commands from mlx5_nl.c to the
mlx5_vlan.c.
Rename Netlink commands and structures to use prefix mlx5_nl.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  72 +++------------------
 drivers/net/mlx5/mlx5_nl.c   | 149 +++----------------------------------------
 drivers/net/mlx5/mlx5_nl.h   |  69 ++++++++++++++++++++
 drivers/net/mlx5/mlx5_vlan.c | 134 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 220 insertions(+), 204 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3daf0db..01d0051 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -39,6 +39,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
+#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
@@ -75,24 +76,6 @@ struct mlx5_mp_param {
 /** Key string for IPC. */
 #define MLX5_MP_NAME "net_mlx5_mp"
 
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
 
 LIST_HEAD(mlx5_dev_list, mlx5_ibv_shared);
 
@@ -226,30 +209,12 @@ enum mlx5_verbs_alloc_type {
 	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
 };
 
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
 	uint32_t created:1;
 };
 
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t nl_sn;
-	uint32_t vf_ifindex;
-	struct rte_eth_dev *dev;
-	struct mlx5_vlan_dev vlan_dev[4096];
-};
-
 /**
  * Verbs allocator needs a context to know in the callback which kind of
  * resources it is allocating.
@@ -576,7 +541,7 @@ struct mlx5_priv {
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
 	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
-	struct mlx5_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
+	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
@@ -672,6 +637,8 @@ int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 void mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx5_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
 		      uint32_t index, uint32_t vmdq);
+struct mlx5_nl_vlan_vmwa_context *mlx5_vlan_vmwa_init
+				    (struct rte_eth_dev *dev, uint32_t ifindex);
 int mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr);
 int mlx5_set_mc_addr_list(struct rte_eth_dev *dev,
 			struct rte_ether_addr *mc_addr_set,
@@ -715,6 +682,11 @@ int mlx5_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
 int mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 void mlx5_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t queue, int on);
 int mlx5_vlan_offload_set(struct rte_eth_dev *dev, int mask);
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *ctx);
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
 
 /* mlx5_trigger.c */
 
@@ -796,32 +768,6 @@ int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
 int mlx5_pmd_socket_init(void);
 void mlx5_pmd_socket_uninit(void);
 
-/* mlx5_nl.c */
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-struct mlx5_vlan_vmwa_context *mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-						   uint32_t ifindex);
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *ctx);
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index e7ba034..3fe4b6f 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -5,7 +5,6 @@
 
 #include <errno.h>
 #include <linux/if_link.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
@@ -18,8 +17,6 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
-#include <rte_malloc.h>
-#include <rte_hypervisor.h>
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -1072,7 +1069,8 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_switch_info(int nl, unsigned int ifindex, struct mlx5_switch_info *info)
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
 {
 	uint32_t seq = random();
 	struct {
@@ -1116,12 +1114,12 @@ struct mlx5_nl_ifindex_data {
  * Delete VLAN network device by ifindex.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Interface index of network device to delete.
  */
-static void
-mlx5_vlan_vmwa_delete(struct mlx5_vlan_vmwa_context *vmwa,
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
 	int ret;
@@ -1196,14 +1194,14 @@ struct mlx5_nl_ifindex_data {
  * Create network VLAN device with specified VLAN tag.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Base network interface index.
  * @param[in] tag
  *   VLAN tag for VLAN network device to create.
  */
-static uint32_t
-mlx5_vlan_vmwa_create(struct mlx5_vlan_vmwa_context *vmwa,
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex,
 		      uint16_t tag)
 {
@@ -1269,134 +1267,3 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
-
-/*
- * Release VLAN network device, created for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to release.
- */
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(vlan->created);
-	assert(priv->vmwa_context);
-	if (!vlan->created || !vmwa)
-		return;
-	vlan->created = 0;
-	assert(vlan_dev[vlan->tag].refcnt);
-	if (--vlan_dev[vlan->tag].refcnt == 0 &&
-	    vlan_dev[vlan->tag].ifindex) {
-		mlx5_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex = 0;
-	}
-}
-
-/**
- * Acquire VLAN interface with specified tag for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to acquire.
- */
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(!vlan->created);
-	assert(priv->vmwa_context);
-	if (vlan->created || !vmwa)
-		return;
-	if (vlan_dev[vlan->tag].refcnt == 0) {
-		assert(!vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex =
-			mlx5_vlan_vmwa_create(vmwa,
-					      vmwa->vf_ifindex,
-					      vlan->tag);
-	}
-	if (vlan_dev[vlan->tag].ifindex) {
-		vlan_dev[vlan->tag].refcnt++;
-		vlan->created = 1;
-	}
-}
-
-/*
- * Create per ethernet device VLAN VM workaround context
- */
-struct mlx5_vlan_vmwa_context *
-mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-		    uint32_t ifindex)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_dev_config *config = &priv->config;
-	struct mlx5_vlan_vmwa_context *vmwa;
-	enum rte_hypervisor hv_type;
-
-	/* Do not engage workaround over PF. */
-	if (!config->vf)
-		return NULL;
-	/* Check whether there is desired virtual environment */
-	hv_type = rte_hypervisor_get();
-	switch (hv_type) {
-	case RTE_HYPERVISOR_UNKNOWN:
-	case RTE_HYPERVISOR_VMWARE:
-		/*
-		 * The "white list" of configurations
-		 * to engage the workaround.
-		 */
-		break;
-	default:
-		/*
-		 * The configuration is not found in the "white list".
-		 * We should not engage the VLAN workaround.
-		 */
-		return NULL;
-	}
-	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
-	if (!vmwa) {
-		DRV_LOG(WARNING,
-			"Can not allocate memory"
-			" for VLAN workaround context");
-		return NULL;
-	}
-	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
-	if (vmwa->nl_socket < 0) {
-		DRV_LOG(WARNING,
-			"Can not create Netlink socket"
-			" for VLAN workaround context");
-		rte_free(vmwa);
-		return NULL;
-	}
-	vmwa->nl_sn = random();
-	vmwa->vf_ifindex = ifindex;
-	vmwa->dev = dev;
-	/* Cleanup for existing VLAN devices. */
-	return vmwa;
-}
-
-/*
- * Destroy per ethernet device VLAN VM workaround context
- */
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *vmwa)
-{
-	unsigned int i;
-
-	/* Delete all remaining VLAN devices. */
-	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
-		if (vmwa->vlan_dev[i].ifindex)
-			mlx5_vlan_vmwa_delete(vmwa, vmwa->vlan_dev[i].ifindex);
-	}
-	if (vmwa->nl_socket >= 0)
-		close(vmwa->nl_socket);
-	rte_free(vmwa);
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..7903673
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t nl_sn;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			 uint32_t index);
+int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
+void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
+int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
+int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			   uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index b0fa31a..fb52d8f 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -7,6 +7,8 @@
 #include <errno.h>
 #include <assert.h>
 #include <stdint.h>
+#include <unistd.h>
+
 
 /*
  * Not needed by this file; included to work around the lack of off_t
@@ -26,6 +28,8 @@
 
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_hypervisor.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -33,6 +37,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
@@ -193,3 +198,132 @@
 	}
 	return 0;
 }
+
+/*
+ * Release VLAN network device, created for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to release.
+ */
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(vlan->created);
+	assert(priv->vmwa_context);
+	if (!vlan->created || !vmwa)
+		return;
+	vlan->created = 0;
+	assert(vlan_dev[vlan->tag].refcnt);
+	if (--vlan_dev[vlan->tag].refcnt == 0 &&
+	    vlan_dev[vlan->tag].ifindex) {
+		mlx5_nl_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex = 0;
+	}
+}
+
+/**
+ * Acquire VLAN interface with specified tag for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to acquire.
+ */
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(!vlan->created);
+	assert(priv->vmwa_context);
+	if (vlan->created || !vmwa)
+		return;
+	if (vlan_dev[vlan->tag].refcnt == 0) {
+		assert(!vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex =
+			mlx5_nl_vlan_vmwa_create(vmwa, vmwa->vf_ifindex,
+						 vlan->tag);
+	}
+	if (vlan_dev[vlan->tag].ifindex) {
+		vlan_dev[vlan->tag].refcnt++;
+		vlan->created = 1;
+	}
+}
+
+/*
+ * Create per ethernet device VLAN VM workaround context
+ */
+struct mlx5_nl_vlan_vmwa_context *
+mlx5_vlan_vmwa_init(struct rte_eth_dev *dev, uint32_t ifindex)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
+	struct mlx5_nl_vlan_vmwa_context *vmwa;
+	enum rte_hypervisor hv_type;
+
+	/* Do not engage workaround over PF. */
+	if (!config->vf)
+		return NULL;
+	/* Check whether there is desired virtual environment */
+	hv_type = rte_hypervisor_get();
+	switch (hv_type) {
+	case RTE_HYPERVISOR_UNKNOWN:
+	case RTE_HYPERVISOR_VMWARE:
+		/*
+		 * The "white list" of configurations
+		 * to engage the workaround.
+		 */
+		break;
+	default:
+		/*
+		 * The configuration is not found in the "white list".
+		 * We should not engage the VLAN workaround.
+		 */
+		return NULL;
+	}
+	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
+	if (!vmwa) {
+		DRV_LOG(WARNING,
+			"Can not allocate memory"
+			" for VLAN workaround context");
+		return NULL;
+	}
+	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
+	if (vmwa->nl_socket < 0) {
+		DRV_LOG(WARNING,
+			"Can not create Netlink socket"
+			" for VLAN workaround context");
+		rte_free(vmwa);
+		return NULL;
+	}
+	vmwa->nl_sn = random();
+	vmwa->vf_ifindex = ifindex;
+	/* Cleanup for existing VLAN devices. */
+	return vmwa;
+}
+
+/*
+ * Destroy per ethernet device VLAN VM workaround context
+ */
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *vmwa)
+{
+	unsigned int i;
+
+	/* Delete all remaining VLAN devices. */
+	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
+		if (vmwa->vlan_dev[i].ifindex)
+			mlx5_nl_vlan_vmwa_delete(vmwa,
+						 vmwa->vlan_dev[i].ifindex);
+	}
+	if (vmwa->nl_socket >= 0)
+		close(vmwa->nl_socket);
+	rte_free(vmwa);
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 23/25] net/mlx5: reduce Netlink commands dependencies
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (21 preceding siblings ...)
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 22/25] net/mlx5: separate Netlink command interface Matan Azrad
@ 2020-01-28 10:06   ` Matan Azrad
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 24/25] common/mlx5: share Netlink commands Matan Azrad
                     ` (2 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:06 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
As an arrangment for Netlink command moving to the common library,
reduce the net/mlx5 dependencies.
Replace ethdev class command parameters.
Improve Netlink sequence number mechanism to be controlled by the
mlx5 Netlink mechanism.
Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one
which uses it.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  10 +-
 drivers/net/mlx5/mlx5.h        |   3 -
 drivers/net/mlx5/mlx5_ethdev.c |  49 ------
 drivers/net/mlx5/mlx5_mac.c    |  14 +-
 drivers/net/mlx5/mlx5_nl.c     | 329 +++++++++++++++++++++++++----------------
 drivers/net/mlx5/mlx5_nl.h     |  23 +--
 drivers/net/mlx5/mlx5_rxmode.c |  12 +-
 drivers/net/mlx5/mlx5_vlan.c   |   1 -
 8 files changed, 236 insertions(+), 205 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 353196b..439b7b8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1270,7 +1270,9 @@ struct mlx5_flow_id_pool *
 	if (priv->reta_idx != NULL)
 		rte_free(priv->reta_idx);
 	if (priv->config.vf)
-		mlx5_nl_mac_addr_flush(dev);
+		mlx5_nl_mac_addr_flush(priv->nl_socket_route, mlx5_ifindex(dev),
+				       dev->data->mac_addrs,
+				       MLX5_MAX_MAC_ADDRESSES, priv->mac_own);
 	if (priv->nl_socket_route >= 0)
 		close(priv->nl_socket_route);
 	if (priv->nl_socket_rdma >= 0)
@@ -2327,7 +2329,6 @@ struct mlx5_flow_id_pool *
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
-	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
 	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
@@ -2647,7 +2648,10 @@ struct mlx5_flow_id_pool *
 	/* Register MAC address. */
 	claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
 	if (config.vf && config.vf_nl_en)
-		mlx5_nl_mac_addr_sync(eth_dev);
+		mlx5_nl_mac_addr_sync(priv->nl_socket_route,
+				      mlx5_ifindex(eth_dev),
+				      eth_dev->data->mac_addrs,
+				      MLX5_MAX_MAC_ADDRESSES);
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
 	TAILQ_INIT(&priv->flow_meters);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 01d0051..9864aa7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -539,7 +539,6 @@ struct mlx5_priv {
 	/* Context for Verbs allocator. */
 	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
-	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
@@ -617,8 +616,6 @@ int mlx5_sysfs_switch_info(unsigned int ifindex,
 			   struct mlx5_switch_info *info);
 void mlx5_sysfs_check_switch_info(bool device_dir,
 				  struct mlx5_switch_info *switch_info);
-void mlx5_nl_check_switch_info(bool nun_vf_set,
-			       struct mlx5_switch_info *switch_info);
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
 void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2628e64..5484104 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1891,55 +1891,6 @@ struct mlx5_priv *
 }
 
 /**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
  * Analyze gathered port parameters via sysfs to recognize master
  * and representor devices for E-Switch configuration.
  *
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index a646b90..0ab2a0e 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -74,8 +74,9 @@
 	if (rte_is_zero_ether_addr(&dev->data->mac_addrs[index]))
 		return;
 	if (vf)
-		mlx5_nl_mac_addr_remove(dev, &dev->data->mac_addrs[index],
-					index);
+		mlx5_nl_mac_addr_remove(priv->nl_socket_route,
+					mlx5_ifindex(dev), priv->mac_own,
+					&dev->data->mac_addrs[index], index);
 	memset(&dev->data->mac_addrs[index], 0, sizeof(struct rte_ether_addr));
 }
 
@@ -117,7 +118,9 @@
 		return -rte_errno;
 	}
 	if (vf) {
-		int ret = mlx5_nl_mac_addr_add(dev, mac, index);
+		int ret = mlx5_nl_mac_addr_add(priv->nl_socket_route,
+					       mlx5_ifindex(dev), priv->mac_own,
+					       mac, index);
 
 		if (ret)
 			return ret;
@@ -209,8 +212,9 @@
 			if (priv->master == 1) {
 				priv = dev->data->dev_private;
 				return mlx5_nl_vf_mac_addr_modify
-					(&rte_eth_devices[port_id],
-					 mac_addr, priv->representor_id);
+				       (priv->nl_socket_route,
+					mlx5_ifindex(&rte_eth_devices[port_id]),
+					mac_addr, priv->representor_id);
 			}
 		}
 		rte_errno = -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 3fe4b6f..6b8ca00 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -17,8 +17,11 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
+#include <rte_atomic.h>
+#include <rte_ether.h>
 
 #include "mlx5.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /* Size of the buffer to receive kernel messages */
@@ -109,6 +112,11 @@ struct mlx5_nl_ifindex_data {
 	uint32_t portnum; /**< IB device max port number (out). */
 };
 
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
 /**
  * Opens a Netlink socket.
  *
@@ -369,8 +377,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Get bridge MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac[out]
  *   Pointer to the array table of MAC addresses to fill.
  *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
@@ -381,11 +391,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_list(struct rte_eth_dev *dev, struct rte_ether_addr (*mac)[],
-		      int *mac_n)
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr	hdr;
 		struct ifinfomsg ifm;
@@ -404,33 +412,33 @@ struct mlx5_nl_ifindex_data {
 		.mac = mac,
 		.mac_n = 0,
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_request(fd, &req.hdr, sn, &req.ifm,
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
 			      sizeof(struct ifinfomsg));
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, mlx5_nl_mac_addr_cb, &data);
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
 	if (ret < 0)
 		goto error;
 	*mac_n = data.mac_n;
 	return 0;
 error:
-	DRV_LOG(DEBUG, "port %u cannot retrieve MAC address list %s",
-		dev->data->port_id, strerror(rte_errno));
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
 	return -rte_errno;
 }
 
 /**
  * Modify the MAC address neighbour table with Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *   MAC address to consider.
  * @param add
@@ -440,11 +448,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_modify(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			int add)
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ndmsg ndm;
@@ -468,28 +474,26 @@ struct mlx5_nl_ifindex_data {
 			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
 	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
 	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
 		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
 error:
 	DRV_LOG(DEBUG,
-		"port %u cannot %s MAC address %02X:%02X:%02X:%02X:%02X:%02X"
-		" %s",
-		dev->data->port_id,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
 		add ? "add" : "remove",
 		mac->addr_bytes[0], mac->addr_bytes[1],
 		mac->addr_bytes[2], mac->addr_bytes[3],
@@ -501,8 +505,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Modify the VF MAC address neighbour table with Netlink.
  *
- * @param dev
- *    Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *    MAC address to consider.
  * @param vf_index
@@ -512,12 +518,10 @@ struct mlx5_nl_ifindex_data {
  *    0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			   struct rte_ether_addr *mac, int vf_index)
 {
-	int fd, ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
+	int ret;
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifm;
@@ -546,10 +550,10 @@ struct mlx5_nl_ifindex_data {
 			.rta_type = IFLA_VF_MAC,
 		},
 	};
-	uint32_t sn = priv->nl_sn++;
 	struct ifla_vf_mac ivm = {
 		.vf = vf_index,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 
 	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
 	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
@@ -564,13 +568,12 @@ struct mlx5_nl_ifindex_data {
 	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
 					       &req.vf_info_rta);
 
-	fd = priv->nl_socket_route;
-	if (fd < 0)
+	if (nlsk_fd < 0)
 		return -1;
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
@@ -589,8 +592,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Add a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to register.
  * @param index
@@ -600,15 +607,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
 		     uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_modify(dev, mac, 1);
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
 	if (!ret)
-		BITFIELD_SET(priv->mac_own, index);
+		BITFIELD_SET(mac_own, index);
 	if (ret == -EEXIST)
 		return 0;
 	return ret;
@@ -617,8 +624,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Remove a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to remove.
  * @param index
@@ -628,46 +639,50 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			uint32_t index)
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	BITFIELD_RESET(priv->mac_own, index);
-	return mlx5_nl_mac_addr_modify(dev, mac, 0);
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
 }
 
 /**
  * Synchronize Netlink bridge table to the internal table.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
  */
 void
-mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
 {
-	struct rte_ether_addr macs[MLX5_MAX_MAC_ADDRESSES];
+	struct rte_ether_addr macs[n];
 	int macs_n = 0;
 	int i;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_list(dev, &macs, &macs_n);
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
 	if (ret)
 		return;
 	for (i = 0; i != macs_n; ++i) {
 		int j;
 
 		/* Verify the address is not in the array yet. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j)
-			if (rte_is_same_ether_addr(&macs[i],
-					       &dev->data->mac_addrs[j]))
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
 				break;
-		if (j != MLX5_MAX_MAC_ADDRESSES)
+		if (j != n)
 			continue;
 		/* Find the first entry available. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j) {
-			if (rte_is_zero_ether_addr(&dev->data->mac_addrs[j])) {
-				dev->data->mac_addrs[j] = macs[i];
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
 				break;
 			}
 		}
@@ -677,28 +692,40 @@ struct mlx5_nl_ifindex_data {
 /**
  * Flush all added MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  */
 void
-mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 
-	for (i = MLX5_MAX_MAC_ADDRESSES - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &dev->data->mac_addrs[i];
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
 
-		if (BITFIELD_ISSET(priv->mac_own, i))
-			mlx5_nl_mac_addr_remove(dev, m, i);
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
 	}
 }
 
 /**
  * Enable promiscuous / all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param flags
  *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
  * @param enable
@@ -708,10 +735,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_device_flags(struct rte_eth_dev *dev, uint32_t flags, int enable)
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifi;
@@ -727,14 +753,13 @@ struct mlx5_nl_ifindex_data {
 			.ifi_index = iface_idx,
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (priv->nl_socket_route < 0)
+	if (nlsk_fd < 0)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_send(fd, &req.hdr, priv->nl_sn++);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		return ret;
 	return 0;
@@ -743,8 +768,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable promiscuous mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -752,14 +779,14 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_promisc(struct rte_eth_dev *dev, int enable)
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_PROMISC, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s promisc mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -767,8 +794,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -776,14 +805,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable)
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_ALLMULTI, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s allmulti mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -879,7 +909,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
 		.flags = 0,
@@ -900,19 +929,20 @@ struct mlx5_nl_ifindex_data {
 		},
 	};
 	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
 	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
 		goto error;
 	data.flags = 0;
-	++seq;
+	sn = MLX5_NL_SN_GENERATE;
 	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
 					     RDMA_NLDEV_CMD_PORT_GET);
 	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
@@ -927,10 +957,10 @@ struct mlx5_nl_ifindex_data {
 	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
 	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
 	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -959,7 +989,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_portnum(int nl, const char *name)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.flags = 0,
 		.name = name,
@@ -972,12 +1001,13 @@ struct mlx5_nl_ifindex_data {
 					       RDMA_NLDEV_CMD_GET),
 		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req, seq);
+	ret = mlx5_nl_send(nl, &req, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -992,6 +1022,55 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
@@ -1072,7 +1151,6 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_switch_info(int nl, unsigned int ifindex,
 		    struct mlx5_switch_info *info)
 {
-	uint32_t seq = random();
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
@@ -1096,11 +1174,12 @@ struct mlx5_nl_ifindex_data {
 		},
 		.extmask = RTE_LE32(1),
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, seq, mlx5_nl_switch_info_cb, info);
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
 	if (info->master && info->representor) {
 		DRV_LOG(ERR, "ifindex %u device is recognized as master"
 			     " and as representor", ifindex);
@@ -1122,6 +1201,7 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 	struct {
 		struct nlmsghdr nh;
@@ -1139,18 +1219,12 @@ struct mlx5_nl_ifindex_data {
 	};
 
 	if (ifindex) {
-		++vmwa->nl_sn;
-		if (!vmwa->nl_sn)
-			++vmwa->nl_sn;
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, vmwa->nl_sn);
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
 		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket,
-					   vmwa->nl_sn,
-					   NULL, NULL);
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting"
-					 " VLAN WA ifindex %u, %d",
-					 ifindex, ret);
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
 	}
 }
 
@@ -1202,8 +1276,7 @@ struct mlx5_nl_ifindex_data {
  */
 uint32_t
 mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex,
-		      uint16_t tag)
+			 uint32_t ifindex, uint16_t tag)
 {
 	struct nlmsghdr *nlh;
 	struct ifinfomsg *ifm;
@@ -1220,12 +1293,10 @@ struct mlx5_nl_ifindex_data {
 		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
 	struct nlattr *na_info;
 	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	memset(buf, 0, sizeof(buf));
-	++vmwa->nl_sn;
-	if (!vmwa->nl_sn)
-		++vmwa->nl_sn;
 	nlh = (struct nlmsghdr *)buf;
 	nlh->nlmsg_len = sizeof(struct nlmsghdr);
 	nlh->nlmsg_type = RTM_NEWLINK;
@@ -1249,20 +1320,18 @@ struct mlx5_nl_ifindex_data {
 	nl_attr_nest_end(nlh, na_vlan);
 	nl_attr_nest_end(nlh, na_info);
 	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, vmwa->nl_sn);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, vmwa->nl_sn, NULL, NULL);
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 	if (ret < 0) {
-		DRV_LOG(WARNING,
-			"netlink: VLAN %s create failure (%d)",
-			name, ret);
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
 	}
 	// Try to get ifindex of created or pre-existing device.
 	ret = if_nametoindex(name);
 	if (!ret) {
-		DRV_LOG(WARNING,
-			"VLAN %s failed to get index (%d)",
-			name, errno);
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
 		return 0;
 	}
 	return ret;
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
index 7903673..9be87c0 100644
--- a/drivers/net/mlx5/mlx5_nl.h
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -39,30 +39,33 @@ struct mlx5_nl_vlan_dev {
  */
 struct mlx5_nl_vlan_vmwa_context {
 	int nl_socket;
-	uint32_t nl_sn;
 	uint32_t vf_ifindex;
 	struct mlx5_nl_vlan_dev vlan_dev[4096];
 };
 
 
 int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
 			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
 unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			       struct rte_ether_addr *mac, int vf_index);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
 void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			   uint32_t ifindex);
+			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
 
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 760cc2f..84c8b05 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -47,7 +47,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 1);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      1);
 		if (ret)
 			return ret;
 	}
@@ -80,7 +81,8 @@
 
 	dev->data->promiscuous = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 0);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      0);
 		if (ret)
 			return ret;
 	}
@@ -120,7 +122,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 1);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       1);
 		if (ret)
 			goto error;
 	}
@@ -153,7 +156,8 @@
 
 	dev->data->all_multicast = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 0);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       0);
 		if (ret)
 			goto error;
 	}
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fb52d8f..fc1a91c 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -304,7 +304,6 @@ struct mlx5_nl_vlan_vmwa_context *
 		rte_free(vmwa);
 		return NULL;
 	}
-	vmwa->nl_sn = random();
 	vmwa->vf_ifindex = ifindex;
 	/* Cleanup for existing VLAN devices. */
 	return vmwa;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 24/25] common/mlx5: share Netlink commands
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (22 preceding siblings ...)
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
@ 2020-01-28 10:06   ` Matan Azrad
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:06 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move Netlink mechanism and its dependencies from net/mlx5 to
common/mlx5 in order to be ready to use by other mlx5 drivers.
The dependencies are BITFIELD defines, the ppc64 compilation workaround
for bool type and the function mlx5_translate_port_name.
Update build mechanism accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |    3 +-
 drivers/common/mlx5/meson.build                 |    1 +
 drivers/common/mlx5/mlx5_common.c               |   55 +
 drivers/common/mlx5/mlx5_common.h               |   58 +
 drivers/common/mlx5/mlx5_nl.c                   | 1337 ++++++++++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   57 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   18 +-
 drivers/net/mlx5/Makefile                       |    1 -
 drivers/net/mlx5/meson.build                    |    1 -
 drivers/net/mlx5/mlx5.h                         |    2 +-
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_ethdev.c                  |   55 -
 drivers/net/mlx5/mlx5_nl.c                      | 1338 -----------------------
 drivers/net/mlx5/mlx5_nl.h                      |   72 --
 drivers/net/mlx5/mlx5_vlan.c                    |    2 +-
 15 files changed, 1529 insertions(+), 1479 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b9e9803..6a14b7d 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -15,6 +15,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
@@ -41,7 +42,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs -lrte_net
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 174c64a..73aa8af 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -42,6 +42,7 @@ if build
 	sources = files(
 		'mlx5_devx_cmds.c',
 		'mlx5_common.c',
+		'mlx5_nl.c',
 	)
 	if not pmd_dlopen
 		sources += files('mlx5_glue.c')
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index c5af3ca..8973e5d 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -107,6 +107,61 @@
 	return ret;
 }
 
+/**
+ * Extract port name, as a number, from sysfs or netlink information.
+ *
+ * @param[in] port_name_in
+ *   String representing the port name.
+ * @param[out] port_info_out
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   port_name field set according to recognized name format.
+ */
+void
+mlx5_translate_port_name(const char *port_name_in,
+			 struct mlx5_switch_info *port_info_out)
+{
+	char pf_c1, pf_c2, vf_c1, vf_c2;
+	char *end;
+	int sc_items;
+
+	/*
+	 * Check for port-name as a string of the form pf0vf0
+	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
+			  &pf_c1, &pf_c2, &port_info_out->pf_num,
+			  &vf_c1, &vf_c2, &port_info_out->port_name);
+	if (sc_items == 6 &&
+	    pf_c1 == 'p' && pf_c2 == 'f' &&
+	    vf_c1 == 'v' && vf_c2 == 'f') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
+		return;
+	}
+	/*
+	 * Check for port-name as a string of the form p0
+	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%d",
+			  &pf_c1, &port_info_out->port_name);
+	if (sc_items == 2 && pf_c1 == 'p') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
+		return;
+	}
+	/* Check for port-name as a number (support kernel ver < 5.0 */
+	errno = 0;
+	port_info_out->port_name = strtol(port_name_in, &end, 0);
+	if (!errno &&
+	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
+		return;
+	}
+	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
+	return;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index aeaa7b9..ac65105 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -18,6 +18,35 @@
 
 
 /*
+ * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
+ * Otherwise there would be a type conflict between stdbool and altivec.
+ */
+#if defined(__PPC64__) && !defined(__APPLE_ALTIVEC__)
+#undef bool
+/* redefine as in stdbool.h */
+#define bool _Bool
+#endif
+
+/* Bit-field manipulation. */
+#define BITFIELD_DECLARE(bf, type, size) \
+	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \
+		 !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
+#define BITFIELD_DEFINE(bf, type, size) \
+	BITFIELD_DECLARE((bf), type, (size)) = { 0 }
+#define BITFIELD_SET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |= \
+		((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_RESET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &= \
+		~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_ISSET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] & \
+	     ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
+
+/*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
  * manner.
  */
@@ -112,6 +141,33 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* Maximum number of simultaneous unicast MAC addresses. */
+#define MLX5_MAX_UC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous Multicast MAC addresses. */
+#define MLX5_MAX_MC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous MAC addresses. */
+#define MLX5_MAX_MAC_ADDRESSES \
+	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
 /* CQE status. */
 enum mlx5_cqe_status {
 	MLX5_CQE_STATUS_SW_OWN = -1,
@@ -152,5 +208,7 @@ enum mlx5_cqe_status {
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 int mlx5_vdpa_mode_selected(struct rte_devargs *devargs);
+void mlx5_translate_port_name(const char *port_name_in,
+			      struct mlx5_switch_info *port_info_out);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
new file mode 100644
index 0000000..b4fc053
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -0,0 +1,1337 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <linux/if_link.h>
+#include <linux/rtnetlink.h>
+#include <net/if.h>
+#include <rdma/rdma_netlink.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdalign.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <stdbool.h>
+
+#include <rte_errno.h>
+#include <rte_atomic.h>
+
+#include "mlx5_nl.h"
+#include "mlx5_common_utils.h"
+
+/* Size of the buffer to receive kernel messages */
+#define MLX5_NL_BUF_SIZE (32 * 1024)
+/* Send buffer size for the Netlink socket */
+#define MLX5_SEND_BUF_SIZE 32768
+/* Receive buffer size for the Netlink socket */
+#define MLX5_RECV_BUF_SIZE 32768
+
+/** Parameters of VLAN devices created by driver. */
+#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
+/*
+ * Define NDA_RTA as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef MLX5_NDA_RTA
+#define MLX5_NDA_RTA(r) \
+	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
+#endif
+/*
+ * Define NLMSG_TAIL as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef NLMSG_TAIL
+#define NLMSG_TAIL(nmsg) \
+	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
+#endif
+/*
+ * The following definitions are normally found in rdma/rdma_netlink.h,
+ * however they are so recent that most systems do not expose them yet.
+ */
+#ifndef HAVE_RDMA_NL_NLDEV
+#define RDMA_NL_NLDEV 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_GET
+#define RDMA_NLDEV_CMD_GET 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
+#define RDMA_NLDEV_CMD_PORT_GET 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
+#define RDMA_NLDEV_ATTR_DEV_INDEX 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
+#define RDMA_NLDEV_ATTR_DEV_NAME 2
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
+#define RDMA_NLDEV_ATTR_PORT_INDEX 3
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
+#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
+#endif
+
+/* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
+#ifndef HAVE_IFLA_PHYS_SWITCH_ID
+#define IFLA_PHYS_SWITCH_ID 36
+#endif
+#ifndef HAVE_IFLA_PHYS_PORT_NAME
+#define IFLA_PHYS_PORT_NAME 38
+#endif
+
+/* Add/remove MAC address through Netlink */
+struct mlx5_nl_mac_addr {
+	struct rte_ether_addr (*mac)[];
+	/**< MAC address handled by the device. */
+	int mac_n; /**< Number of addresses in the array. */
+};
+
+#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
+#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
+#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
+#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
+
+/** Data structure used by mlx5_nl_cmdget_cb(). */
+struct mlx5_nl_ifindex_data {
+	const char *name; /**< IB device name (in). */
+	uint32_t flags; /**< found attribute flags (out). */
+	uint32_t ibindex; /**< IB device index (out). */
+	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number (out). */
+};
+
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
+/**
+ * Opens a Netlink socket.
+ *
+ * @param protocol
+ *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
+ *
+ * @return
+ *   A file descriptor on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+int
+mlx5_nl_init(int protocol)
+{
+	int fd;
+	int sndbuf_size = MLX5_SEND_BUF_SIZE;
+	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
+	struct sockaddr_nl local = {
+		.nl_family = AF_NETLINK,
+	};
+	int ret;
+
+	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
+	if (fd == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	return fd;
+error:
+	close(fd);
+	return -rte_errno;
+}
+
+/**
+ * Send a request message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] ssn
+ *   Sequence number.
+ * @param[in] req
+ *   Pointer to the request structure.
+ * @param[in] len
+ *   Length of the request in bytes.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
+		int len)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov[2] = {
+		{ .iov_base = nh, .iov_len = sizeof(*nh), },
+		{ .iov_base = req, .iov_len = len, },
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = iov,
+		.msg_iovlen = 2,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Send a message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] sn
+ *   Sequence number.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov = {
+		.iov_base = nh,
+		.iov_len = nh->nlmsg_len,
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Receive a message from the kernel on the Netlink socket, following
+ * mlx5_nl_send().
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] sn
+ *   Sequence number.
+ * @param[in] cb
+ *   The callback function to call for each Netlink message received.
+ * @param[in, out] arg
+ *   Custom arguments for the callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
+	     void *arg)
+{
+	struct sockaddr_nl sa;
+	char buf[MLX5_RECV_BUF_SIZE];
+	struct iovec iov = {
+		.iov_base = buf,
+		.iov_len = sizeof(buf),
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		/* One message at a time */
+		.msg_iovlen = 1,
+	};
+	int multipart = 0;
+	int ret = 0;
+
+	do {
+		struct nlmsghdr *nh;
+		int recv_bytes = 0;
+
+		do {
+			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
+			if (recv_bytes == -1) {
+				rte_errno = errno;
+				return -rte_errno;
+			}
+			nh = (struct nlmsghdr *)buf;
+		} while (nh->nlmsg_seq != sn);
+		for (;
+		     NLMSG_OK(nh, (unsigned int)recv_bytes);
+		     nh = NLMSG_NEXT(nh, recv_bytes)) {
+			if (nh->nlmsg_type == NLMSG_ERROR) {
+				struct nlmsgerr *err_data = NLMSG_DATA(nh);
+
+				if (err_data->error < 0) {
+					rte_errno = -err_data->error;
+					return -rte_errno;
+				}
+				/* Ack message. */
+				return 0;
+			}
+			/* Multi-part msgs and their trailing DONE message. */
+			if (nh->nlmsg_flags & NLM_F_MULTI) {
+				if (nh->nlmsg_type == NLMSG_DONE)
+					return 0;
+				multipart = 1;
+			}
+			if (cb) {
+				ret = cb(nh, arg);
+				if (ret < 0)
+					return ret;
+			}
+		}
+	} while (multipart);
+	return ret;
+}
+
+/**
+ * Parse Netlink message to retrieve the bridge MAC address.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_mac_addr *data = arg;
+	struct ndmsg *r = NLMSG_DATA(nh);
+	struct rtattr *attribute;
+	int len;
+
+	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
+	for (attribute = MLX5_NDA_RTA(r);
+	     RTA_OK(attribute, len);
+	     attribute = RTA_NEXT(attribute, len)) {
+		if (attribute->rta_type == NDA_LLADDR) {
+			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
+				DRV_LOG(WARNING,
+					"not enough room to finalize the"
+					" request");
+				rte_errno = ENOMEM;
+				return -rte_errno;
+			}
+#ifndef NDEBUG
+			char m[18];
+
+			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
+			DRV_LOG(DEBUG, "bridge MAC address %s", m);
+#endif
+			memcpy(&(*data->mac)[data->mac_n++],
+			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
+		}
+	}
+	return 0;
+}
+
+/**
+ * Get bridge MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac[out]
+ *   Pointer to the array table of MAC addresses to fill.
+ *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
+ * @param mac_n[out]
+ *   Number of entries filled in MAC array.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
+{
+	struct {
+		struct nlmsghdr	hdr;
+		struct ifinfomsg ifm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_GETNEIGH,
+			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		},
+		.ifm = {
+			.ifi_family = PF_BRIDGE,
+			.ifi_index = iface_idx,
+		},
+	};
+	struct mlx5_nl_mac_addr data = {
+		.mac = mac,
+		.mac_n = 0,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
+			      sizeof(struct ifinfomsg));
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
+	if (ret < 0)
+		goto error;
+	*mac_n = data.mac_n;
+	return 0;
+error:
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *   MAC address to consider.
+ * @param add
+ *   1 to add the MAC address, 0 to remove the MAC address.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ndmsg ndm;
+		struct rtattr rta;
+		uint8_t buffer[RTE_ETHER_ADDR_LEN];
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+				NLM_F_EXCL | NLM_F_ACK,
+			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
+		},
+		.ndm = {
+			.ndm_family = PF_BRIDGE,
+			.ndm_state = NUD_NOARP | NUD_PERMANENT,
+			.ndm_ifindex = iface_idx,
+			.ndm_flags = NTF_SELF,
+		},
+		.rta = {
+			.rta_type = NDA_LLADDR,
+			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.rta.rta_len);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
+		add ? "add" : "remove",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the VF MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *    MAC address to consider.
+ * @param vf_index
+ *    VF index.
+ *
+ * @return
+ *    0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac, int vf_index)
+{
+	int ret;
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifm;
+		struct rtattr vf_list_rta;
+		struct rtattr vf_info_rta;
+		struct rtattr vf_mac_rta;
+		struct ifla_vf_mac ivm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+			.nlmsg_type = RTM_BASE,
+		},
+		.ifm = {
+			.ifi_index = iface_idx,
+		},
+		.vf_list_rta = {
+			.rta_type = IFLA_VFINFO_LIST,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_info_rta = {
+			.rta_type = IFLA_VF_INFO,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_mac_rta = {
+			.rta_type = IFLA_VF_MAC,
+		},
+	};
+	struct ifla_vf_mac ivm = {
+		.vf = vf_index,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+
+	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
+	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
+
+	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.vf_list_rta.rta_len) +
+		RTA_ALIGN(req.vf_info_rta.rta_len) +
+		RTA_ALIGN(req.vf_mac_rta.rta_len);
+	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_list_rta);
+	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_info_rta);
+
+	if (nlsk_fd < 0)
+		return -1;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(ERR,
+		"representor %u cannot set VF MAC address "
+		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
+		vf_index,
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Add a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
+		     uint32_t index)
+{
+	int ret;
+
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
+	if (!ret)
+		BITFIELD_SET(mac_own, index);
+	if (ret == -EEXIST)
+		return 0;
+	return ret;
+}
+
+/**
+ * Remove a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to remove.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
+{
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
+}
+
+/**
+ * Synchronize Netlink bridge table to the internal table.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
+ */
+void
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
+{
+	struct rte_ether_addr macs[n];
+	int macs_n = 0;
+	int i;
+	int ret;
+
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
+	if (ret)
+		return;
+	for (i = 0; i != macs_n; ++i) {
+		int j;
+
+		/* Verify the address is not in the array yet. */
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
+				break;
+		if (j != n)
+			continue;
+		/* Find the first entry available. */
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
+				break;
+			}
+		}
+	}
+}
+
+/**
+ * Flush all added MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ */
+void
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
+{
+	int i;
+
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
+
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
+	}
+}
+
+/**
+ * Enable promiscuous / all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param flags
+ *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifi;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_NEWLINK,
+			.nlmsg_flags = NLM_F_REQUEST,
+		},
+		.ifi = {
+			.ifi_flags = enable ? flags : 0,
+			.ifi_change = flags,
+			.ifi_index = iface_idx,
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
+	if (nlsk_fd < 0)
+		return 0;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+/**
+ * Enable promiscuous mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Enable all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Process network interface information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_ifindex_data *data = arg;
+	struct mlx5_nl_ifindex_data local = {
+		.flags = 0,
+	};
+	size_t off = NLMSG_HDRLEN;
+
+	if (nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
+	    nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct nlattr *na = (void *)((uintptr_t)nh + off);
+		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
+
+		if (na->nla_len > nh->nlmsg_len - off)
+			goto error;
+		switch (na->nla_type) {
+		case RDMA_NLDEV_ATTR_DEV_INDEX:
+			local.ibindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_DEV_NAME:
+			if (!strcmp(payload, data->name))
+				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
+			break;
+		case RDMA_NLDEV_ATTR_NDEV_INDEX:
+			local.ifindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			local.portnum = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
+			break;
+		default:
+			break;
+		}
+		off += NLA_ALIGN(na->nla_len);
+	}
+	/*
+	 * It is possible to have multiple messages for all
+	 * Infiniband devices in the system with appropriate name.
+	 * So we should gather parameters locally and copy to
+	 * query context only in case of coinciding device name.
+	 */
+	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
+		data->flags = local.flags;
+		data->ibindex = local.ibindex;
+		data->ifindex = local.ifindex;
+		data->portnum = local.portnum;
+	}
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get index of network interface associated with some IB device.
+ *
+ * This is the only somewhat safe method to avoid resorting to heuristics
+ * when faced with port representors. Unfortunately it requires at least
+ * Linux 4.17.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ * @param[in] pindex
+ *   IB device port index, starting from 1
+ * @return
+ *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
+ *   is set.
+ */
+unsigned int
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.flags = 0,
+		.ibindex = 0, /* Determined during first pass. */
+		.ifindex = 0, /* Determined during second pass. */
+	};
+	union {
+		struct nlmsghdr nh;
+		uint8_t buf[NLMSG_HDRLEN +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(0),
+			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+						       RDMA_NLDEV_CMD_GET),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+		},
+	};
+	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
+		goto error;
+	data.flags = 0;
+	sn = MLX5_NL_SN_GENERATE;
+	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					     RDMA_NLDEV_CMD_PORT_GET);
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
+	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
+	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
+	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &data.ibindex, sizeof(data.ibindex));
+	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
+	na->nla_len = NLA_HDRLEN + sizeof(pindex);
+	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &pindex, sizeof(pindex));
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
+	    !data.ifindex)
+		goto error;
+	return data.ifindex;
+error:
+	rte_errno = ENODEV;
+	return 0;
+}
+
+/**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.flags = 0,
+		.name = name,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
+ * Process switch information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_switch_info info = {
+		.master = 0,
+		.representor = 0,
+		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
+		.port_name = 0,
+		.switch_id = 0,
+	};
+	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+	bool switch_id_set = false;
+	bool num_vf_set = false;
+
+	if (nh->nlmsg_type != RTM_NEWLINK)
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct rtattr *ra = (void *)((uintptr_t)nh + off);
+		void *payload = RTA_DATA(ra);
+		unsigned int i;
+
+		if (ra->rta_len > nh->nlmsg_len - off)
+			goto error;
+		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
+		case IFLA_PHYS_PORT_NAME:
+			mlx5_translate_port_name((char *)payload, &info);
+			break;
+		case IFLA_PHYS_SWITCH_ID:
+			info.switch_id = 0;
+			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
+				info.switch_id <<= 8;
+				info.switch_id |= ((uint8_t *)payload)[i];
+			}
+			switch_id_set = true;
+			break;
+		}
+		off += RTA_ALIGN(ra->rta_len);
+	}
+	if (switch_id_set) {
+		/* We have some E-Switch configuration. */
+		mlx5_nl_check_switch_info(num_vf_set, &info);
+	}
+	assert(!(info.master && info.representor));
+	memcpy(arg, &info, sizeof(info));
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get switch information associated with network interface.
+ *
+ * @param nl
+ *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
+ * @param ifindex
+ *   Network interface index.
+ * @param[out] info
+ *   Switch information object, populated in case of success.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
+{
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
+			.nlmsg_type = RTM_GETLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
+	if (info->master && info->representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		ret = -rte_errno;
+	}
+	return ret;
+}
+
+/*
+ * Delete VLAN network device by ifindex.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Interface index of network device to delete.
+ */
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+		      uint32_t ifindex)
+{
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_DELLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+	};
+
+	if (ifindex) {
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
+		if (ret >= 0)
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+		if (ret < 0)
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
+	}
+}
+
+/* Set of subroutines to build Netlink message. */
+static struct nlattr *
+nl_msg_tail(struct nlmsghdr *nlh)
+{
+	return (struct nlattr *)
+		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
+}
+
+static void
+nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
+{
+	struct nlattr *nla = nl_msg_tail(nlh);
+
+	nla->nla_type = type;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
+	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+
+	if (alen)
+		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
+}
+
+static struct nlattr *
+nl_attr_nest_start(struct nlmsghdr *nlh, int type)
+{
+	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
+
+	nl_attr_put(nlh, type, NULL, 0);
+	return nest;
+}
+
+static void
+nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
+{
+	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
+}
+
+/*
+ * Create network VLAN device with specified VLAN tag.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Base network interface index.
+ * @param[in] tag
+ *   VLAN tag for VLAN network device to create.
+ */
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			 uint32_t ifindex, uint16_t tag)
+{
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
+
+	alignas(RTE_CACHE_LINE_SIZE)
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(name)) +
+		    NLMSG_ALIGN(sizeof("vlan")) +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
+	struct nlattr *na_info;
+	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = RTM_NEWLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+			   NLM_F_EXCL | NLM_F_ACK;
+	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct ifinfomsg);
+	ifm->ifi_family = AF_UNSPEC;
+	ifm->ifi_type = 0;
+	ifm->ifi_index = 0;
+	ifm->ifi_flags = IFF_UP;
+	ifm->ifi_change = 0xffffffff;
+	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
+	ret = snprintf(name, sizeof(name), "%s.%u.%u",
+		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
+	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
+	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
+	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
+	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
+	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
+	nl_attr_nest_end(nlh, na_vlan);
+	nl_attr_nest_end(nlh, na_info);
+	assert(sizeof(buf) >= nlh->nlmsg_len);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
+	}
+	// Try to get ifindex of created or pre-existing device.
+	ret = if_nametoindex(name);
+	if (!ret) {
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
+		return 0;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..8e66a98
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+#include <rte_ether.h>
+
+#include "mlx5_common.h"
+
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			      uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index d32d631..34b66a5 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -26,5 +26,21 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-	mlx5_vdpa_mode_selected;
+
+	mlx5_nl_allmulti;
+	mlx5_nl_ifindex;
+	mlx5_nl_init;
+	mlx5_nl_mac_addr_add;
+	mlx5_nl_mac_addr_flush;
+	mlx5_nl_mac_addr_remove;
+	mlx5_nl_mac_addr_sync;
+	mlx5_nl_portnum;
+	mlx5_nl_promisc;
+	mlx5_nl_switch_info;
+	mlx5_nl_vf_mac_addr_modify;
+	mlx5_nl_vlan_vmwa_create;
+	mlx5_nl_vlan_vmwa_delete;
+
+	mlx5_translate_port_name;
+        mlx5_vdpa_mode_selected;
 };
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index dc6b3c8..d26afbb 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -30,7 +30,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_meter.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index e10ef3a..d45be00 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -19,7 +19,6 @@ sources = files(
 	'mlx5_flow_verbs.c',
 	'mlx5_mac.c',
 	'mlx5_mr.c',
-	'mlx5_nl.c',
 	'mlx5_rss.c',
 	'mlx5_rxmode.c',
 	'mlx5_rxq.c',
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9864aa7..a7e7089 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -35,11 +35,11 @@
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
-#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index dc9b965..9b392ed 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -14,14 +14,6 @@
 /* Reported driver name. */
 #define MLX5_DRIVER_NAME "net_mlx5"
 
-/* Maximum number of simultaneous unicast MAC addresses. */
-#define MLX5_MAX_UC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous Multicast MAC addresses. */
-#define MLX5_MAX_MC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous MAC addresses. */
-#define MLX5_MAX_MAC_ADDRESSES \
-	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
-
 /* Maximum number of simultaneous VLAN filters. */
 #define MLX5_MAX_VLAN_IDS 128
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5484104..b765636 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1940,61 +1940,6 @@ struct mlx5_priv *
 }
 
 /**
- * Extract port name, as a number, from sysfs or netlink information.
- *
- * @param[in] port_name_in
- *   String representing the port name.
- * @param[out] port_info_out
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   port_name field set according to recognized name format.
- */
-void
-mlx5_translate_port_name(const char *port_name_in,
-			 struct mlx5_switch_info *port_info_out)
-{
-	char pf_c1, pf_c2, vf_c1, vf_c2;
-	char *end;
-	int sc_items;
-
-	/*
-	 * Check for port-name as a string of the form pf0vf0
-	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
-			  &pf_c1, &pf_c2, &port_info_out->pf_num,
-			  &vf_c1, &vf_c2, &port_info_out->port_name);
-	if (sc_items == 6 &&
-	    pf_c1 == 'p' && pf_c2 == 'f' &&
-	    vf_c1 == 'v' && vf_c2 == 'f') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
-		return;
-	}
-	/*
-	 * Check for port-name as a string of the form p0
-	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%d",
-			  &pf_c1, &port_info_out->port_name);
-	if (sc_items == 2 && pf_c1 == 'p') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
-		return;
-	}
-	/* Check for port-name as a number (support kernel ver < 5.0 */
-	errno = 0;
-	port_info_out->port_name = strtol(port_name_in, &end, 0);
-	if (!errno &&
-	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
-		return;
-	}
-	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
-	return;
-}
-
-/**
  * DPDK callback to retrieve plug-in module EEPROM information (type and size).
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
deleted file mode 100644
index 6b8ca00..0000000
--- a/drivers/net/mlx5/mlx5_nl.c
+++ /dev/null
@@ -1,1338 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <linux/if_link.h>
-#include <linux/rtnetlink.h>
-#include <net/if.h>
-#include <rdma/rdma_netlink.h>
-#include <stdbool.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <stdalign.h>
-#include <string.h>
-#include <sys/socket.h>
-#include <unistd.h>
-
-#include <rte_errno.h>
-#include <rte_atomic.h>
-#include <rte_ether.h>
-
-#include "mlx5.h"
-#include "mlx5_nl.h"
-#include "mlx5_utils.h"
-
-/* Size of the buffer to receive kernel messages */
-#define MLX5_NL_BUF_SIZE (32 * 1024)
-/* Send buffer size for the Netlink socket */
-#define MLX5_SEND_BUF_SIZE 32768
-/* Receive buffer size for the Netlink socket */
-#define MLX5_RECV_BUF_SIZE 32768
-
-/** Parameters of VLAN devices created by driver. */
-#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
-/*
- * Define NDA_RTA as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef MLX5_NDA_RTA
-#define MLX5_NDA_RTA(r) \
-	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
-#endif
-/*
- * Define NLMSG_TAIL as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef NLMSG_TAIL
-#define NLMSG_TAIL(nmsg) \
-	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
-#endif
-/*
- * The following definitions are normally found in rdma/rdma_netlink.h,
- * however they are so recent that most systems do not expose them yet.
- */
-#ifndef HAVE_RDMA_NL_NLDEV
-#define RDMA_NL_NLDEV 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_GET
-#define RDMA_NLDEV_CMD_GET 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
-#define RDMA_NLDEV_CMD_PORT_GET 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
-#define RDMA_NLDEV_ATTR_DEV_INDEX 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
-#define RDMA_NLDEV_ATTR_DEV_NAME 2
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
-#define RDMA_NLDEV_ATTR_PORT_INDEX 3
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
-#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
-#endif
-
-/* These are normally found in linux/if_link.h. */
-#ifndef HAVE_IFLA_NUM_VF
-#define IFLA_NUM_VF 21
-#endif
-#ifndef HAVE_IFLA_EXT_MASK
-#define IFLA_EXT_MASK 29
-#endif
-#ifndef HAVE_IFLA_PHYS_SWITCH_ID
-#define IFLA_PHYS_SWITCH_ID 36
-#endif
-#ifndef HAVE_IFLA_PHYS_PORT_NAME
-#define IFLA_PHYS_PORT_NAME 38
-#endif
-
-/* Add/remove MAC address through Netlink */
-struct mlx5_nl_mac_addr {
-	struct rte_ether_addr (*mac)[];
-	/**< MAC address handled by the device. */
-	int mac_n; /**< Number of addresses in the array. */
-};
-
-#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
-#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
-#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
-#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
-
-/** Data structure used by mlx5_nl_cmdget_cb(). */
-struct mlx5_nl_ifindex_data {
-	const char *name; /**< IB device name (in). */
-	uint32_t flags; /**< found attribute flags (out). */
-	uint32_t ibindex; /**< IB device index (out). */
-	uint32_t ifindex; /**< Network interface index (out). */
-	uint32_t portnum; /**< IB device max port number (out). */
-};
-
-rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
-
-/* Generate Netlink sequence number. */
-#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
-
-/**
- * Opens a Netlink socket.
- *
- * @param protocol
- *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
- *
- * @return
- *   A file descriptor on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-int
-mlx5_nl_init(int protocol)
-{
-	int fd;
-	int sndbuf_size = MLX5_SEND_BUF_SIZE;
-	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
-	struct sockaddr_nl local = {
-		.nl_family = AF_NETLINK,
-	};
-	int ret;
-
-	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
-	if (fd == -1) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	return fd;
-error:
-	close(fd);
-	return -rte_errno;
-}
-
-/**
- * Send a request message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] ssn
- *   Sequence number.
- * @param[in] req
- *   Pointer to the request structure.
- * @param[in] len
- *   Length of the request in bytes.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
-		int len)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov[2] = {
-		{ .iov_base = nh, .iov_len = sizeof(*nh), },
-		{ .iov_base = req, .iov_len = len, },
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = iov,
-		.msg_iovlen = 2,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Send a message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] sn
- *   Sequence number.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov = {
-		.iov_base = nh,
-		.iov_len = nh->nlmsg_len,
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		.msg_iovlen = 1,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Receive a message from the kernel on the Netlink socket, following
- * mlx5_nl_send().
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] sn
- *   Sequence number.
- * @param[in] cb
- *   The callback function to call for each Netlink message received.
- * @param[in, out] arg
- *   Custom arguments for the callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
-	     void *arg)
-{
-	struct sockaddr_nl sa;
-	char buf[MLX5_RECV_BUF_SIZE];
-	struct iovec iov = {
-		.iov_base = buf,
-		.iov_len = sizeof(buf),
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		/* One message at a time */
-		.msg_iovlen = 1,
-	};
-	int multipart = 0;
-	int ret = 0;
-
-	do {
-		struct nlmsghdr *nh;
-		int recv_bytes = 0;
-
-		do {
-			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
-			if (recv_bytes == -1) {
-				rte_errno = errno;
-				return -rte_errno;
-			}
-			nh = (struct nlmsghdr *)buf;
-		} while (nh->nlmsg_seq != sn);
-		for (;
-		     NLMSG_OK(nh, (unsigned int)recv_bytes);
-		     nh = NLMSG_NEXT(nh, recv_bytes)) {
-			if (nh->nlmsg_type == NLMSG_ERROR) {
-				struct nlmsgerr *err_data = NLMSG_DATA(nh);
-
-				if (err_data->error < 0) {
-					rte_errno = -err_data->error;
-					return -rte_errno;
-				}
-				/* Ack message. */
-				return 0;
-			}
-			/* Multi-part msgs and their trailing DONE message. */
-			if (nh->nlmsg_flags & NLM_F_MULTI) {
-				if (nh->nlmsg_type == NLMSG_DONE)
-					return 0;
-				multipart = 1;
-			}
-			if (cb) {
-				ret = cb(nh, arg);
-				if (ret < 0)
-					return ret;
-			}
-		}
-	} while (multipart);
-	return ret;
-}
-
-/**
- * Parse Netlink message to retrieve the bridge MAC address.
- *
- * @param nh
- *   Pointer to Netlink Message Header.
- * @param arg
- *   PMD data register with this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_mac_addr *data = arg;
-	struct ndmsg *r = NLMSG_DATA(nh);
-	struct rtattr *attribute;
-	int len;
-
-	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
-	for (attribute = MLX5_NDA_RTA(r);
-	     RTA_OK(attribute, len);
-	     attribute = RTA_NEXT(attribute, len)) {
-		if (attribute->rta_type == NDA_LLADDR) {
-			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
-				DRV_LOG(WARNING,
-					"not enough room to finalize the"
-					" request");
-				rte_errno = ENOMEM;
-				return -rte_errno;
-			}
-#ifndef NDEBUG
-			char m[18];
-
-			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
-			DRV_LOG(DEBUG, "bridge MAC address %s", m);
-#endif
-			memcpy(&(*data->mac)[data->mac_n++],
-			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
-		}
-	}
-	return 0;
-}
-
-/**
- * Get bridge MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac[out]
- *   Pointer to the array table of MAC addresses to fill.
- *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
- * @param mac_n[out]
- *   Number of entries filled in MAC array.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr (*mac)[], int *mac_n)
-{
-	struct {
-		struct nlmsghdr	hdr;
-		struct ifinfomsg ifm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_GETNEIGH,
-			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
-		},
-		.ifm = {
-			.ifi_family = PF_BRIDGE,
-			.ifi_index = iface_idx,
-		},
-	};
-	struct mlx5_nl_mac_addr data = {
-		.mac = mac,
-		.mac_n = 0,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
-			      sizeof(struct ifinfomsg));
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
-	if (ret < 0)
-		goto error;
-	*mac_n = data.mac_n;
-	return 0;
-error:
-	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
-		iface_idx, strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *   MAC address to consider.
- * @param add
- *   1 to add the MAC address, 0 to remove the MAC address.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			struct rte_ether_addr *mac, int add)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ndmsg ndm;
-		struct rtattr rta;
-		uint8_t buffer[RTE_ETHER_ADDR_LEN];
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-				NLM_F_EXCL | NLM_F_ACK,
-			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
-		},
-		.ndm = {
-			.ndm_family = PF_BRIDGE,
-			.ndm_state = NUD_NOARP | NUD_PERMANENT,
-			.ndm_ifindex = iface_idx,
-			.ndm_flags = NTF_SELF,
-		},
-		.rta = {
-			.rta_type = NDA_LLADDR,
-			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(DEBUG,
-		"Interface %u cannot %s MAC address"
-		" %02X:%02X:%02X:%02X:%02X:%02X %s",
-		iface_idx,
-		add ? "add" : "remove",
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the VF MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *    MAC address to consider.
- * @param vf_index
- *    VF index.
- *
- * @return
- *    0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac, int vf_index)
-{
-	int ret;
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifm;
-		struct rtattr vf_list_rta;
-		struct rtattr vf_info_rta;
-		struct rtattr vf_mac_rta;
-		struct ifla_vf_mac ivm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-			.nlmsg_type = RTM_BASE,
-		},
-		.ifm = {
-			.ifi_index = iface_idx,
-		},
-		.vf_list_rta = {
-			.rta_type = IFLA_VFINFO_LIST,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_info_rta = {
-			.rta_type = IFLA_VF_INFO,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_mac_rta = {
-			.rta_type = IFLA_VF_MAC,
-		},
-	};
-	struct ifla_vf_mac ivm = {
-		.vf = vf_index,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-
-	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
-	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
-
-	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.vf_list_rta.rta_len) +
-		RTA_ALIGN(req.vf_info_rta.rta_len) +
-		RTA_ALIGN(req.vf_mac_rta.rta_len);
-	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_list_rta);
-	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_info_rta);
-
-	if (nlsk_fd < 0)
-		return -1;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(ERR,
-		"representor %u cannot set VF MAC address "
-		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
-		vf_index,
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Add a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to register.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
-		     uint64_t *mac_own, struct rte_ether_addr *mac,
-		     uint32_t index)
-{
-	int ret;
-
-	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
-	if (!ret)
-		BITFIELD_SET(mac_own, index);
-	if (ret == -EEXIST)
-		return 0;
-	return ret;
-}
-
-/**
- * Remove a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to remove.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			struct rte_ether_addr *mac, uint32_t index)
-{
-	BITFIELD_RESET(mac_own, index);
-	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
-}
-
-/**
- * Synchronize Netlink bridge table to the internal table.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_addrs
- *   Mac addresses array to sync.
- * @param n
- *   @p mac_addrs array size.
- */
-void
-mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr *mac_addrs, int n)
-{
-	struct rte_ether_addr macs[n];
-	int macs_n = 0;
-	int i;
-	int ret;
-
-	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
-	if (ret)
-		return;
-	for (i = 0; i != macs_n; ++i) {
-		int j;
-
-		/* Verify the address is not in the array yet. */
-		for (j = 0; j != n; ++j)
-			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
-				break;
-		if (j != n)
-			continue;
-		/* Find the first entry available. */
-		for (j = 0; j != n; ++j) {
-			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
-				mac_addrs[j] = macs[i];
-				break;
-			}
-		}
-	}
-}
-
-/**
- * Flush all added MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param[in] mac_addrs
- *   Mac addresses array to flush.
- * @param n
- *   @p mac_addrs array size.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- */
-void
-mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-		       struct rte_ether_addr *mac_addrs, int n,
-		       uint64_t *mac_own)
-{
-	int i;
-
-	for (i = n - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &mac_addrs[i];
-
-		if (BITFIELD_ISSET(mac_own, i))
-			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
-						i);
-	}
-}
-
-/**
- * Enable promiscuous / all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param flags
- *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
-		     int enable)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifi;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_NEWLINK,
-			.nlmsg_flags = NLM_F_REQUEST,
-		},
-		.ifi = {
-			.ifi_flags = enable ? flags : 0,
-			.ifi_change = flags,
-			.ifi_index = iface_idx,
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (nlsk_fd < 0)
-		return 0;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		return ret;
-	return 0;
-}
-
-/**
- * Enable promiscuous mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s promisc mode: Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Enable all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
-				       enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s allmulti : Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Process network interface information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_ifindex_data *data = arg;
-	struct mlx5_nl_ifindex_data local = {
-		.flags = 0,
-	};
-	size_t off = NLMSG_HDRLEN;
-
-	if (nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
-	    nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct nlattr *na = (void *)((uintptr_t)nh + off);
-		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
-
-		if (na->nla_len > nh->nlmsg_len - off)
-			goto error;
-		switch (na->nla_type) {
-		case RDMA_NLDEV_ATTR_DEV_INDEX:
-			local.ibindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_DEV_NAME:
-			if (!strcmp(payload, data->name))
-				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
-			break;
-		case RDMA_NLDEV_ATTR_NDEV_INDEX:
-			local.ifindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_PORT_INDEX:
-			local.portnum = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
-			break;
-		default:
-			break;
-		}
-		off += NLA_ALIGN(na->nla_len);
-	}
-	/*
-	 * It is possible to have multiple messages for all
-	 * Infiniband devices in the system with appropriate name.
-	 * So we should gather parameters locally and copy to
-	 * query context only in case of coinciding device name.
-	 */
-	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
-		data->flags = local.flags;
-		data->ibindex = local.ibindex;
-		data->ifindex = local.ifindex;
-		data->portnum = local.portnum;
-	}
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get index of network interface associated with some IB device.
- *
- * This is the only somewhat safe method to avoid resorting to heuristics
- * when faced with port representors. Unfortunately it requires at least
- * Linux 4.17.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- * @param[in] pindex
- *   IB device port index, starting from 1
- * @return
- *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
- *   is set.
- */
-unsigned int
-mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.name = name,
-		.flags = 0,
-		.ibindex = 0, /* Determined during first pass. */
-		.ifindex = 0, /* Determined during second pass. */
-	};
-	union {
-		struct nlmsghdr nh;
-		uint8_t buf[NLMSG_HDRLEN +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(0),
-			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-						       RDMA_NLDEV_CMD_GET),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-		},
-	};
-	struct nlattr *na;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
-		goto error;
-	data.flags = 0;
-	sn = MLX5_NL_SN_GENERATE;
-	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					     RDMA_NLDEV_CMD_PORT_GET);
-	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
-	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
-	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
-	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &data.ibindex, sizeof(data.ibindex));
-	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
-	na->nla_len = NLA_HDRLEN + sizeof(pindex);
-	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
-	    !data.ifindex)
-		goto error;
-	return data.ifindex;
-error:
-	rte_errno = ENODEV;
-	return 0;
-}
-
-/**
- * Get the number of physical ports of given IB device.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- *
- * @return
- *   A valid (nonzero) number of ports on success, 0 otherwise
- *   and rte_errno is set.
- */
-unsigned int
-mlx5_nl_portnum(int nl, const char *name)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.flags = 0,
-		.name = name,
-		.ifindex = 0,
-		.portnum = 0,
-	};
-	struct nlmsghdr req = {
-		.nlmsg_len = NLMSG_LENGTH(0),
-		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					       RDMA_NLDEV_CMD_GET),
-		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
-		rte_errno = ENODEV;
-		return 0;
-	}
-	if (!data.portnum)
-		rte_errno = EINVAL;
-	return data.portnum;
-}
-
-/**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-static void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
- * Process switch information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_switch_info info = {
-		.master = 0,
-		.representor = 0,
-		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
-		.port_name = 0,
-		.switch_id = 0,
-	};
-	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-	bool switch_id_set = false;
-	bool num_vf_set = false;
-
-	if (nh->nlmsg_type != RTM_NEWLINK)
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct rtattr *ra = (void *)((uintptr_t)nh + off);
-		void *payload = RTA_DATA(ra);
-		unsigned int i;
-
-		if (ra->rta_len > nh->nlmsg_len - off)
-			goto error;
-		switch (ra->rta_type) {
-		case IFLA_NUM_VF:
-			num_vf_set = true;
-			break;
-		case IFLA_PHYS_PORT_NAME:
-			mlx5_translate_port_name((char *)payload, &info);
-			break;
-		case IFLA_PHYS_SWITCH_ID:
-			info.switch_id = 0;
-			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
-				info.switch_id <<= 8;
-				info.switch_id |= ((uint8_t *)payload)[i];
-			}
-			switch_id_set = true;
-			break;
-		}
-		off += RTA_ALIGN(ra->rta_len);
-	}
-	if (switch_id_set) {
-		/* We have some E-Switch configuration. */
-		mlx5_nl_check_switch_info(num_vf_set, &info);
-	}
-	assert(!(info.master && info.representor));
-	memcpy(arg, &info, sizeof(info));
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get switch information associated with network interface.
- *
- * @param nl
- *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
- * @param ifindex
- *   Network interface index.
- * @param[out] info
- *   Switch information object, populated in case of success.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_switch_info(int nl, unsigned int ifindex,
-		    struct mlx5_switch_info *info)
-{
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-		struct rtattr rta;
-		uint32_t extmask;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH
-					(sizeof(req.info) +
-					 RTA_LENGTH(sizeof(uint32_t))),
-			.nlmsg_type = RTM_GETLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-		.rta = {
-			.rta_type = IFLA_EXT_MASK,
-			.rta_len = RTA_LENGTH(sizeof(int32_t)),
-		},
-		.extmask = RTE_LE32(1),
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
-	if (info->master && info->representor) {
-		DRV_LOG(ERR, "ifindex %u device is recognized as master"
-			     " and as representor", ifindex);
-		rte_errno = ENODEV;
-		ret = -rte_errno;
-	}
-	return ret;
-}
-
-/*
- * Delete VLAN network device by ifindex.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Interface index of network device to delete.
- */
-void
-mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex)
-{
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_DELLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-	};
-
-	if (ifindex) {
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
-		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
-				" ifindex %u, %d", ifindex, ret);
-	}
-}
-
-/* Set of subroutines to build Netlink message. */
-static struct nlattr *
-nl_msg_tail(struct nlmsghdr *nlh)
-{
-	return (struct nlattr *)
-		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
-}
-
-static void
-nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
-{
-	struct nlattr *nla = nl_msg_tail(nlh);
-
-	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
-
-	if (alen)
-		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
-}
-
-static struct nlattr *
-nl_attr_nest_start(struct nlmsghdr *nlh, int type)
-{
-	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
-
-	nl_attr_put(nlh, type, NULL, 0);
-	return nest;
-}
-
-static void
-nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
-{
-	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
-}
-
-/*
- * Create network VLAN device with specified VLAN tag.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Base network interface index.
- * @param[in] tag
- *   VLAN tag for VLAN network device to create.
- */
-uint32_t
-mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			 uint32_t ifindex, uint16_t tag)
-{
-	struct nlmsghdr *nlh;
-	struct ifinfomsg *ifm;
-	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
-
-	alignas(RTE_CACHE_LINE_SIZE)
-	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
-		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
-		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(name)) +
-		    NLMSG_ALIGN(sizeof("vlan")) +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
-	struct nlattr *na_info;
-	struct nlattr *na_vlan;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	memset(buf, 0, sizeof(buf));
-	nlh = (struct nlmsghdr *)buf;
-	nlh->nlmsg_len = sizeof(struct nlmsghdr);
-	nlh->nlmsg_type = RTM_NEWLINK;
-	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-			   NLM_F_EXCL | NLM_F_ACK;
-	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
-	nlh->nlmsg_len += sizeof(struct ifinfomsg);
-	ifm->ifi_family = AF_UNSPEC;
-	ifm->ifi_type = 0;
-	ifm->ifi_index = 0;
-	ifm->ifi_flags = IFF_UP;
-	ifm->ifi_change = 0xffffffff;
-	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
-	ret = snprintf(name, sizeof(name), "%s.%u.%u",
-		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
-	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
-	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
-	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
-	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
-	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
-	nl_attr_nest_end(nlh, na_vlan);
-	nl_attr_nest_end(nlh, na_info);
-	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-	if (ret < 0) {
-		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
-			ret);
-	}
-	// Try to get ifindex of created or pre-existing device.
-	ret = if_nametoindex(name);
-	if (!ret) {
-		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
-			errno);
-		return 0;
-	}
-	return ret;
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
deleted file mode 100644
index 9be87c0..0000000
--- a/drivers/net/mlx5/mlx5_nl.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_NL_H_
-#define RTE_PMD_MLX5_NL_H_
-
-#include <linux/netlink.h>
-
-
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_nl_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
-
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_nl_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_nl_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t vf_ifindex;
-	struct mlx5_nl_vlan_dev vlan_dev[4096];
-};
-
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			 struct rte_ether_addr *mac, uint32_t index);
-int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
-			    uint64_t *mac_own, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac_addrs, int n);
-void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-			    struct rte_ether_addr *mac_addrs, int n,
-			    uint64_t *mac_own);
-int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
-int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			      uint32_t ifindex);
-uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-				  uint32_t ifindex, uint16_t tag);
-
-#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fc1a91c..8e63b67 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -33,11 +33,11 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_nl.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 25/25] common/mlx5: support ROCE disable through Netlink
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (23 preceding siblings ...)
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 24/25] common/mlx5: share Netlink commands Matan Azrad
@ 2020-01-28 10:06   ` Matan Azrad
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 10:06 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add new 4 Netlink commands to support enable/disable ROCE:
        1. mlx5_nl_devlink_family_id_get to get the Devlink family ID of
           Netlink general command.
        2. mlx5_nl_enable_roce_get to get the ROCE current status.
        3. mlx5_nl_driver_reload - to reload the device kernel driver.
        4. mlx5_nl_enable_roce_set - to set the ROCE status.
When the user changes the ROCE status, the IB device may disappear and
appear again, so DPDK driver should wait for it and to restart itself.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |   5 +
 drivers/common/mlx5/meson.build                 |   1 +
 drivers/common/mlx5/mlx5_nl.c                   | 366 +++++++++++++++++++++++-
 drivers/common/mlx5/mlx5_nl.h                   |   6 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   4 +
 5 files changed, 380 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 6a14b7d..9d4d81f 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -260,6 +260,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum IFLA_PHYS_PORT_NAME \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_DEVLINK \
+		linux/devlink.h \
+		define DEVLINK_GENL_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_SUPPORTED_40000baseKR4_Full \
 		/usr/include/linux/ethtool.h \
 		define SUPPORTED_40000baseKR4_Full \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 73aa8af..2bad1c1 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -168,6 +168,7 @@ if build
 		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
 		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
 		'mlx5dv_dump_dr_domain'],
+		[ 'HAVE_DEVLINK', 'linux/devlink.h', 'DEVLINK_GENL_NAME' ],
 	]
 	config = configuration_data()
 	foreach arg:has_sym_args
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
index b4fc053..0d1efd2 100644
--- a/drivers/common/mlx5/mlx5_nl.c
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -6,6 +6,7 @@
 #include <errno.h>
 #include <linux/if_link.h>
 #include <linux/rtnetlink.h>
+#include <linux/genetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
 #include <stdbool.h>
@@ -22,6 +23,10 @@
 
 #include "mlx5_nl.h"
 #include "mlx5_common_utils.h"
+#ifdef HAVE_DEVLINK
+#include <linux/devlink.h>
+#endif
+
 
 /* Size of the buffer to receive kernel messages */
 #define MLX5_NL_BUF_SIZE (32 * 1024)
@@ -90,6 +95,59 @@
 #define IFLA_PHYS_PORT_NAME 38
 #endif
 
+/*
+ * Some Devlink defines may be missed in old kernel versions,
+ * adjust used defines.
+ */
+#ifndef DEVLINK_GENL_NAME
+#define DEVLINK_GENL_NAME "devlink"
+#endif
+#ifndef DEVLINK_GENL_VERSION
+#define DEVLINK_GENL_VERSION 1
+#endif
+#ifndef DEVLINK_ATTR_BUS_NAME
+#define DEVLINK_ATTR_BUS_NAME 1
+#endif
+#ifndef DEVLINK_ATTR_DEV_NAME
+#define DEVLINK_ATTR_DEV_NAME 2
+#endif
+#ifndef DEVLINK_ATTR_PARAM
+#define DEVLINK_ATTR_PARAM 80
+#endif
+#ifndef DEVLINK_ATTR_PARAM_NAME
+#define DEVLINK_ATTR_PARAM_NAME 81
+#endif
+#ifndef DEVLINK_ATTR_PARAM_TYPE
+#define DEVLINK_ATTR_PARAM_TYPE 83
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUES_LIST
+#define DEVLINK_ATTR_PARAM_VALUES_LIST 84
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE
+#define DEVLINK_ATTR_PARAM_VALUE 85
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_DATA
+#define DEVLINK_ATTR_PARAM_VALUE_DATA 86
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_CMODE
+#define DEVLINK_ATTR_PARAM_VALUE_CMODE 87
+#endif
+#ifndef DEVLINK_PARAM_CMODE_DRIVERINIT
+#define DEVLINK_PARAM_CMODE_DRIVERINIT 1
+#endif
+#ifndef DEVLINK_CMD_RELOAD
+#define DEVLINK_CMD_RELOAD 37
+#endif
+#ifndef DEVLINK_CMD_PARAM_GET
+#define DEVLINK_CMD_PARAM_GET 38
+#endif
+#ifndef DEVLINK_CMD_PARAM_SET
+#define DEVLINK_CMD_PARAM_SET 39
+#endif
+#ifndef NLA_FLAG
+#define NLA_FLAG 6
+#endif
+
 /* Add/remove MAC address through Netlink */
 struct mlx5_nl_mac_addr {
 	struct rte_ether_addr (*mac)[];
@@ -1241,8 +1299,8 @@ struct mlx5_nl_ifindex_data {
 	struct nlattr *nla = nl_msg_tail(nlh);
 
 	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr)) + alen;
+	nlh->nlmsg_len += NLMSG_ALIGN(nla->nla_len);
 
 	if (alen)
 		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
@@ -1335,3 +1393,307 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
+
+/**
+ * Parse Netlink message to retrieve the general family ID.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_family_id_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	for (; nla->nla_len && nla < tail;
+	     nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len))) {
+		if (nla->nla_type == CTRL_ATTR_FAMILY_ID) {
+			*(uint16_t *)arg = *(uint16_t *)(nla + 1);
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
+#define MLX5_NL_MAX_ATTR_SIZE 100
+/**
+ * Get generic netlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] name
+ *   The family name.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_generic_family_id_get(int nlsk_fd, const char *name)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int name_size = strlen(name) + 1;
+	int ret;
+	uint16_t id = -1;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE)];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = GENL_ID_CTRL;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = CTRL_CMD_GETFAMILY;
+	genl->version = 1;
+	nl_attr_put(nlh, CTRL_ATTR_FAMILY_NAME, name, name_size);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_family_id_cb, &id);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get Netlink %s family ID: %d.", name,
+			ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Netlink \"%s\" family ID is %u.", name, id);
+	return (int)id;
+}
+
+/**
+ * Get Devlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+
+int
+mlx5_nl_devlink_family_id_get(int nlsk_fd)
+{
+	return mlx5_nl_generic_family_id_get(nlsk_fd, DEVLINK_GENL_NAME);
+}
+
+/**
+ * Parse Netlink message to retrieve the ROCE enable status.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_roce_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	int ret = -EINVAL;
+	int *enable = arg;
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	while (nla->nla_len && nla < tail) {
+		switch (nla->nla_type) {
+		/* Expected nested attributes case. */
+		case DEVLINK_ATTR_PARAM:
+		case DEVLINK_ATTR_PARAM_VALUES_LIST:
+		case DEVLINK_ATTR_PARAM_VALUE:
+			ret = 0;
+			nla += 1;
+			break;
+		case DEVLINK_ATTR_PARAM_VALUE_DATA:
+			*enable = 1;
+			return 0;
+		default:
+			nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len));
+		}
+	}
+	*enable = 0;
+	return ret;
+}
+
+/**
+ * Get ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   Where to store the enable status.
+ *
+ * @return
+ *   0 on success and @p enable is updated, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			int *enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	int cur_en;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 4 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 4];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_GET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_roce_cb, &cur_en);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable on device %s: %d.",
+			pci_addr, ret);
+		return ret;
+	}
+	*enable = cur_en;
+	DRV_LOG(DEBUG, "ROCE is %sabled for device \"%s\".",
+		cur_en ? "en" : "dis", pci_addr);
+	return ret;
+}
+
+/**
+ * Reload mlx5 device kernel driver through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 2 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 2];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_RELOAD;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to reload %s device by Netlink - %d",
+			pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device \"%s\" was reloaded by Netlink successfully.",
+		pci_addr);
+	return 0;
+}
+
+/**
+ * Set ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			int enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 6 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 6];
+	uint8_t cmode = DEVLINK_PARAM_CMODE_DRIVERINIT;
+	uint8_t ptype = NLA_FLAG;
+;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_SET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_CMODE, &cmode, sizeof(cmode));
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_TYPE, &ptype, sizeof(ptype));
+	if (enable)
+		nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, NULL, 0);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to %sable ROCE for device %s by Netlink:"
+			" %d.", enable ? "en" : "dis", pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device %s ROCE was %sabled by Netlink successfully.",
+		pci_addr, enable ? "en" : "dis");
+	/* Now, need to reload the driver. */
+	return mlx5_nl_driver_reload(nlsk_fd, family_id, pci_addr);
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
index 8e66a98..2c3f837 100644
--- a/drivers/common/mlx5/mlx5_nl.h
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -53,5 +53,11 @@ void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
+int mlx5_nl_devlink_family_id_get(int nlsk_fd);
+int mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			    int *enable);
+int mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr);
+int mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			    int enable);
 
 #endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 34b66a5..ee69f99 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -28,6 +28,10 @@ DPDK_20.02 {
 	mlx5_dev_to_pci_addr;
 
 	mlx5_nl_allmulti;
+	mlx5_nl_devlink_family_id_get;
+	mlx5_nl_driver_reload;
+	mlx5_nl_enable_roce_get;
+	mlx5_nl_enable_roce_set;
 	mlx5_nl_ifindex;
 	mlx5_nl_init;
 	mlx5_nl_mac_addr_add;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
                     ` (24 preceding siblings ...)
  2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
@ 2020-01-28 16:27   ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 01/25] net/mlx5: separate DevX commands interface Matan Azrad
                       ` (25 more replies)
  25 siblings, 26 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Steps:
- Prepare net/mlx5 for code sharing.
- Introduce new common lib for mlx5 devices.
- Share code from net/mlx5 to common/mlx5.
v2:
- Reorder patches for 2 serieses - this is the first one for common directory and vDPA preparation,
  the second will be sent later for vDPA new driver part.
- Fix spelling and per patch complition issues.
- moved to use claim_zero instead of pure asserts.
- improve title names.
v3:
rebase.
Matan Azrad (25):
  net/mlx5: separate DevX commands interface
  drivers: introduce mlx5 common library
  common/mlx5: share the mlx5 glue reference
  common/mlx5: share mlx5 PCI device detection
  common/mlx5: share mlx5 devices information
  common/mlx5: share CQ entry check
  common/mlx5: add query vDPA DevX capabilities
  common/mlx5: glue null memory region allocation
  common/mlx5: support DevX indirect mkey creation
  common/mlx5: glue event queue query
  common/mlx5: glue event interrupt commands
  common/mlx5: glue UAR allocation
  common/mlx5: add DevX command to create CQ
  common/mlx5: glue VAR allocation
  common/mlx5: add DevX virtq commands
  common/mlx5: add support for DevX QP operations
  common/mlx5: allow type configuration for DevX RQT
  common/mlx5: add TIR field constants
  common/mlx5: add DevX command to modify RQT
  common/mlx5: get DevX capability for max RQT size
  net/mlx5: select driver by vDPA device argument
  net/mlx5: separate Netlink command interface
  net/mlx5: reduce Netlink commands dependencies
  common/mlx5: share Netlink commands
  common/mlx5: support ROCE disable through Netlink
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  347 ++++
 drivers/common/mlx5/meson.build                 |  210 ++
 drivers/common/mlx5/mlx5_common.c               |  332 +++
 drivers/common/mlx5/mlx5_common.h               |  214 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            | 1530 ++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  351 ++++
 drivers/common/mlx5/mlx5_glue.c                 | 1296 ++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  305 +++
 drivers/common/mlx5/mlx5_nl.c                   | 1699 +++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   63 +
 drivers/common/mlx5/mlx5_prm.h                  | 2542 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   50 +
 drivers/net/mlx5/Makefile                       |  307 +--
 drivers/net/mlx5/meson.build                    |  257 +--
 drivers/net/mlx5/mlx5.c                         |  194 +-
 drivers/net/mlx5/mlx5.h                         |  326 +--
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_devx_cmds.c               |  969 ---------
 drivers/net/mlx5/mlx5_ethdev.c                  |  161 +-
 drivers/net/mlx5/mlx5_flow.c                    |   12 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |   12 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 ----------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ---
 drivers/net/mlx5/mlx5_mac.c                     |   16 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_nl.c                      | 1402 -------------
 drivers/net/mlx5/mlx5_prm.h                     | 1888 -----------------
 drivers/net/mlx5/mlx5_rss.c                     |    2 +-
 drivers/net/mlx5/mlx5_rxmode.c                  |   12 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    7 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |   46 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    5 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |  137 +-
 mk/rte.app.mk                                   |    1 +
 49 files changed, 9273 insertions(+), 7000 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 01/25] net/mlx5: separate DevX commands interface
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 02/25] drivers: introduce mlx5 common library Matan Azrad
                       ` (24 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The DevX commands interface is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency on DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |   1 +
 drivers/net/mlx5/mlx5.h           | 221 +-----------------------------------
 drivers/net/mlx5/mlx5_devx_cmds.c |  33 +++---
 drivers/net/mlx5/mlx5_devx_cmds.h | 231 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c    |   1 +
 drivers/net/mlx5/mlx5_flow.c      |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c   |   1 +
 drivers/net/mlx5/mlx5_rxq.c       |   1 +
 drivers/net/mlx5/mlx5_rxtx.c      |   1 +
 drivers/net/mlx5/mlx5_txq.c       |   1 +
 drivers/net/mlx5/mlx5_vlan.c      |   1 +
 11 files changed, 263 insertions(+), 234 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2049370..7126edf 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -46,6 +46,7 @@
 #include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5818349..4d0485d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -38,6 +38,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
@@ -156,62 +157,6 @@ struct mlx5_stats_ctrl {
 	uint64_t imissed_base;
 };
 
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint32_t flow_meter_reg_share:1;
-	/* Whether reg_c share is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
 /* Flow list . */
 TAILQ_HEAD(mlx5_flows, rte_flow);
 
@@ -291,133 +236,6 @@ struct mlx5_dev_config {
 	struct mlx5_lro_config lro; /* LRO configuration. */
 };
 
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
 
 /**
  * Type of object being allocated.
@@ -1026,43 +844,6 @@ void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
 void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
 			    struct mlx5_vf_vlan *vf_vlan);
 
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					     struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				struct mlx5_devx_create_rq_attr *rq_attr,
-				int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
-	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq
-	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
-	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-
-int mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh, FILE *file);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 282d501..62ca590 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -1,13 +1,15 @@
 // SPDX-License-Identifier: BSD-3-Clause
 /* Copyright 2018 Mellanox Technologies, Ltd */
 
+#include <unistd.h>
+
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
-#include <unistd.h>
 
-#include "mlx5.h"
-#include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_utils.h"
+
 
 /**
  * Allocate flow counters via devx interface.
@@ -936,8 +938,12 @@ struct mlx5_devx_obj *
 /**
  * Dump all flows to file.
  *
- * @param[in] sh
- *   Pointer to context.
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
  * @param[out] file
  *   Pointer to file stream.
  *
@@ -945,23 +951,24 @@ struct mlx5_devx_obj *
  *   0 on success, a nagative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh __rte_unused,
-			FILE *file __rte_unused)
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
 {
 	int ret = 0;
 
 #ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (sh->fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, sh->fdb_domain);
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
 		if (ret)
 			return ret;
 	}
-	assert(sh->rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->rx_domain);
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
 	if (ret)
 		return ret;
-	assert(sh->tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->tx_domain);
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
 #else
 	ret = ENOTSUP;
 #endif
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..2d58d96
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t flow_meter_reg_share:1;
+	/* Whether reg_c share is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+/* HCA attributes. */
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3b4c5db..ce0109c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include "mlx5.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 5c9fea6..983b1c3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -31,6 +31,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_flow.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
@@ -5692,6 +5693,8 @@ struct mlx5_flow_counter *
 		   struct rte_flow_error *error __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
 
-	return mlx5_devx_cmd_flow_dump(priv->sh, file);
+	return mlx5_devx_cmd_flow_dump(sh->fdb_domain, sh->rx_domain,
+				       sh->tx_domain, file);
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b90734e..653d649 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -32,6 +32,7 @@
 #include "mlx5.h"
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_flow.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4092cb7..371b996 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -37,6 +37,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5e31f01..5a03556 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -29,6 +29,7 @@
 #include <rte_flow.h>
 
 #include "mlx5.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1a76f6e..5adb4dc 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -34,6 +34,7 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index 5f6554a..feac0f1 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -30,6 +30,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 02/25] drivers: introduce mlx5 common library
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 01/25] net/mlx5: separate DevX commands interface Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
                       ` (23 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  331 ++++
 drivers/common/mlx5/meson.build                 |  205 +++
 drivers/common/mlx5/mlx5_common.c               |   17 +
 drivers/common/mlx5/mlx5_common.h               |   87 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            |  976 ++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  231 +++
 drivers/common/mlx5/mlx5_glue.c                 | 1138 ++++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  265 ++++
 drivers/common/mlx5/mlx5_prm.h                  | 1889 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   20 +
 drivers/net/mlx5/Makefile                       |  309 +---
 drivers/net/mlx5/meson.build                    |  256 +--
 drivers/net/mlx5/mlx5.c                         |    7 +-
 drivers/net/mlx5/mlx5.h                         |    9 +-
 drivers/net/mlx5/mlx5_devx_cmds.c               |  976 ------------
 drivers/net/mlx5/mlx5_devx_cmds.h               |  231 ---
 drivers/net/mlx5/mlx5_ethdev.c                  |    5 +-
 drivers/net/mlx5/mlx5_flow.c                    |    9 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |    9 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 --------------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ----
 drivers/net/mlx5/mlx5_mac.c                     |    2 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_prm.h                     | 1888 ----------------------
                      |    2 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    8 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    2 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |    5 +-
 mk/rte.app.mk                                   |    1 +
 45 files changed, 5313 insertions(+), 5144 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 94bccae..150d507 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -737,6 +737,7 @@ M: Matan Azrad <matan@mellanox.com>
 M: Shahaf Shuler <shahafs@mellanox.com>
 M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
 T: git://dpdk.org/next/dpdk-next-net-mlx
+F: drivers/common/mlx5/
 F: drivers/net/mlx5/
 F: buildtools/options-ibverbs-static.sh
 F: doc/guides/nics/mlx5.rst
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 3254c52..4775d4b 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,4 +35,8 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+DIRS-y += mlx5
+endif
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/common/meson.build b/drivers/common/meson.build
index fc620f7..ffd06e2 100644
--- a/drivers/common/meson.build
+++ b/drivers/common/meson.build
@@ -2,6 +2,6 @@
 # Copyright(c) 2018 Cavium, Inc
 
 std_deps = ['eal']
-drivers = ['cpt', 'dpaax', 'iavf', 'mvep', 'octeontx', 'octeontx2', 'qat']
+drivers = ['cpt', 'dpaax', 'iavf', 'mlx5', 'mvep', 'octeontx', 'octeontx2', 'qat']
 config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON'
 driver_name_fmt = 'rte_common_@0@'
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
new file mode 100644
index 0000000..b94d3c0
--- /dev/null
+++ b/drivers/common/mlx5/Makefile
@@ -0,0 +1,331 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_common_mlx5.a
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
+LIB_GLUE_VERSION = 20.02.0
+
+# Sources.
+ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+endif
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+endif
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+CFLAGS_mlx5_glue.o += -fPIC
+LDLIBS += -ldl
+else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
+LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
+else
+LDLIBS += -libverbs -lmlx5
+endif
+
+LDLIBS += -lrte_eal
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
+
+EXPORT_MAP := rte_common_mlx5_version.map
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up mlx5_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+mlx5_autoconf.h.new: FORCE
+
+mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
+	$Q $(RM) -f -- '$@'
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_MPLS_SUPPORT \
+		infiniband/verbs.h \
+		enum IBV_FLOW_SPEC_MPLS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAG_RX_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_SWP \
+		infiniband/mlx5dv.h \
+		type 'struct mlx5dv_sw_parsing_caps' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_MPW \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DV_SUPPORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_create_flow_action_packet_reformat \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ESWITCH \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_VLAN \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_push_vlan \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_DEVX_PORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_query_devx_port \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_OBJ \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_create \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DEVX_COUNTERS \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_ASYNC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_query_async \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_dest_devx_tir \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_flow_meter \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_FLOW_DUMP \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dump_dr_domain \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
+		infiniband/mlx5dv.h \
+		enum MLX5_MMAP_GET_NC_PAGES_CMD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_25G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_50G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_100G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
+		infiniband/verbs.h \
+		type 'struct ibv_counter_set_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
+		infiniband/verbs.h \
+		type 'struct ibv_counters_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NL_NLDEV \
+		rdma/rdma_netlink.h \
+		enum RDMA_NL_NLDEV \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_PORT_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_PORT_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_PORT_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_SWITCH_ID \
+		linux/if_link.h \
+		enum IFLA_PHYS_SWITCH_ID \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_PORT_NAME \
+		linux/if_link.h \
+		enum IFLA_PHYS_PORT_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_STATIC_ASSERT \
+		/usr/include/assert.h \
+		define static_assert \
+		$(AUTOCONF_OUTPUT)
+
+# Create mlx5_autoconf.h or update it in case it differs from the new one.
+
+mlx5_autoconf.h: mlx5_autoconf.h.new
+	$Q [ -f '$@' ] && \
+		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
+		mv '$<' '$@'
+
+$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+
+# Generate dependency plug-in for rdma-core when the PMD must not be linked
+# directly, so that applications do not inherit this dependency.
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+
+$(LIB): $(LIB_GLUE)
+
+ifeq ($(LINK_USING_CC),1)
+GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
+else
+GLUE_LDFLAGS := $(LDFLAGS)
+endif
+$(LIB_GLUE): mlx5_glue.o
+	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
+		-shared -o $@ $< -libverbs -lmlx5
+
+mlx5_glue.o: mlx5_autoconf.h
+
+endif
+
+clean_mlx5: FORCE
+	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
+	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
+
+clean: clean_mlx5
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
new file mode 100644
index 0000000..718cef2
--- /dev/null
+++ b/drivers/common/mlx5/meson.build
@@ -0,0 +1,205 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+build = true
+
+pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
+LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
+LIB_GLUE_VERSION = '20.02.0'
+LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
+if pmd_dlopen
+	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
+	cflags += [
+		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
+		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
+	]
+endif
+
+libnames = [ 'mlx5', 'ibverbs' ]
+libs = []
+foreach libname:libnames
+	lib = dependency('lib' + libname, required:false)
+	if not lib.found()
+		lib = cc.find_library(libname, required:false)
+	endif
+	if lib.found()
+		libs += lib
+	else
+		build = false
+		reason = 'missing dependency, "' + libname + '"'
+	endif
+endforeach
+
+if build
+	allow_experimental_apis = true
+	deps += ['hash', 'pci', 'net', 'eal']
+	ext_deps += libs
+	sources = files(
+		'mlx5_devx_cmds.c',
+		'mlx5_common.c',
+	)
+	if not pmd_dlopen
+		sources += files('mlx5_glue.c')
+	endif
+	cflags_options = [
+		'-std=c11',
+		'-Wno-strict-prototypes',
+		'-D_BSD_SOURCE',
+		'-D_DEFAULT_SOURCE',
+		'-D_XOPEN_SOURCE=600'
+	]
+	foreach option:cflags_options
+		if cc.has_argument(option)
+			cflags += option
+		endif
+	endforeach
+	if get_option('buildtype').contains('debug')
+		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+	else
+		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
+	endif
+	# To maintain the compatibility with the make build system
+	# mlx5_autoconf.h file is still generated.
+	# input array for meson member search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search", "struct member to search" ]
+	has_member_args = [
+		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
+		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
+		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
+		'struct ibv_counters_init_attr', 'comp_mask' ],
+	]
+	# input array for meson symbol search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search" ]
+	has_sym_args = [
+		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
+		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
+		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
+		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_create_flow_action_packet_reformat' ],
+		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
+		'IBV_FLOW_SPEC_MPLS' ],
+		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
+		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAG_RX_END_PADDING' ],
+		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_query_devx_port' ],
+		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_create' ],
+		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
+		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
+		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_query_async' ],
+		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_flow_meter' ],
+		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
+		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
+		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
+		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
+		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseLR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseLR4_Full' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
+		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
+		'IFLA_PHYS_SWITCH_ID' ],
+		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
+		'IFLA_PHYS_PORT_NAME' ],
+		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
+		'RDMA_NL_NLDEV' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_GET' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_PORT_GET' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_NAME' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
+		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
+		'mlx5dv_dump_dr_domain'],
+	]
+	config = configuration_data()
+	foreach arg:has_sym_args
+		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
+			dependencies: libs))
+	endforeach
+	foreach arg:has_member_args
+		file_prefix = '#include <' + arg[1] + '>'
+		config.set(arg[0], cc.has_member(arg[2], arg[3],
+			prefix : file_prefix, dependencies: libs))
+	endforeach
+	configure_file(output : 'mlx5_autoconf.h', configuration : config)
+endif
+# Build Glue Library
+if pmd_dlopen and build
+	dlopen_name = 'mlx5_glue'
+	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
+	dlopen_so_version = LIB_GLUE_VERSION
+	dlopen_sources = files('mlx5_glue.c')
+	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
+	dlopen_includes = [global_inc]
+	dlopen_includes += include_directories(
+		'../../../lib/librte_eal/common/include/generic',
+	)
+	shared_lib = shared_library(
+		dlopen_lib_name,
+		dlopen_sources,
+		include_directories: dlopen_includes,
+		c_args: cflags,
+		dependencies: libs,
+		link_args: [
+		'-Wl,-export-dynamic',
+		'-Wl,-h,@0@'.format(LIB_GLUE),
+		],
+		soversion: dlopen_so_version,
+		install: true,
+		install_dir: dlopen_install_dir,
+	)
+endif
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
new file mode 100644
index 0000000..14ebd30
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#include "mlx5_common.h"
+
+
+int mlx5_common_logtype;
+
+
+RTE_INIT(rte_mlx5_common_pmd_init)
+{
+	/* Initialize driver log type. */
+	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
+	if (mlx5_common_logtype >= 0)
+		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
new file mode 100644
index 0000000..9f10def
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_H_
+#define RTE_PMD_MLX5_COMMON_H_
+
+#include <assert.h>
+
+#include <rte_log.h>
+
+
+/*
+ * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
+ * manner.
+ */
+#define PMD_DRV_LOG_STRIP(a, b) a
+#define PMD_DRV_LOG_OPAREN (
+#define PMD_DRV_LOG_CPAREN )
+#define PMD_DRV_LOG_COMMA ,
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+	const char *n = s;
+
+	while (*n)
+		if (*(n++) == '/')
+			s = n;
+	return s;
+}
+
+#define PMD_DRV_LOG___(level, type, name, ...) \
+	rte_log(RTE_LOG_ ## level, \
+		type, \
+		RTE_FMT(name ": " \
+			RTE_FMT_HEAD(__VA_ARGS__,), \
+		RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
+ */
+#ifndef NDEBUG
+
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, "%s:%u: %s(): " __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name,\
+		s "\n" PMD_DRV_LOG_COMMA \
+		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
+		__LINE__ PMD_DRV_LOG_COMMA \
+		__func__, \
+		__VA_ARGS__)
+
+#else /* NDEBUG */
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name, s "\n", __VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* claim_zero() does not perform any check when debugging is disabled. */
+#ifndef NDEBUG
+
+#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
+
+#else /* NDEBUG */
+
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+#define claim_nonzero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
+	char name[mkstr_size_##name + 1]; \
+	\
+	snprintf(name, sizeof(name), "" __VA_ARGS__)
+
+#endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_utils.h b/drivers/common/mlx5/mlx5_common_utils.h
new file mode 100644
index 0000000..32c3adf
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_UTILS_H_
+#define RTE_PMD_MLX5_COMMON_UTILS_H_
+
+#include "mlx5_common.h"
+
+
+extern int mlx5_common_logtype;
+
+#define MLX5_COMMON_LOG_PREFIX "common_mlx5"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_common_logtype, MLX5_COMMON_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_COMMON_UTILS_H_ */
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
new file mode 100644
index 0000000..4d94f92
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -0,0 +1,976 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/* Copyright 2018 Mellanox Technologies, Ltd */
+
+#include <unistd.h>
+
+#include <rte_errno.h>
+#include <rte_malloc.h>
+
+#include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_common_utils.h"
+
+
+/**
+ * Allocate flow counters via devx interface.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param dcs
+ *   Pointer to counters properties structure to be filled by the routine.
+ * @param bulk_n_128
+ *   Bulk counter numbers in 128 counters units.
+ *
+ * @return
+ *   Pointer to counter object on success, a negative value otherwise and
+ *   rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
+{
+	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		rte_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
+/**
+ * Query flow counters values.
+ *
+ * @param[in] dcs
+ *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
+ * @param[in] clear
+ *   Whether hardware should clear the counters after the query or not.
+ * @param[in] n_counters
+ *   0 in case of 1 counter to read, otherwise the counter number to read.
+ *  @param pkts
+ *   The number of packets that matched the flow.
+ *  @param bytes
+ *    The number of bytes that matched the flow.
+ *  @param mkey
+ *   The mkey key for batch query.
+ *  @param addr
+ *    The address in the mkey range for batch query.
+ *  @param cmd_comp
+ *   The completion object for asynchronous batch query.
+ *  @param async_id
+ *    The ID to be returned in the asynchronous batch query response.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				 int clear, uint32_t n_counters,
+				 uint64_t *pkts, uint64_t *bytes,
+				 uint32_t mkey, void *addr,
+				 struct mlx5dv_devx_cmd_comp *cmd_comp,
+				 uint64_t async_id)
+{
+	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			MLX5_ST_SZ_BYTES(traffic_counter);
+	uint32_t out[out_len];
+	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
+	void *stats;
+	int rc;
+
+	MLX5_SET(query_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
+	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
+	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
+	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
+
+	if (n_counters) {
+		MLX5_SET(query_flow_counter_in, in, num_of_counters,
+			 n_counters);
+		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
+		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
+		MLX5_SET64(query_flow_counter_in, in, address,
+			   (uint64_t)(uintptr_t)addr);
+	}
+	if (!cmd_comp)
+		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
+					       out_len);
+	else
+		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
+						     out_len, async_id,
+						     cmd_comp);
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
+		rte_errno = rc;
+		return -rc;
+	}
+	if (!n_counters) {
+		stats = MLX5_ADDR_OF(query_flow_counter_out,
+				     out, flow_statistics);
+		*pkts = MLX5_GET64(traffic_counter, stats, packets);
+		*bytes = MLX5_GET64(traffic_counter, stats, octets);
+	}
+	return 0;
+}
+
+/**
+ * Create a new mkey.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] attr
+ *   Attributes of the requested mkey.
+ *
+ * @return
+ *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
+ *   is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+			  struct mlx5_devx_mkey_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
+	void *mkc;
+	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
+	size_t pgsize;
+	uint32_t translation_size;
+
+	if (!mkey) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pgsize = sysconf(_SC_PAGESIZE);
+	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 translation_size);
+	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 0x1);
+	MLX5_SET(mkc, mkc, lr, 0x1);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, attr->pd);
+	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
+	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
+	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
+	MLX5_SET64(mkc, mkc, len, attr->size);
+	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+					       sizeof(out));
+	if (!mkey->obj) {
+		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		rte_errno = errno;
+		rte_free(mkey);
+		return NULL;
+	}
+	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
+	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
+	return mkey;
+}
+
+/**
+ * Get status of devx command response.
+ * Mainly used for asynchronous commands.
+ *
+ * @param[in] out
+ *   The out response buffer.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+int
+mlx5_devx_get_out_command_status(void *out)
+{
+	int status;
+
+	if (!out)
+		return -EINVAL;
+	status = MLX5_GET(query_flow_counter_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
+
+		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
+			syndrome);
+	}
+	return status;
+}
+
+/**
+ * Destroy any object allocated by a Devx API.
+ *
+ * @param[in] obj
+ *   Pointer to a general object.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
+{
+	int ret;
+
+	if (!obj)
+		return 0;
+	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
+	rte_free(obj);
+	return ret;
+}
+
+/**
+ * Query NIC vport context.
+ * Fills minimal inline attribute.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] vport
+ *   vport index
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+static int
+mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
+				      unsigned int vport,
+				      struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
+	void *vctx;
+	int status, syndrome, rc;
+
+	/* Query NIC vport context to determine inline mode. */
+	MLX5_SET(query_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+	if (vport)
+		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_nic_vport_context_out, out, status);
+	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+			    nic_vport_context);
+	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
+					   min_wqe_inline_mode);
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query HCA attributes.
+ * Using those attributes we can check on run time if the device
+ * is having the required capabilities.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+			     struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr;
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in), out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->flow_counter_bulk_alloc_bitmap =
+			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
+	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
+					    flow_counters_dump);
+	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
+	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
+					  eth_net_offloads);
+	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
+	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
+					       flex_parser_protocols);
+	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	if (attr->qos.sup) {
+		MLX5_SET(query_hca_cap_in, in, op_mod,
+			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
+			 MLX5_HCA_CAP_OPMOD_GET_CUR);
+		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
+						 out, sizeof(out));
+		if (rc)
+			goto error;
+		if (status) {
+			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
+				" status %x, syndrome = %x",
+				status, syndrome);
+			return -1;
+		}
+		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+		attr->qos.srtcm_sup =
+				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
+		attr->qos.log_max_flow_meter =
+				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
+		attr->qos.flow_meter_reg_c_ids =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
+		attr->qos.flow_meter_reg_share =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+	}
+	if (!attr->eth_net_offloads)
+		return 0;
+
+	/* Query HCA offloads for Ethernet protocol. */
+	memset(in, 0, sizeof(in));
+	memset(out, 0, sizeof(out));
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc) {
+		attr->eth_net_offloads = 0;
+		goto error;
+	}
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		attr->eth_net_offloads = 0;
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_vlan_insert);
+	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_cap);
+	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
+					hcattr, tunnel_lro_gre);
+	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
+					  hcattr, tunnel_lro_vxlan);
+	attr->lro_max_msg_sz_mode = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, lro_max_msg_sz_mode);
+	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
+		attr->lro_timer_supported_periods[i] =
+			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_timer_supported_periods[i]);
+	}
+	attr->tunnel_stateless_geneve_rx =
+			    MLX5_GET(per_protocol_networking_offload_caps,
+				     hcattr, tunnel_stateless_geneve_rx);
+	attr->geneve_max_opt_len =
+		    MLX5_GET(per_protocol_networking_offload_caps,
+			     hcattr, max_geneve_opt_len);
+	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_inline_mode);
+	attr->tunnel_stateless_gtp = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, tunnel_stateless_gtp);
+	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return 0;
+	if (attr->eth_virt) {
+		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
+		if (rc) {
+			attr->eth_virt = 0;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query TIS transport domain from QP verbs object using DevX API.
+ *
+ * @param[in] qp
+ *   Pointer to verbs QP returned by ibv_create_qp .
+ * @param[in] tis_num
+ *   TIS number of TIS to query.
+ * @param[out] tis_td
+ *   Pointer to TIS transport domain variable, to be set by the routine.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+			      uint32_t *tis_td)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
+	int rc;
+	void *tis_ctx;
+
+	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
+	MLX5_SET(query_tis_in, in, tisn, tis_num);
+	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query QP using DevX");
+		return -rc;
+	};
+	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
+	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
+	return 0;
+}
+
+/**
+ * Fill WQ data for DevX API command.
+ * Utility function for use when creating DevX objects containing a WQ.
+ *
+ * @param[in] wq_ctx
+ *   Pointer to WQ context to fill with data.
+ * @param [in] wq_attr
+ *   Pointer to WQ attributes structure to fill in WQ context.
+ */
+static void
+devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
+{
+	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
+	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
+	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
+	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
+	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
+	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
+	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
+	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
+	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
+	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
+	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
+	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
+	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
+	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
+	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
+	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
+	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
+	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
+	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
+		 wq_attr->log_hairpin_num_packets);
+	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
+	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
+		 wq_attr->single_wqe_log_num_of_strides);
+	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
+	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
+		 wq_attr->single_stride_log_num_of_bytes);
+	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
+	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
+	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
+}
+
+/**
+ * Create RQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rq_attr
+ *   Pointer to create RQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+			struct mlx5_devx_create_rq_attr *rq_attr,
+			int socket)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *rq = NULL;
+
+	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
+	if (!rq) {
+		DRV_LOG(ERR, "Failed to allocate RQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
+	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
+	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
+	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
+	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
+	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
+	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
+	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
+	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+	wq_attr = &rq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						  out, sizeof(out));
+	if (!rq->obj) {
+		DRV_LOG(ERR, "Failed to create RQ using DevX");
+		rte_errno = errno;
+		rte_free(rq);
+		return NULL;
+	}
+	rq->id = MLX5_GET(create_rq_out, out, rqn);
+	return rq;
+}
+
+/**
+ * Modify RQ using DevX API.
+ *
+ * @param[in] rq
+ *   Pointer to RQ object structure.
+ * @param [in] rq_attr
+ *   Pointer to modify RQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			struct mlx5_devx_modify_rq_attr *rq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	int ret;
+
+	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
+	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
+	MLX5_SET(modify_rq_in, in, rqn, rq->id);
+	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
+	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
+		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
+		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
+		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
+		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
+	}
+	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIR using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tir_attr
+ *   Pointer to TIR attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+			 struct mlx5_devx_tir_attr *tir_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
+	void *tir_ctx, *outer, *inner;
+	struct mlx5_devx_obj *tir = NULL;
+	int i;
+
+	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
+	if (!tir) {
+		DRV_LOG(ERR, "Failed to allocate TIR data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
+	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
+	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
+	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
+		 tir_attr->lro_timeout_period_usecs);
+	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
+	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
+	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
+	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
+	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
+		 tir_attr->tunneled_offload_en);
+	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
+	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
+	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
+	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
+	for (i = 0; i < 10; i++) {
+		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
+			 tir_attr->rx_hash_toeplitz_key[i]);
+	}
+	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
+	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, selected_fields,
+		 tir_attr->rx_hash_field_selector_outer.selected_fields);
+	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
+	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, selected_fields,
+		 tir_attr->rx_hash_field_selector_inner.selected_fields);
+	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						   out, sizeof(out));
+	if (!tir->obj) {
+		DRV_LOG(ERR, "Failed to create TIR using DevX");
+		rte_errno = errno;
+		rte_free(tir);
+		return NULL;
+	}
+	tir->id = MLX5_GET(create_tir_out, out, tirn);
+	return tir;
+}
+
+/**
+ * Create RQT using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t *in = NULL;
+	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
+	void *rqt_ctx;
+	struct mlx5_devx_obj *rqt = NULL;
+	int i;
+
+	in = rte_calloc(__func__, 1, inlen, 0);
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT IN data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
+	if (!rqt) {
+		DRV_LOG(ERR, "Failed to allocate RQT data");
+		rte_errno = ENOMEM;
+		rte_free(in);
+		return NULL;
+	}
+	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
+	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (!rqt->obj) {
+		DRV_LOG(ERR, "Failed to create RQT using DevX");
+		rte_errno = errno;
+		rte_free(rqt);
+		return NULL;
+	}
+	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
+	return rqt;
+}
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
+
+/**
+ * Dump all flows to file.
+ *
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
+ * @param[out] file
+ *   Pointer to file stream.
+ *
+ * @return
+ *   0 on success, a nagative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
+{
+	int ret = 0;
+
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
+		if (ret)
+			return ret;
+	}
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
+	if (ret)
+		return ret;
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
+#else
+	ret = ENOTSUP;
+#endif
+	return -ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..2d58d96
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t flow_meter_reg_share:1;
+	/* Whether reg_c share is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+/* HCA attributes. */
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
new file mode 100644
index 0000000..d5bc84e
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -0,0 +1,1138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+/*
+ * Not needed by this file; included to work around the lack of off_t
+ * definition for mlx5dv.h with unpatched rdma-core versions.
+ */
+#include <sys/types.h>
+
+#include <rte_config.h>
+
+#include "mlx5_glue.h"
+
+static int
+mlx5_glue_fork_init(void)
+{
+	return ibv_fork_init();
+}
+
+static struct ibv_pd *
+mlx5_glue_alloc_pd(struct ibv_context *context)
+{
+	return ibv_alloc_pd(context);
+}
+
+static int
+mlx5_glue_dealloc_pd(struct ibv_pd *pd)
+{
+	return ibv_dealloc_pd(pd);
+}
+
+static struct ibv_device **
+mlx5_glue_get_device_list(int *num_devices)
+{
+	return ibv_get_device_list(num_devices);
+}
+
+static void
+mlx5_glue_free_device_list(struct ibv_device **list)
+{
+	ibv_free_device_list(list);
+}
+
+static struct ibv_context *
+mlx5_glue_open_device(struct ibv_device *device)
+{
+	return ibv_open_device(device);
+}
+
+static int
+mlx5_glue_close_device(struct ibv_context *context)
+{
+	return ibv_close_device(context);
+}
+
+static int
+mlx5_glue_query_device(struct ibv_context *context,
+		       struct ibv_device_attr *device_attr)
+{
+	return ibv_query_device(context, device_attr);
+}
+
+static int
+mlx5_glue_query_device_ex(struct ibv_context *context,
+			  const struct ibv_query_device_ex_input *input,
+			  struct ibv_device_attr_ex *attr)
+{
+	return ibv_query_device_ex(context, input, attr);
+}
+
+static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+			  struct ibv_values_ex *values)
+{
+	return ibv_query_rt_values_ex(context, values);
+}
+
+static int
+mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
+		     struct ibv_port_attr *port_attr)
+{
+	return ibv_query_port(context, port_num, port_attr);
+}
+
+static struct ibv_comp_channel *
+mlx5_glue_create_comp_channel(struct ibv_context *context)
+{
+	return ibv_create_comp_channel(context);
+}
+
+static int
+mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
+{
+	return ibv_destroy_comp_channel(channel);
+}
+
+static struct ibv_cq *
+mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
+		    struct ibv_comp_channel *channel, int comp_vector)
+{
+	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
+}
+
+static int
+mlx5_glue_destroy_cq(struct ibv_cq *cq)
+{
+	return ibv_destroy_cq(cq);
+}
+
+static int
+mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
+		       void **cq_context)
+{
+	return ibv_get_cq_event(channel, cq, cq_context);
+}
+
+static void
+mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
+{
+	ibv_ack_cq_events(cq, nevents);
+}
+
+static struct ibv_rwq_ind_table *
+mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
+			       struct ibv_rwq_ind_table_init_attr *init_attr)
+{
+	return ibv_create_rwq_ind_table(context, init_attr);
+}
+
+static int
+mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
+{
+	return ibv_destroy_rwq_ind_table(rwq_ind_table);
+}
+
+static struct ibv_wq *
+mlx5_glue_create_wq(struct ibv_context *context,
+		    struct ibv_wq_init_attr *wq_init_attr)
+{
+	return ibv_create_wq(context, wq_init_attr);
+}
+
+static int
+mlx5_glue_destroy_wq(struct ibv_wq *wq)
+{
+	return ibv_destroy_wq(wq);
+}
+static int
+mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
+{
+	return ibv_modify_wq(wq, wq_attr);
+}
+
+static struct ibv_flow *
+mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
+{
+	return ibv_create_flow(qp, flow);
+}
+
+static int
+mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
+{
+	return ibv_destroy_flow(flow_id);
+}
+
+static int
+mlx5_glue_destroy_flow_action(void *action)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_destroy(action);
+#else
+	struct mlx5dv_flow_action_attr *attr = action;
+	int res = 0;
+	switch (attr->type) {
+	case MLX5DV_FLOW_ACTION_TAG:
+		break;
+	default:
+		res = ibv_destroy_flow_action(attr->action);
+		break;
+	}
+	free(action);
+	return res;
+#endif
+#else
+	(void)action;
+	return ENOTSUP;
+#endif
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
+{
+	return ibv_create_qp(pd, qp_init_attr);
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp_ex(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
+{
+	return ibv_create_qp_ex(context, qp_init_attr_ex);
+}
+
+static int
+mlx5_glue_destroy_qp(struct ibv_qp *qp)
+{
+	return ibv_destroy_qp(qp);
+}
+
+static int
+mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
+{
+	return ibv_modify_qp(qp, attr, attr_mask);
+}
+
+static struct ibv_mr *
+mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
+{
+	return ibv_reg_mr(pd, addr, length, access);
+}
+
+static int
+mlx5_glue_dereg_mr(struct ibv_mr *mr)
+{
+	return ibv_dereg_mr(mr);
+}
+
+static struct ibv_counter_set *
+mlx5_glue_create_counter_set(struct ibv_context *context,
+			     struct ibv_counter_set_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)init_attr;
+	return NULL;
+#else
+	return ibv_create_counter_set(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)cs;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counter_set(cs);
+#endif
+}
+
+static int
+mlx5_glue_describe_counter_set(struct ibv_context *context,
+			       uint16_t counter_set_id,
+			       struct ibv_counter_set_description *cs_desc)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)counter_set_id;
+	(void)cs_desc;
+	return ENOTSUP;
+#else
+	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
+#endif
+}
+
+static int
+mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
+			    struct ibv_counter_set_data *cs_data)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)query_attr;
+	(void)cs_data;
+	return ENOTSUP;
+#else
+	return ibv_query_counter_set(query_attr, cs_data);
+#endif
+}
+
+static struct ibv_counters *
+mlx5_glue_create_counters(struct ibv_context *context,
+			  struct ibv_counters_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)context;
+	(void)init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return ibv_create_counters(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counters(struct ibv_counters *counters)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counters(counters);
+#endif
+}
+
+static int
+mlx5_glue_attach_counters(struct ibv_counters *counters,
+			  struct ibv_counter_attach_attr *attr,
+			  struct ibv_flow *flow)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)attr;
+	(void)flow;
+	return ENOTSUP;
+#else
+	return ibv_attach_counters_point_flow(counters, attr, flow);
+#endif
+}
+
+static int
+mlx5_glue_query_counters(struct ibv_counters *counters,
+			 uint64_t *counters_value,
+			 uint32_t ncounters,
+			 uint32_t flags)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)counters_value;
+	(void)ncounters;
+	(void)flags;
+	return ENOTSUP;
+#else
+	return ibv_read_counters(counters, counters_value, ncounters, flags);
+#endif
+}
+
+static void
+mlx5_glue_ack_async_event(struct ibv_async_event *event)
+{
+	ibv_ack_async_event(event);
+}
+
+static int
+mlx5_glue_get_async_event(struct ibv_context *context,
+			  struct ibv_async_event *event)
+{
+	return ibv_get_async_event(context, event);
+}
+
+static const char *
+mlx5_glue_port_state_str(enum ibv_port_state port_state)
+{
+	return ibv_port_state_str(port_state);
+}
+
+static struct ibv_cq *
+mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
+{
+	return ibv_cq_ex_to_cq(cq);
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_table(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
+#else
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_dest_vport(domain, port);
+#else
+	(void)domain;
+	(void)port;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_drop(void)
+{
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_drop();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
+					  rte_be32_t vlan_tag)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
+#else
+	(void)domain;
+	(void)vlan_tag;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_pop_vlan(void)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_pop_vlan();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_create(domain, level);
+#else
+	(void)domain;
+	(void)level;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_destroy(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_domain(struct ibv_context *ctx,
+			   enum  mlx5dv_dr_domain_type domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_create(ctx, domain);
+#else
+	(void)ctx;
+	(void)domain;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_domain(void *domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_destroy(domain);
+#else
+	(void)domain;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_cq_ex *
+mlx5_glue_dv_create_cq(struct ibv_context *context,
+		       struct ibv_cq_init_attr_ex *cq_attr,
+		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
+{
+	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
+}
+
+static struct ibv_wq *
+mlx5_glue_dv_create_wq(struct ibv_context *context,
+		       struct ibv_wq_init_attr *wq_attr,
+		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
+{
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+	(void)context;
+	(void)wq_attr;
+	(void)mlx5_wq_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
+#endif
+}
+
+static int
+mlx5_glue_dv_query_device(struct ibv_context *ctx,
+			  struct mlx5dv_context *attrs_out)
+{
+	return mlx5dv_query_device(ctx, attrs_out);
+}
+
+static int
+mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
+			      enum mlx5dv_set_ctx_attr_type type, void *attr)
+{
+	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
+}
+
+static int
+mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
+{
+	return mlx5dv_init_obj(obj, obj_type);
+}
+
+static struct ibv_qp *
+mlx5_glue_dv_create_qp(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
+{
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
+#else
+	(void)context;
+	(void)qp_init_attr_ex;
+	(void)dv_qp_init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
+				 struct mlx5dv_flow_matcher_attr *matcher_attr,
+				 void *tbl)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)context;
+	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
+					matcher_attr->match_criteria_enable,
+					matcher_attr->match_mask);
+#else
+	(void)tbl;
+	return mlx5dv_create_flow_matcher(context, matcher_attr);
+#endif
+#else
+	(void)context;
+	(void)matcher_attr;
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow(void *matcher,
+			 void *match_value,
+			 size_t num_actions,
+			 void *actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
+				     (struct mlx5dv_dr_action **)actions);
+#else
+	struct mlx5dv_flow_action_attr actions_attr[8];
+
+	if (num_actions > 8)
+		return NULL;
+	for (size_t i = 0; i < num_actions; i++)
+		actions_attr[i] =
+			*((struct mlx5dv_flow_action_attr *)(actions[i]));
+	return mlx5dv_create_flow(matcher, match_value,
+				  num_actions, actions_attr);
+#endif
+#else
+	(void)matcher;
+	(void)match_value;
+	(void)num_actions;
+	(void)actions;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)offset;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
+	action->obj = counter_obj;
+	return action;
+#endif
+#else
+	(void)counter_obj;
+	(void)offset;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
+	action->obj = qp;
+	return action;
+#endif
+#else
+	(void)qp;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
+{
+#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
+	return mlx5dv_dr_action_create_dest_devx_tir(tir);
+#else
+	(void)tir;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_modify_header
+					(struct ibv_context *ctx,
+					 enum mlx5dv_flow_table_type ft_type,
+					 void *domain, uint64_t flags,
+					 size_t actions_sz,
+					 uint64_t actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
+						     (__be64 *)actions);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)domain;
+	(void)flags;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_modify_header
+		(ctx, actions_sz, actions, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)actions_sz;
+	(void)actions;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_packet_reformat
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
+						       reformat_type, data_sz,
+						       data);
+#else
+	(void)domain;
+	(void)flags;
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_packet_reformat
+		(ctx, data_sz, data, reformat_type, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)reformat_type;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)data_sz;
+	(void)data;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_tag(tag);
+#else
+	struct mlx5dv_flow_action_attr *action;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_TAG;
+	action->tag_value = tag;
+	return action;
+#endif
+#endif
+	(void)tag;
+	errno = ENOTSUP;
+	return NULL;
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_create_flow_meter(attr);
+#else
+	(void)attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dv_modify_flow_action_meter(void *action,
+				      struct mlx5dv_dr_flow_meter_attr *attr,
+				      uint64_t modify_bits)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
+#else
+	(void)action;
+	(void)attr;
+	(void)modify_bits;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow(void *flow_id)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_destroy(flow_id);
+#else
+	return ibv_destroy_flow(flow_id);
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow_matcher(void *matcher)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_matcher_destroy(matcher);
+#else
+	return mlx5dv_destroy_flow_matcher(matcher);
+#endif
+#else
+	(void)matcher;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_context *
+mlx5_glue_dv_open_device(struct ibv_device *device)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_open_device(device,
+				  &(struct mlx5dv_context_attr){
+					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
+				  });
+#else
+	(void)device;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static struct mlx5dv_devx_obj *
+mlx5_glue_devx_obj_create(struct ibv_context *ctx,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_destroy(obj);
+#else
+	(void)obj;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
+			 const void *in, size_t inlen,
+			 void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
+			   const void *in, size_t inlen,
+			   void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_cmd_comp *
+mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_create_cmd_comp(ctx);
+#else
+	(void)ctx;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
+#else
+	(void)cmd_comp;
+	errno = -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
+			       size_t inlen, size_t outlen, uint64_t wr_id,
+			       struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
+					   cmd_comp);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)outlen;
+	(void)wr_id;
+	(void)cmd_comp;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
+				  size_t cmd_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
+					      cmd_resp_len);
+#else
+	(void)cmd_comp;
+	(void)cmd_resp;
+	(void)cmd_resp_len;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_umem *
+mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
+			uint32_t access)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_reg(context, addr, size, access);
+#else
+	(void)context;
+	(void)addr;
+	(void)size;
+	(void)access;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_dereg(dv_devx_umem);
+#else
+	(void)dv_devx_umem;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_qp_query(struct ibv_qp *qp,
+			const void *in, size_t inlen,
+			void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
+#else
+	(void)qp;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_devx_port_query(struct ibv_context *ctx,
+			  uint32_t port_num,
+			  struct mlx5dv_devx_port *mlx5_devx_port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
+#else
+	(void)ctx;
+	(void)port_num;
+	(void)mlx5_devx_port;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dr_dump_domain(FILE *file, void *domain)
+{
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	return mlx5dv_dump_dr_domain(file, domain);
+#else
+	RTE_SET_USED(file);
+	RTE_SET_USED(domain);
+	return -ENOTSUP;
+#endif
+}
+
+alignas(RTE_CACHE_LINE_SIZE)
+const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
+	.version = MLX5_GLUE_VERSION,
+	.fork_init = mlx5_glue_fork_init,
+	.alloc_pd = mlx5_glue_alloc_pd,
+	.dealloc_pd = mlx5_glue_dealloc_pd,
+	.get_device_list = mlx5_glue_get_device_list,
+	.free_device_list = mlx5_glue_free_device_list,
+	.open_device = mlx5_glue_open_device,
+	.close_device = mlx5_glue_close_device,
+	.query_device = mlx5_glue_query_device,
+	.query_device_ex = mlx5_glue_query_device_ex,
+	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
+	.query_port = mlx5_glue_query_port,
+	.create_comp_channel = mlx5_glue_create_comp_channel,
+	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
+	.create_cq = mlx5_glue_create_cq,
+	.destroy_cq = mlx5_glue_destroy_cq,
+	.get_cq_event = mlx5_glue_get_cq_event,
+	.ack_cq_events = mlx5_glue_ack_cq_events,
+	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
+	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
+	.create_wq = mlx5_glue_create_wq,
+	.destroy_wq = mlx5_glue_destroy_wq,
+	.modify_wq = mlx5_glue_modify_wq,
+	.create_flow = mlx5_glue_create_flow,
+	.destroy_flow = mlx5_glue_destroy_flow,
+	.destroy_flow_action = mlx5_glue_destroy_flow_action,
+	.create_qp = mlx5_glue_create_qp,
+	.create_qp_ex = mlx5_glue_create_qp_ex,
+	.destroy_qp = mlx5_glue_destroy_qp,
+	.modify_qp = mlx5_glue_modify_qp,
+	.reg_mr = mlx5_glue_reg_mr,
+	.dereg_mr = mlx5_glue_dereg_mr,
+	.create_counter_set = mlx5_glue_create_counter_set,
+	.destroy_counter_set = mlx5_glue_destroy_counter_set,
+	.describe_counter_set = mlx5_glue_describe_counter_set,
+	.query_counter_set = mlx5_glue_query_counter_set,
+	.create_counters = mlx5_glue_create_counters,
+	.destroy_counters = mlx5_glue_destroy_counters,
+	.attach_counters = mlx5_glue_attach_counters,
+	.query_counters = mlx5_glue_query_counters,
+	.ack_async_event = mlx5_glue_ack_async_event,
+	.get_async_event = mlx5_glue_get_async_event,
+	.port_state_str = mlx5_glue_port_state_str,
+	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
+	.dr_create_flow_action_dest_flow_tbl =
+		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
+	.dr_create_flow_action_dest_port =
+		mlx5_glue_dr_create_flow_action_dest_port,
+	.dr_create_flow_action_drop =
+		mlx5_glue_dr_create_flow_action_drop,
+	.dr_create_flow_action_push_vlan =
+		mlx5_glue_dr_create_flow_action_push_vlan,
+	.dr_create_flow_action_pop_vlan =
+		mlx5_glue_dr_create_flow_action_pop_vlan,
+	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
+	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
+	.dr_create_domain = mlx5_glue_dr_create_domain,
+	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
+	.dv_create_cq = mlx5_glue_dv_create_cq,
+	.dv_create_wq = mlx5_glue_dv_create_wq,
+	.dv_query_device = mlx5_glue_dv_query_device,
+	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
+	.dv_init_obj = mlx5_glue_dv_init_obj,
+	.dv_create_qp = mlx5_glue_dv_create_qp,
+	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
+	.dv_create_flow = mlx5_glue_dv_create_flow,
+	.dv_create_flow_action_counter =
+		mlx5_glue_dv_create_flow_action_counter,
+	.dv_create_flow_action_dest_ibv_qp =
+		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
+	.dv_create_flow_action_dest_devx_tir =
+		mlx5_glue_dv_create_flow_action_dest_devx_tir,
+	.dv_create_flow_action_modify_header =
+		mlx5_glue_dv_create_flow_action_modify_header,
+	.dv_create_flow_action_packet_reformat =
+		mlx5_glue_dv_create_flow_action_packet_reformat,
+	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
+	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
+	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
+	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
+	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
+	.dv_open_device = mlx5_glue_dv_open_device,
+	.devx_obj_create = mlx5_glue_devx_obj_create,
+	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
+	.devx_obj_query = mlx5_glue_devx_obj_query,
+	.devx_obj_modify = mlx5_glue_devx_obj_modify,
+	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
+	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
+	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
+	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
+	.devx_umem_reg = mlx5_glue_devx_umem_reg,
+	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
+	.devx_qp_query = mlx5_glue_devx_qp_query,
+	.devx_port_query = mlx5_glue_devx_port_query,
+	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+};
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
new file mode 100644
index 0000000..f4c3180
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -0,0 +1,265 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#ifndef MLX5_GLUE_H_
+#define MLX5_GLUE_H_
+
+#include <stddef.h>
+#include <stdint.h>
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+#ifndef MLX5_GLUE_VERSION
+#define MLX5_GLUE_VERSION ""
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+struct ibv_counter_set;
+struct ibv_counter_set_data;
+struct ibv_counter_set_description;
+struct ibv_counter_set_init_attr;
+struct ibv_query_counter_set_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+struct ibv_counters;
+struct ibv_counters_init_attr;
+struct ibv_counter_attach_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+struct mlx5dv_qp_init_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+struct mlx5dv_wq_init_attr;
+#endif
+
+#ifndef HAVE_IBV_FLOW_DV_SUPPORT
+struct mlx5dv_flow_matcher;
+struct mlx5dv_flow_matcher_attr;
+struct mlx5dv_flow_action_attr;
+struct mlx5dv_flow_match_parameters;
+struct mlx5dv_dr_flow_meter_attr;
+struct ibv_flow_action;
+enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
+enum mlx5dv_flow_table_type { flow_table_type = 0, };
+#endif
+
+#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
+#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
+#endif
+
+#ifndef HAVE_IBV_DEVX_OBJ
+struct mlx5dv_devx_obj;
+struct mlx5dv_devx_umem { uint32_t umem_id; };
+#endif
+
+#ifndef HAVE_IBV_DEVX_ASYNC
+struct mlx5dv_devx_cmd_comp;
+struct mlx5dv_devx_async_cmd_hdr;
+#endif
+
+#ifndef HAVE_MLX5DV_DR
+enum  mlx5dv_dr_domain_type { unused, };
+struct mlx5dv_dr_domain;
+#endif
+
+#ifndef HAVE_MLX5DV_DR_DEVX_PORT
+struct mlx5dv_devx_port;
+#endif
+
+#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
+struct mlx5dv_dr_flow_meter_attr;
+#endif
+
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
+struct mlx5_glue {
+	const char *version;
+	int (*fork_init)(void);
+	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
+	int (*dealloc_pd)(struct ibv_pd *pd);
+	struct ibv_device **(*get_device_list)(int *num_devices);
+	void (*free_device_list)(struct ibv_device **list);
+	struct ibv_context *(*open_device)(struct ibv_device *device);
+	int (*close_device)(struct ibv_context *context);
+	int (*query_device)(struct ibv_context *context,
+			    struct ibv_device_attr *device_attr);
+	int (*query_device_ex)(struct ibv_context *context,
+			       const struct ibv_query_device_ex_input *input,
+			       struct ibv_device_attr_ex *attr);
+	int (*query_rt_values_ex)(struct ibv_context *context,
+			       struct ibv_values_ex *values);
+	int (*query_port)(struct ibv_context *context, uint8_t port_num,
+			  struct ibv_port_attr *port_attr);
+	struct ibv_comp_channel *(*create_comp_channel)
+		(struct ibv_context *context);
+	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
+	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
+				    void *cq_context,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector);
+	int (*destroy_cq)(struct ibv_cq *cq);
+	int (*get_cq_event)(struct ibv_comp_channel *channel,
+			    struct ibv_cq **cq, void **cq_context);
+	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
+	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
+		(struct ibv_context *context,
+		 struct ibv_rwq_ind_table_init_attr *init_attr);
+	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
+	struct ibv_wq *(*create_wq)(struct ibv_context *context,
+				    struct ibv_wq_init_attr *wq_init_attr);
+	int (*destroy_wq)(struct ibv_wq *wq);
+	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
+	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
+					struct ibv_flow_attr *flow);
+	int (*destroy_flow)(struct ibv_flow *flow_id);
+	int (*destroy_flow_action)(void *action);
+	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *qp_init_attr);
+	struct ibv_qp *(*create_qp_ex)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
+	int (*destroy_qp)(struct ibv_qp *qp);
+	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+			 int attr_mask);
+	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
+				 size_t length, int access);
+	int (*dereg_mr)(struct ibv_mr *mr);
+	struct ibv_counter_set *(*create_counter_set)
+		(struct ibv_context *context,
+		 struct ibv_counter_set_init_attr *init_attr);
+	int (*destroy_counter_set)(struct ibv_counter_set *cs);
+	int (*describe_counter_set)
+		(struct ibv_context *context,
+		 uint16_t counter_set_id,
+		 struct ibv_counter_set_description *cs_desc);
+	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
+				 struct ibv_counter_set_data *cs_data);
+	struct ibv_counters *(*create_counters)
+		(struct ibv_context *context,
+		 struct ibv_counters_init_attr *init_attr);
+	int (*destroy_counters)(struct ibv_counters *counters);
+	int (*attach_counters)(struct ibv_counters *counters,
+			       struct ibv_counter_attach_attr *attr,
+			       struct ibv_flow *flow);
+	int (*query_counters)(struct ibv_counters *counters,
+			      uint64_t *counters_value,
+			      uint32_t ncounters,
+			      uint32_t flags);
+	void (*ack_async_event)(struct ibv_async_event *event);
+	int (*get_async_event)(struct ibv_context *context,
+			       struct ibv_async_event *event);
+	const char *(*port_state_str)(enum ibv_port_state port_state);
+	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
+	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
+	void *(*dr_create_flow_action_dest_port)(void *domain,
+						 uint32_t port);
+	void *(*dr_create_flow_action_drop)();
+	void *(*dr_create_flow_action_push_vlan)
+					(struct mlx5dv_dr_domain *domain,
+					 rte_be32_t vlan_tag);
+	void *(*dr_create_flow_action_pop_vlan)();
+	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
+	int (*dr_destroy_flow_tbl)(void *tbl);
+	void *(*dr_create_domain)(struct ibv_context *ctx,
+				  enum mlx5dv_dr_domain_type domain);
+	int (*dr_destroy_domain)(void *domain);
+	struct ibv_cq_ex *(*dv_create_cq)
+		(struct ibv_context *context,
+		 struct ibv_cq_init_attr_ex *cq_attr,
+		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
+	struct ibv_wq *(*dv_create_wq)
+		(struct ibv_context *context,
+		 struct ibv_wq_init_attr *wq_attr,
+		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
+	int (*dv_query_device)(struct ibv_context *ctx_in,
+			       struct mlx5dv_context *attrs_out);
+	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
+				   enum mlx5dv_set_ctx_attr_type type,
+				   void *attr);
+	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
+	struct ibv_qp *(*dv_create_qp)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
+	void *(*dv_create_flow_matcher)
+		(struct ibv_context *context,
+		 struct mlx5dv_flow_matcher_attr *matcher_attr,
+		 void *tbl);
+	void *(*dv_create_flow)(void *matcher, void *match_value,
+			  size_t num_actions, void *actions[]);
+	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
+	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
+	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
+	void *(*dv_create_flow_action_modify_header)
+		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
+		 void *domain, uint64_t flags, size_t actions_sz,
+		 uint64_t actions[]);
+	void *(*dv_create_flow_action_packet_reformat)
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data);
+	void *(*dv_create_flow_action_tag)(uint32_t tag);
+	void *(*dv_create_flow_action_meter)
+		(struct mlx5dv_dr_flow_meter_attr *attr);
+	int (*dv_modify_flow_action_meter)(void *action,
+		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
+	int (*dv_destroy_flow)(void *flow);
+	int (*dv_destroy_flow_matcher)(void *matcher);
+	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_obj *(*devx_obj_create)
+					(struct ibv_context *ctx,
+					 const void *in, size_t inlen,
+					 void *out, size_t outlen);
+	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
+	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
+			      const void *in, size_t inlen,
+			      void *out, size_t outlen);
+	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
+			       const void *in, size_t inlen,
+			       void *out, size_t outlen);
+	int (*devx_general_cmd)(struct ibv_context *context,
+				const void *in, size_t inlen,
+				void *out, size_t outlen);
+	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
+					(struct ibv_context *context);
+	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
+				    const void *in, size_t inlen,
+				    size_t outlen, uint64_t wr_id,
+				    struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				       struct mlx5dv_devx_async_cmd_hdr *resp,
+				       size_t cmd_resp_len);
+	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
+						  void *addr, size_t size,
+						  uint32_t access);
+	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
+	int (*devx_qp_query)(struct ibv_qp *qp,
+			     const void *in, size_t inlen,
+			     void *out, size_t outlen);
+	int (*devx_port_query)(struct ibv_context *ctx,
+			       uint32_t port_num,
+			       struct mlx5dv_devx_port *mlx5_devx_port);
+	int (*dr_dump_domain)(FILE *file, void *domain);
+};
+
+const struct mlx5_glue *mlx5_glue;
+
+#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
new file mode 100644
index 0000000..5730ad1
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -0,0 +1,1889 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2016 6WIND S.A.
+ * Copyright 2016 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_PRM_H_
+#define RTE_PMD_MLX5_PRM_H_
+
+#include <assert.h>
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_vect.h>
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+/* RSS hash key size. */
+#define MLX5_RSS_HASH_KEY_LEN 40
+
+/* Get CQE owner bit. */
+#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
+
+/* Get CQE format. */
+#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
+
+/* Get CQE opcode. */
+#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
+
+/* Get CQE solicited event. */
+#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
+
+/* Invalidate a CQE. */
+#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
+
+/* WQE Segment sizes in bytes. */
+#define MLX5_WSEG_SIZE 16u
+#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
+#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
+#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
+
+/* WQE/WQEBB size in bytes. */
+#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
+
+/*
+ * Max size of a WQE session.
+ * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
+ * the WQE size field in Control Segment is 6 bits wide.
+ */
+#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
+
+/*
+ * Default minimum number of Tx queues for inlining packets.
+ * If there are less queues as specified we assume we have
+ * no enough CPU resources (cycles) to perform inlining,
+ * the PCIe throughput is not supposed as bottleneck and
+ * inlining is disabled.
+ */
+#define MLX5_INLINE_MAX_TXQS 8u
+#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
+
+/*
+ * Default packet length threshold to be inlined with
+ * enhanced MPW. If packet length exceeds the threshold
+ * the data are not inlined. Should be aligned in WQEBB
+ * boundary with accounting the title Control and Ethernet
+ * segments.
+ */
+#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Maximal inline data length sent with enhanced MPW.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Minimal amount of packets to be sent with EMPW.
+ * This limits the minimal required size of sent EMPW.
+ * If there are no enough resources to built minimal
+ * EMPW the sending loop exits.
+ */
+#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
+/*
+ * Maximal amount of packets to be sent with EMPW.
+ * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
+ * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
+ * without CQE generation request, being multiplied by
+ * MLX5_TX_COMP_MAX_CQE it may cause significant latency
+ * in tx burst routine at the moment of freeing multiple mbufs.
+ */
+#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
+#define MLX5_MPW_MAX_PACKETS 6
+#define MLX5_MPW_INLINE_MAX_PACKETS 2
+
+/*
+ * Default packet length threshold to be inlined with
+ * ordinary SEND. Inlining saves the MR key search
+ * and extra PCIe data fetch transaction, but eats the
+ * CPU cycles.
+ */
+#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE)
+/*
+ * Maximal inline data length sent with ordinary SEND.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE)
+
+/* Missed in mlv5dv.h, should define here. */
+#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
+
+/* CQE value to inform that VLAN is stripped. */
+#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
+
+/* IPv4 options. */
+#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
+
+/* IPv6 packet. */
+#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
+
+/* IPv4 packet. */
+#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
+
+/* TCP packet. */
+#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
+
+/* UDP packet. */
+#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
+
+/* IP is fragmented. */
+#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
+
+/* L2 header is valid. */
+#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
+
+/* L3 header is valid. */
+#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
+
+/* L4 header is valid. */
+#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
+
+/* Outer packet, 0 IPv4, 1 IPv6. */
+#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
+
+/* Tunnel packet bit in the CQE. */
+#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
+
+/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
+#define MLX5_CQE_LRO_PUSH_MASK 0x40
+
+/* Mask for L4 type in the CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_MASK 0x70
+
+/* The bit index of L4 type in CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_SHIFT 0x4
+
+/* L4 type to indicate TCP packet without acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
+
+/* L4 type to indicate TCP packet with acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
+
+/* Inner L3 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
+
+/* Inner L4 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
+
+/* Outer L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
+
+/* Outer L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
+
+/* Outer L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
+
+/* Outer L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
+
+/* Inner L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
+
+/* Inner L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
+
+/* Inner L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
+
+/* Inner L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
+
+/* VLAN insertion flag. */
+#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
+
+/* Data inline segment flag. */
+#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
+
+/* Is flow mark valid. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
+#else
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
+#endif
+
+/* INVALID is used by packets matching no flow rules. */
+#define MLX5_FLOW_MARK_INVALID 0
+
+/* Maximum allowed value to mark a packet. */
+#define MLX5_FLOW_MARK_MAX 0xfffff0
+
+/* Default mark value used when none is provided. */
+#define MLX5_FLOW_MARK_DEFAULT 0xffffff
+
+/* Default mark mask for metadata legacy mode. */
+#define MLX5_FLOW_MARK_MASK 0xffffff
+
+/* Maximum number of DS in WQE. Limited by 6-bit field. */
+#define MLX5_DSEG_MAX 63
+
+/* The completion mode offset in the WQE control segment line 2. */
+#define MLX5_COMP_MODE_OFFSET 2
+
+/* Amount of data bytes in minimal inline data segment. */
+#define MLX5_DSEG_MIN_INLINE_SIZE 12u
+
+/* Amount of data bytes in minimal inline eth segment. */
+#define MLX5_ESEG_MIN_INLINE_SIZE 18u
+
+/* Amount of data bytes after eth data segment. */
+#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
+
+/* The maximum log value of segments per RQ WQE. */
+#define MLX5_MAX_LOG_RQ_SEGS 5u
+
+/* The alignment needed for WQ buffer. */
+#define MLX5_WQE_BUF_ALIGNMENT 512
+
+/* Completion mode. */
+enum mlx5_completion_mode {
+	MLX5_COMP_ONLY_ERR = 0x0,
+	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
+	MLX5_COMP_ALWAYS = 0x2,
+	MLX5_COMP_CQE_AND_EQE = 0x3,
+};
+
+/* MPW mode. */
+enum mlx5_mpw_mode {
+	MLX5_MPW_DISABLED,
+	MLX5_MPW,
+	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
+};
+
+/* WQE Control segment. */
+struct mlx5_wqe_cseg {
+	uint32_t opcode;
+	uint32_t sq_ds;
+	uint32_t flags;
+	uint32_t misc;
+} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
+
+/* Header of data segment. Minimal size Data Segment */
+struct mlx5_wqe_dseg {
+	uint32_t bcount;
+	union {
+		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
+		struct {
+			uint32_t lkey;
+			uint64_t pbuf;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* Subset of struct WQE Ethernet Segment. */
+struct mlx5_wqe_eseg {
+	union {
+		struct {
+			uint32_t swp_offs;
+			uint8_t	cs_flags;
+			uint8_t	swp_flags;
+			uint16_t mss;
+			uint32_t metadata;
+			uint16_t inline_hdr_sz;
+			union {
+				uint16_t inline_data;
+				uint16_t vlan_tag;
+			};
+		} __rte_packed;
+		struct {
+			uint32_t offsets;
+			uint32_t flags;
+			uint32_t flow_metadata;
+			uint32_t inline_hdr;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* The title WQEBB, header of WQE. */
+struct mlx5_wqe {
+	union {
+		struct mlx5_wqe_cseg cseg;
+		uint32_t ctrl[4];
+	};
+	struct mlx5_wqe_eseg eseg;
+	union {
+		struct mlx5_wqe_dseg dseg[2];
+		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
+	};
+} __rte_packed;
+
+/* WQE for Multi-Packet RQ. */
+struct mlx5_wqe_mprq {
+	struct mlx5_wqe_srq_next_seg next_seg;
+	struct mlx5_wqe_data_seg dseg;
+};
+
+#define MLX5_MPRQ_LEN_MASK 0x000ffff
+#define MLX5_MPRQ_LEN_SHIFT 0
+#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
+#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
+#define MLX5_MPRQ_FILLER_MASK 0x80000000
+#define MLX5_MPRQ_FILLER_SHIFT 31
+
+#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
+
+/* CQ element structure - should be equal to the cache line size */
+struct mlx5_cqe {
+#if (RTE_CACHE_LINE_SIZE == 128)
+	uint8_t padding[64];
+#endif
+	uint8_t pkt_info;
+	uint8_t rsvd0;
+	uint16_t wqe_id;
+	uint8_t lro_tcppsh_abort_dupack;
+	uint8_t lro_min_ttl;
+	uint16_t lro_tcp_win;
+	uint32_t lro_ack_seq_num;
+	uint32_t rx_hash_res;
+	uint8_t rx_hash_type;
+	uint8_t rsvd1[3];
+	uint16_t csum;
+	uint8_t rsvd2[6];
+	uint16_t hdr_type_etc;
+	uint16_t vlan_info;
+	uint8_t lro_num_seg;
+	uint8_t rsvd3[3];
+	uint32_t flow_table_metadata;
+	uint8_t rsvd4[4];
+	uint32_t byte_cnt;
+	uint64_t timestamp;
+	uint32_t sop_drop_qpn;
+	uint16_t wqe_counter;
+	uint8_t rsvd5;
+	uint8_t op_own;
+};
+
+/* Adding direct verbs to data-path. */
+
+/* CQ sequence number mask. */
+#define MLX5_CQ_SQN_MASK 0x3
+
+/* CQ sequence number index. */
+#define MLX5_CQ_SQN_OFFSET 28
+
+/* CQ doorbell index mask. */
+#define MLX5_CI_MASK 0xffffff
+
+/* CQ doorbell offset. */
+#define MLX5_CQ_ARM_DB 1
+
+/* CQ doorbell offset*/
+#define MLX5_CQ_DOORBELL 0x20
+
+/* CQE format value. */
+#define MLX5_COMPRESSED 0x3
+
+/* Action type of header modification. */
+enum {
+	MLX5_MODIFICATION_TYPE_SET = 0x1,
+	MLX5_MODIFICATION_TYPE_ADD = 0x2,
+	MLX5_MODIFICATION_TYPE_COPY = 0x3,
+};
+
+/* The field of packet to be modified. */
+enum mlx5_modification_field {
+	MLX5_MODI_OUT_NONE = -1,
+	MLX5_MODI_OUT_SMAC_47_16 = 1,
+	MLX5_MODI_OUT_SMAC_15_0,
+	MLX5_MODI_OUT_ETHERTYPE,
+	MLX5_MODI_OUT_DMAC_47_16,
+	MLX5_MODI_OUT_DMAC_15_0,
+	MLX5_MODI_OUT_IP_DSCP,
+	MLX5_MODI_OUT_TCP_FLAGS,
+	MLX5_MODI_OUT_TCP_SPORT,
+	MLX5_MODI_OUT_TCP_DPORT,
+	MLX5_MODI_OUT_IPV4_TTL,
+	MLX5_MODI_OUT_UDP_SPORT,
+	MLX5_MODI_OUT_UDP_DPORT,
+	MLX5_MODI_OUT_SIPV6_127_96,
+	MLX5_MODI_OUT_SIPV6_95_64,
+	MLX5_MODI_OUT_SIPV6_63_32,
+	MLX5_MODI_OUT_SIPV6_31_0,
+	MLX5_MODI_OUT_DIPV6_127_96,
+	MLX5_MODI_OUT_DIPV6_95_64,
+	MLX5_MODI_OUT_DIPV6_63_32,
+	MLX5_MODI_OUT_DIPV6_31_0,
+	MLX5_MODI_OUT_SIPV4,
+	MLX5_MODI_OUT_DIPV4,
+	MLX5_MODI_OUT_FIRST_VID,
+	MLX5_MODI_IN_SMAC_47_16 = 0x31,
+	MLX5_MODI_IN_SMAC_15_0,
+	MLX5_MODI_IN_ETHERTYPE,
+	MLX5_MODI_IN_DMAC_47_16,
+	MLX5_MODI_IN_DMAC_15_0,
+	MLX5_MODI_IN_IP_DSCP,
+	MLX5_MODI_IN_TCP_FLAGS,
+	MLX5_MODI_IN_TCP_SPORT,
+	MLX5_MODI_IN_TCP_DPORT,
+	MLX5_MODI_IN_IPV4_TTL,
+	MLX5_MODI_IN_UDP_SPORT,
+	MLX5_MODI_IN_UDP_DPORT,
+	MLX5_MODI_IN_SIPV6_127_96,
+	MLX5_MODI_IN_SIPV6_95_64,
+	MLX5_MODI_IN_SIPV6_63_32,
+	MLX5_MODI_IN_SIPV6_31_0,
+	MLX5_MODI_IN_DIPV6_127_96,
+	MLX5_MODI_IN_DIPV6_95_64,
+	MLX5_MODI_IN_DIPV6_63_32,
+	MLX5_MODI_IN_DIPV6_31_0,
+	MLX5_MODI_IN_SIPV4,
+	MLX5_MODI_IN_DIPV4,
+	MLX5_MODI_OUT_IPV6_HOPLIMIT,
+	MLX5_MODI_IN_IPV6_HOPLIMIT,
+	MLX5_MODI_META_DATA_REG_A,
+	MLX5_MODI_META_DATA_REG_B = 0x50,
+	MLX5_MODI_META_REG_C_0,
+	MLX5_MODI_META_REG_C_1,
+	MLX5_MODI_META_REG_C_2,
+	MLX5_MODI_META_REG_C_3,
+	MLX5_MODI_META_REG_C_4,
+	MLX5_MODI_META_REG_C_5,
+	MLX5_MODI_META_REG_C_6,
+	MLX5_MODI_META_REG_C_7,
+	MLX5_MODI_OUT_TCP_SEQ_NUM,
+	MLX5_MODI_IN_TCP_SEQ_NUM,
+	MLX5_MODI_OUT_TCP_ACK_NUM,
+	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
+};
+
+/* Total number of metadata reg_c's. */
+#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
+
+enum modify_reg {
+	REG_NONE = 0,
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Modification sub command. */
+struct mlx5_modification_cmd {
+	union {
+		uint32_t data0;
+		struct {
+			unsigned int length:5;
+			unsigned int rsvd0:3;
+			unsigned int offset:5;
+			unsigned int rsvd1:3;
+			unsigned int field:12;
+			unsigned int action_type:4;
+		};
+	};
+	union {
+		uint32_t data1;
+		uint8_t data[4];
+		struct {
+			unsigned int rsvd2:8;
+			unsigned int dst_offset:5;
+			unsigned int rsvd3:3;
+			unsigned int dst_field:12;
+			unsigned int rsvd4:4;
+		};
+	};
+};
+
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+
+#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
+#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
+#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
+				  (&(__mlx5_nullp(typ)->fld)))
+#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0x1f))
+#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
+#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
+#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
+				  __mlx5_dw_bit_off(typ, fld))
+#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
+#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0xf))
+#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
+#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
+#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
+#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
+
+/* insert a value to a struct */
+#define MLX5_SET(typ, p, fld, v) \
+	do { \
+		u32 _v = v; \
+		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
+		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
+				  __mlx5_dw_off(typ, fld))) & \
+				  (~__mlx5_dw_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask(typ, fld)) << \
+				   __mlx5_dw_bit_off(typ, fld))); \
+	} while (0)
+
+#define MLX5_SET64(typ, p, fld, v) \
+	do { \
+		assert(__mlx5_bit_sz(typ, fld) == 64); \
+		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
+			rte_cpu_to_be_64(v); \
+	} while (0)
+
+#define MLX5_GET(typ, p, fld) \
+	((rte_be_to_cpu_32(*((__be32 *)(p) +\
+	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
+	__mlx5_mask(typ, fld))
+#define MLX5_GET16(typ, p, fld) \
+	((rte_be_to_cpu_16(*((__be16 *)(p) + \
+	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
+	 __mlx5_mask16(typ, fld))
+#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
+						   __mlx5_64_off(typ, fld)))
+#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
+
+struct mlx5_ifc_fte_match_set_misc_bits {
+	u8 gre_c_present[0x1];
+	u8 reserved_at_1[0x1];
+	u8 gre_k_present[0x1];
+	u8 gre_s_present[0x1];
+	u8 source_vhci_port[0x4];
+	u8 source_sqn[0x18];
+	u8 reserved_at_20[0x10];
+	u8 source_port[0x10];
+	u8 outer_second_prio[0x3];
+	u8 outer_second_cfi[0x1];
+	u8 outer_second_vid[0xc];
+	u8 inner_second_prio[0x3];
+	u8 inner_second_cfi[0x1];
+	u8 inner_second_vid[0xc];
+	u8 outer_second_cvlan_tag[0x1];
+	u8 inner_second_cvlan_tag[0x1];
+	u8 outer_second_svlan_tag[0x1];
+	u8 inner_second_svlan_tag[0x1];
+	u8 reserved_at_64[0xc];
+	u8 gre_protocol[0x10];
+	u8 gre_key_h[0x18];
+	u8 gre_key_l[0x8];
+	u8 vxlan_vni[0x18];
+	u8 reserved_at_b8[0x8];
+	u8 geneve_vni[0x18];
+	u8 reserved_at_e4[0x7];
+	u8 geneve_oam[0x1];
+	u8 reserved_at_e0[0xc];
+	u8 outer_ipv6_flow_label[0x14];
+	u8 reserved_at_100[0xc];
+	u8 inner_ipv6_flow_label[0x14];
+	u8 reserved_at_120[0xa];
+	u8 geneve_opt_len[0x6];
+	u8 geneve_protocol_type[0x10];
+	u8 reserved_at_140[0xc0];
+};
+
+struct mlx5_ifc_ipv4_layout_bits {
+	u8 reserved_at_0[0x60];
+	u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+	u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+	u8 reserved_at_0[0x80];
+};
+
+struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
+	u8 smac_47_16[0x20];
+	u8 smac_15_0[0x10];
+	u8 ethertype[0x10];
+	u8 dmac_47_16[0x20];
+	u8 dmac_15_0[0x10];
+	u8 first_prio[0x3];
+	u8 first_cfi[0x1];
+	u8 first_vid[0xc];
+	u8 ip_protocol[0x8];
+	u8 ip_dscp[0x6];
+	u8 ip_ecn[0x2];
+	u8 cvlan_tag[0x1];
+	u8 svlan_tag[0x1];
+	u8 frag[0x1];
+	u8 ip_version[0x4];
+	u8 tcp_flags[0x9];
+	u8 tcp_sport[0x10];
+	u8 tcp_dport[0x10];
+	u8 reserved_at_c0[0x20];
+	u8 udp_sport[0x10];
+	u8 udp_dport[0x10];
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
+};
+
+struct mlx5_ifc_fte_match_mpls_bits {
+	u8 mpls_label[0x14];
+	u8 mpls_exp[0x3];
+	u8 mpls_s_bos[0x1];
+	u8 mpls_ttl[0x8];
+};
+
+struct mlx5_ifc_fte_match_set_misc2_bits {
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
+	u8 metadata_reg_c_7[0x20];
+	u8 metadata_reg_c_6[0x20];
+	u8 metadata_reg_c_5[0x20];
+	u8 metadata_reg_c_4[0x20];
+	u8 metadata_reg_c_3[0x20];
+	u8 metadata_reg_c_2[0x20];
+	u8 metadata_reg_c_1[0x20];
+	u8 metadata_reg_c_0[0x20];
+	u8 metadata_reg_a[0x20];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
+};
+
+struct mlx5_ifc_fte_match_set_misc3_bits {
+	u8 inner_tcp_seq_num[0x20];
+	u8 outer_tcp_seq_num[0x20];
+	u8 inner_tcp_ack_num[0x20];
+	u8 outer_tcp_ack_num[0x20];
+	u8 reserved_at_auto1[0x8];
+	u8 outer_vxlan_gpe_vni[0x18];
+	u8 outer_vxlan_gpe_next_protocol[0x8];
+	u8 outer_vxlan_gpe_flags[0x8];
+	u8 reserved_at_a8[0x10];
+	u8 icmp_header_data[0x20];
+	u8 icmpv6_header_data[0x20];
+	u8 icmp_type[0x8];
+	u8 icmp_code[0x8];
+	u8 icmpv6_type[0x8];
+	u8 icmpv6_code[0x8];
+	u8 reserved_at_120[0x20];
+	u8 gtpu_teid[0x20];
+	u8 gtpu_msg_type[0x08];
+	u8 gtpu_msg_flags[0x08];
+	u8 reserved_at_170[0x90];
+};
+
+/* Flow matcher. */
+struct mlx5_ifc_fte_match_param_bits {
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
+	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
+	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
+	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
+};
+
+enum {
+	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
+};
+
+enum {
+	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
+	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
+	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
+	MLX5_CMD_OP_CREATE_RQ = 0x908,
+	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
+	MLX5_CMD_OP_QUERY_TIS = 0x915,
+	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
+	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+};
+
+enum {
+	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+};
+
+/* Flow counters. */
+struct mlx5_ifc_alloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_40[0x18];
+	u8         flow_counter_bulk[0x8];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_traffic_counter_bits {
+	u8         packets[0x40];
+	u8         octets[0x40];
+};
+
+struct mlx5_ifc_query_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
+};
+
+struct mlx5_ifc_query_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         reserved_at_40[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+	u8         clear[0x1];
+	u8         dump_to_memory[0x1];
+	u8         num_of_counters[0x1e];
+	u8         flow_counter_id[0x20];
+};
+
+struct mlx5_ifc_mkc_bits {
+	u8         reserved_at_0[0x1];
+	u8         free[0x1];
+	u8         reserved_at_2[0x1];
+	u8         access_mode_4_2[0x3];
+	u8         reserved_at_6[0x7];
+	u8         relaxed_ordering_write[0x1];
+	u8         reserved_at_e[0x1];
+	u8         small_fence_on_rdma_read_response[0x1];
+	u8         umr_en[0x1];
+	u8         a[0x1];
+	u8         rw[0x1];
+	u8         rr[0x1];
+	u8         lw[0x1];
+	u8         lr[0x1];
+	u8         access_mode_1_0[0x2];
+	u8         reserved_at_18[0x8];
+
+	u8         qpn[0x18];
+	u8         mkey_7_0[0x8];
+
+	u8         reserved_at_40[0x20];
+
+	u8         length64[0x1];
+	u8         bsf_en[0x1];
+	u8         sync_umr[0x1];
+	u8         reserved_at_63[0x2];
+	u8         expected_sigerr_count[0x1];
+	u8         reserved_at_66[0x1];
+	u8         en_rinval[0x1];
+	u8         pd[0x18];
+
+	u8         start_addr[0x40];
+
+	u8         len[0x40];
+
+	u8         bsf_octword_size[0x20];
+
+	u8         reserved_at_120[0x80];
+
+	u8         translations_octword_size[0x20];
+
+	u8         reserved_at_1c0[0x1b];
+	u8         log_page_size[0x5];
+
+	u8         reserved_at_1e0[0x20];
+};
+
+struct mlx5_ifc_create_mkey_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x8];
+	u8         mkey_index[0x18];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_mkey_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         pg_access[0x1];
+	u8         reserved_at_61[0x1f];
+
+	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
+
+	u8         reserved_at_280[0x80];
+
+	u8         translations_octword_actual_size[0x20];
+
+	u8         mkey_umem_id[0x20];
+
+	u8         mkey_umem_offset[0x40];
+
+	u8         reserved_at_380[0x500];
+
+	u8         klm_pas_mtt[][0x20];
+};
+
+enum {
+	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+};
+
+enum {
+	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
+	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
+};
+
+enum {
+	MLX5_CAP_INLINE_MODE_L2,
+	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
+	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
+};
+
+enum {
+	MLX5_INLINE_MODE_NONE,
+	MLX5_INLINE_MODE_L2,
+	MLX5_INLINE_MODE_IP,
+	MLX5_INLINE_MODE_TCP_UDP,
+	MLX5_INLINE_MODE_RESERVED4,
+	MLX5_INLINE_MODE_INNER_L2,
+	MLX5_INLINE_MODE_INNER_IP,
+	MLX5_INLINE_MODE_INNER_TCP_UDP,
+};
+
+/* HCA bit masks indicating which Flex parser protocols are already enabled. */
+#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
+#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
+#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
+#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
+#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
+#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
+#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
+#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
+
+struct mlx5_ifc_cmd_hca_cap_bits {
+	u8 reserved_at_0[0x30];
+	u8 vhca_id[0x10];
+	u8 reserved_at_40[0x40];
+	u8 log_max_srq_sz[0x8];
+	u8 log_max_qp_sz[0x8];
+	u8 reserved_at_90[0xb];
+	u8 log_max_qp[0x5];
+	u8 reserved_at_a0[0xb];
+	u8 log_max_srq[0x5];
+	u8 reserved_at_b0[0x10];
+	u8 reserved_at_c0[0x8];
+	u8 log_max_cq_sz[0x8];
+	u8 reserved_at_d0[0xb];
+	u8 log_max_cq[0x5];
+	u8 log_max_eq_sz[0x8];
+	u8 reserved_at_e8[0x2];
+	u8 log_max_mkey[0x6];
+	u8 reserved_at_f0[0x8];
+	u8 dump_fill_mkey[0x1];
+	u8 reserved_at_f9[0x3];
+	u8 log_max_eq[0x4];
+	u8 max_indirection[0x8];
+	u8 fixed_buffer_size[0x1];
+	u8 log_max_mrw_sz[0x7];
+	u8 force_teardown[0x1];
+	u8 reserved_at_111[0x1];
+	u8 log_max_bsf_list_size[0x6];
+	u8 umr_extended_translation_offset[0x1];
+	u8 null_mkey[0x1];
+	u8 log_max_klm_list_size[0x6];
+	u8 reserved_at_120[0xa];
+	u8 log_max_ra_req_dc[0x6];
+	u8 reserved_at_130[0xa];
+	u8 log_max_ra_res_dc[0x6];
+	u8 reserved_at_140[0xa];
+	u8 log_max_ra_req_qp[0x6];
+	u8 reserved_at_150[0xa];
+	u8 log_max_ra_res_qp[0x6];
+	u8 end_pad[0x1];
+	u8 cc_query_allowed[0x1];
+	u8 cc_modify_allowed[0x1];
+	u8 start_pad[0x1];
+	u8 cache_line_128byte[0x1];
+	u8 reserved_at_165[0xa];
+	u8 qcam_reg[0x1];
+	u8 gid_table_size[0x10];
+	u8 out_of_seq_cnt[0x1];
+	u8 vport_counters[0x1];
+	u8 retransmission_q_counters[0x1];
+	u8 debug[0x1];
+	u8 modify_rq_counter_set_id[0x1];
+	u8 rq_delay_drop[0x1];
+	u8 max_qp_cnt[0xa];
+	u8 pkey_table_size[0x10];
+	u8 vport_group_manager[0x1];
+	u8 vhca_group_manager[0x1];
+	u8 ib_virt[0x1];
+	u8 eth_virt[0x1];
+	u8 vnic_env_queue_counters[0x1];
+	u8 ets[0x1];
+	u8 nic_flow_table[0x1];
+	u8 eswitch_manager[0x1];
+	u8 device_memory[0x1];
+	u8 mcam_reg[0x1];
+	u8 pcam_reg[0x1];
+	u8 local_ca_ack_delay[0x5];
+	u8 port_module_event[0x1];
+	u8 enhanced_error_q_counters[0x1];
+	u8 ports_check[0x1];
+	u8 reserved_at_1b3[0x1];
+	u8 disable_link_up[0x1];
+	u8 beacon_led[0x1];
+	u8 port_type[0x2];
+	u8 num_ports[0x8];
+	u8 reserved_at_1c0[0x1];
+	u8 pps[0x1];
+	u8 pps_modify[0x1];
+	u8 log_max_msg[0x5];
+	u8 reserved_at_1c8[0x4];
+	u8 max_tc[0x4];
+	u8 temp_warn_event[0x1];
+	u8 dcbx[0x1];
+	u8 general_notification_event[0x1];
+	u8 reserved_at_1d3[0x2];
+	u8 fpga[0x1];
+	u8 rol_s[0x1];
+	u8 rol_g[0x1];
+	u8 reserved_at_1d8[0x1];
+	u8 wol_s[0x1];
+	u8 wol_g[0x1];
+	u8 wol_a[0x1];
+	u8 wol_b[0x1];
+	u8 wol_m[0x1];
+	u8 wol_u[0x1];
+	u8 wol_p[0x1];
+	u8 stat_rate_support[0x10];
+	u8 reserved_at_1f0[0xc];
+	u8 cqe_version[0x4];
+	u8 compact_address_vector[0x1];
+	u8 striding_rq[0x1];
+	u8 reserved_at_202[0x1];
+	u8 ipoib_enhanced_offloads[0x1];
+	u8 ipoib_basic_offloads[0x1];
+	u8 reserved_at_205[0x1];
+	u8 repeated_block_disabled[0x1];
+	u8 umr_modify_entity_size_disabled[0x1];
+	u8 umr_modify_atomic_disabled[0x1];
+	u8 umr_indirect_mkey_disabled[0x1];
+	u8 umr_fence[0x2];
+	u8 reserved_at_20c[0x3];
+	u8 drain_sigerr[0x1];
+	u8 cmdif_checksum[0x2];
+	u8 sigerr_cqe[0x1];
+	u8 reserved_at_213[0x1];
+	u8 wq_signature[0x1];
+	u8 sctr_data_cqe[0x1];
+	u8 reserved_at_216[0x1];
+	u8 sho[0x1];
+	u8 tph[0x1];
+	u8 rf[0x1];
+	u8 dct[0x1];
+	u8 qos[0x1];
+	u8 eth_net_offloads[0x1];
+	u8 roce[0x1];
+	u8 atomic[0x1];
+	u8 reserved_at_21f[0x1];
+	u8 cq_oi[0x1];
+	u8 cq_resize[0x1];
+	u8 cq_moderation[0x1];
+	u8 reserved_at_223[0x3];
+	u8 cq_eq_remap[0x1];
+	u8 pg[0x1];
+	u8 block_lb_mc[0x1];
+	u8 reserved_at_229[0x1];
+	u8 scqe_break_moderation[0x1];
+	u8 cq_period_start_from_cqe[0x1];
+	u8 cd[0x1];
+	u8 reserved_at_22d[0x1];
+	u8 apm[0x1];
+	u8 vector_calc[0x1];
+	u8 umr_ptr_rlky[0x1];
+	u8 imaicl[0x1];
+	u8 reserved_at_232[0x4];
+	u8 qkv[0x1];
+	u8 pkv[0x1];
+	u8 set_deth_sqpn[0x1];
+	u8 reserved_at_239[0x3];
+	u8 xrc[0x1];
+	u8 ud[0x1];
+	u8 uc[0x1];
+	u8 rc[0x1];
+	u8 uar_4k[0x1];
+	u8 reserved_at_241[0x9];
+	u8 uar_sz[0x6];
+	u8 reserved_at_250[0x8];
+	u8 log_pg_sz[0x8];
+	u8 bf[0x1];
+	u8 driver_version[0x1];
+	u8 pad_tx_eth_packet[0x1];
+	u8 reserved_at_263[0x8];
+	u8 log_bf_reg_size[0x5];
+	u8 reserved_at_270[0xb];
+	u8 lag_master[0x1];
+	u8 num_lag_ports[0x4];
+	u8 reserved_at_280[0x10];
+	u8 max_wqe_sz_sq[0x10];
+	u8 reserved_at_2a0[0x10];
+	u8 max_wqe_sz_rq[0x10];
+	u8 max_flow_counter_31_16[0x10];
+	u8 max_wqe_sz_sq_dc[0x10];
+	u8 reserved_at_2e0[0x7];
+	u8 max_qp_mcg[0x19];
+	u8 reserved_at_300[0x10];
+	u8 flow_counter_bulk_alloc[0x08];
+	u8 log_max_mcg[0x8];
+	u8 reserved_at_320[0x3];
+	u8 log_max_transport_domain[0x5];
+	u8 reserved_at_328[0x3];
+	u8 log_max_pd[0x5];
+	u8 reserved_at_330[0xb];
+	u8 log_max_xrcd[0x5];
+	u8 nic_receive_steering_discard[0x1];
+	u8 receive_discard_vport_down[0x1];
+	u8 transmit_discard_vport_down[0x1];
+	u8 reserved_at_343[0x5];
+	u8 log_max_flow_counter_bulk[0x8];
+	u8 max_flow_counter_15_0[0x10];
+	u8 modify_tis[0x1];
+	u8 flow_counters_dump[0x1];
+	u8 reserved_at_360[0x1];
+	u8 log_max_rq[0x5];
+	u8 reserved_at_368[0x3];
+	u8 log_max_sq[0x5];
+	u8 reserved_at_370[0x3];
+	u8 log_max_tir[0x5];
+	u8 reserved_at_378[0x3];
+	u8 log_max_tis[0x5];
+	u8 basic_cyclic_rcv_wqe[0x1];
+	u8 reserved_at_381[0x2];
+	u8 log_max_rmp[0x5];
+	u8 reserved_at_388[0x3];
+	u8 log_max_rqt[0x5];
+	u8 reserved_at_390[0x3];
+	u8 log_max_rqt_size[0x5];
+	u8 reserved_at_398[0x3];
+	u8 log_max_tis_per_sq[0x5];
+	u8 ext_stride_num_range[0x1];
+	u8 reserved_at_3a1[0x2];
+	u8 log_max_stride_sz_rq[0x5];
+	u8 reserved_at_3a8[0x3];
+	u8 log_min_stride_sz_rq[0x5];
+	u8 reserved_at_3b0[0x3];
+	u8 log_max_stride_sz_sq[0x5];
+	u8 reserved_at_3b8[0x3];
+	u8 log_min_stride_sz_sq[0x5];
+	u8 hairpin[0x1];
+	u8 reserved_at_3c1[0x2];
+	u8 log_max_hairpin_queues[0x5];
+	u8 reserved_at_3c8[0x3];
+	u8 log_max_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3d0[0x3];
+	u8 log_max_hairpin_num_packets[0x5];
+	u8 reserved_at_3d8[0x3];
+	u8 log_max_wq_sz[0x5];
+	u8 nic_vport_change_event[0x1];
+	u8 disable_local_lb_uc[0x1];
+	u8 disable_local_lb_mc[0x1];
+	u8 log_min_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3e8[0x3];
+	u8 log_max_vlan_list[0x5];
+	u8 reserved_at_3f0[0x3];
+	u8 log_max_current_mc_list[0x5];
+	u8 reserved_at_3f8[0x3];
+	u8 log_max_current_uc_list[0x5];
+	u8 general_obj_types[0x40];
+	u8 reserved_at_440[0x20];
+	u8 reserved_at_460[0x10];
+	u8 max_num_eqs[0x10];
+	u8 reserved_at_480[0x3];
+	u8 log_max_l2_table[0x5];
+	u8 reserved_at_488[0x8];
+	u8 log_uar_page_sz[0x10];
+	u8 reserved_at_4a0[0x20];
+	u8 device_frequency_mhz[0x20];
+	u8 device_frequency_khz[0x20];
+	u8 reserved_at_500[0x20];
+	u8 num_of_uars_per_page[0x20];
+	u8 flex_parser_protocols[0x20];
+	u8 reserved_at_560[0x20];
+	u8 reserved_at_580[0x3c];
+	u8 mini_cqe_resp_stride_index[0x1];
+	u8 cqe_128_always[0x1];
+	u8 cqe_compression_128[0x1];
+	u8 cqe_compression[0x1];
+	u8 cqe_compression_timeout[0x10];
+	u8 cqe_compression_max_num[0x10];
+	u8 reserved_at_5e0[0x10];
+	u8 tag_matching[0x1];
+	u8 rndv_offload_rc[0x1];
+	u8 rndv_offload_dc[0x1];
+	u8 log_tag_matching_list_sz[0x5];
+	u8 reserved_at_5f8[0x3];
+	u8 log_max_xrq[0x5];
+	u8 affiliate_nic_vport_criteria[0x8];
+	u8 native_port_num[0x8];
+	u8 num_vhca_ports[0x8];
+	u8 reserved_at_618[0x6];
+	u8 sw_owner_id[0x1];
+	u8 reserved_at_61f[0x1e1];
+};
+
+struct mlx5_ifc_qos_cap_bits {
+	u8 packet_pacing[0x1];
+	u8 esw_scheduling[0x1];
+	u8 esw_bw_share[0x1];
+	u8 esw_rate_limit[0x1];
+	u8 reserved_at_4[0x1];
+	u8 packet_pacing_burst_bound[0x1];
+	u8 packet_pacing_typical_size[0x1];
+	u8 flow_meter_srtcm[0x1];
+	u8 reserved_at_8[0x8];
+	u8 log_max_flow_meter[0x8];
+	u8 flow_meter_reg_id[0x8];
+	u8 reserved_at_25[0x8];
+	u8 flow_meter_reg_share[0x1];
+	u8 reserved_at_2e[0x17];
+	u8 packet_pacing_max_rate[0x20];
+	u8 packet_pacing_min_rate[0x20];
+	u8 reserved_at_80[0x10];
+	u8 packet_pacing_rate_table_size[0x10];
+	u8 esw_element_type[0x10];
+	u8 esw_tsar_type[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 max_qos_para_vport[0x10];
+	u8 max_tsar_bw_share[0x20];
+	u8 reserved_at_100[0x6e8];
+};
+
+struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
+	u8 csum_cap[0x1];
+	u8 vlan_cap[0x1];
+	u8 lro_cap[0x1];
+	u8 lro_psh_flag[0x1];
+	u8 lro_time_stamp[0x1];
+	u8 lro_max_msg_sz_mode[0x2];
+	u8 wqe_vlan_insert[0x1];
+	u8 self_lb_en_modifiable[0x1];
+	u8 self_lb_mc[0x1];
+	u8 self_lb_uc[0x1];
+	u8 max_lso_cap[0x5];
+	u8 multi_pkt_send_wqe[0x2];
+	u8 wqe_inline_mode[0x2];
+	u8 rss_ind_tbl_cap[0x4];
+	u8 reg_umr_sq[0x1];
+	u8 scatter_fcs[0x1];
+	u8 enhanced_multi_pkt_send_wqe[0x1];
+	u8 tunnel_lso_const_out_ip_id[0x1];
+	u8 tunnel_lro_gre[0x1];
+	u8 tunnel_lro_vxlan[0x1];
+	u8 tunnel_stateless_gre[0x1];
+	u8 tunnel_stateless_vxlan[0x1];
+	u8 swp[0x1];
+	u8 swp_csum[0x1];
+	u8 swp_lso[0x1];
+	u8 reserved_at_23[0x8];
+	u8 tunnel_stateless_gtp[0x1];
+	u8 reserved_at_25[0x4];
+	u8 max_vxlan_udp_ports[0x8];
+	u8 reserved_at_38[0x6];
+	u8 max_geneve_opt_len[0x1];
+	u8 tunnel_stateless_geneve_rx[0x1];
+	u8 reserved_at_40[0x10];
+	u8 lro_min_mss_size[0x10];
+	u8 reserved_at_60[0x120];
+	u8 lro_timer_supported_periods[4][0x20];
+	u8 reserved_at_200[0x600];
+};
+
+union mlx5_ifc_hca_cap_union_bits {
+	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
+	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
+	       per_protocol_networking_offload_caps;
+	struct mlx5_ifc_qos_cap_bits qos_cap;
+	u8 reserved_at_0[0x8000];
+};
+
+struct mlx5_ifc_query_hca_cap_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	union mlx5_ifc_hca_cap_union_bits capability;
+};
+
+struct mlx5_ifc_query_hca_cap_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mac_address_layout_bits {
+	u8 reserved_at_0[0x10];
+	u8 mac_addr_47_32[0x10];
+	u8 mac_addr_31_0[0x20];
+};
+
+struct mlx5_ifc_nic_vport_context_bits {
+	u8 reserved_at_0[0x5];
+	u8 min_wqe_inline_mode[0x3];
+	u8 reserved_at_8[0x15];
+	u8 disable_mc_local_lb[0x1];
+	u8 disable_uc_local_lb[0x1];
+	u8 roce_en[0x1];
+	u8 arm_change_event[0x1];
+	u8 reserved_at_21[0x1a];
+	u8 event_on_mtu[0x1];
+	u8 event_on_promisc_change[0x1];
+	u8 event_on_vlan_change[0x1];
+	u8 event_on_mc_address_change[0x1];
+	u8 event_on_uc_address_change[0x1];
+	u8 reserved_at_40[0xc];
+	u8 affiliation_criteria[0x4];
+	u8 affiliated_vhca_id[0x10];
+	u8 reserved_at_60[0xd0];
+	u8 mtu[0x10];
+	u8 system_image_guid[0x40];
+	u8 port_guid[0x40];
+	u8 node_guid[0x40];
+	u8 reserved_at_200[0x140];
+	u8 qkey_violation_counter[0x10];
+	u8 reserved_at_350[0x430];
+	u8 promisc_uc[0x1];
+	u8 promisc_mc[0x1];
+	u8 promisc_all[0x1];
+	u8 reserved_at_783[0x2];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_788[0xc];
+	u8 allowed_list_size[0xc];
+	struct mlx5_ifc_mac_address_layout_bits permanent_address;
+	u8 reserved_at_7e0[0x20];
+};
+
+struct mlx5_ifc_query_nic_vport_context_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
+};
+
+struct mlx5_ifc_query_nic_vport_context_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 other_vport[0x1];
+	u8 reserved_at_41[0xf];
+	u8 vport_number[0x10];
+	u8 reserved_at_60[0x5];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_68[0x18];
+};
+
+struct mlx5_ifc_tisc_bits {
+	u8 strict_lag_tx_port_affinity[0x1];
+	u8 reserved_at_1[0x3];
+	u8 lag_tx_port_affinity[0x04];
+	u8 reserved_at_8[0x4];
+	u8 prio[0x4];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x100];
+	u8 reserved_at_120[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_140[0x8];
+	u8 underlay_qpn[0x18];
+	u8 reserved_at_160[0x3a0];
+};
+
+struct mlx5_ifc_query_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_tisc_bits tis_context;
+};
+
+struct mlx5_ifc_query_tis_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+enum {
+	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
+	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
+	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
+	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
+};
+
+enum {
+	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
+	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
+};
+
+struct mlx5_ifc_wq_bits {
+	u8 wq_type[0x4];
+	u8 wq_signature[0x1];
+	u8 end_padding_mode[0x2];
+	u8 cd_slave[0x1];
+	u8 reserved_at_8[0x18];
+	u8 hds_skip_first_sge[0x1];
+	u8 log2_hds_buf_size[0x3];
+	u8 reserved_at_24[0x7];
+	u8 page_offset[0x5];
+	u8 lwm[0x10];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x8];
+	u8 uar_page[0x18];
+	u8 dbr_addr[0x40];
+	u8 hw_counter[0x20];
+	u8 sw_counter[0x20];
+	u8 reserved_at_100[0xc];
+	u8 log_wq_stride[0x4];
+	u8 reserved_at_110[0x3];
+	u8 log_wq_pg_sz[0x5];
+	u8 reserved_at_118[0x3];
+	u8 log_wq_sz[0x5];
+	u8 dbr_umem_valid[0x1];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_122[0x1];
+	u8 log_hairpin_num_packets[0x5];
+	u8 reserved_at_128[0x3];
+	u8 log_hairpin_data_sz[0x5];
+	u8 reserved_at_130[0x4];
+	u8 single_wqe_log_num_of_strides[0x4];
+	u8 two_byte_shift_en[0x1];
+	u8 reserved_at_139[0x4];
+	u8 single_stride_log_num_of_bytes[0x3];
+	u8 dbr_umem_id[0x20];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_offset[0x40];
+	u8 reserved_at_1c0[0x440];
+};
+
+enum {
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
+};
+
+enum {
+	MLX5_RQC_STATE_RST  = 0x0,
+	MLX5_RQC_STATE_RDY  = 0x1,
+	MLX5_RQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_rqc_bits {
+	u8 rlky[0x1];
+	u8 delay_drop_en[0x1];
+	u8 scatter_fcs[0x1];
+	u8 vsd[0x1];
+	u8 mem_rq_type[0x4];
+	u8 state[0x4];
+	u8 reserved_at_c[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 counter_set_id[0x8];
+	u8 reserved_at_68[0x18];
+	u8 reserved_at_80[0x8];
+	u8 rmpn[0x18];
+	u8 reserved_at_a0[0x8];
+	u8 hairpin_peer_sq[0x18];
+	u8 reserved_at_c0[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_e0[0xa0];
+	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
+};
+
+struct mlx5_ifc_create_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+struct mlx5_ifc_modify_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
+enum {
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
+};
+
+struct mlx5_ifc_modify_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 rq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+enum {
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
+};
+
+struct mlx5_ifc_rx_hash_field_select_bits {
+	u8 l3_prot_type[0x1];
+	u8 l4_prot_type[0x1];
+	u8 selected_fields[0x1e];
+};
+
+enum {
+	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
+	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
+};
+
+enum {
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
+};
+
+enum {
+	MLX5_RX_HASH_FN_NONE           = 0x0,
+	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
+	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
+};
+
+enum {
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
+};
+
+enum {
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
+};
+
+struct mlx5_ifc_tirc_bits {
+	u8 reserved_at_0[0x20];
+	u8 disp_type[0x4];
+	u8 reserved_at_24[0x1c];
+	u8 reserved_at_40[0x40];
+	u8 reserved_at_80[0x4];
+	u8 lro_timeout_period_usecs[0x10];
+	u8 lro_enable_mask[0x4];
+	u8 lro_max_msg_sz[0x8];
+	u8 reserved_at_a0[0x40];
+	u8 reserved_at_e0[0x8];
+	u8 inline_rqn[0x18];
+	u8 rx_hash_symmetric[0x1];
+	u8 reserved_at_101[0x1];
+	u8 tunneled_offload_en[0x1];
+	u8 reserved_at_103[0x5];
+	u8 indirect_table[0x18];
+	u8 rx_hash_fn[0x4];
+	u8 reserved_at_124[0x2];
+	u8 self_lb_block[0x2];
+	u8 transport_domain[0x18];
+	u8 rx_hash_toeplitz_key[10][0x20];
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
+	u8 reserved_at_2c0[0x4c0];
+};
+
+struct mlx5_ifc_create_tir_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tirn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tir_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tirc_bits ctx;
+};
+
+struct mlx5_ifc_rq_num_bits {
+	u8 reserved_at_0[0x8];
+	u8 rq_num[0x18];
+};
+
+struct mlx5_ifc_rqtc_bits {
+	u8 reserved_at_0[0xa0];
+	u8 reserved_at_a0[0x10];
+	u8 rqt_max_size[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 rqt_actual_size[0x10];
+	u8 reserved_at_e0[0x6a0];
+	struct mlx5_ifc_rq_num_bits rq_num[];
+};
+
+struct mlx5_ifc_create_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+enum {
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
+};
+
+struct mlx5_ifc_flow_meter_parameters_bits {
+	u8         valid[0x1];			// 00h
+	u8         bucket_overflow[0x1];
+	u8         start_color[0x2];
+	u8         both_buckets_on_green[0x1];
+	u8         meter_mode[0x2];
+	u8         reserved_at_1[0x19];
+	u8         reserved_at_2[0x20]; //04h
+	u8         reserved_at_3[0x3];
+	u8         cbs_exponent[0x5];		// 08h
+	u8         cbs_mantissa[0x8];
+	u8         reserved_at_4[0x3];
+	u8         cir_exponent[0x5];
+	u8         cir_mantissa[0x8];
+	u8         reserved_at_5[0x20];		// 0Ch
+	u8         reserved_at_6[0x3];
+	u8         ebs_exponent[0x5];		// 10h
+	u8         ebs_mantissa[0x8];
+	u8         reserved_at_7[0x3];
+	u8         eir_exponent[0x5];
+	u8         eir_mantissa[0x8];
+	u8         reserved_at_8[0x60];		// 14h-1Ch
+};
+
+/* CQE format mask. */
+#define MLX5E_CQE_FORMAT_MASK 0xc
+
+/* MPW opcode. */
+#define MLX5_OPC_MOD_MPW 0x01
+
+/* Compressed Rx CQE structure. */
+struct mlx5_mini_cqe8 {
+	union {
+		uint32_t rx_hash_result;
+		struct {
+			uint16_t checksum;
+			uint16_t stride_idx;
+		};
+		struct {
+			uint16_t wqe_counter;
+			uint8_t  s_wqe_opcode;
+			uint8_t  reserved;
+		} s_wqe_info;
+	};
+	uint32_t byte_cnt;
+};
+
+/* srTCM PRM flow meter parameters. */
+enum {
+	MLX5_FLOW_COLOR_RED = 0,
+	MLX5_FLOW_COLOR_YELLOW,
+	MLX5_FLOW_COLOR_GREEN,
+	MLX5_FLOW_COLOR_UNDEFINED,
+};
+
+/* Maximum value of srTCM metering parameters. */
+#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
+#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
+#define MLX5_SRTCM_EBS_MAX 0
+
+/* The bits meter color use. */
+#define MLX5_MTR_COLOR_BITS 8
+
+/**
+ * Convert a user mark to flow mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_set(uint32_t val)
+{
+	uint32_t ret;
+
+	/*
+	 * Add one to the user value to differentiate un-marked flows from
+	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
+	 * remains untouched.
+	 */
+	if (val != MLX5_FLOW_MARK_DEFAULT)
+		++val;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	/*
+	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
+	 * word, byte-swapped by the kernel on little-endian systems. In this
+	 * case, left-shifting the resulting big-endian value ensures the
+	 * least significant 24 bits are retained when converting it back.
+	 */
+	ret = rte_cpu_to_be_32(val) >> 8;
+#else
+	ret = val;
+#endif
+	return ret;
+}
+
+/**
+ * Convert a mark to user mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_get(uint32_t val)
+{
+	/*
+	 * Subtract one from the retrieved value. It was added by
+	 * mlx5_flow_mark_set() to distinguish unmarked flows.
+	 */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	return (val >> 8) - 1;
+#else
+	return val - 1;
+#endif
+}
+
+#endif /* RTE_PMD_MLX5_PRM_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
new file mode 100644
index 0000000..e4f85e2
--- /dev/null
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -0,0 +1,20 @@
+DPDK_20.02 {
+	global:
+
+	mlx5_devx_cmd_create_rq;
+	mlx5_devx_cmd_create_rqt;
+	mlx5_devx_cmd_create_sq;
+	mlx5_devx_cmd_create_tir;
+	mlx5_devx_cmd_create_td;
+	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_destroy;
+	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_query;
+	mlx5_devx_cmd_flow_dump;
+	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_qp_query_tis_td;
+	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_get_out_command_status;
+};
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 0466d9d..a9558ca 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -10,11 +10,14 @@ LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
 LIB_GLUE_VERSION = 20.02.0
 
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+LDLIBS += -ldl
+endif
+
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
-ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
-endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
@@ -37,34 +40,22 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
-endif
-
 # Basic CFLAGS.
 CFLAGS += -O3
 CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
-CFLAGS += -I.
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-CFLAGS_mlx5_glue.o += -fPIC
-LDLIBS += -ldl
-else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
-LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-else
-LDLIBS += -libverbs -lmlx5
-endif
+LDLIBS += -lrte_common_mlx5
 LDLIBS += -lm
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -74,6 +65,7 @@ LDLIBS += -lrte_bus_pci
 CFLAGS += -Wno-error=cast-qual
 
 EXPORT_MAP := rte_pmd_mlx5_version.map
+
 # memseg walk is not part of stable API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
@@ -96,282 +88,3 @@ endif
 
 include $(RTE_SDK)/mk/rte.lib.mk
 
-# Generate and clean-up mlx5_autoconf.h.
-
-export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
-export AUTO_CONFIG_CFLAGS += -Wno-error
-
-ifndef V
-AUTOCONF_OUTPUT := >/dev/null
-endif
-
-mlx5_autoconf.h.new: FORCE
-
-mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
-	$Q $(RM) -f -- '$@'
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_MPLS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_FLOW_SPEC_MPLS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAG_RX_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_SWP \
-		infiniband/mlx5dv.h \
-		type 'struct mlx5dv_sw_parsing_caps' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_MPW \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DV_SUPPORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_create_flow_action_packet_reformat \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ESWITCH \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_VLAN \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_push_vlan \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_DEVX_PORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_query_devx_port \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_OBJ \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_create \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DEVX_COUNTERS \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_ASYNC \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_query_async \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_dest_devx_tir \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_flow_meter \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_FLOW_DUMP \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dump_dr_domain \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
-		infiniband/mlx5dv.h \
-		enum MLX5_MMAP_GET_NC_PAGES_CMD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_25G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_50G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_100G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
-		infiniband/verbs.h \
-		type 'struct ibv_counter_set_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
-		infiniband/verbs.h \
-		type 'struct ibv_counters_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NL_NLDEV \
-		rdma/rdma_netlink.h \
-		enum RDMA_NL_NLDEV \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_PORT_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_PORT_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_PORT_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_NUM_VF \
-		linux/if_link.h \
-		enum IFLA_NUM_VF \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_EXT_MASK \
-		linux/if_link.h \
-		enum IFLA_EXT_MASK \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_SWITCH_ID \
-		linux/if_link.h \
-		enum IFLA_PHYS_SWITCH_ID \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_PORT_NAME \
-		linux/if_link.h \
-		enum IFLA_PHYS_PORT_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_STATIC_ASSERT \
-		/usr/include/assert.h \
-		define static_assert \
-		$(AUTOCONF_OUTPUT)
-
-# Create mlx5_autoconf.h or update it in case it differs from the new one.
-
-mlx5_autoconf.h: mlx5_autoconf.h.new
-	$Q [ -f '$@' ] && \
-		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
-		mv '$<' '$@'
-
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
-
-# Generate dependency plug-in for rdma-core when the PMD must not be linked
-# directly, so that applications do not inherit this dependency.
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-
-$(LIB): $(LIB_GLUE)
-
-ifeq ($(LINK_USING_CC),1)
-GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
-else
-GLUE_LDFLAGS := $(LDFLAGS)
-endif
-$(LIB_GLUE): mlx5_glue.o
-	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
-		-Wl,-h,$(LIB_GLUE) \
-		-shared -o $@ $< -libverbs -lmlx5
-
-mlx5_glue.o: mlx5_autoconf.h
-
-endif
-
-clean_mlx5: FORCE
-	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
-	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
-
-clean: clean_mlx5
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 3ad4f02..f6d0db9 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -7,224 +7,54 @@ if not is_linux
 	reason = 'only supported on Linux'
 	subdir_done()
 endif
-build = true
 
-pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
 LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
 LIB_GLUE_VERSION = '20.02.0'
 LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-if pmd_dlopen
-	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
-	cflags += [
-		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
-		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
-	]
-endif
 
-libnames = [ 'mlx5', 'ibverbs' ]
-libs = []
-foreach libname:libnames
-	lib = dependency('lib' + libname, required:false)
-	if not lib.found()
-		lib = cc.find_library(libname, required:false)
-	endif
-	if lib.found()
-		libs += [ lib ]
-	else
-		build = false
-		reason = 'missing dependency, "' + libname + '"'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5']
+sources = files(
+	'mlx5.c',
+	'mlx5_ethdev.c',
+	'mlx5_flow.c',
+	'mlx5_flow_meter.c',
+	'mlx5_flow_dv.c',
+	'mlx5_flow_verbs.c',
+	'mlx5_mac.c',
+	'mlx5_mr.c',
+	'mlx5_nl.c',
+	'mlx5_rss.c',
+	'mlx5_rxmode.c',
+	'mlx5_rxq.c',
+	'mlx5_rxtx.c',
+	'mlx5_mp.c',
+	'mlx5_stats.c',
+	'mlx5_trigger.c',
+	'mlx5_txq.c',
+	'mlx5_vlan.c',
+	'mlx5_utils.c',
+	'mlx5_socket.c',
+)
+if (dpdk_conf.has('RTE_ARCH_X86_64')
+	or dpdk_conf.has('RTE_ARCH_ARM64')
+	or dpdk_conf.has('RTE_ARCH_PPC_64'))
+	sources += files('mlx5_rxtx_vec.c')
+endif
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
 	endif
 endforeach
-
-if build
-	allow_experimental_apis = true
-	deps += ['hash']
-	ext_deps += libs
-	sources = files(
-		'mlx5.c',
-		'mlx5_ethdev.c',
-		'mlx5_flow.c',
-		'mlx5_flow_meter.c',
-		'mlx5_flow_dv.c',
-		'mlx5_flow_verbs.c',
-		'mlx5_mac.c',
-		'mlx5_mr.c',
-		'mlx5_nl.c',
-		'mlx5_rss.c',
-		'mlx5_rxmode.c',
-		'mlx5_rxq.c',
-		'mlx5_rxtx.c',
-		'mlx5_mp.c',
-		'mlx5_stats.c',
-		'mlx5_trigger.c',
-		'mlx5_txq.c',
-		'mlx5_vlan.c',
-		'mlx5_devx_cmds.c',
-		'mlx5_utils.c',
-		'mlx5_socket.c',
-	)
-	if (dpdk_conf.has('RTE_ARCH_X86_64')
-		or dpdk_conf.has('RTE_ARCH_ARM64')
-		or dpdk_conf.has('RTE_ARCH_PPC_64'))
-		sources += files('mlx5_rxtx_vec.c')
-	endif
-	if not pmd_dlopen
-		sources += files('mlx5_glue.c')
-	endif
-	cflags_options = [
-		'-std=c11',
-		'-Wno-strict-prototypes',
-		'-D_BSD_SOURCE',
-		'-D_DEFAULT_SOURCE',
-		'-D_XOPEN_SOURCE=600'
-	]
-	foreach option:cflags_options
-		if cc.has_argument(option)
-			cflags += option
-		endif
-	endforeach
-	if get_option('buildtype').contains('debug')
-		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
-	else
-		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
-	endif
-	# To maintain the compatibility with the make build system
-	# mlx5_autoconf.h file is still generated.
-	# input array for meson member search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search", "struct member to search" ]
-	has_member_args = [
-		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
-		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
-		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
-		'struct ibv_counters_init_attr', 'comp_mask' ],
-	]
-	# input array for meson symbol search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search" ]
-	has_sym_args = [
-		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
-		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
-		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
-		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_create_flow_action_packet_reformat' ],
-		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
-		'IBV_FLOW_SPEC_MPLS' ],
-		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
-		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAG_RX_END_PADDING' ],
-		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_query_devx_port' ],
-		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_create' ],
-		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
-		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
-		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_query_async' ],
-		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_dest_devx_tir' ],
-		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_flow_meter' ],
-		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
-		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
-		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
-		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
-		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_push_vlan' ],
-		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseLR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseLR4_Full' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
-		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
-		'IFLA_NUM_VF' ],
-		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
-		'IFLA_EXT_MASK' ],
-		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
-		'IFLA_PHYS_SWITCH_ID' ],
-		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
-		'IFLA_PHYS_PORT_NAME' ],
-		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
-		'RDMA_NL_NLDEV' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_GET' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_PORT_GET' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_NAME' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
-		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
-		'mlx5dv_dump_dr_domain'],
-	]
-	config = configuration_data()
-	foreach arg:has_sym_args
-		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
-			dependencies: libs))
-	endforeach
-	foreach arg:has_member_args
-		file_prefix = '#include <' + arg[1] + '>'
-		config.set(arg[0], cc.has_member(arg[2], arg[3],
-			prefix : file_prefix, dependencies: libs))
-	endforeach
-	configure_file(output : 'mlx5_autoconf.h', configuration : config)
-endif
-# Build Glue Library
-if pmd_dlopen and build
-	dlopen_name = 'mlx5_glue'
-	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
-	dlopen_so_version = LIB_GLUE_VERSION
-	dlopen_sources = files('mlx5_glue.c')
-	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
-	dlopen_includes = [global_inc]
-	dlopen_includes += include_directories(
-		'../../../lib/librte_eal/common/include/generic',
-	)
-	shared_lib = shared_library(
-		dlopen_lib_name,
-		dlopen_sources,
-		include_directories: dlopen_includes,
-		c_args: cflags,
-		dependencies: libs,
-		link_args: [
-		'-Wl,-export-dynamic',
-		'-Wl,-h,@0@'.format(LIB_GLUE),
-		],
-		soversion: dlopen_so_version,
-		install: true,
-		install_dir: dlopen_install_dir,
-	)
+if get_option('buildtype').contains('debug')
+	cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+else
+	cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
 endif
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7126edf..7cf357d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -38,15 +38,16 @@
 #include <rte_string_fns.h>
 #include <rte_alarm.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4d0485d..872fccb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -32,13 +32,14 @@
 #include <rte_errno.h>
 #include <rte_flow.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
deleted file mode 100644
index 62ca590..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ /dev/null
@@ -1,976 +0,0 @@
-// SPDX-License-Identifier: BSD-3-Clause
-/* Copyright 2018 Mellanox Technologies, Ltd */
-
-#include <unistd.h>
-
-#include <rte_flow_driver.h>
-#include <rte_malloc.h>
-
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_utils.h"
-
-
-/**
- * Allocate flow counters via devx interface.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param dcs
- *   Pointer to counters properties structure to be filled by the routine.
- * @param bulk_n_128
- *   Bulk counter numbers in 128 counters units.
- *
- * @return
- *   Pointer to counter object on success, a negative value otherwise and
- *   rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
-{
-	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
-	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
-
-	if (!dcs) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
-	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
-	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
-					      sizeof(in), out, sizeof(out));
-	if (!dcs->obj) {
-		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
-		rte_errno = errno;
-		rte_free(dcs);
-		return NULL;
-	}
-	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
-	return dcs;
-}
-
-/**
- * Query flow counters values.
- *
- * @param[in] dcs
- *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
- * @param[in] clear
- *   Whether hardware should clear the counters after the query or not.
- * @param[in] n_counters
- *   0 in case of 1 counter to read, otherwise the counter number to read.
- *  @param pkts
- *   The number of packets that matched the flow.
- *  @param bytes
- *    The number of bytes that matched the flow.
- *  @param mkey
- *   The mkey key for batch query.
- *  @param addr
- *    The address in the mkey range for batch query.
- *  @param cmd_comp
- *   The completion object for asynchronous batch query.
- *  @param async_id
- *    The ID to be returned in the asynchronous batch query response.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				 int clear, uint32_t n_counters,
-				 uint64_t *pkts, uint64_t *bytes,
-				 uint32_t mkey, void *addr,
-				 struct mlx5dv_devx_cmd_comp *cmd_comp,
-				 uint64_t async_id)
-{
-	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-			MLX5_ST_SZ_BYTES(traffic_counter);
-	uint32_t out[out_len];
-	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
-	void *stats;
-	int rc;
-
-	MLX5_SET(query_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
-	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
-	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
-	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
-
-	if (n_counters) {
-		MLX5_SET(query_flow_counter_in, in, num_of_counters,
-			 n_counters);
-		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
-		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
-		MLX5_SET64(query_flow_counter_in, in, address,
-			   (uint64_t)(uintptr_t)addr);
-	}
-	if (!cmd_comp)
-		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
-					       out_len);
-	else
-		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
-						     out_len, async_id,
-						     cmd_comp);
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
-		rte_errno = rc;
-		return -rc;
-	}
-	if (!n_counters) {
-		stats = MLX5_ADDR_OF(query_flow_counter_out,
-				     out, flow_statistics);
-		*pkts = MLX5_GET64(traffic_counter, stats, packets);
-		*bytes = MLX5_GET64(traffic_counter, stats, octets);
-	}
-	return 0;
-}
-
-/**
- * Create a new mkey.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] attr
- *   Attributes of the requested mkey.
- *
- * @return
- *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
- *   is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-			  struct mlx5_devx_mkey_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
-	void *mkc;
-	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
-	size_t pgsize;
-	uint32_t translation_size;
-
-	if (!mkey) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
-	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
-	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
-		 translation_size);
-	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
-	MLX5_SET(mkc, mkc, lw, 0x1);
-	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
-	MLX5_SET(mkc, mkc, qpn, 0xffffff);
-	MLX5_SET(mkc, mkc, pd, attr->pd);
-	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
-	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
-	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
-	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
-					       sizeof(out));
-	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
-		rte_errno = errno;
-		rte_free(mkey);
-		return NULL;
-	}
-	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
-	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
-	return mkey;
-}
-
-/**
- * Get status of devx command response.
- * Mainly used for asynchronous commands.
- *
- * @param[in] out
- *   The out response buffer.
- *
- * @return
- *   0 on success, non-zero value otherwise.
- */
-int
-mlx5_devx_get_out_command_status(void *out)
-{
-	int status;
-
-	if (!out)
-		return -EINVAL;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	if (status) {
-		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-
-		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
-			syndrome);
-	}
-	return status;
-}
-
-/**
- * Destroy any object allocated by a Devx API.
- *
- * @param[in] obj
- *   Pointer to a general object.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
-{
-	int ret;
-
-	if (!obj)
-		return 0;
-	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
-	rte_free(obj);
-	return ret;
-}
-
-/**
- * Query NIC vport context.
- * Fills minimal inline attribute.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] vport
- *   vport index
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-static int
-mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
-				      unsigned int vport,
-				      struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
-	void *vctx;
-	int status, syndrome, rc;
-
-	/* Query NIC vport context to determine inline mode. */
-	MLX5_SET(query_nic_vport_context_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
-	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
-	if (vport)
-		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_nic_vport_context_out, out, status);
-	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
-			    nic_vport_context);
-	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
-					   min_wqe_inline_mode);
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query HCA attributes.
- * Using those attributes we can check on run time if the device
- * is having the required capabilities.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-			     struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
-	void *hcattr;
-	int status, syndrome, rc;
-
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in), out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->flow_counter_bulk_alloc_bitmap =
-			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
-	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
-					    flow_counters_dump);
-	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
-	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
-	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
-						log_max_hairpin_queues);
-	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
-						    log_max_hairpin_wq_data_sz);
-	attr->log_max_hairpin_num_packets = MLX5_GET
-		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
-	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
-	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
-					  eth_net_offloads);
-	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
-					       flex_parser_protocols);
-	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
-	if (attr->qos.sup) {
-		MLX5_SET(query_hca_cap_in, in, op_mod,
-			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
-			 MLX5_HCA_CAP_OPMOD_GET_CUR);
-		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
-						 out, sizeof(out));
-		if (rc)
-			goto error;
-		if (status) {
-			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
-				" status %x, syndrome = %x",
-				status, syndrome);
-			return -1;
-		}
-		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-		attr->qos.srtcm_sup =
-				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
-		attr->qos.log_max_flow_meter =
-				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
-		attr->qos.flow_meter_reg_c_ids =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
-		attr->qos.flow_meter_reg_share =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
-	}
-	if (!attr->eth_net_offloads)
-		return 0;
-
-	/* Query HCA offloads for Ethernet protocol. */
-	memset(in, 0, sizeof(in));
-	memset(out, 0, sizeof(out));
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc) {
-		attr->eth_net_offloads = 0;
-		goto error;
-	}
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		attr->eth_net_offloads = 0;
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_vlan_insert);
-	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_cap);
-	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
-					hcattr, tunnel_lro_gre);
-	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
-					  hcattr, tunnel_lro_vxlan);
-	attr->lro_max_msg_sz_mode = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, lro_max_msg_sz_mode);
-	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
-		attr->lro_timer_supported_periods[i] =
-			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_timer_supported_periods[i]);
-	}
-	attr->tunnel_stateless_geneve_rx =
-			    MLX5_GET(per_protocol_networking_offload_caps,
-				     hcattr, tunnel_stateless_geneve_rx);
-	attr->geneve_max_opt_len =
-		    MLX5_GET(per_protocol_networking_offload_caps,
-			     hcattr, max_geneve_opt_len);
-	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_inline_mode);
-	attr->tunnel_stateless_gtp = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, tunnel_stateless_gtp);
-	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-		return 0;
-	if (attr->eth_virt) {
-		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
-		if (rc) {
-			attr->eth_virt = 0;
-			goto error;
-		}
-	}
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query TIS transport domain from QP verbs object using DevX API.
- *
- * @param[in] qp
- *   Pointer to verbs QP returned by ibv_create_qp .
- * @param[in] tis_num
- *   TIS number of TIS to query.
- * @param[out] tis_td
- *   Pointer to TIS transport domain variable, to be set by the routine.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-			      uint32_t *tis_td)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
-	int rc;
-	void *tis_ctx;
-
-	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
-	MLX5_SET(query_tis_in, in, tisn, tis_num);
-	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query QP using DevX");
-		return -rc;
-	};
-	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
-	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
-	return 0;
-}
-
-/**
- * Fill WQ data for DevX API command.
- * Utility function for use when creating DevX objects containing a WQ.
- *
- * @param[in] wq_ctx
- *   Pointer to WQ context to fill with data.
- * @param [in] wq_attr
- *   Pointer to WQ attributes structure to fill in WQ context.
- */
-static void
-devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
-{
-	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
-	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
-	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
-	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
-	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
-	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
-	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
-	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
-	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
-	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
-	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
-	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
-	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
-	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
-	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
-	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
-	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
-	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
-	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
-		 wq_attr->log_hairpin_num_packets);
-	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
-	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
-		 wq_attr->single_wqe_log_num_of_strides);
-	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
-	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
-		 wq_attr->single_stride_log_num_of_bytes);
-	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
-	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
-	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
-}
-
-/**
- * Create RQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rq_attr
- *   Pointer to create RQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-			struct mlx5_devx_create_rq_attr *rq_attr,
-			int socket)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *rq = NULL;
-
-	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
-	if (!rq) {
-		DRV_LOG(ERR, "Failed to allocate RQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
-	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
-	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
-	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
-	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
-	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
-	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
-	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
-	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-	wq_attr = &rq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						  out, sizeof(out));
-	if (!rq->obj) {
-		DRV_LOG(ERR, "Failed to create RQ using DevX");
-		rte_errno = errno;
-		rte_free(rq);
-		return NULL;
-	}
-	rq->id = MLX5_GET(create_rq_out, out, rqn);
-	return rq;
-}
-
-/**
- * Modify RQ using DevX API.
- *
- * @param[in] rq
- *   Pointer to RQ object structure.
- * @param [in] rq_attr
- *   Pointer to modify RQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			struct mlx5_devx_modify_rq_attr *rq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	int ret;
-
-	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
-	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
-	MLX5_SET(modify_rq_in, in, rqn, rq->id);
-	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
-	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
-		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
-		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
-		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
-		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
-	}
-	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify RQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIR using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tir_attr
- *   Pointer to TIR attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-			 struct mlx5_devx_tir_attr *tir_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
-	void *tir_ctx, *outer, *inner;
-	struct mlx5_devx_obj *tir = NULL;
-	int i;
-
-	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
-	if (!tir) {
-		DRV_LOG(ERR, "Failed to allocate TIR data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
-	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
-	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
-	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
-		 tir_attr->lro_timeout_period_usecs);
-	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
-	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
-	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
-	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
-	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
-		 tir_attr->tunneled_offload_en);
-	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
-	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
-	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
-	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
-	for (i = 0; i < 10; i++) {
-		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
-			 tir_attr->rx_hash_toeplitz_key[i]);
-	}
-	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
-	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, selected_fields,
-		 tir_attr->rx_hash_field_selector_outer.selected_fields);
-	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
-	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, selected_fields,
-		 tir_attr->rx_hash_field_selector_inner.selected_fields);
-	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						   out, sizeof(out));
-	if (!tir->obj) {
-		DRV_LOG(ERR, "Failed to create TIR using DevX");
-		rte_errno = errno;
-		rte_free(tir);
-		return NULL;
-	}
-	tir->id = MLX5_GET(create_tir_out, out, tirn);
-	return tir;
-}
-
-/**
- * Create RQT using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rqt_attr
- *   Pointer to RQT attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-			 struct mlx5_devx_rqt_attr *rqt_attr)
-{
-	uint32_t *in = NULL;
-	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
-			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
-	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
-	void *rqt_ctx;
-	struct mlx5_devx_obj *rqt = NULL;
-	int i;
-
-	in = rte_calloc(__func__, 1, inlen, 0);
-	if (!in) {
-		DRV_LOG(ERR, "Failed to allocate RQT IN data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
-	if (!rqt) {
-		DRV_LOG(ERR, "Failed to allocate RQT data");
-		rte_errno = ENOMEM;
-		rte_free(in);
-		return NULL;
-	}
-	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
-	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
-	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
-	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
-	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
-		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
-	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
-	rte_free(in);
-	if (!rqt->obj) {
-		DRV_LOG(ERR, "Failed to create RQT using DevX");
-		rte_errno = errno;
-		rte_free(rqt);
-		return NULL;
-	}
-	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
-	return rqt;
-}
-
-/**
- * Create SQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- **/
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-			struct mlx5_devx_create_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
-	void *sq_ctx;
-	void *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *sq = NULL;
-
-	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
-	if (!sq) {
-		DRV_LOG(ERR, "Failed to allocate SQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
-	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
-	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
-	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
-	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
-		 sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
-		 sq_attr->min_wqe_inline_mode);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
-	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
-	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
-	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
-	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
-	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
-		 sq_attr->packet_pacing_rate_limit_index);
-	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
-	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
-	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
-	wq_attr = &sq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!sq->obj) {
-		DRV_LOG(ERR, "Failed to create SQ using DevX");
-		rte_errno = errno;
-		rte_free(sq);
-		return NULL;
-	}
-	sq->id = MLX5_GET(create_sq_out, out, sqn);
-	return sq;
-}
-
-/**
- * Modify SQ using DevX API.
- *
- * @param[in] sq
- *   Pointer to SQ object structure.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			struct mlx5_devx_modify_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
-	void *sq_ctx;
-	int ret;
-
-	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
-	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
-	MLX5_SET(modify_sq_in, in, sqn, sq->id);
-	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
-	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify SQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIS using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tis_attr
- *   Pointer to TIS attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-			 struct mlx5_devx_tis_attr *tis_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
-	struct mlx5_devx_obj *tis = NULL;
-	void *tis_ctx;
-
-	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
-	if (!tis) {
-		DRV_LOG(ERR, "Failed to allocate TIS object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
-	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
-	MLX5_SET(tisc, tis_ctx, transport_domain,
-		 tis_attr->transport_domain);
-	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					      out, sizeof(out));
-	if (!tis->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(tis);
-		return NULL;
-	}
-	tis->id = MLX5_GET(create_tis_out, out, tisn);
-	return tis;
-}
-
-/**
- * Create transport domain using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_td(struct ibv_context *ctx)
-{
-	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
-	struct mlx5_devx_obj *td = NULL;
-
-	td = rte_calloc(__func__, 1, sizeof(*td), 0);
-	if (!td) {
-		DRV_LOG(ERR, "Failed to allocate TD object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_transport_domain_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
-	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!td->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(td);
-		return NULL;
-	}
-	td->id = MLX5_GET(alloc_transport_domain_out, out,
-			   transport_domain);
-	return td;
-}
-
-/**
- * Dump all flows to file.
- *
- * @param[in] fdb_domain
- *   FDB domain.
- * @param[in] rx_domain
- *   RX domain.
- * @param[in] tx_domain
- *   TX domain.
- * @param[out] file
- *   Pointer to file stream.
- *
- * @return
- *   0 on success, a nagative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
-			void *rx_domain __rte_unused,
-			void *tx_domain __rte_unused, FILE *file __rte_unused)
-{
-	int ret = 0;
-
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
-		if (ret)
-			return ret;
-	}
-	assert(rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
-	if (ret)
-		return ret;
-	assert(tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
-#else
-	ret = ENOTSUP;
-#endif
-	return -ret;
-}
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
deleted file mode 100644
index 2d58d96..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.h
+++ /dev/null
@@ -1,231 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
-#define RTE_PMD_MLX5_DEVX_CMDS_H_
-
-#include "mlx5_glue.h"
-
-
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint32_t flow_meter_reg_share:1;
-	/* Whether reg_c share is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
-
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					      struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				       struct mlx5_devx_create_rq_attr *rq_attr,
-				       int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					   struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					   struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-				      struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			    struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-					   struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
-			    FILE *file);
-#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ce0109c..eddf888 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -36,9 +36,10 @@
 #include <rte_rwlock.h>
 #include <rte_cycles.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 983b1c3..47ba521 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -27,12 +27,13 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 /* Dev ops structure defined in mlx5.c */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 39be5ba..4255472 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -25,8 +25,9 @@
 #include <rte_alarm.h>
 #include <rte_mtr.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5.h"
-#include "mlx5_prm.h"
 
 /* Private rte flow items. */
 enum mlx5_rte_flow_item_type {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 653d649..1b31602 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -29,12 +29,13 @@
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index c4d28b2..32d51c0 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -9,6 +9,8 @@
 #include <rte_mtr.h>
 #include <rte_mtr_driver.h>
 
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_flow.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 72fb1e4..9231451 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -26,11 +26,12 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #define VERBS_SPEC_INNER(item_flags) \
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
deleted file mode 100644
index 4906eeb..0000000
--- a/drivers/net/mlx5/mlx5_glue.c
+++ /dev/null
@@ -1,1150 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <stdalign.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <stdlib.h>
-
-/*
- * Not needed by this file; included to work around the lack of off_t
- * definition for mlx5dv.h with unpatched rdma-core versions.
- */
-#include <sys/types.h>
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_config.h>
-
-#include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-
-static int
-mlx5_glue_fork_init(void)
-{
-	return ibv_fork_init();
-}
-
-static struct ibv_pd *
-mlx5_glue_alloc_pd(struct ibv_context *context)
-{
-	return ibv_alloc_pd(context);
-}
-
-static int
-mlx5_glue_dealloc_pd(struct ibv_pd *pd)
-{
-	return ibv_dealloc_pd(pd);
-}
-
-static struct ibv_device **
-mlx5_glue_get_device_list(int *num_devices)
-{
-	return ibv_get_device_list(num_devices);
-}
-
-static void
-mlx5_glue_free_device_list(struct ibv_device **list)
-{
-	ibv_free_device_list(list);
-}
-
-static struct ibv_context *
-mlx5_glue_open_device(struct ibv_device *device)
-{
-	return ibv_open_device(device);
-}
-
-static int
-mlx5_glue_close_device(struct ibv_context *context)
-{
-	return ibv_close_device(context);
-}
-
-static int
-mlx5_glue_query_device(struct ibv_context *context,
-		       struct ibv_device_attr *device_attr)
-{
-	return ibv_query_device(context, device_attr);
-}
-
-static int
-mlx5_glue_query_device_ex(struct ibv_context *context,
-			  const struct ibv_query_device_ex_input *input,
-			  struct ibv_device_attr_ex *attr)
-{
-	return ibv_query_device_ex(context, input, attr);
-}
-
-static int
-mlx5_glue_query_rt_values_ex(struct ibv_context *context,
-			  struct ibv_values_ex *values)
-{
-	return ibv_query_rt_values_ex(context, values);
-}
-
-static int
-mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
-		     struct ibv_port_attr *port_attr)
-{
-	return ibv_query_port(context, port_num, port_attr);
-}
-
-static struct ibv_comp_channel *
-mlx5_glue_create_comp_channel(struct ibv_context *context)
-{
-	return ibv_create_comp_channel(context);
-}
-
-static int
-mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
-{
-	return ibv_destroy_comp_channel(channel);
-}
-
-static struct ibv_cq *
-mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
-		    struct ibv_comp_channel *channel, int comp_vector)
-{
-	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
-}
-
-static int
-mlx5_glue_destroy_cq(struct ibv_cq *cq)
-{
-	return ibv_destroy_cq(cq);
-}
-
-static int
-mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
-		       void **cq_context)
-{
-	return ibv_get_cq_event(channel, cq, cq_context);
-}
-
-static void
-mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
-{
-	ibv_ack_cq_events(cq, nevents);
-}
-
-static struct ibv_rwq_ind_table *
-mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
-			       struct ibv_rwq_ind_table_init_attr *init_attr)
-{
-	return ibv_create_rwq_ind_table(context, init_attr);
-}
-
-static int
-mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
-{
-	return ibv_destroy_rwq_ind_table(rwq_ind_table);
-}
-
-static struct ibv_wq *
-mlx5_glue_create_wq(struct ibv_context *context,
-		    struct ibv_wq_init_attr *wq_init_attr)
-{
-	return ibv_create_wq(context, wq_init_attr);
-}
-
-static int
-mlx5_glue_destroy_wq(struct ibv_wq *wq)
-{
-	return ibv_destroy_wq(wq);
-}
-static int
-mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
-{
-	return ibv_modify_wq(wq, wq_attr);
-}
-
-static struct ibv_flow *
-mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
-{
-	return ibv_create_flow(qp, flow);
-}
-
-static int
-mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
-{
-	return ibv_destroy_flow(flow_id);
-}
-
-static int
-mlx5_glue_destroy_flow_action(void *action)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_destroy(action);
-#else
-	struct mlx5dv_flow_action_attr *attr = action;
-	int res = 0;
-	switch (attr->type) {
-	case MLX5DV_FLOW_ACTION_TAG:
-		break;
-	default:
-		res = ibv_destroy_flow_action(attr->action);
-		break;
-	}
-	free(action);
-	return res;
-#endif
-#else
-	(void)action;
-	return ENOTSUP;
-#endif
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
-{
-	return ibv_create_qp(pd, qp_init_attr);
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp_ex(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
-{
-	return ibv_create_qp_ex(context, qp_init_attr_ex);
-}
-
-static int
-mlx5_glue_destroy_qp(struct ibv_qp *qp)
-{
-	return ibv_destroy_qp(qp);
-}
-
-static int
-mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
-{
-	return ibv_modify_qp(qp, attr, attr_mask);
-}
-
-static struct ibv_mr *
-mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
-{
-	return ibv_reg_mr(pd, addr, length, access);
-}
-
-static int
-mlx5_glue_dereg_mr(struct ibv_mr *mr)
-{
-	return ibv_dereg_mr(mr);
-}
-
-static struct ibv_counter_set *
-mlx5_glue_create_counter_set(struct ibv_context *context,
-			     struct ibv_counter_set_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)init_attr;
-	return NULL;
-#else
-	return ibv_create_counter_set(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)cs;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counter_set(cs);
-#endif
-}
-
-static int
-mlx5_glue_describe_counter_set(struct ibv_context *context,
-			       uint16_t counter_set_id,
-			       struct ibv_counter_set_description *cs_desc)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)counter_set_id;
-	(void)cs_desc;
-	return ENOTSUP;
-#else
-	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
-#endif
-}
-
-static int
-mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
-			    struct ibv_counter_set_data *cs_data)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)query_attr;
-	(void)cs_data;
-	return ENOTSUP;
-#else
-	return ibv_query_counter_set(query_attr, cs_data);
-#endif
-}
-
-static struct ibv_counters *
-mlx5_glue_create_counters(struct ibv_context *context,
-			  struct ibv_counters_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)context;
-	(void)init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return ibv_create_counters(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counters(struct ibv_counters *counters)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counters(counters);
-#endif
-}
-
-static int
-mlx5_glue_attach_counters(struct ibv_counters *counters,
-			  struct ibv_counter_attach_attr *attr,
-			  struct ibv_flow *flow)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)attr;
-	(void)flow;
-	return ENOTSUP;
-#else
-	return ibv_attach_counters_point_flow(counters, attr, flow);
-#endif
-}
-
-static int
-mlx5_glue_query_counters(struct ibv_counters *counters,
-			 uint64_t *counters_value,
-			 uint32_t ncounters,
-			 uint32_t flags)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)counters_value;
-	(void)ncounters;
-	(void)flags;
-	return ENOTSUP;
-#else
-	return ibv_read_counters(counters, counters_value, ncounters, flags);
-#endif
-}
-
-static void
-mlx5_glue_ack_async_event(struct ibv_async_event *event)
-{
-	ibv_ack_async_event(event);
-}
-
-static int
-mlx5_glue_get_async_event(struct ibv_context *context,
-			  struct ibv_async_event *event)
-{
-	return ibv_get_async_event(context, event);
-}
-
-static const char *
-mlx5_glue_port_state_str(enum ibv_port_state port_state)
-{
-	return ibv_port_state_str(port_state);
-}
-
-static struct ibv_cq *
-mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
-{
-	return ibv_cq_ex_to_cq(cq);
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_table(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
-#else
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_dest_vport(domain, port);
-#else
-	(void)domain;
-	(void)port;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_drop(void)
-{
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_drop();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
-					  rte_be32_t vlan_tag)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
-#else
-	(void)domain;
-	(void)vlan_tag;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_pop_vlan(void)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_pop_vlan();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_create(domain, level);
-#else
-	(void)domain;
-	(void)level;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_destroy(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_domain(struct ibv_context *ctx,
-			   enum  mlx5dv_dr_domain_type domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_create(ctx, domain);
-#else
-	(void)ctx;
-	(void)domain;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_domain(void *domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_destroy(domain);
-#else
-	(void)domain;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_cq_ex *
-mlx5_glue_dv_create_cq(struct ibv_context *context,
-		       struct ibv_cq_init_attr_ex *cq_attr,
-		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
-{
-	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
-}
-
-static struct ibv_wq *
-mlx5_glue_dv_create_wq(struct ibv_context *context,
-		       struct ibv_wq_init_attr *wq_attr,
-		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
-{
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-	(void)context;
-	(void)wq_attr;
-	(void)mlx5_wq_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
-#endif
-}
-
-static int
-mlx5_glue_dv_query_device(struct ibv_context *ctx,
-			  struct mlx5dv_context *attrs_out)
-{
-	return mlx5dv_query_device(ctx, attrs_out);
-}
-
-static int
-mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
-			      enum mlx5dv_set_ctx_attr_type type, void *attr)
-{
-	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
-}
-
-static int
-mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
-{
-	return mlx5dv_init_obj(obj, obj_type);
-}
-
-static struct ibv_qp *
-mlx5_glue_dv_create_qp(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
-{
-#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
-#else
-	(void)context;
-	(void)qp_init_attr_ex;
-	(void)dv_qp_init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
-				 struct mlx5dv_flow_matcher_attr *matcher_attr,
-				 void *tbl)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)context;
-	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
-					matcher_attr->match_criteria_enable,
-					matcher_attr->match_mask);
-#else
-	(void)tbl;
-	return mlx5dv_create_flow_matcher(context, matcher_attr);
-#endif
-#else
-	(void)context;
-	(void)matcher_attr;
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow(void *matcher,
-			 void *match_value,
-			 size_t num_actions,
-			 void *actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
-				     (struct mlx5dv_dr_action **)actions);
-#else
-	struct mlx5dv_flow_action_attr actions_attr[8];
-
-	if (num_actions > 8)
-		return NULL;
-	for (size_t i = 0; i < num_actions; i++)
-		actions_attr[i] =
-			*((struct mlx5dv_flow_action_attr *)(actions[i]));
-	return mlx5dv_create_flow(matcher, match_value,
-				  num_actions, actions_attr);
-#endif
-#else
-	(void)matcher;
-	(void)match_value;
-	(void)num_actions;
-	(void)actions;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)offset;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
-	action->obj = counter_obj;
-	return action;
-#endif
-#else
-	(void)counter_obj;
-	(void)offset;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
-	action->obj = qp;
-	return action;
-#endif
-#else
-	(void)qp;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
-{
-#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
-	return mlx5dv_dr_action_create_dest_devx_tir(tir);
-#else
-	(void)tir;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_modify_header
-					(struct ibv_context *ctx,
-					 enum mlx5dv_flow_table_type ft_type,
-					 void *domain, uint64_t flags,
-					 size_t actions_sz,
-					 uint64_t actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
-						     (__be64 *)actions);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)domain;
-	(void)flags;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_modify_header
-		(ctx, actions_sz, actions, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)actions_sz;
-	(void)actions;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_packet_reformat
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
-						       reformat_type, data_sz,
-						       data);
-#else
-	(void)domain;
-	(void)flags;
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_packet_reformat
-		(ctx, data_sz, data, reformat_type, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)reformat_type;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)data_sz;
-	(void)data;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_tag(tag);
-#else
-	struct mlx5dv_flow_action_attr *action;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_TAG;
-	action->tag_value = tag;
-	return action;
-#endif
-#endif
-	(void)tag;
-	errno = ENOTSUP;
-	return NULL;
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_create_flow_meter(attr);
-#else
-	(void)attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dv_modify_flow_action_meter(void *action,
-				      struct mlx5dv_dr_flow_meter_attr *attr,
-				      uint64_t modify_bits)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
-#else
-	(void)action;
-	(void)attr;
-	(void)modify_bits;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow(void *flow_id)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_destroy(flow_id);
-#else
-	return ibv_destroy_flow(flow_id);
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow_matcher(void *matcher)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_matcher_destroy(matcher);
-#else
-	return mlx5dv_destroy_flow_matcher(matcher);
-#endif
-#else
-	(void)matcher;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_context *
-mlx5_glue_dv_open_device(struct ibv_device *device)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_open_device(device,
-				  &(struct mlx5dv_context_attr){
-					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
-				  });
-#else
-	(void)device;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static struct mlx5dv_devx_obj *
-mlx5_glue_devx_obj_create(struct ibv_context *ctx,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_destroy(obj);
-#else
-	(void)obj;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
-			 const void *in, size_t inlen,
-			 void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
-			   const void *in, size_t inlen,
-			   void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_cmd_comp *
-mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_create_cmd_comp(ctx);
-#else
-	(void)ctx;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void
-mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
-#else
-	(void)cmd_comp;
-	errno = -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
-			       size_t inlen, size_t outlen, uint64_t wr_id,
-			       struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
-					   cmd_comp);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)outlen;
-	(void)wr_id;
-	(void)cmd_comp;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
-				  size_t cmd_resp_len)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
-					      cmd_resp_len);
-#else
-	(void)cmd_comp;
-	(void)cmd_resp;
-	(void)cmd_resp_len;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_umem *
-mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
-			uint32_t access)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_reg(context, addr, size, access);
-#else
-	(void)context;
-	(void)addr;
-	(void)size;
-	(void)access;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_dereg(dv_devx_umem);
-#else
-	(void)dv_devx_umem;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_qp_query(struct ibv_qp *qp,
-			const void *in, size_t inlen,
-			void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
-#else
-	(void)qp;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_devx_port_query(struct ibv_context *ctx,
-			  uint32_t port_num,
-			  struct mlx5dv_devx_port *mlx5_devx_port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
-#else
-	(void)ctx;
-	(void)port_num;
-	(void)mlx5_devx_port;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dr_dump_domain(FILE *file, void *domain)
-{
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	return mlx5dv_dump_dr_domain(file, domain);
-#else
-	RTE_SET_USED(file);
-	RTE_SET_USED(domain);
-	return -ENOTSUP;
-#endif
-}
-
-alignas(RTE_CACHE_LINE_SIZE)
-const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
-	.version = MLX5_GLUE_VERSION,
-	.fork_init = mlx5_glue_fork_init,
-	.alloc_pd = mlx5_glue_alloc_pd,
-	.dealloc_pd = mlx5_glue_dealloc_pd,
-	.get_device_list = mlx5_glue_get_device_list,
-	.free_device_list = mlx5_glue_free_device_list,
-	.open_device = mlx5_glue_open_device,
-	.close_device = mlx5_glue_close_device,
-	.query_device = mlx5_glue_query_device,
-	.query_device_ex = mlx5_glue_query_device_ex,
-	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
-	.query_port = mlx5_glue_query_port,
-	.create_comp_channel = mlx5_glue_create_comp_channel,
-	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
-	.create_cq = mlx5_glue_create_cq,
-	.destroy_cq = mlx5_glue_destroy_cq,
-	.get_cq_event = mlx5_glue_get_cq_event,
-	.ack_cq_events = mlx5_glue_ack_cq_events,
-	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
-	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
-	.create_wq = mlx5_glue_create_wq,
-	.destroy_wq = mlx5_glue_destroy_wq,
-	.modify_wq = mlx5_glue_modify_wq,
-	.create_flow = mlx5_glue_create_flow,
-	.destroy_flow = mlx5_glue_destroy_flow,
-	.destroy_flow_action = mlx5_glue_destroy_flow_action,
-	.create_qp = mlx5_glue_create_qp,
-	.create_qp_ex = mlx5_glue_create_qp_ex,
-	.destroy_qp = mlx5_glue_destroy_qp,
-	.modify_qp = mlx5_glue_modify_qp,
-	.reg_mr = mlx5_glue_reg_mr,
-	.dereg_mr = mlx5_glue_dereg_mr,
-	.create_counter_set = mlx5_glue_create_counter_set,
-	.destroy_counter_set = mlx5_glue_destroy_counter_set,
-	.describe_counter_set = mlx5_glue_describe_counter_set,
-	.query_counter_set = mlx5_glue_query_counter_set,
-	.create_counters = mlx5_glue_create_counters,
-	.destroy_counters = mlx5_glue_destroy_counters,
-	.attach_counters = mlx5_glue_attach_counters,
-	.query_counters = mlx5_glue_query_counters,
-	.ack_async_event = mlx5_glue_ack_async_event,
-	.get_async_event = mlx5_glue_get_async_event,
-	.port_state_str = mlx5_glue_port_state_str,
-	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
-	.dr_create_flow_action_dest_flow_tbl =
-		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
-	.dr_create_flow_action_dest_port =
-		mlx5_glue_dr_create_flow_action_dest_port,
-	.dr_create_flow_action_drop =
-		mlx5_glue_dr_create_flow_action_drop,
-	.dr_create_flow_action_push_vlan =
-		mlx5_glue_dr_create_flow_action_push_vlan,
-	.dr_create_flow_action_pop_vlan =
-		mlx5_glue_dr_create_flow_action_pop_vlan,
-	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
-	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
-	.dr_create_domain = mlx5_glue_dr_create_domain,
-	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
-	.dv_create_cq = mlx5_glue_dv_create_cq,
-	.dv_create_wq = mlx5_glue_dv_create_wq,
-	.dv_query_device = mlx5_glue_dv_query_device,
-	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
-	.dv_init_obj = mlx5_glue_dv_init_obj,
-	.dv_create_qp = mlx5_glue_dv_create_qp,
-	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
-	.dv_create_flow = mlx5_glue_dv_create_flow,
-	.dv_create_flow_action_counter =
-		mlx5_glue_dv_create_flow_action_counter,
-	.dv_create_flow_action_dest_ibv_qp =
-		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
-	.dv_create_flow_action_dest_devx_tir =
-		mlx5_glue_dv_create_flow_action_dest_devx_tir,
-	.dv_create_flow_action_modify_header =
-		mlx5_glue_dv_create_flow_action_modify_header,
-	.dv_create_flow_action_packet_reformat =
-		mlx5_glue_dv_create_flow_action_packet_reformat,
-	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
-	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
-	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
-	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
-	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
-	.dv_open_device = mlx5_glue_dv_open_device,
-	.devx_obj_create = mlx5_glue_devx_obj_create,
-	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
-	.devx_obj_query = mlx5_glue_devx_obj_query,
-	.devx_obj_modify = mlx5_glue_devx_obj_modify,
-	.devx_general_cmd = mlx5_glue_devx_general_cmd,
-	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
-	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
-	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
-	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
-	.devx_umem_reg = mlx5_glue_devx_umem_reg,
-	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
-	.devx_qp_query = mlx5_glue_devx_qp_query,
-	.devx_port_query = mlx5_glue_devx_port_query,
-	.dr_dump_domain = mlx5_glue_dr_dump_domain,
-};
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
deleted file mode 100644
index 6771a18..0000000
--- a/drivers/net/mlx5/mlx5_glue.h
+++ /dev/null
@@ -1,264 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#ifndef MLX5_GLUE_H_
-#define MLX5_GLUE_H_
-
-#include <stddef.h>
-#include <stdint.h>
-
-#include "rte_byteorder.h"
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#ifndef MLX5_GLUE_VERSION
-#define MLX5_GLUE_VERSION ""
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-struct ibv_counter_set;
-struct ibv_counter_set_data;
-struct ibv_counter_set_description;
-struct ibv_counter_set_init_attr;
-struct ibv_query_counter_set_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-struct ibv_counters;
-struct ibv_counters_init_attr;
-struct ibv_counter_attach_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-struct mlx5dv_qp_init_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-struct mlx5dv_wq_init_attr;
-#endif
-
-#ifndef HAVE_IBV_FLOW_DV_SUPPORT
-struct mlx5dv_flow_matcher;
-struct mlx5dv_flow_matcher_attr;
-struct mlx5dv_flow_action_attr;
-struct mlx5dv_flow_match_parameters;
-struct mlx5dv_dr_flow_meter_attr;
-struct ibv_flow_action;
-enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
-enum mlx5dv_flow_table_type { flow_table_type = 0, };
-#endif
-
-#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
-#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
-#endif
-
-#ifndef HAVE_IBV_DEVX_OBJ
-struct mlx5dv_devx_obj;
-struct mlx5dv_devx_umem { uint32_t umem_id; };
-#endif
-
-#ifndef HAVE_IBV_DEVX_ASYNC
-struct mlx5dv_devx_cmd_comp;
-struct mlx5dv_devx_async_cmd_hdr;
-#endif
-
-#ifndef HAVE_MLX5DV_DR
-enum  mlx5dv_dr_domain_type { unused, };
-struct mlx5dv_dr_domain;
-#endif
-
-#ifndef HAVE_MLX5DV_DR_DEVX_PORT
-struct mlx5dv_devx_port;
-#endif
-
-#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
-struct mlx5dv_dr_flow_meter_attr;
-#endif
-
-/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
-struct mlx5_glue {
-	const char *version;
-	int (*fork_init)(void);
-	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
-	int (*dealloc_pd)(struct ibv_pd *pd);
-	struct ibv_device **(*get_device_list)(int *num_devices);
-	void (*free_device_list)(struct ibv_device **list);
-	struct ibv_context *(*open_device)(struct ibv_device *device);
-	int (*close_device)(struct ibv_context *context);
-	int (*query_device)(struct ibv_context *context,
-			    struct ibv_device_attr *device_attr);
-	int (*query_device_ex)(struct ibv_context *context,
-			       const struct ibv_query_device_ex_input *input,
-			       struct ibv_device_attr_ex *attr);
-	int (*query_rt_values_ex)(struct ibv_context *context,
-			       struct ibv_values_ex *values);
-	int (*query_port)(struct ibv_context *context, uint8_t port_num,
-			  struct ibv_port_attr *port_attr);
-	struct ibv_comp_channel *(*create_comp_channel)
-		(struct ibv_context *context);
-	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
-	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
-				    void *cq_context,
-				    struct ibv_comp_channel *channel,
-				    int comp_vector);
-	int (*destroy_cq)(struct ibv_cq *cq);
-	int (*get_cq_event)(struct ibv_comp_channel *channel,
-			    struct ibv_cq **cq, void **cq_context);
-	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
-	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
-		(struct ibv_context *context,
-		 struct ibv_rwq_ind_table_init_attr *init_attr);
-	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
-	struct ibv_wq *(*create_wq)(struct ibv_context *context,
-				    struct ibv_wq_init_attr *wq_init_attr);
-	int (*destroy_wq)(struct ibv_wq *wq);
-	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
-	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
-					struct ibv_flow_attr *flow);
-	int (*destroy_flow)(struct ibv_flow *flow_id);
-	int (*destroy_flow_action)(void *action);
-	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
-				    struct ibv_qp_init_attr *qp_init_attr);
-	struct ibv_qp *(*create_qp_ex)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
-	int (*destroy_qp)(struct ibv_qp *qp);
-	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
-			 int attr_mask);
-	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
-				 size_t length, int access);
-	int (*dereg_mr)(struct ibv_mr *mr);
-	struct ibv_counter_set *(*create_counter_set)
-		(struct ibv_context *context,
-		 struct ibv_counter_set_init_attr *init_attr);
-	int (*destroy_counter_set)(struct ibv_counter_set *cs);
-	int (*describe_counter_set)
-		(struct ibv_context *context,
-		 uint16_t counter_set_id,
-		 struct ibv_counter_set_description *cs_desc);
-	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
-				 struct ibv_counter_set_data *cs_data);
-	struct ibv_counters *(*create_counters)
-		(struct ibv_context *context,
-		 struct ibv_counters_init_attr *init_attr);
-	int (*destroy_counters)(struct ibv_counters *counters);
-	int (*attach_counters)(struct ibv_counters *counters,
-			       struct ibv_counter_attach_attr *attr,
-			       struct ibv_flow *flow);
-	int (*query_counters)(struct ibv_counters *counters,
-			      uint64_t *counters_value,
-			      uint32_t ncounters,
-			      uint32_t flags);
-	void (*ack_async_event)(struct ibv_async_event *event);
-	int (*get_async_event)(struct ibv_context *context,
-			       struct ibv_async_event *event);
-	const char *(*port_state_str)(enum ibv_port_state port_state);
-	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
-	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
-	void *(*dr_create_flow_action_dest_port)(void *domain,
-						 uint32_t port);
-	void *(*dr_create_flow_action_drop)();
-	void *(*dr_create_flow_action_push_vlan)
-					(struct mlx5dv_dr_domain *domain,
-					 rte_be32_t vlan_tag);
-	void *(*dr_create_flow_action_pop_vlan)();
-	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
-	int (*dr_destroy_flow_tbl)(void *tbl);
-	void *(*dr_create_domain)(struct ibv_context *ctx,
-				  enum mlx5dv_dr_domain_type domain);
-	int (*dr_destroy_domain)(void *domain);
-	struct ibv_cq_ex *(*dv_create_cq)
-		(struct ibv_context *context,
-		 struct ibv_cq_init_attr_ex *cq_attr,
-		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
-	struct ibv_wq *(*dv_create_wq)
-		(struct ibv_context *context,
-		 struct ibv_wq_init_attr *wq_attr,
-		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
-	int (*dv_query_device)(struct ibv_context *ctx_in,
-			       struct mlx5dv_context *attrs_out);
-	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
-				   enum mlx5dv_set_ctx_attr_type type,
-				   void *attr);
-	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
-	struct ibv_qp *(*dv_create_qp)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
-	void *(*dv_create_flow_matcher)
-		(struct ibv_context *context,
-		 struct mlx5dv_flow_matcher_attr *matcher_attr,
-		 void *tbl);
-	void *(*dv_create_flow)(void *matcher, void *match_value,
-			  size_t num_actions, void *actions[]);
-	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
-	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
-	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
-	void *(*dv_create_flow_action_modify_header)
-		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
-		 void *domain, uint64_t flags, size_t actions_sz,
-		 uint64_t actions[]);
-	void *(*dv_create_flow_action_packet_reformat)
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data);
-	void *(*dv_create_flow_action_tag)(uint32_t tag);
-	void *(*dv_create_flow_action_meter)
-		(struct mlx5dv_dr_flow_meter_attr *attr);
-	int (*dv_modify_flow_action_meter)(void *action,
-		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
-	int (*dv_destroy_flow)(void *flow);
-	int (*dv_destroy_flow_matcher)(void *matcher);
-	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
-	struct mlx5dv_devx_obj *(*devx_obj_create)
-					(struct ibv_context *ctx,
-					 const void *in, size_t inlen,
-					 void *out, size_t outlen);
-	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
-	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
-			      const void *in, size_t inlen,
-			      void *out, size_t outlen);
-	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
-			       const void *in, size_t inlen,
-			       void *out, size_t outlen);
-	int (*devx_general_cmd)(struct ibv_context *context,
-				const void *in, size_t inlen,
-				void *out, size_t outlen);
-	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
-					(struct ibv_context *context);
-	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
-				    const void *in, size_t inlen,
-				    size_t outlen, uint64_t wr_id,
-				    struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				       struct mlx5dv_devx_async_cmd_hdr *resp,
-				       size_t cmd_resp_len);
-	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
-						  void *addr, size_t size,
-						  uint32_t access);
-	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
-	int (*devx_qp_query)(struct ibv_qp *qp,
-			     const void *in, size_t inlen,
-			     void *out, size_t outlen);
-	int (*devx_port_query)(struct ibv_context *ctx,
-			       uint32_t port_num,
-			       struct mlx5dv_devx_port *mlx5_devx_port);
-	int (*dr_dump_domain)(FILE *file, void *domain);
-};
-
-const struct mlx5_glue *mlx5_glue;
-
-#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 7bdaa2a..a646b90 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -27,10 +27,10 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 /**
  * Get MAC address by querying netdevice.
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 0d549b6..b1cd9f7 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -17,10 +17,11 @@
 #include <rte_rwlock.h>
 #include <rte_bus_pci.h>
 
+#include <mlx5_glue.h>
+
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_glue.h"
 
 struct mr_find_contig_memsegs_data {
 	uintptr_t addr;
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
deleted file mode 100644
index 8a67025..0000000
--- a/drivers/net/mlx5/mlx5_prm.h
+++ /dev/null
@@ -1,1888 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2016 6WIND S.A.
- * Copyright 2016 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_PRM_H_
-#define RTE_PMD_MLX5_PRM_H_
-
-#include <assert.h>
-
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_vect.h>
-#include "mlx5_autoconf.h"
-
-/* RSS hash key size. */
-#define MLX5_RSS_HASH_KEY_LEN 40
-
-/* Get CQE owner bit. */
-#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
-
-/* Get CQE format. */
-#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
-
-/* Get CQE opcode. */
-#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
-
-/* Get CQE solicited event. */
-#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
-
-/* Invalidate a CQE. */
-#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
-
-/* WQE Segment sizes in bytes. */
-#define MLX5_WSEG_SIZE 16u
-#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
-#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
-#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
-
-/* WQE/WQEBB size in bytes. */
-#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
-
-/*
- * Max size of a WQE session.
- * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
- * the WQE size field in Control Segment is 6 bits wide.
- */
-#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
-
-/*
- * Default minimum number of Tx queues for inlining packets.
- * If there are less queues as specified we assume we have
- * no enough CPU resources (cycles) to perform inlining,
- * the PCIe throughput is not supposed as bottleneck and
- * inlining is disabled.
- */
-#define MLX5_INLINE_MAX_TXQS 8u
-#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
-
-/*
- * Default packet length threshold to be inlined with
- * enhanced MPW. If packet length exceeds the threshold
- * the data are not inlined. Should be aligned in WQEBB
- * boundary with accounting the title Control and Ethernet
- * segments.
- */
-#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Maximal inline data length sent with enhanced MPW.
- * Is based on maximal WQE size.
- */
-#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Minimal amount of packets to be sent with EMPW.
- * This limits the minimal required size of sent EMPW.
- * If there are no enough resources to built minimal
- * EMPW the sending loop exits.
- */
-#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
-/*
- * Maximal amount of packets to be sent with EMPW.
- * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
- * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
- * without CQE generation request, being multiplied by
- * MLX5_TX_COMP_MAX_CQE it may cause significant latency
- * in tx burst routine at the moment of freeing multiple mbufs.
- */
-#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
-#define MLX5_MPW_MAX_PACKETS 6
-#define MLX5_MPW_INLINE_MAX_PACKETS 2
-
-/*
- * Default packet length threshold to be inlined with
- * ordinary SEND. Inlining saves the MR key search
- * and extra PCIe data fetch transaction, but eats the
- * CPU cycles.
- */
-#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE)
-/*
- * Maximal inline data length sent with ordinary SEND.
- * Is based on maximal WQE size.
- */
-#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE)
-
-/* Missed in mlv5dv.h, should define here. */
-#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
-
-/* CQE value to inform that VLAN is stripped. */
-#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
-
-/* IPv4 options. */
-#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
-
-/* IPv6 packet. */
-#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
-
-/* IPv4 packet. */
-#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
-
-/* TCP packet. */
-#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
-
-/* UDP packet. */
-#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
-
-/* IP is fragmented. */
-#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
-
-/* L2 header is valid. */
-#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
-
-/* L3 header is valid. */
-#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
-
-/* L4 header is valid. */
-#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
-
-/* Outer packet, 0 IPv4, 1 IPv6. */
-#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
-
-/* Tunnel packet bit in the CQE. */
-#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
-
-/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
-#define MLX5_CQE_LRO_PUSH_MASK 0x40
-
-/* Mask for L4 type in the CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_MASK 0x70
-
-/* The bit index of L4 type in CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_SHIFT 0x4
-
-/* L4 type to indicate TCP packet without acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
-
-/* L4 type to indicate TCP packet with acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
-
-/* Inner L3 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
-
-/* Inner L4 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
-
-/* Outer L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
-
-/* Outer L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
-
-/* Outer L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
-
-/* Outer L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
-
-/* Inner L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
-
-/* Inner L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
-
-/* Inner L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
-
-/* Inner L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
-
-/* VLAN insertion flag. */
-#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
-
-/* Data inline segment flag. */
-#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
-
-/* Is flow mark valid. */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
-#else
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
-#endif
-
-/* INVALID is used by packets matching no flow rules. */
-#define MLX5_FLOW_MARK_INVALID 0
-
-/* Maximum allowed value to mark a packet. */
-#define MLX5_FLOW_MARK_MAX 0xfffff0
-
-/* Default mark value used when none is provided. */
-#define MLX5_FLOW_MARK_DEFAULT 0xffffff
-
-/* Default mark mask for metadata legacy mode. */
-#define MLX5_FLOW_MARK_MASK 0xffffff
-
-/* Maximum number of DS in WQE. Limited by 6-bit field. */
-#define MLX5_DSEG_MAX 63
-
-/* The completion mode offset in the WQE control segment line 2. */
-#define MLX5_COMP_MODE_OFFSET 2
-
-/* Amount of data bytes in minimal inline data segment. */
-#define MLX5_DSEG_MIN_INLINE_SIZE 12u
-
-/* Amount of data bytes in minimal inline eth segment. */
-#define MLX5_ESEG_MIN_INLINE_SIZE 18u
-
-/* Amount of data bytes after eth data segment. */
-#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
-
-/* The maximum log value of segments per RQ WQE. */
-#define MLX5_MAX_LOG_RQ_SEGS 5u
-
-/* The alignment needed for WQ buffer. */
-#define MLX5_WQE_BUF_ALIGNMENT 512
-
-/* Completion mode. */
-enum mlx5_completion_mode {
-	MLX5_COMP_ONLY_ERR = 0x0,
-	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
-	MLX5_COMP_ALWAYS = 0x2,
-	MLX5_COMP_CQE_AND_EQE = 0x3,
-};
-
-/* MPW mode. */
-enum mlx5_mpw_mode {
-	MLX5_MPW_DISABLED,
-	MLX5_MPW,
-	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
-};
-
-/* WQE Control segment. */
-struct mlx5_wqe_cseg {
-	uint32_t opcode;
-	uint32_t sq_ds;
-	uint32_t flags;
-	uint32_t misc;
-} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
-
-/* Header of data segment. Minimal size Data Segment */
-struct mlx5_wqe_dseg {
-	uint32_t bcount;
-	union {
-		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
-		struct {
-			uint32_t lkey;
-			uint64_t pbuf;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* Subset of struct WQE Ethernet Segment. */
-struct mlx5_wqe_eseg {
-	union {
-		struct {
-			uint32_t swp_offs;
-			uint8_t	cs_flags;
-			uint8_t	swp_flags;
-			uint16_t mss;
-			uint32_t metadata;
-			uint16_t inline_hdr_sz;
-			union {
-				uint16_t inline_data;
-				uint16_t vlan_tag;
-			};
-		} __rte_packed;
-		struct {
-			uint32_t offsets;
-			uint32_t flags;
-			uint32_t flow_metadata;
-			uint32_t inline_hdr;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* The title WQEBB, header of WQE. */
-struct mlx5_wqe {
-	union {
-		struct mlx5_wqe_cseg cseg;
-		uint32_t ctrl[4];
-	};
-	struct mlx5_wqe_eseg eseg;
-	union {
-		struct mlx5_wqe_dseg dseg[2];
-		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
-	};
-} __rte_packed;
-
-/* WQE for Multi-Packet RQ. */
-struct mlx5_wqe_mprq {
-	struct mlx5_wqe_srq_next_seg next_seg;
-	struct mlx5_wqe_data_seg dseg;
-};
-
-#define MLX5_MPRQ_LEN_MASK 0x000ffff
-#define MLX5_MPRQ_LEN_SHIFT 0
-#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
-#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
-#define MLX5_MPRQ_FILLER_MASK 0x80000000
-#define MLX5_MPRQ_FILLER_SHIFT 31
-
-#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
-
-/* CQ element structure - should be equal to the cache line size */
-struct mlx5_cqe {
-#if (RTE_CACHE_LINE_SIZE == 128)
-	uint8_t padding[64];
-#endif
-	uint8_t pkt_info;
-	uint8_t rsvd0;
-	uint16_t wqe_id;
-	uint8_t lro_tcppsh_abort_dupack;
-	uint8_t lro_min_ttl;
-	uint16_t lro_tcp_win;
-	uint32_t lro_ack_seq_num;
-	uint32_t rx_hash_res;
-	uint8_t rx_hash_type;
-	uint8_t rsvd1[3];
-	uint16_t csum;
-	uint8_t rsvd2[6];
-	uint16_t hdr_type_etc;
-	uint16_t vlan_info;
-	uint8_t lro_num_seg;
-	uint8_t rsvd3[3];
-	uint32_t flow_table_metadata;
-	uint8_t rsvd4[4];
-	uint32_t byte_cnt;
-	uint64_t timestamp;
-	uint32_t sop_drop_qpn;
-	uint16_t wqe_counter;
-	uint8_t rsvd5;
-	uint8_t op_own;
-};
-
-/* Adding direct verbs to data-path. */
-
-/* CQ sequence number mask. */
-#define MLX5_CQ_SQN_MASK 0x3
-
-/* CQ sequence number index. */
-#define MLX5_CQ_SQN_OFFSET 28
-
-/* CQ doorbell index mask. */
-#define MLX5_CI_MASK 0xffffff
-
-/* CQ doorbell offset. */
-#define MLX5_CQ_ARM_DB 1
-
-/* CQ doorbell offset*/
-#define MLX5_CQ_DOORBELL 0x20
-
-/* CQE format value. */
-#define MLX5_COMPRESSED 0x3
-
-/* Action type of header modification. */
-enum {
-	MLX5_MODIFICATION_TYPE_SET = 0x1,
-	MLX5_MODIFICATION_TYPE_ADD = 0x2,
-	MLX5_MODIFICATION_TYPE_COPY = 0x3,
-};
-
-/* The field of packet to be modified. */
-enum mlx5_modification_field {
-	MLX5_MODI_OUT_NONE = -1,
-	MLX5_MODI_OUT_SMAC_47_16 = 1,
-	MLX5_MODI_OUT_SMAC_15_0,
-	MLX5_MODI_OUT_ETHERTYPE,
-	MLX5_MODI_OUT_DMAC_47_16,
-	MLX5_MODI_OUT_DMAC_15_0,
-	MLX5_MODI_OUT_IP_DSCP,
-	MLX5_MODI_OUT_TCP_FLAGS,
-	MLX5_MODI_OUT_TCP_SPORT,
-	MLX5_MODI_OUT_TCP_DPORT,
-	MLX5_MODI_OUT_IPV4_TTL,
-	MLX5_MODI_OUT_UDP_SPORT,
-	MLX5_MODI_OUT_UDP_DPORT,
-	MLX5_MODI_OUT_SIPV6_127_96,
-	MLX5_MODI_OUT_SIPV6_95_64,
-	MLX5_MODI_OUT_SIPV6_63_32,
-	MLX5_MODI_OUT_SIPV6_31_0,
-	MLX5_MODI_OUT_DIPV6_127_96,
-	MLX5_MODI_OUT_DIPV6_95_64,
-	MLX5_MODI_OUT_DIPV6_63_32,
-	MLX5_MODI_OUT_DIPV6_31_0,
-	MLX5_MODI_OUT_SIPV4,
-	MLX5_MODI_OUT_DIPV4,
-	MLX5_MODI_OUT_FIRST_VID,
-	MLX5_MODI_IN_SMAC_47_16 = 0x31,
-	MLX5_MODI_IN_SMAC_15_0,
-	MLX5_MODI_IN_ETHERTYPE,
-	MLX5_MODI_IN_DMAC_47_16,
-	MLX5_MODI_IN_DMAC_15_0,
-	MLX5_MODI_IN_IP_DSCP,
-	MLX5_MODI_IN_TCP_FLAGS,
-	MLX5_MODI_IN_TCP_SPORT,
-	MLX5_MODI_IN_TCP_DPORT,
-	MLX5_MODI_IN_IPV4_TTL,
-	MLX5_MODI_IN_UDP_SPORT,
-	MLX5_MODI_IN_UDP_DPORT,
-	MLX5_MODI_IN_SIPV6_127_96,
-	MLX5_MODI_IN_SIPV6_95_64,
-	MLX5_MODI_IN_SIPV6_63_32,
-	MLX5_MODI_IN_SIPV6_31_0,
-	MLX5_MODI_IN_DIPV6_127_96,
-	MLX5_MODI_IN_DIPV6_95_64,
-	MLX5_MODI_IN_DIPV6_63_32,
-	MLX5_MODI_IN_DIPV6_31_0,
-	MLX5_MODI_IN_SIPV4,
-	MLX5_MODI_IN_DIPV4,
-	MLX5_MODI_OUT_IPV6_HOPLIMIT,
-	MLX5_MODI_IN_IPV6_HOPLIMIT,
-	MLX5_MODI_META_DATA_REG_A,
-	MLX5_MODI_META_DATA_REG_B = 0x50,
-	MLX5_MODI_META_REG_C_0,
-	MLX5_MODI_META_REG_C_1,
-	MLX5_MODI_META_REG_C_2,
-	MLX5_MODI_META_REG_C_3,
-	MLX5_MODI_META_REG_C_4,
-	MLX5_MODI_META_REG_C_5,
-	MLX5_MODI_META_REG_C_6,
-	MLX5_MODI_META_REG_C_7,
-	MLX5_MODI_OUT_TCP_SEQ_NUM,
-	MLX5_MODI_IN_TCP_SEQ_NUM,
-	MLX5_MODI_OUT_TCP_ACK_NUM,
-	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
-};
-
-/* Total number of metadata reg_c's. */
-#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
-
-enum modify_reg {
-	REG_NONE = 0,
-	REG_A,
-	REG_B,
-	REG_C_0,
-	REG_C_1,
-	REG_C_2,
-	REG_C_3,
-	REG_C_4,
-	REG_C_5,
-	REG_C_6,
-	REG_C_7,
-};
-
-/* Modification sub command. */
-struct mlx5_modification_cmd {
-	union {
-		uint32_t data0;
-		struct {
-			unsigned int length:5;
-			unsigned int rsvd0:3;
-			unsigned int offset:5;
-			unsigned int rsvd1:3;
-			unsigned int field:12;
-			unsigned int action_type:4;
-		};
-	};
-	union {
-		uint32_t data1;
-		uint8_t data[4];
-		struct {
-			unsigned int rsvd2:8;
-			unsigned int dst_offset:5;
-			unsigned int rsvd3:3;
-			unsigned int dst_field:12;
-			unsigned int rsvd4:4;
-		};
-	};
-};
-
-typedef uint32_t u32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-
-#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
-#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
-#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
-				  (&(__mlx5_nullp(typ)->fld)))
-#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0x1f))
-#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
-#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
-#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
-				  __mlx5_dw_bit_off(typ, fld))
-#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
-#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0xf))
-#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
-#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
-#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
-#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
-
-/* insert a value to a struct */
-#define MLX5_SET(typ, p, fld, v) \
-	do { \
-		u32 _v = v; \
-		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
-		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
-				  __mlx5_dw_off(typ, fld))) & \
-				  (~__mlx5_dw_mask(typ, fld))) | \
-				 (((_v) & __mlx5_mask(typ, fld)) << \
-				   __mlx5_dw_bit_off(typ, fld))); \
-	} while (0)
-
-#define MLX5_SET64(typ, p, fld, v) \
-	do { \
-		assert(__mlx5_bit_sz(typ, fld) == 64); \
-		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
-			rte_cpu_to_be_64(v); \
-	} while (0)
-
-#define MLX5_GET(typ, p, fld) \
-	((rte_be_to_cpu_32(*((__be32 *)(p) +\
-	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
-	__mlx5_mask(typ, fld))
-#define MLX5_GET16(typ, p, fld) \
-	((rte_be_to_cpu_16(*((__be16 *)(p) + \
-	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
-	 __mlx5_mask16(typ, fld))
-#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
-						   __mlx5_64_off(typ, fld)))
-#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
-
-struct mlx5_ifc_fte_match_set_misc_bits {
-	u8 gre_c_present[0x1];
-	u8 reserved_at_1[0x1];
-	u8 gre_k_present[0x1];
-	u8 gre_s_present[0x1];
-	u8 source_vhci_port[0x4];
-	u8 source_sqn[0x18];
-	u8 reserved_at_20[0x10];
-	u8 source_port[0x10];
-	u8 outer_second_prio[0x3];
-	u8 outer_second_cfi[0x1];
-	u8 outer_second_vid[0xc];
-	u8 inner_second_prio[0x3];
-	u8 inner_second_cfi[0x1];
-	u8 inner_second_vid[0xc];
-	u8 outer_second_cvlan_tag[0x1];
-	u8 inner_second_cvlan_tag[0x1];
-	u8 outer_second_svlan_tag[0x1];
-	u8 inner_second_svlan_tag[0x1];
-	u8 reserved_at_64[0xc];
-	u8 gre_protocol[0x10];
-	u8 gre_key_h[0x18];
-	u8 gre_key_l[0x8];
-	u8 vxlan_vni[0x18];
-	u8 reserved_at_b8[0x8];
-	u8 geneve_vni[0x18];
-	u8 reserved_at_e4[0x7];
-	u8 geneve_oam[0x1];
-	u8 reserved_at_e0[0xc];
-	u8 outer_ipv6_flow_label[0x14];
-	u8 reserved_at_100[0xc];
-	u8 inner_ipv6_flow_label[0x14];
-	u8 reserved_at_120[0xa];
-	u8 geneve_opt_len[0x6];
-	u8 geneve_protocol_type[0x10];
-	u8 reserved_at_140[0xc0];
-};
-
-struct mlx5_ifc_ipv4_layout_bits {
-	u8 reserved_at_0[0x60];
-	u8 ipv4[0x20];
-};
-
-struct mlx5_ifc_ipv6_layout_bits {
-	u8 ipv6[16][0x8];
-};
-
-union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
-	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
-	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
-	u8 reserved_at_0[0x80];
-};
-
-struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
-	u8 smac_47_16[0x20];
-	u8 smac_15_0[0x10];
-	u8 ethertype[0x10];
-	u8 dmac_47_16[0x20];
-	u8 dmac_15_0[0x10];
-	u8 first_prio[0x3];
-	u8 first_cfi[0x1];
-	u8 first_vid[0xc];
-	u8 ip_protocol[0x8];
-	u8 ip_dscp[0x6];
-	u8 ip_ecn[0x2];
-	u8 cvlan_tag[0x1];
-	u8 svlan_tag[0x1];
-	u8 frag[0x1];
-	u8 ip_version[0x4];
-	u8 tcp_flags[0x9];
-	u8 tcp_sport[0x10];
-	u8 tcp_dport[0x10];
-	u8 reserved_at_c0[0x20];
-	u8 udp_sport[0x10];
-	u8 udp_dport[0x10];
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
-};
-
-struct mlx5_ifc_fte_match_mpls_bits {
-	u8 mpls_label[0x14];
-	u8 mpls_exp[0x3];
-	u8 mpls_s_bos[0x1];
-	u8 mpls_ttl[0x8];
-};
-
-struct mlx5_ifc_fte_match_set_misc2_bits {
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
-	u8 metadata_reg_c_7[0x20];
-	u8 metadata_reg_c_6[0x20];
-	u8 metadata_reg_c_5[0x20];
-	u8 metadata_reg_c_4[0x20];
-	u8 metadata_reg_c_3[0x20];
-	u8 metadata_reg_c_2[0x20];
-	u8 metadata_reg_c_1[0x20];
-	u8 metadata_reg_c_0[0x20];
-	u8 metadata_reg_a[0x20];
-	u8 metadata_reg_b[0x20];
-	u8 reserved_at_1c0[0x40];
-};
-
-struct mlx5_ifc_fte_match_set_misc3_bits {
-	u8 inner_tcp_seq_num[0x20];
-	u8 outer_tcp_seq_num[0x20];
-	u8 inner_tcp_ack_num[0x20];
-	u8 outer_tcp_ack_num[0x20];
-	u8 reserved_at_auto1[0x8];
-	u8 outer_vxlan_gpe_vni[0x18];
-	u8 outer_vxlan_gpe_next_protocol[0x8];
-	u8 outer_vxlan_gpe_flags[0x8];
-	u8 reserved_at_a8[0x10];
-	u8 icmp_header_data[0x20];
-	u8 icmpv6_header_data[0x20];
-	u8 icmp_type[0x8];
-	u8 icmp_code[0x8];
-	u8 icmpv6_type[0x8];
-	u8 icmpv6_code[0x8];
-	u8 reserved_at_120[0x20];
-	u8 gtpu_teid[0x20];
-	u8 gtpu_msg_type[0x08];
-	u8 gtpu_msg_flags[0x08];
-	u8 reserved_at_170[0x90];
-};
-
-/* Flow matcher. */
-struct mlx5_ifc_fte_match_param_bits {
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
-	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
-	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
-	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
-};
-
-enum {
-	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
-};
-
-enum {
-	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
-	MLX5_CMD_OP_CREATE_MKEY = 0x200,
-	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
-	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
-	MLX5_CMD_OP_CREATE_TIR = 0x900,
-	MLX5_CMD_OP_CREATE_SQ = 0X904,
-	MLX5_CMD_OP_MODIFY_SQ = 0X905,
-	MLX5_CMD_OP_CREATE_RQ = 0x908,
-	MLX5_CMD_OP_MODIFY_RQ = 0x909,
-	MLX5_CMD_OP_CREATE_TIS = 0x912,
-	MLX5_CMD_OP_QUERY_TIS = 0x915,
-	MLX5_CMD_OP_CREATE_RQT = 0x916,
-	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
-	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
-};
-
-enum {
-	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
-};
-
-/* Flow counters. */
-struct mlx5_ifc_alloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_40[0x18];
-	u8         flow_counter_bulk[0x8];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_traffic_counter_bits {
-	u8         packets[0x40];
-	u8         octets[0x40];
-};
-
-struct mlx5_ifc_query_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
-};
-
-struct mlx5_ifc_query_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         reserved_at_40[0x20];
-	u8         mkey[0x20];
-	u8         address[0x40];
-	u8         clear[0x1];
-	u8         dump_to_memory[0x1];
-	u8         num_of_counters[0x1e];
-	u8         flow_counter_id[0x20];
-};
-
-struct mlx5_ifc_mkc_bits {
-	u8         reserved_at_0[0x1];
-	u8         free[0x1];
-	u8         reserved_at_2[0x1];
-	u8         access_mode_4_2[0x3];
-	u8         reserved_at_6[0x7];
-	u8         relaxed_ordering_write[0x1];
-	u8         reserved_at_e[0x1];
-	u8         small_fence_on_rdma_read_response[0x1];
-	u8         umr_en[0x1];
-	u8         a[0x1];
-	u8         rw[0x1];
-	u8         rr[0x1];
-	u8         lw[0x1];
-	u8         lr[0x1];
-	u8         access_mode_1_0[0x2];
-	u8         reserved_at_18[0x8];
-
-	u8         qpn[0x18];
-	u8         mkey_7_0[0x8];
-
-	u8         reserved_at_40[0x20];
-
-	u8         length64[0x1];
-	u8         bsf_en[0x1];
-	u8         sync_umr[0x1];
-	u8         reserved_at_63[0x2];
-	u8         expected_sigerr_count[0x1];
-	u8         reserved_at_66[0x1];
-	u8         en_rinval[0x1];
-	u8         pd[0x18];
-
-	u8         start_addr[0x40];
-
-	u8         len[0x40];
-
-	u8         bsf_octword_size[0x20];
-
-	u8         reserved_at_120[0x80];
-
-	u8         translations_octword_size[0x20];
-
-	u8         reserved_at_1c0[0x1b];
-	u8         log_page_size[0x5];
-
-	u8         reserved_at_1e0[0x20];
-};
-
-struct mlx5_ifc_create_mkey_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-
-	u8         syndrome[0x20];
-
-	u8         reserved_at_40[0x8];
-	u8         mkey_index[0x18];
-
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_mkey_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-
-	u8         reserved_at_40[0x20];
-
-	u8         pg_access[0x1];
-	u8         reserved_at_61[0x1f];
-
-	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
-
-	u8         reserved_at_280[0x80];
-
-	u8         translations_octword_actual_size[0x20];
-
-	u8         mkey_umem_id[0x20];
-
-	u8         mkey_umem_offset[0x40];
-
-	u8         reserved_at_380[0x500];
-
-	u8         klm_pas_mtt[][0x20];
-};
-
-enum {
-	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
-};
-
-enum {
-	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
-	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
-};
-
-enum {
-	MLX5_CAP_INLINE_MODE_L2,
-	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
-	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
-};
-
-enum {
-	MLX5_INLINE_MODE_NONE,
-	MLX5_INLINE_MODE_L2,
-	MLX5_INLINE_MODE_IP,
-	MLX5_INLINE_MODE_TCP_UDP,
-	MLX5_INLINE_MODE_RESERVED4,
-	MLX5_INLINE_MODE_INNER_L2,
-	MLX5_INLINE_MODE_INNER_IP,
-	MLX5_INLINE_MODE_INNER_TCP_UDP,
-};
-
-/* HCA bit masks indicating which Flex parser protocols are already enabled. */
-#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
-#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
-#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
-#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
-#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
-#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
-#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
-#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
-
-struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x30];
-	u8 vhca_id[0x10];
-	u8 reserved_at_40[0x40];
-	u8 log_max_srq_sz[0x8];
-	u8 log_max_qp_sz[0x8];
-	u8 reserved_at_90[0xb];
-	u8 log_max_qp[0x5];
-	u8 reserved_at_a0[0xb];
-	u8 log_max_srq[0x5];
-	u8 reserved_at_b0[0x10];
-	u8 reserved_at_c0[0x8];
-	u8 log_max_cq_sz[0x8];
-	u8 reserved_at_d0[0xb];
-	u8 log_max_cq[0x5];
-	u8 log_max_eq_sz[0x8];
-	u8 reserved_at_e8[0x2];
-	u8 log_max_mkey[0x6];
-	u8 reserved_at_f0[0x8];
-	u8 dump_fill_mkey[0x1];
-	u8 reserved_at_f9[0x3];
-	u8 log_max_eq[0x4];
-	u8 max_indirection[0x8];
-	u8 fixed_buffer_size[0x1];
-	u8 log_max_mrw_sz[0x7];
-	u8 force_teardown[0x1];
-	u8 reserved_at_111[0x1];
-	u8 log_max_bsf_list_size[0x6];
-	u8 umr_extended_translation_offset[0x1];
-	u8 null_mkey[0x1];
-	u8 log_max_klm_list_size[0x6];
-	u8 reserved_at_120[0xa];
-	u8 log_max_ra_req_dc[0x6];
-	u8 reserved_at_130[0xa];
-	u8 log_max_ra_res_dc[0x6];
-	u8 reserved_at_140[0xa];
-	u8 log_max_ra_req_qp[0x6];
-	u8 reserved_at_150[0xa];
-	u8 log_max_ra_res_qp[0x6];
-	u8 end_pad[0x1];
-	u8 cc_query_allowed[0x1];
-	u8 cc_modify_allowed[0x1];
-	u8 start_pad[0x1];
-	u8 cache_line_128byte[0x1];
-	u8 reserved_at_165[0xa];
-	u8 qcam_reg[0x1];
-	u8 gid_table_size[0x10];
-	u8 out_of_seq_cnt[0x1];
-	u8 vport_counters[0x1];
-	u8 retransmission_q_counters[0x1];
-	u8 debug[0x1];
-	u8 modify_rq_counter_set_id[0x1];
-	u8 rq_delay_drop[0x1];
-	u8 max_qp_cnt[0xa];
-	u8 pkey_table_size[0x10];
-	u8 vport_group_manager[0x1];
-	u8 vhca_group_manager[0x1];
-	u8 ib_virt[0x1];
-	u8 eth_virt[0x1];
-	u8 vnic_env_queue_counters[0x1];
-	u8 ets[0x1];
-	u8 nic_flow_table[0x1];
-	u8 eswitch_manager[0x1];
-	u8 device_memory[0x1];
-	u8 mcam_reg[0x1];
-	u8 pcam_reg[0x1];
-	u8 local_ca_ack_delay[0x5];
-	u8 port_module_event[0x1];
-	u8 enhanced_error_q_counters[0x1];
-	u8 ports_check[0x1];
-	u8 reserved_at_1b3[0x1];
-	u8 disable_link_up[0x1];
-	u8 beacon_led[0x1];
-	u8 port_type[0x2];
-	u8 num_ports[0x8];
-	u8 reserved_at_1c0[0x1];
-	u8 pps[0x1];
-	u8 pps_modify[0x1];
-	u8 log_max_msg[0x5];
-	u8 reserved_at_1c8[0x4];
-	u8 max_tc[0x4];
-	u8 temp_warn_event[0x1];
-	u8 dcbx[0x1];
-	u8 general_notification_event[0x1];
-	u8 reserved_at_1d3[0x2];
-	u8 fpga[0x1];
-	u8 rol_s[0x1];
-	u8 rol_g[0x1];
-	u8 reserved_at_1d8[0x1];
-	u8 wol_s[0x1];
-	u8 wol_g[0x1];
-	u8 wol_a[0x1];
-	u8 wol_b[0x1];
-	u8 wol_m[0x1];
-	u8 wol_u[0x1];
-	u8 wol_p[0x1];
-	u8 stat_rate_support[0x10];
-	u8 reserved_at_1f0[0xc];
-	u8 cqe_version[0x4];
-	u8 compact_address_vector[0x1];
-	u8 striding_rq[0x1];
-	u8 reserved_at_202[0x1];
-	u8 ipoib_enhanced_offloads[0x1];
-	u8 ipoib_basic_offloads[0x1];
-	u8 reserved_at_205[0x1];
-	u8 repeated_block_disabled[0x1];
-	u8 umr_modify_entity_size_disabled[0x1];
-	u8 umr_modify_atomic_disabled[0x1];
-	u8 umr_indirect_mkey_disabled[0x1];
-	u8 umr_fence[0x2];
-	u8 reserved_at_20c[0x3];
-	u8 drain_sigerr[0x1];
-	u8 cmdif_checksum[0x2];
-	u8 sigerr_cqe[0x1];
-	u8 reserved_at_213[0x1];
-	u8 wq_signature[0x1];
-	u8 sctr_data_cqe[0x1];
-	u8 reserved_at_216[0x1];
-	u8 sho[0x1];
-	u8 tph[0x1];
-	u8 rf[0x1];
-	u8 dct[0x1];
-	u8 qos[0x1];
-	u8 eth_net_offloads[0x1];
-	u8 roce[0x1];
-	u8 atomic[0x1];
-	u8 reserved_at_21f[0x1];
-	u8 cq_oi[0x1];
-	u8 cq_resize[0x1];
-	u8 cq_moderation[0x1];
-	u8 reserved_at_223[0x3];
-	u8 cq_eq_remap[0x1];
-	u8 pg[0x1];
-	u8 block_lb_mc[0x1];
-	u8 reserved_at_229[0x1];
-	u8 scqe_break_moderation[0x1];
-	u8 cq_period_start_from_cqe[0x1];
-	u8 cd[0x1];
-	u8 reserved_at_22d[0x1];
-	u8 apm[0x1];
-	u8 vector_calc[0x1];
-	u8 umr_ptr_rlky[0x1];
-	u8 imaicl[0x1];
-	u8 reserved_at_232[0x4];
-	u8 qkv[0x1];
-	u8 pkv[0x1];
-	u8 set_deth_sqpn[0x1];
-	u8 reserved_at_239[0x3];
-	u8 xrc[0x1];
-	u8 ud[0x1];
-	u8 uc[0x1];
-	u8 rc[0x1];
-	u8 uar_4k[0x1];
-	u8 reserved_at_241[0x9];
-	u8 uar_sz[0x6];
-	u8 reserved_at_250[0x8];
-	u8 log_pg_sz[0x8];
-	u8 bf[0x1];
-	u8 driver_version[0x1];
-	u8 pad_tx_eth_packet[0x1];
-	u8 reserved_at_263[0x8];
-	u8 log_bf_reg_size[0x5];
-	u8 reserved_at_270[0xb];
-	u8 lag_master[0x1];
-	u8 num_lag_ports[0x4];
-	u8 reserved_at_280[0x10];
-	u8 max_wqe_sz_sq[0x10];
-	u8 reserved_at_2a0[0x10];
-	u8 max_wqe_sz_rq[0x10];
-	u8 max_flow_counter_31_16[0x10];
-	u8 max_wqe_sz_sq_dc[0x10];
-	u8 reserved_at_2e0[0x7];
-	u8 max_qp_mcg[0x19];
-	u8 reserved_at_300[0x10];
-	u8 flow_counter_bulk_alloc[0x08];
-	u8 log_max_mcg[0x8];
-	u8 reserved_at_320[0x3];
-	u8 log_max_transport_domain[0x5];
-	u8 reserved_at_328[0x3];
-	u8 log_max_pd[0x5];
-	u8 reserved_at_330[0xb];
-	u8 log_max_xrcd[0x5];
-	u8 nic_receive_steering_discard[0x1];
-	u8 receive_discard_vport_down[0x1];
-	u8 transmit_discard_vport_down[0x1];
-	u8 reserved_at_343[0x5];
-	u8 log_max_flow_counter_bulk[0x8];
-	u8 max_flow_counter_15_0[0x10];
-	u8 modify_tis[0x1];
-	u8 flow_counters_dump[0x1];
-	u8 reserved_at_360[0x1];
-	u8 log_max_rq[0x5];
-	u8 reserved_at_368[0x3];
-	u8 log_max_sq[0x5];
-	u8 reserved_at_370[0x3];
-	u8 log_max_tir[0x5];
-	u8 reserved_at_378[0x3];
-	u8 log_max_tis[0x5];
-	u8 basic_cyclic_rcv_wqe[0x1];
-	u8 reserved_at_381[0x2];
-	u8 log_max_rmp[0x5];
-	u8 reserved_at_388[0x3];
-	u8 log_max_rqt[0x5];
-	u8 reserved_at_390[0x3];
-	u8 log_max_rqt_size[0x5];
-	u8 reserved_at_398[0x3];
-	u8 log_max_tis_per_sq[0x5];
-	u8 ext_stride_num_range[0x1];
-	u8 reserved_at_3a1[0x2];
-	u8 log_max_stride_sz_rq[0x5];
-	u8 reserved_at_3a8[0x3];
-	u8 log_min_stride_sz_rq[0x5];
-	u8 reserved_at_3b0[0x3];
-	u8 log_max_stride_sz_sq[0x5];
-	u8 reserved_at_3b8[0x3];
-	u8 log_min_stride_sz_sq[0x5];
-	u8 hairpin[0x1];
-	u8 reserved_at_3c1[0x2];
-	u8 log_max_hairpin_queues[0x5];
-	u8 reserved_at_3c8[0x3];
-	u8 log_max_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3d0[0x3];
-	u8 log_max_hairpin_num_packets[0x5];
-	u8 reserved_at_3d8[0x3];
-	u8 log_max_wq_sz[0x5];
-	u8 nic_vport_change_event[0x1];
-	u8 disable_local_lb_uc[0x1];
-	u8 disable_local_lb_mc[0x1];
-	u8 log_min_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3e8[0x3];
-	u8 log_max_vlan_list[0x5];
-	u8 reserved_at_3f0[0x3];
-	u8 log_max_current_mc_list[0x5];
-	u8 reserved_at_3f8[0x3];
-	u8 log_max_current_uc_list[0x5];
-	u8 general_obj_types[0x40];
-	u8 reserved_at_440[0x20];
-	u8 reserved_at_460[0x10];
-	u8 max_num_eqs[0x10];
-	u8 reserved_at_480[0x3];
-	u8 log_max_l2_table[0x5];
-	u8 reserved_at_488[0x8];
-	u8 log_uar_page_sz[0x10];
-	u8 reserved_at_4a0[0x20];
-	u8 device_frequency_mhz[0x20];
-	u8 device_frequency_khz[0x20];
-	u8 reserved_at_500[0x20];
-	u8 num_of_uars_per_page[0x20];
-	u8 flex_parser_protocols[0x20];
-	u8 reserved_at_560[0x20];
-	u8 reserved_at_580[0x3c];
-	u8 mini_cqe_resp_stride_index[0x1];
-	u8 cqe_128_always[0x1];
-	u8 cqe_compression_128[0x1];
-	u8 cqe_compression[0x1];
-	u8 cqe_compression_timeout[0x10];
-	u8 cqe_compression_max_num[0x10];
-	u8 reserved_at_5e0[0x10];
-	u8 tag_matching[0x1];
-	u8 rndv_offload_rc[0x1];
-	u8 rndv_offload_dc[0x1];
-	u8 log_tag_matching_list_sz[0x5];
-	u8 reserved_at_5f8[0x3];
-	u8 log_max_xrq[0x5];
-	u8 affiliate_nic_vport_criteria[0x8];
-	u8 native_port_num[0x8];
-	u8 num_vhca_ports[0x8];
-	u8 reserved_at_618[0x6];
-	u8 sw_owner_id[0x1];
-	u8 reserved_at_61f[0x1e1];
-};
-
-struct mlx5_ifc_qos_cap_bits {
-	u8 packet_pacing[0x1];
-	u8 esw_scheduling[0x1];
-	u8 esw_bw_share[0x1];
-	u8 esw_rate_limit[0x1];
-	u8 reserved_at_4[0x1];
-	u8 packet_pacing_burst_bound[0x1];
-	u8 packet_pacing_typical_size[0x1];
-	u8 flow_meter_srtcm[0x1];
-	u8 reserved_at_8[0x8];
-	u8 log_max_flow_meter[0x8];
-	u8 flow_meter_reg_id[0x8];
-	u8 reserved_at_25[0x8];
-	u8 flow_meter_reg_share[0x1];
-	u8 reserved_at_2e[0x17];
-	u8 packet_pacing_max_rate[0x20];
-	u8 packet_pacing_min_rate[0x20];
-	u8 reserved_at_80[0x10];
-	u8 packet_pacing_rate_table_size[0x10];
-	u8 esw_element_type[0x10];
-	u8 esw_tsar_type[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 max_qos_para_vport[0x10];
-	u8 max_tsar_bw_share[0x20];
-	u8 reserved_at_100[0x6e8];
-};
-
-struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
-	u8 csum_cap[0x1];
-	u8 vlan_cap[0x1];
-	u8 lro_cap[0x1];
-	u8 lro_psh_flag[0x1];
-	u8 lro_time_stamp[0x1];
-	u8 lro_max_msg_sz_mode[0x2];
-	u8 wqe_vlan_insert[0x1];
-	u8 self_lb_en_modifiable[0x1];
-	u8 self_lb_mc[0x1];
-	u8 self_lb_uc[0x1];
-	u8 max_lso_cap[0x5];
-	u8 multi_pkt_send_wqe[0x2];
-	u8 wqe_inline_mode[0x2];
-	u8 rss_ind_tbl_cap[0x4];
-	u8 reg_umr_sq[0x1];
-	u8 scatter_fcs[0x1];
-	u8 enhanced_multi_pkt_send_wqe[0x1];
-	u8 tunnel_lso_const_out_ip_id[0x1];
-	u8 tunnel_lro_gre[0x1];
-	u8 tunnel_lro_vxlan[0x1];
-	u8 tunnel_stateless_gre[0x1];
-	u8 tunnel_stateless_vxlan[0x1];
-	u8 swp[0x1];
-	u8 swp_csum[0x1];
-	u8 swp_lso[0x1];
-	u8 reserved_at_23[0x8];
-	u8 tunnel_stateless_gtp[0x1];
-	u8 reserved_at_25[0x4];
-	u8 max_vxlan_udp_ports[0x8];
-	u8 reserved_at_38[0x6];
-	u8 max_geneve_opt_len[0x1];
-	u8 tunnel_stateless_geneve_rx[0x1];
-	u8 reserved_at_40[0x10];
-	u8 lro_min_mss_size[0x10];
-	u8 reserved_at_60[0x120];
-	u8 lro_timer_supported_periods[4][0x20];
-	u8 reserved_at_200[0x600];
-};
-
-union mlx5_ifc_hca_cap_union_bits {
-	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
-	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
-	       per_protocol_networking_offload_caps;
-	struct mlx5_ifc_qos_cap_bits qos_cap;
-	u8 reserved_at_0[0x8000];
-};
-
-struct mlx5_ifc_query_hca_cap_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	union mlx5_ifc_hca_cap_union_bits capability;
-};
-
-struct mlx5_ifc_query_hca_cap_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_mac_address_layout_bits {
-	u8 reserved_at_0[0x10];
-	u8 mac_addr_47_32[0x10];
-	u8 mac_addr_31_0[0x20];
-};
-
-struct mlx5_ifc_nic_vport_context_bits {
-	u8 reserved_at_0[0x5];
-	u8 min_wqe_inline_mode[0x3];
-	u8 reserved_at_8[0x15];
-	u8 disable_mc_local_lb[0x1];
-	u8 disable_uc_local_lb[0x1];
-	u8 roce_en[0x1];
-	u8 arm_change_event[0x1];
-	u8 reserved_at_21[0x1a];
-	u8 event_on_mtu[0x1];
-	u8 event_on_promisc_change[0x1];
-	u8 event_on_vlan_change[0x1];
-	u8 event_on_mc_address_change[0x1];
-	u8 event_on_uc_address_change[0x1];
-	u8 reserved_at_40[0xc];
-	u8 affiliation_criteria[0x4];
-	u8 affiliated_vhca_id[0x10];
-	u8 reserved_at_60[0xd0];
-	u8 mtu[0x10];
-	u8 system_image_guid[0x40];
-	u8 port_guid[0x40];
-	u8 node_guid[0x40];
-	u8 reserved_at_200[0x140];
-	u8 qkey_violation_counter[0x10];
-	u8 reserved_at_350[0x430];
-	u8 promisc_uc[0x1];
-	u8 promisc_mc[0x1];
-	u8 promisc_all[0x1];
-	u8 reserved_at_783[0x2];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_788[0xc];
-	u8 allowed_list_size[0xc];
-	struct mlx5_ifc_mac_address_layout_bits permanent_address;
-	u8 reserved_at_7e0[0x20];
-};
-
-struct mlx5_ifc_query_nic_vport_context_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
-};
-
-struct mlx5_ifc_query_nic_vport_context_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 other_vport[0x1];
-	u8 reserved_at_41[0xf];
-	u8 vport_number[0x10];
-	u8 reserved_at_60[0x5];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_68[0x18];
-};
-
-struct mlx5_ifc_tisc_bits {
-	u8 strict_lag_tx_port_affinity[0x1];
-	u8 reserved_at_1[0x3];
-	u8 lag_tx_port_affinity[0x04];
-	u8 reserved_at_8[0x4];
-	u8 prio[0x4];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x100];
-	u8 reserved_at_120[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_140[0x8];
-	u8 underlay_qpn[0x18];
-	u8 reserved_at_160[0x3a0];
-};
-
-struct mlx5_ifc_query_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_tisc_bits tis_context;
-};
-
-struct mlx5_ifc_query_tis_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-enum {
-	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
-	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
-	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
-	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
-};
-
-enum {
-	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
-	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
-};
-
-struct mlx5_ifc_wq_bits {
-	u8 wq_type[0x4];
-	u8 wq_signature[0x1];
-	u8 end_padding_mode[0x2];
-	u8 cd_slave[0x1];
-	u8 reserved_at_8[0x18];
-	u8 hds_skip_first_sge[0x1];
-	u8 log2_hds_buf_size[0x3];
-	u8 reserved_at_24[0x7];
-	u8 page_offset[0x5];
-	u8 lwm[0x10];
-	u8 reserved_at_40[0x8];
-	u8 pd[0x18];
-	u8 reserved_at_60[0x8];
-	u8 uar_page[0x18];
-	u8 dbr_addr[0x40];
-	u8 hw_counter[0x20];
-	u8 sw_counter[0x20];
-	u8 reserved_at_100[0xc];
-	u8 log_wq_stride[0x4];
-	u8 reserved_at_110[0x3];
-	u8 log_wq_pg_sz[0x5];
-	u8 reserved_at_118[0x3];
-	u8 log_wq_sz[0x5];
-	u8 dbr_umem_valid[0x1];
-	u8 wq_umem_valid[0x1];
-	u8 reserved_at_122[0x1];
-	u8 log_hairpin_num_packets[0x5];
-	u8 reserved_at_128[0x3];
-	u8 log_hairpin_data_sz[0x5];
-	u8 reserved_at_130[0x4];
-	u8 single_wqe_log_num_of_strides[0x4];
-	u8 two_byte_shift_en[0x1];
-	u8 reserved_at_139[0x4];
-	u8 single_stride_log_num_of_bytes[0x3];
-	u8 dbr_umem_id[0x20];
-	u8 wq_umem_id[0x20];
-	u8 wq_umem_offset[0x40];
-	u8 reserved_at_1c0[0x440];
-};
-
-enum {
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
-};
-
-enum {
-	MLX5_RQC_STATE_RST  = 0x0,
-	MLX5_RQC_STATE_RDY  = 0x1,
-	MLX5_RQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_rqc_bits {
-	u8 rlky[0x1];
-	u8 delay_drop_en[0x1];
-	u8 scatter_fcs[0x1];
-	u8 vsd[0x1];
-	u8 mem_rq_type[0x4];
-	u8 state[0x4];
-	u8 reserved_at_c[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 counter_set_id[0x8];
-	u8 reserved_at_68[0x18];
-	u8 reserved_at_80[0x8];
-	u8 rmpn[0x18];
-	u8 reserved_at_a0[0x8];
-	u8 hairpin_peer_sq[0x18];
-	u8 reserved_at_c0[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_e0[0xa0];
-	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
-};
-
-struct mlx5_ifc_create_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-struct mlx5_ifc_modify_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_create_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tis_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tisc_bits ctx;
-};
-
-enum {
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
-};
-
-struct mlx5_ifc_modify_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 rq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-enum {
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
-};
-
-struct mlx5_ifc_rx_hash_field_select_bits {
-	u8 l3_prot_type[0x1];
-	u8 l4_prot_type[0x1];
-	u8 selected_fields[0x1e];
-};
-
-enum {
-	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
-	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
-};
-
-enum {
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
-};
-
-enum {
-	MLX5_RX_HASH_FN_NONE           = 0x0,
-	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
-	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
-};
-
-enum {
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
-};
-
-enum {
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
-};
-
-struct mlx5_ifc_tirc_bits {
-	u8 reserved_at_0[0x20];
-	u8 disp_type[0x4];
-	u8 reserved_at_24[0x1c];
-	u8 reserved_at_40[0x40];
-	u8 reserved_at_80[0x4];
-	u8 lro_timeout_period_usecs[0x10];
-	u8 lro_enable_mask[0x4];
-	u8 lro_max_msg_sz[0x8];
-	u8 reserved_at_a0[0x40];
-	u8 reserved_at_e0[0x8];
-	u8 inline_rqn[0x18];
-	u8 rx_hash_symmetric[0x1];
-	u8 reserved_at_101[0x1];
-	u8 tunneled_offload_en[0x1];
-	u8 reserved_at_103[0x5];
-	u8 indirect_table[0x18];
-	u8 rx_hash_fn[0x4];
-	u8 reserved_at_124[0x2];
-	u8 self_lb_block[0x2];
-	u8 transport_domain[0x18];
-	u8 rx_hash_toeplitz_key[10][0x20];
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
-	u8 reserved_at_2c0[0x4c0];
-};
-
-struct mlx5_ifc_create_tir_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tirn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tir_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tirc_bits ctx;
-};
-
-struct mlx5_ifc_rq_num_bits {
-	u8 reserved_at_0[0x8];
-	u8 rq_num[0x18];
-};
-
-struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
-	u8 rqt_max_size[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 rqt_actual_size[0x10];
-	u8 reserved_at_e0[0x6a0];
-	struct mlx5_ifc_rq_num_bits rq_num[];
-};
-
-struct mlx5_ifc_create_rqt_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqtn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-struct mlx5_ifc_create_rqt_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqtc_bits rqt_context;
-};
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-enum {
-	MLX5_SQC_STATE_RST  = 0x0,
-	MLX5_SQC_STATE_RDY  = 0x1,
-	MLX5_SQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_sqc_bits {
-	u8 rlky[0x1];
-	u8 cd_master[0x1];
-	u8 fre[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 allow_multi_pkt_send_wqe[0x1];
-	u8 min_wqe_inline_mode[0x3];
-	u8 state[0x4];
-	u8 reg_umr[0x1];
-	u8 allow_swp[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 reserved_at_60[0x8];
-	u8 hairpin_peer_rq[0x18];
-	u8 reserved_at_80[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_a0[0x50];
-	u8 packet_pacing_rate_limit_index[0x10];
-	u8 tis_lst_sz[0x10];
-	u8 reserved_at_110[0x10];
-	u8 reserved_at_120[0x40];
-	u8 reserved_at_160[0x8];
-	u8 tis_num_0[0x18];
-	struct mlx5_ifc_wq_bits wq;
-};
-
-struct mlx5_ifc_query_sq_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_modify_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_modify_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 sq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-struct mlx5_ifc_create_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-enum {
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
-};
-
-struct mlx5_ifc_flow_meter_parameters_bits {
-	u8         valid[0x1];			// 00h
-	u8         bucket_overflow[0x1];
-	u8         start_color[0x2];
-	u8         both_buckets_on_green[0x1];
-	u8         meter_mode[0x2];
-	u8         reserved_at_1[0x19];
-	u8         reserved_at_2[0x20]; //04h
-	u8         reserved_at_3[0x3];
-	u8         cbs_exponent[0x5];		// 08h
-	u8         cbs_mantissa[0x8];
-	u8         reserved_at_4[0x3];
-	u8         cir_exponent[0x5];
-	u8         cir_mantissa[0x8];
-	u8         reserved_at_5[0x20];		// 0Ch
-	u8         reserved_at_6[0x3];
-	u8         ebs_exponent[0x5];		// 10h
-	u8         ebs_mantissa[0x8];
-	u8         reserved_at_7[0x3];
-	u8         eir_exponent[0x5];
-	u8         eir_mantissa[0x8];
-	u8         reserved_at_8[0x60];		// 14h-1Ch
-};
-
-/* CQE format mask. */
-#define MLX5E_CQE_FORMAT_MASK 0xc
-
-/* MPW opcode. */
-#define MLX5_OPC_MOD_MPW 0x01
-
-/* Compressed Rx CQE structure. */
-struct mlx5_mini_cqe8 {
-	union {
-		uint32_t rx_hash_result;
-		struct {
-			uint16_t checksum;
-			uint16_t stride_idx;
-		};
-		struct {
-			uint16_t wqe_counter;
-			uint8_t  s_wqe_opcode;
-			uint8_t  reserved;
-		} s_wqe_info;
-	};
-	uint32_t byte_cnt;
-};
-
-/* srTCM PRM flow meter parameters. */
-enum {
-	MLX5_FLOW_COLOR_RED = 0,
-	MLX5_FLOW_COLOR_YELLOW,
-	MLX5_FLOW_COLOR_GREEN,
-	MLX5_FLOW_COLOR_UNDEFINED,
-};
-
-/* Maximum value of srTCM metering parameters. */
-#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
-#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
-#define MLX5_SRTCM_EBS_MAX 0
-
-/* The bits meter color use. */
-#define MLX5_MTR_COLOR_BITS 8
-
-/**
- * Convert a user mark to flow mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_set(uint32_t val)
-{
-	uint32_t ret;
-
-	/*
-	 * Add one to the user value to differentiate un-marked flows from
-	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
-	 * remains untouched.
-	 */
-	if (val != MLX5_FLOW_MARK_DEFAULT)
-		++val;
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	/*
-	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
-	 * word, byte-swapped by the kernel on little-endian systems. In this
-	 * case, left-shifting the resulting big-endian value ensures the
-	 * least significant 24 bits are retained when converting it back.
-	 */
-	ret = rte_cpu_to_be_32(val) >> 8;
-#else
-	ret = val;
-#endif
-	return ret;
-}
-
-/**
- * Convert a mark to user mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_get(uint32_t val)
-{
-	/*
-	 * Subtract one from the retrieved value. It was added by
-	 * mlx5_flow_mark_set() to distinguish unmarked flows.
-	 */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	return (val >> 8) - 1;
-#else
-	return val - 1;
-#endif
-}
-
-#endif /* RTE_PMD_MLX5_PRM_H_ */
 --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 1028264..345ce3a 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -22,8 +22,8 @@
 #include <rte_malloc.h>
 #include <rte_ethdev_driver.h>
 
-#include "mlx5.h"
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_rxtx.h"
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 371b996..e01cbfd 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -30,14 +30,16 @@
 #include <rte_debug.h>
 #include <rte_io.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
+
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5a03556..d8f6671 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -28,13 +28,14 @@
 #include <rte_cycles.h>
 #include <rte_flow.h>
 
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 3f659d2..fb13919 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -31,13 +31,14 @@
 #include <rte_bus_pci.h>
 #include <rte_malloc.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
-#include "mlx5_glue.h"
 
 /* Support tunnel matching. */
 #define MLX5_FLOW_TUNNEL 10
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d85f908..5505762 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -23,13 +23,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #if defined RTE_ARCH_X86_64
 #include "mlx5_rxtx_vec_sse.h"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h
index d8c07f2..82f77e5 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
@@ -9,8 +9,9 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5_autoconf.h"
-#include "mlx5_prm.h"
 
 /* HW checksum offload capabilities of vectorized Tx. */
 #define MLX5_VEC_TX_CKSUM_OFFLOAD_CAP \
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 9e5c6ee..1467a42 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -17,13 +17,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 332e9ac..5b846c1 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #pragma GCC diagnostic ignored "-Wcast-qual"
 
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 07d40d5..6e1b967 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 205e4fe..0ed7170 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,9 +13,9 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5adb4dc..1d2ba8a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -28,13 +28,14 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
-#include "mlx5_utils.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5_defs.h"
+#include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index ebf79b8..c868aee 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -13,8 +13,11 @@
 #include <assert.h>
 #include <errno.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 
+
 /*
  * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
  * Otherwise there would be a type conflict between stdbool and altivec.
@@ -50,81 +53,14 @@
 /* Save and restore errno around argument evaluation. */
 #define ERRNO_SAFE(x) ((errno = (int []){ errno, ((x), 0) }[0]))
 
-/*
- * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
- * manner.
- */
-#define PMD_DRV_LOG_STRIP(a, b) a
-#define PMD_DRV_LOG_OPAREN (
-#define PMD_DRV_LOG_CPAREN )
-#define PMD_DRV_LOG_COMMA ,
-
-/* Return the file name part of a path. */
-static inline const char *
-pmd_drv_log_basename(const char *s)
-{
-	const char *n = s;
-
-	while (*n)
-		if (*(n++) == '/')
-			s = n;
-	return s;
-}
-
 extern int mlx5_logtype;
 
-#define PMD_DRV_LOG___(level, ...) \
-	rte_log(RTE_LOG_ ## level, \
-		mlx5_logtype, \
-		RTE_FMT(MLX5_DRIVER_NAME ": " \
-			RTE_FMT_HEAD(__VA_ARGS__,), \
-		RTE_FMT_TAIL(__VA_ARGS__,)))
-
-/*
- * When debugging is enabled (NDEBUG not defined), file, line and function
- * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
- */
-#ifndef NDEBUG
-
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, "%s:%u: %s(): " __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, \
-		s "\n" PMD_DRV_LOG_COMMA \
-		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
-		__LINE__ PMD_DRV_LOG_COMMA \
-		__func__, \
-		__VA_ARGS__)
-
-#else /* NDEBUG */
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, s "\n", __VA_ARGS__)
-
-#endif /* NDEBUG */
-
 /* Generic printf()-like logging macro with automatic line feed. */
 #define DRV_LOG(level, ...) \
-	PMD_DRV_LOG_(level, \
+	PMD_DRV_LOG_(level, mlx5_logtype, MLX5_DRIVER_NAME, \
 		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
 		PMD_DRV_LOG_CPAREN)
 
-/* claim_zero() does not perform any check when debugging is disabled. */
-#ifndef NDEBUG
-
-#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
-#define claim_zero(...) assert((__VA_ARGS__) == 0)
-#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
-
-#else /* NDEBUG */
-
-#define DEBUG(...) (void)0
-#define claim_zero(...) (__VA_ARGS__)
-#define claim_nonzero(...) (__VA_ARGS__)
-
-#endif /* NDEBUG */
-
 #define INFO(...) DRV_LOG(INFO, __VA_ARGS__)
 #define WARN(...) DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) DRV_LOG(ERR, __VA_ARGS__)
@@ -144,13 +80,6 @@
 	 (((val) & (from)) / ((from) / (to))) : \
 	 (((val) & (from)) * ((to) / (from))))
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
-#define MKSTR(name, ...) \
-	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
-	char name[mkstr_size_##name + 1]; \
-	\
-	snprintf(name, sizeof(name), "" __VA_ARGS__)
-
 /**
  * Return logarithm of the nearest power of two above input value.
  *
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index feac0f1..b0fa31a 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -27,10 +27,11 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 15acf95..45f4cad 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,6 +196,7 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 03/25] common/mlx5: share the mlx5 glue reference
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 01/25] net/mlx5: separate DevX commands interface Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 02/25] drivers: introduce mlx5 common library Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
                       ` (22 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
Both, the mlx5 PMD and the vdpa mlx5 PMD should initialize the glue.
The glue initialization should be only one per process, so all the mlx5
PMDs using the glue should share the same glue object.
Move the glue initialization to be in common/mlx5 library to be
initialized by its constructor only once.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.c | 173 +++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/Makefile         |   9 --
 drivers/net/mlx5/meson.build      |   4 -
 drivers/net/mlx5/mlx5.c           | 172 +------------------------------------
 4 files changed, 173 insertions(+), 185 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 14ebd30..9c88a63 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -2,16 +2,185 @@
  * Copyright 2019 Mellanox Technologies, Ltd
  */
 
+#include <dlfcn.h>
+#include <unistd.h>
+#include <string.h>
+
+#include <rte_errno.h>
+
 #include "mlx5_common.h"
+#include "mlx5_common_utils.h"
+#include "mlx5_glue.h"
 
 
 int mlx5_common_logtype;
 
 
-RTE_INIT(rte_mlx5_common_pmd_init)
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+
+/**
+ * Suffix RTE_EAL_PMD_PATH with "-glue".
+ *
+ * This function performs a sanity check on RTE_EAL_PMD_PATH before
+ * suffixing its last component.
+ *
+ * @param buf[out]
+ *   Output buffer, should be large enough otherwise NULL is returned.
+ * @param size
+ *   Size of @p out.
+ *
+ * @return
+ *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
+ */
+static char *
+mlx5_glue_path(char *buf, size_t size)
+{
+	static const char *const bad[] = { "/", ".", "..", NULL };
+	const char *path = RTE_EAL_PMD_PATH;
+	size_t len = strlen(path);
+	size_t off;
+	int i;
+
+	while (len && path[len - 1] == '/')
+		--len;
+	for (off = len; off && path[off - 1] != '/'; --off)
+		;
+	for (i = 0; bad[i]; ++i)
+		if (!strncmp(path + off, bad[i], (int)(len - off)))
+			goto error;
+	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
+	if (i == -1 || (size_t)i >= size)
+		goto error;
+	return buf;
+error:
+	RTE_LOG(ERR, PMD, "unable to append \"-glue\" to last component of"
+		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"), please"
+		" re-configure DPDK");
+	return NULL;
+}
+#endif
+
+/**
+ * Initialization routine for run-time dependency on rdma-core.
+ */
+RTE_INIT_PRIO(mlx5_glue_init, CLASS)
 {
-	/* Initialize driver log type. */
+	void *handle = NULL;
+
+	/* Initialize common log type. */
 	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
 	if (mlx5_common_logtype >= 0)
 		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+	/*
+	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+	 * huge pages. Calling ibv_fork_init() during init allows
+	 * applications to use fork() safely for purposes other than
+	 * using this PMD, which is not supported in forked processes.
+	 */
+	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+	/* Match the size of Rx completion entry to the size of a cacheline. */
+	if (RTE_CACHE_LINE_SIZE == 128)
+		setenv("MLX5_CQE_SIZE", "128", 0);
+	/*
+	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
+	 * cleanup all the Verbs resources even when the device was removed.
+	 */
+	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
+	/* The glue initialization was done earlier by mlx5 common library. */
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
+	const char *path[] = {
+		/*
+		 * A basic security check is necessary before trusting
+		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
+		 */
+		(geteuid() == getuid() && getegid() == getgid() ?
+		 getenv("MLX5_GLUE_PATH") : NULL),
+		/*
+		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
+		 * variant, otherwise let dlopen() look up libraries on its
+		 * own.
+		 */
+		(*RTE_EAL_PMD_PATH ?
+		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
+	};
+	unsigned int i = 0;
+	void **sym;
+	const char *dlmsg;
+
+	while (!handle && i != RTE_DIM(path)) {
+		const char *end;
+		size_t len;
+		int ret;
+
+		if (!path[i]) {
+			++i;
+			continue;
+		}
+		end = strpbrk(path[i], ":;");
+		if (!end)
+			end = path[i] + strlen(path[i]);
+		len = end - path[i];
+		ret = 0;
+		do {
+			char name[ret + 1];
+
+			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
+				       (int)len, path[i],
+				       (!len || *(end - 1) == '/') ? "" : "/");
+			if (ret == -1)
+				break;
+			if (sizeof(name) != (size_t)ret + 1)
+				continue;
+			DRV_LOG(DEBUG, "Looking for rdma-core glue as "
+				"\"%s\"", name);
+			handle = dlopen(name, RTLD_LAZY);
+			break;
+		} while (1);
+		path[i] = end + 1;
+		if (!*end)
+			++i;
+	}
+	if (!handle) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(WARNING, "Cannot load glue library: %s", dlmsg);
+		goto glue_error;
+	}
+	sym = dlsym(handle, "mlx5_glue");
+	if (!sym || !*sym) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(ERR, "Cannot resolve glue symbol: %s", dlmsg);
+		goto glue_error;
+	}
+	mlx5_glue = *sym;
+#endif /* RTE_IBVERBS_LINK_DLOPEN */
+#ifndef NDEBUG
+	/* Glue structure must not contain any NULL pointers. */
+	{
+		unsigned int i;
+
+		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
+			assert(((const void *const *)mlx5_glue)[i]);
+	}
+#endif
+	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
+		rte_errno = EINVAL;
+		DRV_LOG(ERR, "rdma-core glue \"%s\" mismatch: \"%s\" is "
+			"required", mlx5_glue->version, MLX5_GLUE_VERSION);
+		goto glue_error;
+	}
+	mlx5_glue->fork_init();
+	return;
+glue_error:
+	if (handle)
+		dlclose(handle);
+	DRV_LOG(WARNING, "Cannot initialize MLX5 common due to missing"
+		" run-time dependency on rdma-core libraries (libibverbs,"
+		" libmlx5)");
+	mlx5_glue = NULL;
+	return;
 }
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a9558ca..dc6b3c8 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -6,15 +6,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx5.a
-LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
-LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 20.02.0
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-LDLIBS += -ldl
-endif
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index f6d0db9..e10ef3a 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -8,10 +8,6 @@ if not is_linux
 	subdir_done()
 endif
 
-LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
-LIB_GLUE_VERSION = '20.02.0'
-LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-
 allow_experimental_apis = true
 deps += ['hash', 'common_mlx5']
 sources = files(
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7cf357d..8fbe826 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -7,7 +7,6 @@
 #include <unistd.h>
 #include <string.h>
 #include <assert.h>
-#include <dlfcn.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <errno.h>
@@ -3505,138 +3504,6 @@ struct mlx5_flow_id_pool *
 		     RTE_PCI_DRV_PROBE_AGAIN,
 };
 
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-
-/**
- * Suffix RTE_EAL_PMD_PATH with "-glue".
- *
- * This function performs a sanity check on RTE_EAL_PMD_PATH before
- * suffixing its last component.
- *
- * @param buf[out]
- *   Output buffer, should be large enough otherwise NULL is returned.
- * @param size
- *   Size of @p out.
- *
- * @return
- *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
- */
-static char *
-mlx5_glue_path(char *buf, size_t size)
-{
-	static const char *const bad[] = { "/", ".", "..", NULL };
-	const char *path = RTE_EAL_PMD_PATH;
-	size_t len = strlen(path);
-	size_t off;
-	int i;
-
-	while (len && path[len - 1] == '/')
-		--len;
-	for (off = len; off && path[off - 1] != '/'; --off)
-		;
-	for (i = 0; bad[i]; ++i)
-		if (!strncmp(path + off, bad[i], (int)(len - off)))
-			goto error;
-	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
-	if (i == -1 || (size_t)i >= size)
-		goto error;
-	return buf;
-error:
-	DRV_LOG(ERR,
-		"unable to append \"-glue\" to last component of"
-		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),"
-		" please re-configure DPDK");
-	return NULL;
-}
-
-/**
- * Initialization routine for run-time dependency on rdma-core.
- */
-static int
-mlx5_glue_init(void)
-{
-	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
-	const char *path[] = {
-		/*
-		 * A basic security check is necessary before trusting
-		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
-		 */
-		(geteuid() == getuid() && getegid() == getgid() ?
-		 getenv("MLX5_GLUE_PATH") : NULL),
-		/*
-		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
-		 * variant, otherwise let dlopen() look up libraries on its
-		 * own.
-		 */
-		(*RTE_EAL_PMD_PATH ?
-		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
-	};
-	unsigned int i = 0;
-	void *handle = NULL;
-	void **sym;
-	const char *dlmsg;
-
-	while (!handle && i != RTE_DIM(path)) {
-		const char *end;
-		size_t len;
-		int ret;
-
-		if (!path[i]) {
-			++i;
-			continue;
-		}
-		end = strpbrk(path[i], ":;");
-		if (!end)
-			end = path[i] + strlen(path[i]);
-		len = end - path[i];
-		ret = 0;
-		do {
-			char name[ret + 1];
-
-			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
-				       (int)len, path[i],
-				       (!len || *(end - 1) == '/') ? "" : "/");
-			if (ret == -1)
-				break;
-			if (sizeof(name) != (size_t)ret + 1)
-				continue;
-			DRV_LOG(DEBUG, "looking for rdma-core glue as \"%s\"",
-				name);
-			handle = dlopen(name, RTLD_LAZY);
-			break;
-		} while (1);
-		path[i] = end + 1;
-		if (!*end)
-			++i;
-	}
-	if (!handle) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(WARNING, "cannot load glue library: %s", dlmsg);
-		goto glue_error;
-	}
-	sym = dlsym(handle, "mlx5_glue");
-	if (!sym || !*sym) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(ERR, "cannot resolve glue symbol: %s", dlmsg);
-		goto glue_error;
-	}
-	mlx5_glue = *sym;
-	return 0;
-glue_error:
-	if (handle)
-		dlclose(handle);
-	DRV_LOG(WARNING,
-		"cannot initialize PMD due to missing run-time dependency on"
-		" rdma-core libraries (libibverbs, libmlx5)");
-	return -rte_errno;
-}
-
-#endif
-
 /**
  * Driver initialization routine.
  */
@@ -3651,43 +3518,8 @@ struct mlx5_flow_id_pool *
 	mlx5_set_ptype_table();
 	mlx5_set_cksum_table();
 	mlx5_set_swp_types_table();
-	/*
-	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
-	 * huge pages. Calling ibv_fork_init() during init allows
-	 * applications to use fork() safely for purposes other than
-	 * using this PMD, which is not supported in forked processes.
-	 */
-	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
-	/* Match the size of Rx completion entry to the size of a cacheline. */
-	if (RTE_CACHE_LINE_SIZE == 128)
-		setenv("MLX5_CQE_SIZE", "128", 0);
-	/*
-	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
-	 * cleanup all the Verbs resources even when the device was removed.
-	 */
-	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-	if (mlx5_glue_init())
-		return;
-	assert(mlx5_glue);
-#endif
-#ifndef NDEBUG
-	/* Glue structure must not contain any NULL pointers. */
-	{
-		unsigned int i;
-
-		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
-			assert(((const void *const *)mlx5_glue)[i]);
-	}
-#endif
-	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
-		DRV_LOG(ERR,
-			"rdma-core glue \"%s\" mismatch: \"%s\" is required",
-			mlx5_glue->version, MLX5_GLUE_VERSION);
-		return;
-	}
-	mlx5_glue->fork_init();
-	rte_pci_register(&mlx5_driver);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_driver);
 }
 
 RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 04/25] common/mlx5: share mlx5 PCI device detection
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (2 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 05/25] common/mlx5: share mlx5 devices information Matan Azrad
                       ` (21 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move PCI detection by IB device from mlx5 PMD to the common code.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 55 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  4 ++
 drivers/common/mlx5/rte_common_mlx5_version.map |  2 +
 drivers/net/mlx5/mlx5.c                         |  1 +
 drivers/net/mlx5/mlx5.h                         |  2 -
 drivers/net/mlx5/mlx5_ethdev.c                  | 53 +-----------------------
 drivers/net/mlx5/mlx5_rxtx.c                    |  1 +
 drivers/net/mlx5/mlx5_stats.c                   |  3 ++
 9 files changed, 68 insertions(+), 55 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b94d3c0..66585b2 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_pci
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 9c88a63..2381208 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -5,6 +5,9 @@
 #include <dlfcn.h>
 #include <unistd.h>
 #include <string.h>
+#include <stdio.h>
+
+#include <rte_errno.h>
 
 #include <rte_errno.h>
 
@@ -16,6 +19,58 @@
 int mlx5_common_logtype;
 
 
+/**
+ * Get PCI information by sysfs device path.
+ *
+ * @param dev_path
+ *   Pointer to device sysfs folder name.
+ * @param[out] pci_addr
+ *   PCI bus address output buffer.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_to_pci_addr(const char *dev_path,
+		     struct rte_pci_addr *pci_addr)
+{
+	FILE *file;
+	char line[32];
+	MKSTR(path, "%s/device/uevent", dev_path);
+
+	file = fopen(path, "rb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	while (fgets(line, sizeof(line), file) == line) {
+		size_t len = strlen(line);
+		int ret;
+
+		/* Truncate long lines. */
+		if (len == (sizeof(line) - 1))
+			while (line[(len - 1)] != '\n') {
+				ret = fgetc(file);
+				if (ret == EOF)
+					break;
+				line[(len - 1)] = ret;
+			}
+		/* Extract information. */
+		if (sscanf(line,
+			   "PCI_SLOT_NAME="
+			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
+			   &pci_addr->domain,
+			   &pci_addr->bus,
+			   &pci_addr->devid,
+			   &pci_addr->function) == 4) {
+			ret = 0;
+			break;
+		}
+	}
+	fclose(file);
+	return 0;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9f10def..107ab8d 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -6,7 +6,9 @@
 #define RTE_PMD_MLX5_COMMON_H_
 
 #include <assert.h>
+#include <stdio.h>
 
+#include <rte_pci.h>
 #include <rte_log.h>
 
 
@@ -84,4 +86,6 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index e4f85e2..0c01172 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,4 +17,6 @@ DPDK_20.02 {
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
 	mlx5_devx_get_out_command_status;
+
+	mlx5_dev_to_pci_addr;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8fbe826..d0fa2da 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -39,6 +39,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 872fccb..261a8fc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -655,8 +655,6 @@ int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
 int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
-int mlx5_dev_to_pci_addr(const char *dev_path,
-			 struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
 void mlx5_dev_interrupt_handler_devx(void *arg);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index eddf888..2628e64 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
@@ -1212,58 +1213,6 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get PCI information by sysfs device path.
- *
- * @param dev_path
- *   Pointer to device sysfs folder name.
- * @param[out] pci_addr
- *   PCI bus address output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_dev_to_pci_addr(const char *dev_path,
-		     struct rte_pci_addr *pci_addr)
-{
-	FILE *file;
-	char line[32];
-	MKSTR(path, "%s/device/uevent", dev_path);
-
-	file = fopen(path, "rb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	while (fgets(line, sizeof(line), file) == line) {
-		size_t len = strlen(line);
-		int ret;
-
-		/* Truncate long lines. */
-		if (len == (sizeof(line) - 1))
-			while (line[(len - 1)] != '\n') {
-				ret = fgetc(file);
-				if (ret == EOF)
-					break;
-				line[(len - 1)] = ret;
-			}
-		/* Extract information. */
-		if (sscanf(line,
-			   "PCI_SLOT_NAME="
-			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
-			   &pci_addr->domain,
-			   &pci_addr->bus,
-			   &pci_addr->devid,
-			   &pci_addr->function) == 4) {
-			ret = 0;
-			break;
-		}
-	}
-	fclose(file);
-	return 0;
-}
-
-/**
  * Handle asynchronous removal event for entire multiport device.
  *
  * @param sh
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index d8f6671..b14ae31 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 0ed7170..4c69e77 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,10 +13,13 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 
+
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
 		.dpdk_name = "rx_port_unicast_bytes",
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 05/25] common/mlx5: share mlx5 devices information
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (3 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 06/25] common/mlx5: share CQ entry check Matan Azrad
                       ` (20 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move the vendor information, vendor ID and device IDs from net/mlx5 PMD
to the common mlx5 file.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 21 +++++++++++++++++++++
 drivers/net/mlx5/mlx5.h           | 21 ---------------------
 drivers/net/mlx5/mlx5_txq.c       |  1 +
 3 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 107ab8d..0f57a27 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -86,6 +86,27 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+enum {
+	PCI_VENDOR_ID_MELLANOX = 0x15b3,
+};
+
+enum {
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
+};
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 261a8fc..3daf0db 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -41,27 +41,6 @@
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
 
-enum {
-	PCI_VENDOR_ID_MELLANOX = 0x15b3,
-};
-
-enum {
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
-};
-
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1d2ba8a..7bff769 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 06/25] common/mlx5: share CQ entry check
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (4 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 05/25] common/mlx5: share mlx5 devices information Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
                       ` (19 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The CQE has owner bit to indicate if it is in SW control or HW.
Share a CQE check for all the mlx5 drivers.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 41 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      | 39 +------------------------------------
 2 files changed, 42 insertions(+), 38 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 0f57a27..9d464d4 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -9,8 +9,11 @@
 #include <stdio.h>
 
 #include <rte_pci.h>
+#include <rte_atomic.h>
 #include <rte_log.h>
 
+#include "mlx5_prm.h"
+
 
 /*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
@@ -107,6 +110,44 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* CQE status. */
+enum mlx5_cqe_status {
+	MLX5_CQE_STATUS_SW_OWN = -1,
+	MLX5_CQE_STATUS_HW_OWN = -2,
+	MLX5_CQE_STATUS_ERR = -3,
+};
+
+/**
+ * Check whether CQE is valid.
+ *
+ * @param cqe
+ *   Pointer to CQE.
+ * @param cqes_n
+ *   Size of completion queue.
+ * @param ci
+ *   Consumer index.
+ *
+ * @return
+ *   The CQE status.
+ */
+static __rte_always_inline enum mlx5_cqe_status
+check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
+	  const uint16_t ci)
+{
+	const uint16_t idx = ci & cqes_n;
+	const uint8_t op_own = cqe->op_own;
+	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
+	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
+
+	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
+		return MLX5_CQE_STATUS_HW_OWN;
+	rte_cio_rmb();
+	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
+		     op_code == MLX5_CQE_REQ_ERR))
+		return MLX5_CQE_STATUS_ERR;
+	return MLX5_CQE_STATUS_SW_OWN;
+}
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index fb13919..c2cd23b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -33,6 +33,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
@@ -549,44 +550,6 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 #define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
 #endif
 
-/* CQE status. */
-enum mlx5_cqe_status {
-	MLX5_CQE_STATUS_SW_OWN = -1,
-	MLX5_CQE_STATUS_HW_OWN = -2,
-	MLX5_CQE_STATUS_ERR = -3,
-};
-
-/**
- * Check whether CQE is valid.
- *
- * @param cqe
- *   Pointer to CQE.
- * @param cqes_n
- *   Size of completion queue.
- * @param ci
- *   Consumer index.
- *
- * @return
- *   The CQE status.
- */
-static __rte_always_inline enum mlx5_cqe_status
-check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
-	  const uint16_t ci)
-{
-	const uint16_t idx = ci & cqes_n;
-	const uint8_t op_own = cqe->op_own;
-	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
-	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
-
-	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
-		return MLX5_CQE_STATUS_HW_OWN;
-	rte_cio_rmb();
-	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
-		     op_code == MLX5_CQE_REQ_ERR))
-		return MLX5_CQE_STATUS_ERR;
-	return MLX5_CQE_STATUS_SW_OWN;
-}
-
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
  * cloned mbuf is allocated is returned instead.
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 07/25] common/mlx5: add query vDPA DevX capabilities
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (5 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 06/25] common/mlx5: share CQ entry check Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 08/25] common/mlx5: glue null memory region allocation Matan Azrad
                       ` (18 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add the DevX capabilities for vDPA configuration and information of
Mellanox devices.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 90 ++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 24 ++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 45 ++++++++++++++++++
 3 files changed, 159 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 4d94f92..3a10ff0 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -285,6 +285,91 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Query NIC vDPA attributes.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] vdpa_attr
+ *   vDPA Attributes structure to fill.
+ */
+static void
+mlx5_devx_cmd_query_hca_vdpa_attr(struct ibv_context *ctx,
+				  struct mlx5_hca_vdpa_attr *vdpa_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+	rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out, sizeof(out));
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (rc || status) {
+		RTE_LOG(DEBUG, PMD, "Failed to query devx VDPA capabilities,"
+			" status %x, syndrome = %x", status, syndrome);
+		vdpa_attr->valid = 0;
+	} else {
+		vdpa_attr->valid = 1;
+		vdpa_attr->desc_tunnel_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 desc_tunnel_offload_type);
+		vdpa_attr->eth_frame_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 eth_frame_offload_type);
+		vdpa_attr->virtio_version_1_0 =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_version_1_0);
+		vdpa_attr->tso_ipv4 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv4);
+		vdpa_attr->tso_ipv6 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv6);
+		vdpa_attr->tx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      tx_csum);
+		vdpa_attr->rx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      rx_csum);
+		vdpa_attr->event_mode = MLX5_GET(virtio_emulation_cap, hcattr,
+						 event_mode);
+		vdpa_attr->virtio_queue_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_queue_type);
+		vdpa_attr->log_doorbell_stride =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_stride);
+		vdpa_attr->log_doorbell_bar_size =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_bar_size);
+		vdpa_attr->doorbell_bar_offset =
+			MLX5_GET64(virtio_emulation_cap, hcattr,
+				   doorbell_bar_offset);
+		vdpa_attr->max_num_virtio_queues =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 max_num_virtio_queues);
+		vdpa_attr->umem_1_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_a);
+		vdpa_attr->umem_1_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_b);
+		vdpa_attr->umem_2_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_2_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_b);
+	}
+}
+
+/**
  * Query HCA attributes.
  * Using those attributes we can check on run time if the device
  * is having the required capabilities.
@@ -343,6 +428,9 @@ struct mlx5_devx_obj *
 	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
 					       flex_parser_protocols);
 	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	attr->vdpa.valid = !!(MLX5_GET64(cmd_hca_cap, hcattr,
+					 general_obj_types) &
+			      MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q);
 	if (attr->qos.sup) {
 		MLX5_SET(query_hca_cap_in, in, op_mod,
 			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
@@ -367,6 +455,8 @@ struct mlx5_devx_obj *
 		attr->qos.flow_meter_reg_share =
 			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
 	}
+	if (attr->vdpa.valid)
+		mlx5_devx_cmd_query_hca_vdpa_attr(ctx, &attr->vdpa);
 	if (!attr->eth_net_offloads)
 		return 0;
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 2d58d96..c1c9e99 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -34,6 +34,29 @@ struct mlx5_hca_qos_attr {
 
 };
 
+struct mlx5_hca_vdpa_attr {
+	uint8_t virtio_queue_type;
+	uint32_t valid:1;
+	uint32_t desc_tunnel_offload_type:1;
+	uint32_t eth_frame_offload_type:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t log_doorbell_stride:5;
+	uint32_t log_doorbell_bar_size:5;
+	uint32_t max_num_virtio_queues;
+	uint32_t umem_1_buffer_param_a;
+	uint32_t umem_1_buffer_param_b;
+	uint32_t umem_2_buffer_param_a;
+	uint32_t umem_2_buffer_param_b;
+	uint32_t umem_3_buffer_param_a;
+	uint32_t umem_3_buffer_param_b;
+	uint64_t doorbell_bar_offset;
+};
+
 /* HCA supports this number of time periods for LRO. */
 #define MLX5_LRO_NUM_SUPP_PERIODS 4
 
@@ -62,6 +85,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_num_packets:5;
 	uint32_t vhca_id:16;
 	struct mlx5_hca_qos_attr qos;
+	struct mlx5_hca_vdpa_attr vdpa;
 };
 
 struct mlx5_devx_wq_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 5730ad1..efd6ad4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -881,6 +881,11 @@ enum {
 	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION = 0x13 << 1,
+};
+
+enum {
+	MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q = (1ULL << 0xd),
 };
 
 enum {
@@ -1256,11 +1261,51 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8 reserved_at_200[0x600];
 };
 
+enum {
+	MLX5_VIRTQ_TYPE_SPLIT = 0,
+	MLX5_VIRTQ_TYPE_PACKED = 1,
+};
+
+enum {
+	MLX5_VIRTQ_EVENT_MODE_NO_MSIX = 0,
+	MLX5_VIRTQ_EVENT_MODE_QP = 1,
+	MLX5_VIRTQ_EVENT_MODE_MSIX = 2,
+};
+
+struct mlx5_ifc_virtio_emulation_cap_bits {
+	u8 desc_tunnel_offload_type[0x1];
+	u8 eth_frame_offload_type[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_7[0x1][0x9];
+	u8 event_mode[0x8];
+	u8 virtio_queue_type[0x8];
+	u8 reserved_at_20[0x13];
+	u8 log_doorbell_stride[0x5];
+	u8 reserved_at_3b[0x3];
+	u8 log_doorbell_bar_size[0x5];
+	u8 doorbell_bar_offset[0x40];
+	u8 reserved_at_80[0x8];
+	u8 max_num_virtio_queues[0x18];
+	u8 reserved_at_a0[0x60];
+	u8 umem_1_buffer_param_a[0x20];
+	u8 umem_1_buffer_param_b[0x20];
+	u8 umem_2_buffer_param_a[0x20];
+	u8 umem_2_buffer_param_b[0x20];
+	u8 umem_3_buffer_param_a[0x20];
+	u8 umem_3_buffer_param_b[0x20];
+	u8 reserved_at_1c0[0x620];
+};
+
 union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
 	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
 	       per_protocol_networking_offload_caps;
 	struct mlx5_ifc_qos_cap_bits qos_cap;
+	struct mlx5_ifc_virtio_emulation_cap_bits vdpa_caps;
 	u8 reserved_at_0[0x8000];
 };
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 08/25] common/mlx5: glue null memory region allocation
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (6 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
                       ` (17 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add support for rdma-core API to allocate NULL MR.
When the device HW get a NULL MR address, it will do nothing with the
address, no read and no write.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 13 +++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  1 +
 2 files changed, 14 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index d5bc84e..e75e6bc 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -226,6 +226,18 @@
 	return ibv_reg_mr(pd, addr, length, access);
 }
 
+static struct ibv_mr *
+mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return ibv_alloc_null_mr(pd);
+#else
+	(void)pd;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static int
 mlx5_glue_dereg_mr(struct ibv_mr *mr)
 {
@@ -1070,6 +1082,7 @@
 	.destroy_qp = mlx5_glue_destroy_qp,
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
+	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
 	.destroy_counter_set = mlx5_glue_destroy_counter_set,
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index f4c3180..33afaf4 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -138,6 +138,7 @@ struct mlx5_glue {
 			 int attr_mask);
 	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
 				 size_t length, int access);
+	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
 		(struct ibv_context *context,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 09/25] common/mlx5: support DevX indirect mkey creation
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (7 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 08/25] common/mlx5: glue null memory region allocation Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 10/25] common/mlx5: glue event queue query Matan Azrad
                       ` (16 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add option to create an indirect mkey by the current
mlx5_devx_cmd_mkey_create command.
Indirect mkey points to set of direct mkeys.
By this way, the HW\SW can reference fragmented memory by one object.
Align the net/mlx5 driver usage in the above command.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 43 ++++++++++++++++++++++++++++++------
 drivers/common/mlx5/mlx5_devx_cmds.h | 17 ++++++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 12 ++++++++++
 drivers/net/mlx5/mlx5_flow_dv.c      |  4 ++++
 4 files changed, 69 insertions(+), 7 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 3a10ff0..2197705 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -142,7 +142,11 @@ struct mlx5_devx_obj *
 mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
 			  struct mlx5_devx_mkey_attr *attr)
 {
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	struct mlx5_klm *klm_array = attr->klm_array;
+	int klm_num = attr->klm_num;
+	int in_size_dw = MLX5_ST_SZ_DW(create_mkey_in) +
+		     (klm_num ? RTE_ALIGN(klm_num, 4) : 0) * MLX5_ST_SZ_DW(klm);
+	uint32_t in[in_size_dw];
 	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
 	void *mkc;
 	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
@@ -153,27 +157,52 @@ struct mlx5_devx_obj *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	memset(in, 0, in_size_dw * 4);
 	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
 	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	if (klm_num > 0) {
+		int i;
+		uint8_t *klm = (uint8_t *)MLX5_ADDR_OF(create_mkey_in, in,
+						       klm_pas_mtt);
+		translation_size = RTE_ALIGN(klm_num, 4);
+		for (i = 0; i < klm_num; i++) {
+			MLX5_SET(klm, klm, byte_count, klm_array[i].byte_count);
+			MLX5_SET(klm, klm, mkey, klm_array[i].mkey);
+			MLX5_SET64(klm, klm, address, klm_array[i].address);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		for (; i < (int)translation_size; i++) {
+			MLX5_SET(klm, klm, mkey, 0x0);
+			MLX5_SET64(klm, klm, address, 0x0);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		MLX5_SET(mkc, mkc, access_mode_1_0, attr->log_entity_size ?
+			 MLX5_MKC_ACCESS_MODE_KLM_FBS :
+			 MLX5_MKC_ACCESS_MODE_KLM);
+		MLX5_SET(mkc, mkc, log_page_size, attr->log_entity_size);
+	} else {
+		translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+		MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+		MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	}
 	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
 		 translation_size);
 	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(create_mkey_in, in, pg_access, attr->pg_access);
 	MLX5_SET(mkc, mkc, lw, 0x1);
 	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, attr->pd);
 	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
 	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
 	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
 	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, in_size_dw * 4, out,
 					       sizeof(out));
 	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		DRV_LOG(ERR, "Can't create %sdirect mkey - error %d\n",
+			klm_num ? "an in" : "a ", errno);
 		rte_errno = errno;
 		rte_free(mkey);
 		return NULL;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c1c9e99..c76c172 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -6,6 +6,7 @@
 #define RTE_PMD_MLX5_DEVX_CMDS_H_
 
 #include "mlx5_glue.h"
+#include "mlx5_prm.h"
 
 
 /* devX creation object */
@@ -14,11 +15,26 @@ struct mlx5_devx_obj {
 	int id; /* The object ID. */
 };
 
+/* UMR memory buffer used to define 1 entry in indirect mkey. */
+struct mlx5_klm {
+	uint32_t byte_count;
+	uint32_t mkey;
+	uint64_t address;
+};
+
+/* This is limitation of libibverbs: in length variable type is u16. */
+#define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
+		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
 	uint32_t umem_id;
 	uint32_t pd;
+	uint32_t log_entity_size;
+	uint32_t pg_access:1;
+	struct mlx5_klm *klm_array;
+	int klm_num;
 };
 
 /* HCA qos attributes. */
@@ -216,6 +232,7 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t hairpin_peer_vhca:16;
 };
 
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index efd6ad4..db15bb6 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -726,6 +726,8 @@ enum {
 
 enum {
 	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+	MLX5_MKC_ACCESS_MODE_KLM   = 0x2,
+	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
 /* Flow counters. */
@@ -790,6 +792,16 @@ struct mlx5_ifc_query_flow_counter_in_bits {
 	u8         flow_counter_id[0x20];
 };
 
+#define MLX5_MAX_KLM_BYTE_COUNT 0x80000000u
+#define MLX5_MIN_KLM_FIXED_BUFFER_SIZE 0x1000u
+
+
+struct mlx5_ifc_klm_bits {
+	u8         byte_count[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+};
+
 struct mlx5_ifc_mkc_bits {
 	u8         reserved_at_0[0x1];
 	u8         free[0x1];
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1b31602..5610d94 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3885,6 +3885,10 @@ struct field_modify_info modify_tcp[] = {
 	mkey_attr.size = size;
 	mkey_attr.umem_id = mem_mng->umem->umem_id;
 	mkey_attr.pd = sh->pdn;
+	mkey_attr.log_entity_size = 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = NULL;
+	mkey_attr.klm_num = 0;
 	mem_mng->dm = mlx5_devx_cmd_mkey_create(sh->ctx, &mkey_attr);
 	if (!mem_mng->dm) {
 		mlx5_glue->devx_umem_dereg(mem_mng->umem);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 10/25] common/mlx5: glue event queue query
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (8 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 11/25] common/mlx5: glue event interrupt commands Matan Azrad
                       ` (15 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The event queue is managed only by the kernel.
Add the rdma-core command in glue to query the kernel event queue
details.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  2 ++
 2 files changed, 17 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e75e6bc..fedce77 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1049,6 +1049,20 @@
 #endif
 }
 
+static int
+mlx5_glue_devx_query_eqn(struct ibv_context *ctx, uint32_t cpus,
+			 uint32_t *eqn)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_query_eqn(ctx, cpus, eqn);
+#else
+	(void)ctx;
+	(void)cpus;
+	(void)eqn;
+	return -ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1148,4 +1162,5 @@
 	.devx_qp_query = mlx5_glue_devx_qp_query,
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+	.devx_query_eqn = mlx5_glue_devx_query_eqn,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 33afaf4..fe51f97 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -259,6 +259,8 @@ struct mlx5_glue {
 			       uint32_t port_num,
 			       struct mlx5dv_devx_port *mlx5_devx_port);
 	int (*dr_dump_domain)(FILE *file, void *domain);
+	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
+			      uint32_t *eqn);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 11/25] common/mlx5: glue event interrupt commands
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (9 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 10/25] common/mlx5: glue event queue query Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 12/25] common/mlx5: glue UAR allocation Matan Azrad
                       ` (14 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add the next commands to glue in order to support interrupt event
channel operations associated to events in the EQ:
	devx_create_event_channel,
	devx_destroy_event_channel,
	devx_subscribe_devx_event,
	devx_subscribe_devx_event_fd,
	devx_get_event.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++
 drivers/common/mlx5/meson.build |  2 ++
 drivers/common/mlx5/mlx5_glue.c | 79 +++++++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h | 25 +++++++++++++
 4 files changed, 111 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 66585b2..7110231 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -154,6 +154,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_action_create_dest_devx_tir \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_EVENT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_get_event \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
 		infiniband/mlx5dv.h \
 		func mlx5dv_dr_action_create_flow_meter \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 718cef2..76ca7d7 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -108,6 +108,8 @@ if build
 		'mlx5dv_devx_obj_query_async' ],
 		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_IBV_DEVX_EVENT', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_get_event' ],
 		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_flow_meter' ],
 		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index fedce77..e4eabdb 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1063,6 +1063,80 @@
 #endif
 }
 
+static struct mlx5dv_devx_event_channel *
+mlx5_glue_devx_create_event_channel(struct ibv_context *ctx, int flags)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_create_event_channel(ctx, flags);
+#else
+	(void)ctx;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_event_channel(struct mlx5dv_devx_event_channel *eventc)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	mlx5dv_devx_destroy_event_channel(eventc);
+#else
+	(void)eventc;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event(struct mlx5dv_devx_event_channel *eventc,
+				    struct mlx5dv_devx_obj *obj,
+				    uint16_t events_sz, uint16_t events_num[],
+				    uint64_t cookie)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event(eventc, obj, events_sz,
+						events_num, cookie);
+#else
+	(void)eventc;
+	(void)obj;
+	(void)events_sz;
+	(void)events_num;
+	(void)cookie;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event_fd(struct mlx5dv_devx_event_channel *eventc,
+				       int fd, struct mlx5dv_devx_obj *obj,
+				       uint16_t event_num)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event_fd(eventc, fd, obj, event_num);
+#else
+	(void)eventc;
+	(void)fd;
+	(void)obj;
+	(void)event_num;
+	return -ENOTSUP;
+#endif
+}
+
+static ssize_t
+mlx5_glue_devx_get_event(struct mlx5dv_devx_event_channel *eventc,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_get_event(eventc, event_data, event_resp_len);
+#else
+	(void)eventc;
+	(void)event_data;
+	(void)event_resp_len;
+	errno = ENOTSUP;
+	return -1;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1163,4 +1237,9 @@
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
 	.devx_query_eqn = mlx5_glue_devx_query_eqn,
+	.devx_create_event_channel = mlx5_glue_devx_create_event_channel,
+	.devx_destroy_event_channel = mlx5_glue_devx_destroy_event_channel,
+	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
+	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
+	.devx_get_event = mlx5_glue_devx_get_event,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index fe51f97..6fc00dd 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -86,6 +86,12 @@
 struct mlx5dv_dr_flow_meter_attr;
 #endif
 
+#ifndef HAVE_IBV_DEVX_EVENT
+struct mlx5dv_devx_event_channel { int fd; };
+struct mlx5dv_devx_async_event_hdr;
+#define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -261,6 +267,25 @@ struct mlx5_glue {
 	int (*dr_dump_domain)(FILE *file, void *domain);
 	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
 			      uint32_t *eqn);
+	struct mlx5dv_devx_event_channel *(*devx_create_event_channel)
+				(struct ibv_context *context, int flags);
+	void (*devx_destroy_event_channel)
+			(struct mlx5dv_devx_event_channel *event_channel);
+	int (*devx_subscribe_devx_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t events_sz,
+			 uint16_t events_num[],
+			 uint64_t cookie);
+	int (*devx_subscribe_devx_event_fd)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 int fd,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t event_num);
+	ssize_t (*devx_get_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 12/25] common/mlx5: glue UAR allocation
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (10 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 11/25] common/mlx5: glue event interrupt commands Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
                       ` (13 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The isolated, protected and independent direct access to the HW by
multiple processes is implemented via User Access Region (UAR)
mechanism.
The UAR is part of PCI address space that is mapped for direct access to
the HW from the CPU.
UAR is comprised of multiple pages, each page containing registers that
control the HW operation.
UAR mechanism is used to post execution or control requests to the HW.
It is used by the HW to enforce protection and isolation between
different processes.
Add a glue command to allocate and free an UAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 25 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  4 ++++
 2 files changed, 29 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e4eabdb..5691636 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1137,6 +1137,29 @@
 #endif
 }
 
+static struct mlx5dv_devx_uar *
+mlx5_glue_devx_alloc_uar(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_alloc_uar(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_free_uar(struct mlx5dv_devx_uar *devx_uar)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	mlx5dv_devx_free_uar(devx_uar);
+#else
+	(void)devx_uar;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1242,4 +1265,6 @@
 	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
 	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
 	.devx_get_event = mlx5_glue_devx_get_event,
+	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
+	.devx_free_uar = mlx5_glue_devx_free_uar,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 6fc00dd..7d9256e 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -66,6 +66,7 @@
 #ifndef HAVE_IBV_DEVX_OBJ
 struct mlx5dv_devx_obj;
 struct mlx5dv_devx_umem { uint32_t umem_id; };
+struct mlx5dv_devx_uar { void *reg_addr; void *base_addr; uint32_t page_id; };
 #endif
 
 #ifndef HAVE_IBV_DEVX_ASYNC
@@ -230,6 +231,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
+						  uint32_t flags);
+	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
 	struct mlx5dv_devx_obj *(*devx_obj_create)
 					(struct ibv_context *ctx,
 					 const void *in, size_t inlen,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 13/25] common/mlx5: add DevX command to create CQ
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (11 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 12/25] common/mlx5: glue UAR allocation Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 14/25] common/mlx5: glue VAR allocation Matan Azrad
                       ` (12 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
HW implements completion queues(CQ) used to post completion reports upon
completion of work request.
Used for Rx and Tx datapath.
Add DevX command to create a CQ.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 57 ++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            | 19 +++++++
 drivers/common/mlx5/mlx5_prm.h                  | 71 +++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 148 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2197705..cdc041b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1093,3 +1093,60 @@ struct mlx5_devx_obj *
 #endif
 	return -ret;
 }
+
+/*
+ * Create CQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to CQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_cq(struct ibv_context *ctx, struct mlx5_devx_cq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_cq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_cq_out)] = {0};
+	struct mlx5_devx_obj *cq_obj = rte_zmalloc(__func__, sizeof(*cq_obj),
+						   0);
+	void *cqctx = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+
+	if (!cq_obj) {
+		DRV_LOG(ERR, "Failed to allocate CQ object memory.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
+	if (attr->db_umem_valid) {
+		MLX5_SET(cqc, cqctx, dbr_umem_valid, attr->db_umem_valid);
+		MLX5_SET(cqc, cqctx, dbr_umem_id, attr->db_umem_id);
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_umem_offset);
+	} else {
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_addr);
+	}
+	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
+	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
+	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
+	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
+	if (attr->q_umem_valid) {
+		MLX5_SET(create_cq_in, in, cq_umem_valid, attr->q_umem_valid);
+		MLX5_SET(create_cq_in, in, cq_umem_id, attr->q_umem_id);
+		MLX5_SET64(create_cq_in, in, cq_umem_offset,
+			   attr->q_umem_offset);
+	}
+	cq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!cq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ using DevX errno=%d.", errno);
+		rte_free(cq_obj);
+		return NULL;
+	}
+	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
+	return cq_obj;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c76c172..581658b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -233,6 +233,23 @@ struct mlx5_devx_modify_sq_attr {
 };
 
 
+/* CQ attributes structure, used by CQ operations. */
+struct mlx5_devx_cq_attr {
+	uint32_t q_umem_valid:1;
+	uint32_t db_umem_valid:1;
+	uint32_t use_first_only:1;
+	uint32_t overrun_ignore:1;
+	uint32_t log_cq_size:5;
+	uint32_t log_page_size:5;
+	uint32_t uar_page_id;
+	uint32_t q_umem_id;
+	uint64_t q_umem_offset;
+	uint32_t db_umem_id;
+	uint64_t db_umem_offset;
+	uint32_t eqn;
+	uint64_t db_addr;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -269,4 +286,6 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
 struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
+					      struct mlx5_devx_cq_attr *attr);
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index db15bb6..a4082b9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -710,6 +710,7 @@ enum {
 enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_CREATE_CQ = 0x400,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -1846,6 +1847,76 @@ struct mlx5_ifc_flow_meter_parameters_bits {
 	u8         reserved_at_8[0x60];		// 14h-1Ch
 };
 
+struct mlx5_ifc_cqc_bits {
+	u8 status[0x4];
+	u8 as_notify[0x1];
+	u8 initiator_src_dct[0x1];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_7[0x1];
+	u8 cqe_sz[0x3];
+	u8 cc[0x1];
+	u8 reserved_at_c[0x1];
+	u8 scqe_break_moderation_en[0x1];
+	u8 oi[0x1];
+	u8 cq_period_mode[0x2];
+	u8 cqe_comp_en[0x1];
+	u8 mini_cqe_res_format[0x2];
+	u8 st[0x4];
+	u8 reserved_at_18[0x8];
+	u8 dbr_umem_id[0x20];
+	u8 reserved_at_40[0x14];
+	u8 page_offset[0x6];
+	u8 reserved_at_5a[0x6];
+	u8 reserved_at_60[0x3];
+	u8 log_cq_size[0x5];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x4];
+	u8 cq_period[0xc];
+	u8 cq_max_count[0x10];
+	u8 reserved_at_a0[0x18];
+	u8 c_eqn[0x8];
+	u8 reserved_at_c0[0x3];
+	u8 log_page_size[0x5];
+	u8 reserved_at_c8[0x18];
+	u8 reserved_at_e0[0x20];
+	u8 reserved_at_100[0x8];
+	u8 last_notified_index[0x18];
+	u8 reserved_at_120[0x8];
+	u8 last_solicit_index[0x18];
+	u8 reserved_at_140[0x8];
+	u8 consumer_counter[0x18];
+	u8 reserved_at_160[0x8];
+	u8 producer_counter[0x18];
+	u8 local_partition_id[0xc];
+	u8 process_id[0x14];
+	u8 reserved_at_1A0[0x20];
+	u8 dbr_addr[0x40];
+};
+
+struct mlx5_ifc_create_cq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_cq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_cqc_bits cq_context;
+	u8 cq_umem_offset[0x40];
+	u8 cq_umem_id[0x20];
+	u8 cq_umem_valid[0x1];
+	u8 reserved_at_2e1[0x1f];
+	u8 reserved_at_300[0x580];
+	u8 pas[];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 0c01172..c6a203d 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -1,6 +1,7 @@
 DPDK_20.02 {
 	global:
 
+	mlx5_devx_cmd_create_cq;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 14/25] common/mlx5: glue VAR allocation
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (12 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 15/25] common/mlx5: add DevX virtq commands Matan Azrad
                       ` (11 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Virtio access region(VAR) is the UAR that allocated for virtio emulation
access.
Add rdma-core operations to allocate and free VAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++++
 drivers/common/mlx5/meson.build |  1 +
 drivers/common/mlx5/mlx5_glue.c | 26 ++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  8 ++++++++
 4 files changed, 40 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 7110231..d1de3ec 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -174,6 +174,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5_MMAP_GET_NC_PAGES_CMD \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_VAR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_alloc_var \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 76ca7d7..3e130cb 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -120,6 +120,7 @@ if build
 		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
 		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_IBV_VAR', 'infiniband/mlx5dv.h', 'mlx5dv_alloc_var' ],
 		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
 		'SUPPORTED_40000baseKR4_Full' ],
 		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index 5691636..27cf33c 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1160,6 +1160,30 @@
 #endif
 }
 
+static struct mlx5dv_var *
+mlx5_glue_dv_alloc_var(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_VAR
+	return mlx5dv_alloc_var(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_dv_free_var(struct mlx5dv_var *var)
+{
+#ifdef HAVE_IBV_VAR
+	mlx5dv_free_var(var);
+#else
+	(void)var;
+	errno = ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1267,4 +1291,6 @@
 	.devx_get_event = mlx5_glue_devx_get_event,
 	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
 	.devx_free_uar = mlx5_glue_devx_free_uar,
+	.dv_alloc_var = mlx5_glue_dv_alloc_var,
+	.dv_free_var = mlx5_glue_dv_free_var,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 7d9256e..6238b43 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -93,6 +93,11 @@
 #define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
 #endif
 
+#ifndef HAVE_IBV_VAR
+struct mlx5dv_var { uint32_t page_id; uint32_t length; off_t mmap_off;
+			uint64_t comp_mask; };
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -231,6 +236,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_var *(*dv_alloc_var)(struct ibv_context *context,
+					   uint32_t flags);
+	void (*dv_free_var)(struct mlx5dv_var *var);
 	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
 						  uint32_t flags);
 	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 15/25] common/mlx5: add DevX virtq commands
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (13 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 14/25] common/mlx5: glue VAR allocation Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
                       ` (10 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Virtio emulation offload allows SW to offload the I/O operations of a
virtio virtqueue, using the device, allowing an improved performance
for its users.
While supplying all the relevant Virtqueue information (type, size,
memory location, doorbell information, etc.). The device can then
offload the I/O operation of this queue, according to its device type
characteristics.
Some of the virtio features can be supported according to the device
capability, for example, TSO and checksum.
Add virtio queue create, modify and query DevX commands.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 199 +++++++++++++++++++++---
 drivers/common/mlx5/mlx5_devx_cmds.h            |  48 +++++-
 drivers/common/mlx5/mlx5_prm.h                  | 117 ++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   3 +
 4 files changed, 343 insertions(+), 24 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index cdc041b..2425513 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -377,24 +377,18 @@ struct mlx5_devx_obj *
 		vdpa_attr->max_num_virtio_queues =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 max_num_virtio_queues);
-		vdpa_attr->umem_1_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_a);
-		vdpa_attr->umem_1_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_b);
-		vdpa_attr->umem_2_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_2_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_b);
+		vdpa_attr->umems[0].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_a);
+		vdpa_attr->umems[0].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_b);
+		vdpa_attr->umems[1].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_a);
+		vdpa_attr->umems[1].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_b);
+		vdpa_attr->umems[2].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_a);
+		vdpa_attr->umems[2].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_b);
 	}
 }
 
@@ -1150,3 +1144,172 @@ struct mlx5_devx_obj *
 	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
 	return cq_obj;
 }
+
+/**
+ * Create VIRTQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to VIRTQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	struct mlx5_devx_obj *virtq_obj = rte_zmalloc(__func__,
+						     sizeof(*virtq_obj), 0);
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+
+	if (!virtq_obj) {
+		DRV_LOG(ERR, "Failed to allocate virtq data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	MLX5_SET16(virtio_net_q, virtq, hw_used_index, attr->hw_used_index);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+	MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+	MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+	MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
+	MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+	MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+	MLX5_SET64(virtio_q, virtctx, available_addr, attr->available_addr);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	MLX5_SET16(virtio_q, virtctx, queue_size, attr->q_size);
+	MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	MLX5_SET(virtio_q, virtctx, umem_1_id, attr->umems[0].id);
+	MLX5_SET(virtio_q, virtctx, umem_1_size, attr->umems[0].size);
+	MLX5_SET64(virtio_q, virtctx, umem_1_offset, attr->umems[0].offset);
+	MLX5_SET(virtio_q, virtctx, umem_2_id, attr->umems[1].id);
+	MLX5_SET(virtio_q, virtctx, umem_2_size, attr->umems[1].size);
+	MLX5_SET64(virtio_q, virtctx, umem_2_offset, attr->umems[1].offset);
+	MLX5_SET(virtio_q, virtctx, umem_3_id, attr->umems[2].id);
+	MLX5_SET(virtio_q, virtctx, umem_3_size, attr->umems[2].size);
+	MLX5_SET64(virtio_q, virtctx, umem_3_offset, attr->umems[2].offset);
+	MLX5_SET(virtio_net_q, virtq, tisn_or_qpn, attr->tis_id);
+	virtq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						    sizeof(out));
+	if (!virtq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create VIRTQ Obj using DevX.");
+		rte_free(virtq_obj);
+		return NULL;
+	}
+	virtq_obj->id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+	return virtq_obj;
+}
+
+/**
+ * Modify VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in] attr
+ *   Pointer to modify virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	switch (attr->type) {
+	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
+			 attr->dirty_bitmap_mkey);
+		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
+			 attr->dirty_bitmap_addr);
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
+			 attr->dirty_bitmap_size);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
+			 attr->dirty_bitmap_dump_enable);
+		break;
+	default:
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Query VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in/out] attr
+ *   Pointer to virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_virtq_out)] = {0};
+	void *hdr = MLX5_ADDR_OF(query_virtq_out, in, hdr);
+	void *virtq = MLX5_ADDR_OF(query_virtq_out, out, virtq);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	ret = mlx5_glue->devx_obj_query(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	attr->hw_available_index = MLX5_GET16(virtio_net_q, virtq,
+					      hw_available_index);
+	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 581658b..1631c08 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -64,12 +64,10 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t max_num_virtio_queues;
-	uint32_t umem_1_buffer_param_a;
-	uint32_t umem_1_buffer_param_b;
-	uint32_t umem_2_buffer_param_a;
-	uint32_t umem_2_buffer_param_b;
-	uint32_t umem_3_buffer_param_a;
-	uint32_t umem_3_buffer_param_b;
+	struct {
+		uint32_t a;
+		uint32_t b;
+	} umems[3];
 	uint64_t doorbell_bar_offset;
 };
 
@@ -250,6 +248,37 @@ struct mlx5_devx_cq_attr {
 	uint64_t db_addr;
 };
 
+/* Virtq attributes structure, used by VIRTQ operations. */
+struct mlx5_devx_virtq_attr {
+	uint16_t hw_available_index;
+	uint16_t hw_used_index;
+	uint16_t q_size;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t state:4;
+	uint32_t dirty_bitmap_dump_enable:1;
+	uint32_t dirty_bitmap_mkey;
+	uint32_t dirty_bitmap_size;
+	uint32_t mkey;
+	uint32_t qp_id;
+	uint32_t queue_index;
+	uint32_t tis_id;
+	uint64_t dirty_bitmap_addr;
+	uint64_t type;
+	uint64_t desc_addr;
+	uint64_t used_addr;
+	uint64_t available_addr;
+	struct {
+		uint32_t id;
+		uint32_t size;
+		uint64_t offset;
+	} umems[3];
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -288,4 +317,11 @@ int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
 					      struct mlx5_devx_cq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+					     struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			       struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			      struct mlx5_devx_virtq_attr *attr);
+
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index a4082b9..4b8a34c 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -527,6 +527,8 @@ struct mlx5_modification_cmd {
 #define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
 				    (__mlx5_bit_off(typ, fld) & 0xf))
 #define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << \
+				  __mlx5_16_bit_off(typ, fld))
 #define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
 #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
 #define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
@@ -551,6 +553,17 @@ struct mlx5_modification_cmd {
 			rte_cpu_to_be_64(v); \
 	} while (0)
 
+#define MLX5_SET16(typ, p, fld, v) \
+	do { \
+		u16 _v = v; \
+		*((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \
+		rte_cpu_to_be_16((rte_be_to_cpu_16(*((__be16 *)(p) + \
+				  __mlx5_16_off(typ, fld))) & \
+				  (~__mlx5_16_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask16(typ, fld)) << \
+				  __mlx5_16_bit_off(typ, fld))); \
+	} while (0)
+
 #define MLX5_GET(typ, p, fld) \
 	((rte_be_to_cpu_32(*((__be32 *)(p) +\
 	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
@@ -723,6 +736,9 @@ enum {
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
+	MLX5_CMD_OP_MODIFY_GENERAL_OBJECT = 0xa01,
+	MLX5_CMD_OP_QUERY_GENERAL_OBJECT = 0xa02,
 };
 
 enum {
@@ -1691,6 +1707,11 @@ struct mlx5_ifc_create_tir_in_bits {
 	struct mlx5_ifc_tirc_bits ctx;
 };
 
+enum {
+	MLX5_INLINE_Q_TYPE_RQ = 0x0,
+	MLX5_INLINE_Q_TYPE_VIRTQ = 0x1,
+};
+
 struct mlx5_ifc_rq_num_bits {
 	u8 reserved_at_0[0x8];
 	u8 rq_num[0x18];
@@ -1917,6 +1938,102 @@ struct mlx5_ifc_create_cq_in_bits {
 	u8 pas[];
 };
 
+enum {
+	MLX5_GENERAL_OBJ_TYPE_VIRTQ = 0x000d,
+};
+
+struct mlx5_ifc_general_obj_in_cmd_hdr_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x20];
+	u8 obj_type[0x10];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_general_obj_out_cmd_hdr_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+enum {
+	MLX5_VIRTQ_STATE_INIT = 0,
+	MLX5_VIRTQ_STATE_RDY = 1,
+	MLX5_VIRTQ_STATE_SUSPEND = 2,
+	MLX5_VIRTQ_STATE_ERROR = 3,
+};
+
+enum {
+	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+};
+
+struct mlx5_ifc_virtio_q_bits {
+	u8 virtio_q_type[0x8];
+	u8 reserved_at_8[0x5];
+	u8 event_mode[0x3];
+	u8 queue_index[0x10];
+	u8 full_emulation[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 reserved_at_22[0x2];
+	u8 offload_type[0x4];
+	u8 event_qpn_or_msix[0x18];
+	u8 doorbell_stride_idx[0x10];
+	u8 queue_size[0x10];
+	u8 device_emulation_id[0x20];
+	u8 desc_addr[0x40];
+	u8 used_addr[0x40];
+	u8 available_addr[0x40];
+	u8 virtio_q_mkey[0x20];
+	u8 reserved_at_160[0x20];
+	u8 umem_1_id[0x20];
+	u8 umem_1_size[0x20];
+	u8 umem_1_offset[0x40];
+	u8 umem_2_id[0x20];
+	u8 umem_2_size[0x20];
+	u8 umem_2_offset[0x40];
+	u8 umem_3_id[0x20];
+	u8 umem_3_size[0x20];
+	u8 umem_3_offset[0x40];
+	u8 reserved_at_300[0x100];
+};
+
+struct mlx5_ifc_virtio_net_q_bits {
+	u8 modify_field_select[0x40];
+	u8 reserved_at_40[0x40];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_84[0x6];
+	u8 dirty_bitmap_dump_enable[0x1];
+	u8 vhost_log_page[0x5];
+	u8 reserved_at_90[0xc];
+	u8 state[0x4];
+	u8 error_type[0x8];
+	u8 tisn_or_qpn[0x18];
+	u8 dirty_bitmap_mkey[0x20];
+	u8 dirty_bitmap_size[0x20];
+	u8 dirty_bitmap_addr[0x40];
+	u8 hw_available_index[0x10];
+	u8 hw_used_index[0x10];
+	u8 reserved_at_160[0xa0];
+	struct mlx5_ifc_virtio_q_bits virtio_q_context;
+};
+
+struct mlx5_ifc_create_virtq_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
+struct mlx5_ifc_query_virtq_out_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index c6a203d..f3082ce 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -8,6 +8,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_create_tir;
 	mlx5_devx_cmd_create_td;
 	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
 	mlx5_devx_cmd_flow_counter_query;
@@ -15,8 +16,10 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_cmd_query_virtq;
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 16/25] common/mlx5: add support for DevX QP operations
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (14 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 15/25] common/mlx5: add DevX virtq commands Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
                       ` (9 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
QP creation is needed for vDPA virtq support.
Add 2 DevX commands to create QP and to modify QP state.
The support is for RC QP only in force loopback address mode.
By this way, the packets can be sent to other inernal destinations in
the nic. For example: other QPs or virtqs.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 167 ++++++++++-
 drivers/common/mlx5/mlx5_devx_cmds.h            |  20 ++
 drivers/common/mlx5/mlx5_prm.h                  | 376 ++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   2 +
 4 files changed, 564 insertions(+), 1 deletion(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2425513..e7288c8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1124,7 +1124,8 @@ struct mlx5_devx_obj *
 	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
 	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
 	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
-	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size -
+		 MLX5_ADAPTER_PAGE_SHIFT);
 	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
 	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
 	if (attr->q_umem_valid) {
@@ -1313,3 +1314,167 @@ struct mlx5_devx_obj *
 	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
 	return ret;
 }
+
+/**
+ * Create QP using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to QP attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
+			struct mlx5_devx_qp_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_qp_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_qp_out)] = {0};
+	struct mlx5_devx_obj *qp_obj = rte_zmalloc(__func__, sizeof(*qp_obj),
+						   0);
+	void *qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+
+	if (!qp_obj) {
+		DRV_LOG(ERR, "Failed to allocate QP data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pd, attr->pd);
+	if (attr->uar_index) {
+		MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+		MLX5_SET(qpc, qpc, uar_page, attr->uar_index);
+		MLX5_SET(qpc, qpc, log_page_size, attr->log_page_size -
+			 MLX5_ADAPTER_PAGE_SHIFT);
+		if (attr->sq_size) {
+			RTE_ASSERT(RTE_IS_POWER_OF_2(attr->sq_size));
+			MLX5_SET(qpc, qpc, cqn_snd, attr->cqn);
+			MLX5_SET(qpc, qpc, log_sq_size,
+				 rte_log2_u32(attr->sq_size));
+		} else {
+			MLX5_SET(qpc, qpc, no_sq, 1);
+		}
+		if (attr->rq_size) {
+			RTE_ASSERT(RTE_IS_POWER_OF_2(attr->rq_size));
+			MLX5_SET(qpc, qpc, cqn_rcv, attr->cqn);
+			MLX5_SET(qpc, qpc, log_rq_stride, attr->log_rq_stride -
+				 MLX5_LOG_RQ_STRIDE_SHIFT);
+			MLX5_SET(qpc, qpc, log_rq_size,
+				 rte_log2_u32(attr->rq_size));
+			MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
+		} else {
+			MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		}
+		if (attr->dbr_umem_valid) {
+			MLX5_SET(qpc, qpc, dbr_umem_valid,
+				 attr->dbr_umem_valid);
+			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
+		}
+		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
+		MLX5_SET64(create_qp_in, in, wq_umem_offset,
+			   attr->wq_umem_offset);
+		MLX5_SET(create_qp_in, in, wq_umem_id, attr->wq_umem_id);
+		MLX5_SET(create_qp_in, in, wq_umem_valid, 1);
+	} else {
+		/* Special QP to be managed by FW - no SQ\RQ\CQ\UAR\DB rec. */
+		MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		MLX5_SET(qpc, qpc, no_sq, 1);
+	}
+	qp_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!qp_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create QP Obj using DevX.");
+		rte_free(qp_obj);
+		return NULL;
+	}
+	qp_obj->id = MLX5_GET(create_qp_out, out, qpn);
+	return qp_obj;
+}
+
+/**
+ * Modify QP using DevX API.
+ * Currently supports only force loop-back QP.
+ *
+ * @param[in] qp
+ *   Pointer to QP object structure.
+ * @param [in] qp_st_mod_op
+ *   The QP state modification operation.
+ * @param [in] remote_qp_id
+ *   The remote QP ID for MLX5_CMD_OP_INIT2RTR_QP operation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
+			      uint32_t remote_qp_id)
+{
+	union {
+		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
+		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
+		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+	} in;
+	union {
+		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
+		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
+		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+	} out;
+	void *qpc;
+	int ret;
+	unsigned int inlen;
+	unsigned int outlen;
+
+	memset(&in, 0, sizeof(in));
+	memset(&out, 0, sizeof(out));
+	MLX5_SET(rst2init_qp_in, &in, opcode, qp_st_mod_op);
+	switch (qp_st_mod_op) {
+	case MLX5_CMD_OP_RST2INIT_QP:
+		MLX5_SET(rst2init_qp_in, &in, qpn, qp->id);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, &in, qpc);
+		MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, 1);
+		MLX5_SET(qpc, qpc, rre, 1);
+		MLX5_SET(qpc, qpc, rwe, 1);
+		MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+		inlen = sizeof(in.rst2init);
+		outlen = sizeof(out.rst2init);
+		break;
+	case MLX5_CMD_OP_INIT2RTR_QP:
+		MLX5_SET(init2rtr_qp_in, &in, qpn, qp->id);
+		qpc = MLX5_ADDR_OF(init2rtr_qp_in, &in, qpc);
+		MLX5_SET(qpc, qpc, primary_address_path.fl, 1);
+		MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, 1);
+		MLX5_SET(qpc, qpc, mtu, 1);
+		MLX5_SET(qpc, qpc, log_msg_max, 30);
+		MLX5_SET(qpc, qpc, remote_qpn, remote_qp_id);
+		MLX5_SET(qpc, qpc, min_rnr_nak, 0);
+		inlen = sizeof(in.init2rtr);
+		outlen = sizeof(out.init2rtr);
+		break;
+	case MLX5_CMD_OP_RTR2RTS_QP:
+		qpc = MLX5_ADDR_OF(rtr2rts_qp_in, &in, qpc);
+		MLX5_SET(rtr2rts_qp_in, &in, qpn, qp->id);
+		MLX5_SET(qpc, qpc, primary_address_path.ack_timeout, 14);
+		MLX5_SET(qpc, qpc, log_ack_req_freq, 0);
+		MLX5_SET(qpc, qpc, retry_count, 7);
+		MLX5_SET(qpc, qpc, rnr_retry, 7);
+		inlen = sizeof(in.rtr2rts);
+		outlen = sizeof(out.rtr2rts);
+		break;
+	default:
+		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
+			qp_st_mod_op);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(qp->obj, &in, inlen, &out, outlen);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify QP using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 1631c08..d1a21b8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -279,6 +279,22 @@ struct mlx5_devx_virtq_attr {
 	} umems[3];
 };
 
+
+struct mlx5_devx_qp_attr {
+	uint32_t pd:24;
+	uint32_t uar_index:24;
+	uint32_t cqn:24;
+	uint32_t log_page_size:5;
+	uint32_t rq_size:17; /* Must be power of 2. */
+	uint32_t log_rq_stride:3;
+	uint32_t sq_size:17; /* Must be power of 2. */
+	uint32_t dbr_umem_valid:1;
+	uint32_t dbr_umem_id;
+	uint64_t dbr_address;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -323,5 +339,9 @@ int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 			       struct mlx5_devx_virtq_attr *attr);
 int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
 			      struct mlx5_devx_virtq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
+					      struct mlx5_devx_qp_attr *attr);
+int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
+				  uint32_t qp_st_mod_op, uint32_t remote_qp_id);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4b8a34c..e53dd61 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -724,6 +724,19 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_CREATE_CQ = 0x400,
+	MLX5_CMD_OP_CREATE_QP = 0x500,
+	MLX5_CMD_OP_RST2INIT_QP = 0x502,
+	MLX5_CMD_OP_INIT2RTR_QP = 0x503,
+	MLX5_CMD_OP_RTR2RTS_QP = 0x504,
+	MLX5_CMD_OP_RTS2RTS_QP = 0x505,
+	MLX5_CMD_OP_SQERR2RTS_QP = 0x506,
+	MLX5_CMD_OP_QP_2ERR = 0x507,
+	MLX5_CMD_OP_QP_2RST = 0x50A,
+	MLX5_CMD_OP_QUERY_QP = 0x50B,
+	MLX5_CMD_OP_SQD2RTS_QP = 0x50C,
+	MLX5_CMD_OP_INIT2INIT_QP = 0x50E,
+	MLX5_CMD_OP_SUSPEND_QP = 0x50F,
+	MLX5_CMD_OP_RESUME_QP = 0x510,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -747,6 +760,9 @@ enum {
 	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
+#define MLX5_ADAPTER_PAGE_SHIFT 12
+#define MLX5_LOG_RQ_STRIDE_SHIFT 4
+
 /* Flow counters. */
 struct mlx5_ifc_alloc_flow_counter_out_bits {
 	u8         status[0x8];
@@ -2034,6 +2050,366 @@ struct mlx5_ifc_query_virtq_out_bits {
 	struct mlx5_ifc_virtio_net_q_bits virtq;
 };
 
+enum {
+	MLX5_QP_ST_RC = 0x0,
+};
+
+enum {
+	MLX5_QP_PM_MIGRATED = 0x3,
+};
+
+enum {
+	MLX5_NON_ZERO_RQ = 0x0,
+	MLX5_SRQ_RQ = 0x1,
+	MLX5_CRQ_RQ = 0x2,
+	MLX5_ZERO_LEN_RQ = 0x3,
+};
+
+struct mlx5_ifc_ads_bits {
+	u8 fl[0x1];
+	u8 free_ar[0x1];
+	u8 reserved_at_2[0xe];
+	u8 pkey_index[0x10];
+	u8 reserved_at_20[0x8];
+	u8 grh[0x1];
+	u8 mlid[0x7];
+	u8 rlid[0x10];
+	u8 ack_timeout[0x5];
+	u8 reserved_at_45[0x3];
+	u8 src_addr_index[0x8];
+	u8 reserved_at_50[0x4];
+	u8 stat_rate[0x4];
+	u8 hop_limit[0x8];
+	u8 reserved_at_60[0x4];
+	u8 tclass[0x8];
+	u8 flow_label[0x14];
+	u8 rgid_rip[16][0x8];
+	u8 reserved_at_100[0x4];
+	u8 f_dscp[0x1];
+	u8 f_ecn[0x1];
+	u8 reserved_at_106[0x1];
+	u8 f_eth_prio[0x1];
+	u8 ecn[0x2];
+	u8 dscp[0x6];
+	u8 udp_sport[0x10];
+	u8 dei_cfi[0x1];
+	u8 eth_prio[0x3];
+	u8 sl[0x4];
+	u8 vhca_port_num[0x8];
+	u8 rmac_47_32[0x10];
+	u8 rmac_31_0[0x20];
+};
+
+struct mlx5_ifc_qpc_bits {
+	u8 state[0x4];
+	u8 lag_tx_port_affinity[0x4];
+	u8 st[0x8];
+	u8 reserved_at_10[0x3];
+	u8 pm_state[0x2];
+	u8 reserved_at_15[0x1];
+	u8 req_e2e_credit_mode[0x2];
+	u8 offload_type[0x4];
+	u8 end_padding_mode[0x2];
+	u8 reserved_at_1e[0x2];
+	u8 wq_signature[0x1];
+	u8 block_lb_mc[0x1];
+	u8 atomic_like_write_en[0x1];
+	u8 latency_sensitive[0x1];
+	u8 reserved_at_24[0x1];
+	u8 drain_sigerr[0x1];
+	u8 reserved_at_26[0x2];
+	u8 pd[0x18];
+	u8 mtu[0x3];
+	u8 log_msg_max[0x5];
+	u8 reserved_at_48[0x1];
+	u8 log_rq_size[0x4];
+	u8 log_rq_stride[0x3];
+	u8 no_sq[0x1];
+	u8 log_sq_size[0x4];
+	u8 reserved_at_55[0x6];
+	u8 rlky[0x1];
+	u8 ulp_stateless_offload_mode[0x4];
+	u8 counter_set_id[0x8];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_a0[0x3];
+	u8 log_page_size[0x5];
+	u8 remote_qpn[0x18];
+	struct mlx5_ifc_ads_bits primary_address_path;
+	struct mlx5_ifc_ads_bits secondary_address_path;
+	u8 log_ack_req_freq[0x4];
+	u8 reserved_at_384[0x4];
+	u8 log_sra_max[0x3];
+	u8 reserved_at_38b[0x2];
+	u8 retry_count[0x3];
+	u8 rnr_retry[0x3];
+	u8 reserved_at_393[0x1];
+	u8 fre[0x1];
+	u8 cur_rnr_retry[0x3];
+	u8 cur_retry_count[0x3];
+	u8 reserved_at_39b[0x5];
+	u8 reserved_at_3a0[0x20];
+	u8 reserved_at_3c0[0x8];
+	u8 next_send_psn[0x18];
+	u8 reserved_at_3e0[0x8];
+	u8 cqn_snd[0x18];
+	u8 reserved_at_400[0x8];
+	u8 deth_sqpn[0x18];
+	u8 reserved_at_420[0x20];
+	u8 reserved_at_440[0x8];
+	u8 last_acked_psn[0x18];
+	u8 reserved_at_460[0x8];
+	u8 ssn[0x18];
+	u8 reserved_at_480[0x8];
+	u8 log_rra_max[0x3];
+	u8 reserved_at_48b[0x1];
+	u8 atomic_mode[0x4];
+	u8 rre[0x1];
+	u8 rwe[0x1];
+	u8 rae[0x1];
+	u8 reserved_at_493[0x1];
+	u8 page_offset[0x6];
+	u8 reserved_at_49a[0x3];
+	u8 cd_slave_receive[0x1];
+	u8 cd_slave_send[0x1];
+	u8 cd_master[0x1];
+	u8 reserved_at_4a0[0x3];
+	u8 min_rnr_nak[0x5];
+	u8 next_rcv_psn[0x18];
+	u8 reserved_at_4c0[0x8];
+	u8 xrcd[0x18];
+	u8 reserved_at_4e0[0x8];
+	u8 cqn_rcv[0x18];
+	u8 dbr_addr[0x40];
+	u8 q_key[0x20];
+	u8 reserved_at_560[0x5];
+	u8 rq_type[0x3];
+	u8 srqn_rmpn_xrqn[0x18];
+	u8 reserved_at_580[0x8];
+	u8 rmsn[0x18];
+	u8 hw_sq_wqebb_counter[0x10];
+	u8 sw_sq_wqebb_counter[0x10];
+	u8 hw_rq_counter[0x20];
+	u8 sw_rq_counter[0x20];
+	u8 reserved_at_600[0x20];
+	u8 reserved_at_620[0xf];
+	u8 cgs[0x1];
+	u8 cs_req[0x8];
+	u8 cs_res[0x8];
+	u8 dc_access_key[0x40];
+	u8 reserved_at_680[0x3];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_684[0x9c];
+	u8 dbr_umem_id[0x20];
+};
+
+struct mlx5_ifc_create_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 wq_umem_offset[0x40];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_861[0x1f];
+	u8 pas[0][0x40];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+struct mlx5_ifc_sqerr2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_sqerr2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_sqd2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_sqd2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rts2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rts2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rtr2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rtr2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rst2init_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rst2init_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_init2rtr_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_init2rtr_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_init2init_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_init2init_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_query_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+	u8 pas[0][0x40];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+struct mlx5_ifc_query_qp_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index f3082ce..df8e064 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -2,6 +2,7 @@ DPDK_20.02 {
 	global:
 
 	mlx5_devx_cmd_create_cq;
+	mlx5_devx_cmd_create_qp;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
@@ -14,6 +15,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_qp_state;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 17/25] common/mlx5: allow type configuration for DevX RQT
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (15 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 18/25] common/mlx5: add TIR field constants Matan Azrad
                       ` (8 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Allow virtio queue type configuration in the RQ table.
The needed fields and configuration was added.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 1 +
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 drivers/common/mlx5/mlx5_prm.h       | 5 +++--
 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index e7288c8..e372df6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -846,6 +846,7 @@ struct mlx5_devx_obj *
 	}
 	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
 	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
 	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
 	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
 	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d1a21b8..9ef3ce2 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -188,6 +188,7 @@ struct mlx5_devx_tir_attr {
 
 /* RQT attributes structure, used by RQT operations. */
 struct mlx5_devx_rqt_attr {
+	uint8_t rq_type;
 	uint32_t rqt_max_size:16;
 	uint32_t rqt_actual_size:16;
 	uint32_t rq_list[];
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e53dd61..000ba1f 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1734,8 +1734,9 @@ struct mlx5_ifc_rq_num_bits {
 };
 
 struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
+	u8 reserved_at_0[0xa5];
+	u8 list_q_type[0x3];
+	u8 reserved_at_a8[0x8];
 	u8 rqt_max_size[0x10];
 	u8 reserved_at_c0[0x10];
 	u8 rqt_actual_size[0x10];
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 18/25] common/mlx5: add TIR field constants
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (16 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
                       ` (7 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The DevX TIR object configuration should get L3 and L4 protocols
expected to be forwarded by the TIR.
Add the PRM constant values needed to configure the L3 and L4 protocols.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 000ba1f..e326868 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1639,6 +1639,16 @@ struct mlx5_ifc_modify_rq_in_bits {
 };
 
 enum {
+	MLX5_L3_PROT_TYPE_IPV4 = 0,
+	MLX5_L3_PROT_TYPE_IPV6 = 1,
+};
+
+enum {
+	MLX5_L4_PROT_TYPE_TCP = 0,
+	MLX5_L4_PROT_TYPE_UDP = 1,
+};
+
+enum {
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 19/25] common/mlx5: add DevX command to modify RQT
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (17 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 18/25] common/mlx5: add TIR field constants Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
                       ` (6 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
RQ table can be changed to support different list of queues.
Add DevX command to modify DevX RQT object to point on new RQ list.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 47 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  2 ++
 drivers/common/mlx5/mlx5_prm.h                  | 21 +++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 71 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index e372df6..1d3a729 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -864,6 +864,53 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Modify RQT using DevX API.
+ *
+ * @param[in] rqt
+ *   Pointer to RQT DevX object structure.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(modify_rqt_out)] = {0};
+	uint32_t *in = rte_calloc(__func__, 1, inlen, 0);
+	void *rqt_ctx;
+	int i;
+	int ret;
+
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT modify IN data.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	MLX5_SET(modify_rqt_in, in, opcode, MLX5_CMD_OP_MODIFY_RQT);
+	MLX5_SET(modify_rqt_in, in, rqtn, rqt->id);
+	MLX5_SET64(modify_rqt_in, in, modify_bitmask, 0x1);
+	rqt_ctx = MLX5_ADDR_OF(modify_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	ret = mlx5_glue->devx_obj_modify(rqt->obj, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQT using DevX.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return ret;
+}
+
+/**
  * Create SQ using DevX API.
  *
  * @param[in] ctx
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 9ef3ce2..b99c54b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -344,5 +344,7 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
 					      struct mlx5_devx_qp_attr *attr);
 int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
 				  uint32_t qp_st_mod_op, uint32_t remote_qp_id);
+int mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			     struct mlx5_devx_rqt_attr *rqt_attr);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e326868..b48cd0a 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -747,6 +747,7 @@ enum {
 	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_MODIFY_RQT = 0x917,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
 	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
@@ -1774,10 +1775,30 @@ struct mlx5_ifc_create_rqt_in_bits {
 	u8 reserved_at_40[0xc0];
 	struct mlx5_ifc_rqtc_bits rqt_context;
 };
+
+struct mlx5_ifc_modify_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_modify_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_SQC_STATE_RST  = 0x0,
 	MLX5_SQC_STATE_RDY  = 0x1,
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index df8e064..95ca54a 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,6 +17,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_qp_state;
 	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_rqt;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 20/25] common/mlx5: get DevX capability for max RQT size
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (18 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 21/25] net/mlx5: select driver by vDPA device argument Matan Azrad
                       ` (5 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
In order to allow RQT size configuration which is limited to the
correct maximum value, add log_max_rqt_size for DevX capability
structure.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 2 ++
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 2 files changed, 3 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d3a729..b0803ac 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -436,6 +436,8 @@ struct mlx5_devx_obj *
 			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
+	attr->log_max_rqt_size = MLX5_GET(cmd_hca_cap, hcattr,
+					  log_max_rqt_size);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
 	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index b99c54b..6912dc6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -78,6 +78,7 @@ struct mlx5_hca_vdpa_attr {
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
 	uint32_t flow_counters_dump:1;
+	uint32_t log_max_rqt_size:5;
 	uint8_t flow_counter_bulk_alloc_bitmap;
 	uint32_t eth_net_offloads:1;
 	uint32_t eth_virt:1;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 21/25] net/mlx5: select driver by vDPA device argument
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (19 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 22/25] net/mlx5: separate Netlink command interface Matan Azrad
                       ` (4 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
There might be a case that one Mellanox device can be probed by
multiple mlx5 drivers.
One case is that any mlx5 vDPA device can be probed by bothe net/mlx5
and vdpa/mlx5.
Add a new mlx5 common API to get the requested driver by devargs:
vdpa=1.
Skip net/mlx5 PMD probing while the device is selected to be probed by
the vDPA driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/meson.build                 |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 36 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  3 +++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 drivers/net/mlx5/mlx5.c                         |  5 ++++
 6 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index d1de3ec..b9e9803 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 3e130cb..b88822e 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -37,7 +37,7 @@ endforeach
 
 if build
 	allow_experimental_apis = true
-	deps += ['hash', 'pci', 'net', 'eal']
+	deps += ['hash', 'pci', 'net', 'eal', 'kvargs']
 	ext_deps += libs
 	sources = files(
 		'mlx5_devx_cmds.c',
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 2381208..f756b6b 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -71,6 +71,42 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_check_handler(__rte_unused const char *key, const char *value,
+			__rte_unused void *opaque)
+{
+	if (strcmp(value, "1"))
+		return -1;
+	return 0;
+}
+
+int
+mlx5_vdpa_mode_selected(struct rte_devargs *devargs)
+{
+	struct rte_kvargs *kvlist;
+	const char *key = "vdpa";
+	int ret = 0;
+
+	if (devargs == NULL)
+		return 0;
+
+	kvlist = rte_kvargs_parse(devargs->args, NULL);
+	if (kvlist == NULL)
+		return 0;
+
+	if (!rte_kvargs_count(kvlist, key))
+		goto exit;
+
+	/* Vdpa mode selected when there's a key-value pair: vdpa=1. */
+	if (rte_kvargs_process(kvlist, key, mlx5_vdpa_check_handler, NULL) < 0)
+		goto exit;
+	ret = 1;
+
+exit:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9d464d4..aeaa7b9 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -11,6 +11,8 @@
 #include <rte_pci.h>
 #include <rte_atomic.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "mlx5_prm.h"
 
@@ -149,5 +151,6 @@ enum mlx5_cqe_status {
 }
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+int mlx5_vdpa_mode_selected(struct rte_devargs *devargs);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 95ca54a..d32d631 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -26,4 +26,5 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
+	mlx5_vdpa_mode_selected;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d0fa2da..353196b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2967,6 +2967,11 @@ struct mlx5_flow_id_pool *
 	struct mlx5_dev_config dev_config;
 	int ret;
 
+	if (mlx5_vdpa_mode_selected(pci_dev->device.devargs)) {
+		DRV_LOG(DEBUG, "Skip probing - should be probed by the vdpa"
+			" driver.");
+		return 1;
+	}
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
 		mlx5_pmd_socket_init();
 	ret = mlx5_init_once();
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 22/25] net/mlx5: separate Netlink command interface
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (20 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 21/25] net/mlx5: select driver by vDPA device argument Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
                       ` (3 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The Netlink commands interfaces is included in the mlx5.h file with a
lot of other PMD interfaces.
As an arrangement to make the Netlink commands shared with different
PMDs, this patch moves the Netlink interface to a new file called
mlx5_nl.h.
Move non Netlink pure vlan commands from mlx5_nl.c to the
mlx5_vlan.c.
Rename Netlink commands and structures to use prefix mlx5_nl.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  72 +++------------------
 drivers/net/mlx5/mlx5_nl.c   | 149 +++----------------------------------------
 drivers/net/mlx5/mlx5_nl.h   |  69 ++++++++++++++++++++
 drivers/net/mlx5/mlx5_vlan.c | 134 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 220 insertions(+), 204 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3daf0db..01d0051 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -39,6 +39,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
+#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
@@ -75,24 +76,6 @@ struct mlx5_mp_param {
 /** Key string for IPC. */
 #define MLX5_MP_NAME "net_mlx5_mp"
 
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
 
 LIST_HEAD(mlx5_dev_list, mlx5_ibv_shared);
 
@@ -226,30 +209,12 @@ enum mlx5_verbs_alloc_type {
 	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
 };
 
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
 	uint32_t created:1;
 };
 
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t nl_sn;
-	uint32_t vf_ifindex;
-	struct rte_eth_dev *dev;
-	struct mlx5_vlan_dev vlan_dev[4096];
-};
-
 /**
  * Verbs allocator needs a context to know in the callback which kind of
  * resources it is allocating.
@@ -576,7 +541,7 @@ struct mlx5_priv {
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
 	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
-	struct mlx5_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
+	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
@@ -672,6 +637,8 @@ int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 void mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx5_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
 		      uint32_t index, uint32_t vmdq);
+struct mlx5_nl_vlan_vmwa_context *mlx5_vlan_vmwa_init
+				    (struct rte_eth_dev *dev, uint32_t ifindex);
 int mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr);
 int mlx5_set_mc_addr_list(struct rte_eth_dev *dev,
 			struct rte_ether_addr *mc_addr_set,
@@ -715,6 +682,11 @@ int mlx5_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
 int mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 void mlx5_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t queue, int on);
 int mlx5_vlan_offload_set(struct rte_eth_dev *dev, int mask);
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *ctx);
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
 
 /* mlx5_trigger.c */
 
@@ -796,32 +768,6 @@ int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
 int mlx5_pmd_socket_init(void);
 void mlx5_pmd_socket_uninit(void);
 
-/* mlx5_nl.c */
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-struct mlx5_vlan_vmwa_context *mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-						   uint32_t ifindex);
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *ctx);
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index e7ba034..3fe4b6f 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -5,7 +5,6 @@
 
 #include <errno.h>
 #include <linux/if_link.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
@@ -18,8 +17,6 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
-#include <rte_malloc.h>
-#include <rte_hypervisor.h>
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -1072,7 +1069,8 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_switch_info(int nl, unsigned int ifindex, struct mlx5_switch_info *info)
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
 {
 	uint32_t seq = random();
 	struct {
@@ -1116,12 +1114,12 @@ struct mlx5_nl_ifindex_data {
  * Delete VLAN network device by ifindex.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Interface index of network device to delete.
  */
-static void
-mlx5_vlan_vmwa_delete(struct mlx5_vlan_vmwa_context *vmwa,
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
 	int ret;
@@ -1196,14 +1194,14 @@ struct mlx5_nl_ifindex_data {
  * Create network VLAN device with specified VLAN tag.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Base network interface index.
  * @param[in] tag
  *   VLAN tag for VLAN network device to create.
  */
-static uint32_t
-mlx5_vlan_vmwa_create(struct mlx5_vlan_vmwa_context *vmwa,
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex,
 		      uint16_t tag)
 {
@@ -1269,134 +1267,3 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
-
-/*
- * Release VLAN network device, created for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to release.
- */
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(vlan->created);
-	assert(priv->vmwa_context);
-	if (!vlan->created || !vmwa)
-		return;
-	vlan->created = 0;
-	assert(vlan_dev[vlan->tag].refcnt);
-	if (--vlan_dev[vlan->tag].refcnt == 0 &&
-	    vlan_dev[vlan->tag].ifindex) {
-		mlx5_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex = 0;
-	}
-}
-
-/**
- * Acquire VLAN interface with specified tag for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to acquire.
- */
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(!vlan->created);
-	assert(priv->vmwa_context);
-	if (vlan->created || !vmwa)
-		return;
-	if (vlan_dev[vlan->tag].refcnt == 0) {
-		assert(!vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex =
-			mlx5_vlan_vmwa_create(vmwa,
-					      vmwa->vf_ifindex,
-					      vlan->tag);
-	}
-	if (vlan_dev[vlan->tag].ifindex) {
-		vlan_dev[vlan->tag].refcnt++;
-		vlan->created = 1;
-	}
-}
-
-/*
- * Create per ethernet device VLAN VM workaround context
- */
-struct mlx5_vlan_vmwa_context *
-mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-		    uint32_t ifindex)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_dev_config *config = &priv->config;
-	struct mlx5_vlan_vmwa_context *vmwa;
-	enum rte_hypervisor hv_type;
-
-	/* Do not engage workaround over PF. */
-	if (!config->vf)
-		return NULL;
-	/* Check whether there is desired virtual environment */
-	hv_type = rte_hypervisor_get();
-	switch (hv_type) {
-	case RTE_HYPERVISOR_UNKNOWN:
-	case RTE_HYPERVISOR_VMWARE:
-		/*
-		 * The "white list" of configurations
-		 * to engage the workaround.
-		 */
-		break;
-	default:
-		/*
-		 * The configuration is not found in the "white list".
-		 * We should not engage the VLAN workaround.
-		 */
-		return NULL;
-	}
-	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
-	if (!vmwa) {
-		DRV_LOG(WARNING,
-			"Can not allocate memory"
-			" for VLAN workaround context");
-		return NULL;
-	}
-	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
-	if (vmwa->nl_socket < 0) {
-		DRV_LOG(WARNING,
-			"Can not create Netlink socket"
-			" for VLAN workaround context");
-		rte_free(vmwa);
-		return NULL;
-	}
-	vmwa->nl_sn = random();
-	vmwa->vf_ifindex = ifindex;
-	vmwa->dev = dev;
-	/* Cleanup for existing VLAN devices. */
-	return vmwa;
-}
-
-/*
- * Destroy per ethernet device VLAN VM workaround context
- */
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *vmwa)
-{
-	unsigned int i;
-
-	/* Delete all remaining VLAN devices. */
-	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
-		if (vmwa->vlan_dev[i].ifindex)
-			mlx5_vlan_vmwa_delete(vmwa, vmwa->vlan_dev[i].ifindex);
-	}
-	if (vmwa->nl_socket >= 0)
-		close(vmwa->nl_socket);
-	rte_free(vmwa);
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..7903673
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t nl_sn;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			 uint32_t index);
+int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
+void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
+int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
+int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			   uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index b0fa31a..fb52d8f 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -7,6 +7,8 @@
 #include <errno.h>
 #include <assert.h>
 #include <stdint.h>
+#include <unistd.h>
+
 
 /*
  * Not needed by this file; included to work around the lack of off_t
@@ -26,6 +28,8 @@
 
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_hypervisor.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -33,6 +37,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
@@ -193,3 +198,132 @@
 	}
 	return 0;
 }
+
+/*
+ * Release VLAN network device, created for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to release.
+ */
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(vlan->created);
+	assert(priv->vmwa_context);
+	if (!vlan->created || !vmwa)
+		return;
+	vlan->created = 0;
+	assert(vlan_dev[vlan->tag].refcnt);
+	if (--vlan_dev[vlan->tag].refcnt == 0 &&
+	    vlan_dev[vlan->tag].ifindex) {
+		mlx5_nl_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex = 0;
+	}
+}
+
+/**
+ * Acquire VLAN interface with specified tag for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to acquire.
+ */
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(!vlan->created);
+	assert(priv->vmwa_context);
+	if (vlan->created || !vmwa)
+		return;
+	if (vlan_dev[vlan->tag].refcnt == 0) {
+		assert(!vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex =
+			mlx5_nl_vlan_vmwa_create(vmwa, vmwa->vf_ifindex,
+						 vlan->tag);
+	}
+	if (vlan_dev[vlan->tag].ifindex) {
+		vlan_dev[vlan->tag].refcnt++;
+		vlan->created = 1;
+	}
+}
+
+/*
+ * Create per ethernet device VLAN VM workaround context
+ */
+struct mlx5_nl_vlan_vmwa_context *
+mlx5_vlan_vmwa_init(struct rte_eth_dev *dev, uint32_t ifindex)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
+	struct mlx5_nl_vlan_vmwa_context *vmwa;
+	enum rte_hypervisor hv_type;
+
+	/* Do not engage workaround over PF. */
+	if (!config->vf)
+		return NULL;
+	/* Check whether there is desired virtual environment */
+	hv_type = rte_hypervisor_get();
+	switch (hv_type) {
+	case RTE_HYPERVISOR_UNKNOWN:
+	case RTE_HYPERVISOR_VMWARE:
+		/*
+		 * The "white list" of configurations
+		 * to engage the workaround.
+		 */
+		break;
+	default:
+		/*
+		 * The configuration is not found in the "white list".
+		 * We should not engage the VLAN workaround.
+		 */
+		return NULL;
+	}
+	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
+	if (!vmwa) {
+		DRV_LOG(WARNING,
+			"Can not allocate memory"
+			" for VLAN workaround context");
+		return NULL;
+	}
+	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
+	if (vmwa->nl_socket < 0) {
+		DRV_LOG(WARNING,
+			"Can not create Netlink socket"
+			" for VLAN workaround context");
+		rte_free(vmwa);
+		return NULL;
+	}
+	vmwa->nl_sn = random();
+	vmwa->vf_ifindex = ifindex;
+	/* Cleanup for existing VLAN devices. */
+	return vmwa;
+}
+
+/*
+ * Destroy per ethernet device VLAN VM workaround context
+ */
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *vmwa)
+{
+	unsigned int i;
+
+	/* Delete all remaining VLAN devices. */
+	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
+		if (vmwa->vlan_dev[i].ifindex)
+			mlx5_nl_vlan_vmwa_delete(vmwa,
+						 vmwa->vlan_dev[i].ifindex);
+	}
+	if (vmwa->nl_socket >= 0)
+		close(vmwa->nl_socket);
+	rte_free(vmwa);
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 23/25] net/mlx5: reduce Netlink commands dependencies
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (21 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 22/25] net/mlx5: separate Netlink command interface Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 24/25] common/mlx5: share Netlink commands Matan Azrad
                       ` (2 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
As an arrangment for Netlink command moving to the common library,
reduce the net/mlx5 dependencies.
Replace ethdev class command parameters.
Improve Netlink sequence number mechanism to be controlled by the
mlx5 Netlink mechanism.
Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one
which uses it.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  10 +-
 drivers/net/mlx5/mlx5.h        |   3 -
 drivers/net/mlx5/mlx5_ethdev.c |  49 ------
 drivers/net/mlx5/mlx5_mac.c    |  14 +-
 drivers/net/mlx5/mlx5_nl.c     | 329 +++++++++++++++++++++++++----------------
 drivers/net/mlx5/mlx5_nl.h     |  23 +--
 drivers/net/mlx5/mlx5_rxmode.c |  12 +-
 drivers/net/mlx5/mlx5_vlan.c   |   1 -
 8 files changed, 236 insertions(+), 205 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 353196b..439b7b8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1270,7 +1270,9 @@ struct mlx5_flow_id_pool *
 	if (priv->reta_idx != NULL)
 		rte_free(priv->reta_idx);
 	if (priv->config.vf)
-		mlx5_nl_mac_addr_flush(dev);
+		mlx5_nl_mac_addr_flush(priv->nl_socket_route, mlx5_ifindex(dev),
+				       dev->data->mac_addrs,
+				       MLX5_MAX_MAC_ADDRESSES, priv->mac_own);
 	if (priv->nl_socket_route >= 0)
 		close(priv->nl_socket_route);
 	if (priv->nl_socket_rdma >= 0)
@@ -2327,7 +2329,6 @@ struct mlx5_flow_id_pool *
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
-	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
 	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
@@ -2647,7 +2648,10 @@ struct mlx5_flow_id_pool *
 	/* Register MAC address. */
 	claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
 	if (config.vf && config.vf_nl_en)
-		mlx5_nl_mac_addr_sync(eth_dev);
+		mlx5_nl_mac_addr_sync(priv->nl_socket_route,
+				      mlx5_ifindex(eth_dev),
+				      eth_dev->data->mac_addrs,
+				      MLX5_MAX_MAC_ADDRESSES);
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
 	TAILQ_INIT(&priv->flow_meters);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 01d0051..9864aa7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -539,7 +539,6 @@ struct mlx5_priv {
 	/* Context for Verbs allocator. */
 	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
-	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
@@ -617,8 +616,6 @@ int mlx5_sysfs_switch_info(unsigned int ifindex,
 			   struct mlx5_switch_info *info);
 void mlx5_sysfs_check_switch_info(bool device_dir,
 				  struct mlx5_switch_info *switch_info);
-void mlx5_nl_check_switch_info(bool nun_vf_set,
-			       struct mlx5_switch_info *switch_info);
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
 void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2628e64..5484104 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1891,55 +1891,6 @@ struct mlx5_priv *
 }
 
 /**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
  * Analyze gathered port parameters via sysfs to recognize master
  * and representor devices for E-Switch configuration.
  *
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index a646b90..0ab2a0e 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -74,8 +74,9 @@
 	if (rte_is_zero_ether_addr(&dev->data->mac_addrs[index]))
 		return;
 	if (vf)
-		mlx5_nl_mac_addr_remove(dev, &dev->data->mac_addrs[index],
-					index);
+		mlx5_nl_mac_addr_remove(priv->nl_socket_route,
+					mlx5_ifindex(dev), priv->mac_own,
+					&dev->data->mac_addrs[index], index);
 	memset(&dev->data->mac_addrs[index], 0, sizeof(struct rte_ether_addr));
 }
 
@@ -117,7 +118,9 @@
 		return -rte_errno;
 	}
 	if (vf) {
-		int ret = mlx5_nl_mac_addr_add(dev, mac, index);
+		int ret = mlx5_nl_mac_addr_add(priv->nl_socket_route,
+					       mlx5_ifindex(dev), priv->mac_own,
+					       mac, index);
 
 		if (ret)
 			return ret;
@@ -209,8 +212,9 @@
 			if (priv->master == 1) {
 				priv = dev->data->dev_private;
 				return mlx5_nl_vf_mac_addr_modify
-					(&rte_eth_devices[port_id],
-					 mac_addr, priv->representor_id);
+				       (priv->nl_socket_route,
+					mlx5_ifindex(&rte_eth_devices[port_id]),
+					mac_addr, priv->representor_id);
 			}
 		}
 		rte_errno = -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 3fe4b6f..6b8ca00 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -17,8 +17,11 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
+#include <rte_atomic.h>
+#include <rte_ether.h>
 
 #include "mlx5.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /* Size of the buffer to receive kernel messages */
@@ -109,6 +112,11 @@ struct mlx5_nl_ifindex_data {
 	uint32_t portnum; /**< IB device max port number (out). */
 };
 
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
 /**
  * Opens a Netlink socket.
  *
@@ -369,8 +377,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Get bridge MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac[out]
  *   Pointer to the array table of MAC addresses to fill.
  *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
@@ -381,11 +391,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_list(struct rte_eth_dev *dev, struct rte_ether_addr (*mac)[],
-		      int *mac_n)
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr	hdr;
 		struct ifinfomsg ifm;
@@ -404,33 +412,33 @@ struct mlx5_nl_ifindex_data {
 		.mac = mac,
 		.mac_n = 0,
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_request(fd, &req.hdr, sn, &req.ifm,
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
 			      sizeof(struct ifinfomsg));
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, mlx5_nl_mac_addr_cb, &data);
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
 	if (ret < 0)
 		goto error;
 	*mac_n = data.mac_n;
 	return 0;
 error:
-	DRV_LOG(DEBUG, "port %u cannot retrieve MAC address list %s",
-		dev->data->port_id, strerror(rte_errno));
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
 	return -rte_errno;
 }
 
 /**
  * Modify the MAC address neighbour table with Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *   MAC address to consider.
  * @param add
@@ -440,11 +448,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_modify(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			int add)
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ndmsg ndm;
@@ -468,28 +474,26 @@ struct mlx5_nl_ifindex_data {
 			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
 	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
 	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
 		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
 error:
 	DRV_LOG(DEBUG,
-		"port %u cannot %s MAC address %02X:%02X:%02X:%02X:%02X:%02X"
-		" %s",
-		dev->data->port_id,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
 		add ? "add" : "remove",
 		mac->addr_bytes[0], mac->addr_bytes[1],
 		mac->addr_bytes[2], mac->addr_bytes[3],
@@ -501,8 +505,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Modify the VF MAC address neighbour table with Netlink.
  *
- * @param dev
- *    Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *    MAC address to consider.
  * @param vf_index
@@ -512,12 +518,10 @@ struct mlx5_nl_ifindex_data {
  *    0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			   struct rte_ether_addr *mac, int vf_index)
 {
-	int fd, ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
+	int ret;
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifm;
@@ -546,10 +550,10 @@ struct mlx5_nl_ifindex_data {
 			.rta_type = IFLA_VF_MAC,
 		},
 	};
-	uint32_t sn = priv->nl_sn++;
 	struct ifla_vf_mac ivm = {
 		.vf = vf_index,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 
 	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
 	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
@@ -564,13 +568,12 @@ struct mlx5_nl_ifindex_data {
 	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
 					       &req.vf_info_rta);
 
-	fd = priv->nl_socket_route;
-	if (fd < 0)
+	if (nlsk_fd < 0)
 		return -1;
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
@@ -589,8 +592,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Add a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to register.
  * @param index
@@ -600,15 +607,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
 		     uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_modify(dev, mac, 1);
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
 	if (!ret)
-		BITFIELD_SET(priv->mac_own, index);
+		BITFIELD_SET(mac_own, index);
 	if (ret == -EEXIST)
 		return 0;
 	return ret;
@@ -617,8 +624,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Remove a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to remove.
  * @param index
@@ -628,46 +639,50 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			uint32_t index)
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	BITFIELD_RESET(priv->mac_own, index);
-	return mlx5_nl_mac_addr_modify(dev, mac, 0);
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
 }
 
 /**
  * Synchronize Netlink bridge table to the internal table.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
  */
 void
-mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
 {
-	struct rte_ether_addr macs[MLX5_MAX_MAC_ADDRESSES];
+	struct rte_ether_addr macs[n];
 	int macs_n = 0;
 	int i;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_list(dev, &macs, &macs_n);
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
 	if (ret)
 		return;
 	for (i = 0; i != macs_n; ++i) {
 		int j;
 
 		/* Verify the address is not in the array yet. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j)
-			if (rte_is_same_ether_addr(&macs[i],
-					       &dev->data->mac_addrs[j]))
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
 				break;
-		if (j != MLX5_MAX_MAC_ADDRESSES)
+		if (j != n)
 			continue;
 		/* Find the first entry available. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j) {
-			if (rte_is_zero_ether_addr(&dev->data->mac_addrs[j])) {
-				dev->data->mac_addrs[j] = macs[i];
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
 				break;
 			}
 		}
@@ -677,28 +692,40 @@ struct mlx5_nl_ifindex_data {
 /**
  * Flush all added MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  */
 void
-mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 
-	for (i = MLX5_MAX_MAC_ADDRESSES - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &dev->data->mac_addrs[i];
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
 
-		if (BITFIELD_ISSET(priv->mac_own, i))
-			mlx5_nl_mac_addr_remove(dev, m, i);
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
 	}
 }
 
 /**
  * Enable promiscuous / all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param flags
  *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
  * @param enable
@@ -708,10 +735,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_device_flags(struct rte_eth_dev *dev, uint32_t flags, int enable)
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifi;
@@ -727,14 +753,13 @@ struct mlx5_nl_ifindex_data {
 			.ifi_index = iface_idx,
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (priv->nl_socket_route < 0)
+	if (nlsk_fd < 0)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_send(fd, &req.hdr, priv->nl_sn++);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		return ret;
 	return 0;
@@ -743,8 +768,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable promiscuous mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -752,14 +779,14 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_promisc(struct rte_eth_dev *dev, int enable)
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_PROMISC, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s promisc mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -767,8 +794,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -776,14 +805,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable)
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_ALLMULTI, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s allmulti mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -879,7 +909,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
 		.flags = 0,
@@ -900,19 +929,20 @@ struct mlx5_nl_ifindex_data {
 		},
 	};
 	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
 	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
 		goto error;
 	data.flags = 0;
-	++seq;
+	sn = MLX5_NL_SN_GENERATE;
 	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
 					     RDMA_NLDEV_CMD_PORT_GET);
 	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
@@ -927,10 +957,10 @@ struct mlx5_nl_ifindex_data {
 	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
 	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
 	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -959,7 +989,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_portnum(int nl, const char *name)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.flags = 0,
 		.name = name,
@@ -972,12 +1001,13 @@ struct mlx5_nl_ifindex_data {
 					       RDMA_NLDEV_CMD_GET),
 		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req, seq);
+	ret = mlx5_nl_send(nl, &req, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -992,6 +1022,55 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
@@ -1072,7 +1151,6 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_switch_info(int nl, unsigned int ifindex,
 		    struct mlx5_switch_info *info)
 {
-	uint32_t seq = random();
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
@@ -1096,11 +1174,12 @@ struct mlx5_nl_ifindex_data {
 		},
 		.extmask = RTE_LE32(1),
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, seq, mlx5_nl_switch_info_cb, info);
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
 	if (info->master && info->representor) {
 		DRV_LOG(ERR, "ifindex %u device is recognized as master"
 			     " and as representor", ifindex);
@@ -1122,6 +1201,7 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 	struct {
 		struct nlmsghdr nh;
@@ -1139,18 +1219,12 @@ struct mlx5_nl_ifindex_data {
 	};
 
 	if (ifindex) {
-		++vmwa->nl_sn;
-		if (!vmwa->nl_sn)
-			++vmwa->nl_sn;
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, vmwa->nl_sn);
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
 		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket,
-					   vmwa->nl_sn,
-					   NULL, NULL);
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting"
-					 " VLAN WA ifindex %u, %d",
-					 ifindex, ret);
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
 	}
 }
 
@@ -1202,8 +1276,7 @@ struct mlx5_nl_ifindex_data {
  */
 uint32_t
 mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex,
-		      uint16_t tag)
+			 uint32_t ifindex, uint16_t tag)
 {
 	struct nlmsghdr *nlh;
 	struct ifinfomsg *ifm;
@@ -1220,12 +1293,10 @@ struct mlx5_nl_ifindex_data {
 		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
 	struct nlattr *na_info;
 	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	memset(buf, 0, sizeof(buf));
-	++vmwa->nl_sn;
-	if (!vmwa->nl_sn)
-		++vmwa->nl_sn;
 	nlh = (struct nlmsghdr *)buf;
 	nlh->nlmsg_len = sizeof(struct nlmsghdr);
 	nlh->nlmsg_type = RTM_NEWLINK;
@@ -1249,20 +1320,18 @@ struct mlx5_nl_ifindex_data {
 	nl_attr_nest_end(nlh, na_vlan);
 	nl_attr_nest_end(nlh, na_info);
 	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, vmwa->nl_sn);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, vmwa->nl_sn, NULL, NULL);
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 	if (ret < 0) {
-		DRV_LOG(WARNING,
-			"netlink: VLAN %s create failure (%d)",
-			name, ret);
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
 	}
 	// Try to get ifindex of created or pre-existing device.
 	ret = if_nametoindex(name);
 	if (!ret) {
-		DRV_LOG(WARNING,
-			"VLAN %s failed to get index (%d)",
-			name, errno);
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
 		return 0;
 	}
 	return ret;
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
index 7903673..9be87c0 100644
--- a/drivers/net/mlx5/mlx5_nl.h
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -39,30 +39,33 @@ struct mlx5_nl_vlan_dev {
  */
 struct mlx5_nl_vlan_vmwa_context {
 	int nl_socket;
-	uint32_t nl_sn;
 	uint32_t vf_ifindex;
 	struct mlx5_nl_vlan_dev vlan_dev[4096];
 };
 
 
 int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
 			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
 unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			       struct rte_ether_addr *mac, int vf_index);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
 void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			   uint32_t ifindex);
+			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
 
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 760cc2f..84c8b05 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -47,7 +47,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 1);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      1);
 		if (ret)
 			return ret;
 	}
@@ -80,7 +81,8 @@
 
 	dev->data->promiscuous = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 0);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      0);
 		if (ret)
 			return ret;
 	}
@@ -120,7 +122,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 1);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       1);
 		if (ret)
 			goto error;
 	}
@@ -153,7 +156,8 @@
 
 	dev->data->all_multicast = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 0);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       0);
 		if (ret)
 			goto error;
 	}
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fb52d8f..fc1a91c 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -304,7 +304,6 @@ struct mlx5_nl_vlan_vmwa_context *
 		rte_free(vmwa);
 		return NULL;
 	}
-	vmwa->nl_sn = random();
 	vmwa->vf_ifindex = ifindex;
 	/* Cleanup for existing VLAN devices. */
 	return vmwa;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 24/25] common/mlx5: share Netlink commands
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (22 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move Netlink mechanism and its dependencies from net/mlx5 to
common/mlx5 in order to be ready to use by other mlx5 drivers.
The dependencies are BITFIELD defines, the ppc64 compilation workaround
for bool type and the function mlx5_translate_port_name.
Update build mechanism accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |    3 +-
 drivers/common/mlx5/meson.build                 |    1 +
 drivers/common/mlx5/mlx5_common.c               |   55 +
 drivers/common/mlx5/mlx5_common.h               |   58 +
 drivers/common/mlx5/mlx5_nl.c                   | 1337 ++++++++++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   57 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   18 +-
 drivers/net/mlx5/Makefile                       |    1 -
 drivers/net/mlx5/meson.build                    |    1 -
 drivers/net/mlx5/mlx5.h                         |    2 +-
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_ethdev.c                  |   55 -
 drivers/net/mlx5/mlx5_nl.c                      | 1338 -----------------------
 drivers/net/mlx5/mlx5_nl.h                      |   72 --
 drivers/net/mlx5/mlx5_vlan.c                    |    2 +-
 15 files changed, 1529 insertions(+), 1479 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b9e9803..6a14b7d 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -15,6 +15,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
@@ -41,7 +42,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs -lrte_net
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index b88822e..34cb7b9 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -42,6 +42,7 @@ if build
 	sources = files(
 		'mlx5_devx_cmds.c',
 		'mlx5_common.c',
+		'mlx5_nl.c',
 	)
 	if not pmd_dlopen
 		sources += files('mlx5_glue.c')
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index f756b6b..7ca6136 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -107,6 +107,61 @@
 	return ret;
 }
 
+/**
+ * Extract port name, as a number, from sysfs or netlink information.
+ *
+ * @param[in] port_name_in
+ *   String representing the port name.
+ * @param[out] port_info_out
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   port_name field set according to recognized name format.
+ */
+void
+mlx5_translate_port_name(const char *port_name_in,
+			 struct mlx5_switch_info *port_info_out)
+{
+	char pf_c1, pf_c2, vf_c1, vf_c2;
+	char *end;
+	int sc_items;
+
+	/*
+	 * Check for port-name as a string of the form pf0vf0
+	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
+			  &pf_c1, &pf_c2, &port_info_out->pf_num,
+			  &vf_c1, &vf_c2, &port_info_out->port_name);
+	if (sc_items == 6 &&
+	    pf_c1 == 'p' && pf_c2 == 'f' &&
+	    vf_c1 == 'v' && vf_c2 == 'f') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
+		return;
+	}
+	/*
+	 * Check for port-name as a string of the form p0
+	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%d",
+			  &pf_c1, &port_info_out->port_name);
+	if (sc_items == 2 && pf_c1 == 'p') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
+		return;
+	}
+	/* Check for port-name as a number (support kernel ver < 5.0 */
+	errno = 0;
+	port_info_out->port_name = strtol(port_name_in, &end, 0);
+	if (!errno &&
+	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
+		return;
+	}
+	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
+	return;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index aeaa7b9..ac65105 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -18,6 +18,35 @@
 
 
 /*
+ * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
+ * Otherwise there would be a type conflict between stdbool and altivec.
+ */
+#if defined(__PPC64__) && !defined(__APPLE_ALTIVEC__)
+#undef bool
+/* redefine as in stdbool.h */
+#define bool _Bool
+#endif
+
+/* Bit-field manipulation. */
+#define BITFIELD_DECLARE(bf, type, size) \
+	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \
+		 !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
+#define BITFIELD_DEFINE(bf, type, size) \
+	BITFIELD_DECLARE((bf), type, (size)) = { 0 }
+#define BITFIELD_SET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |= \
+		((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_RESET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &= \
+		~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_ISSET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] & \
+	     ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
+
+/*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
  * manner.
  */
@@ -112,6 +141,33 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* Maximum number of simultaneous unicast MAC addresses. */
+#define MLX5_MAX_UC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous Multicast MAC addresses. */
+#define MLX5_MAX_MC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous MAC addresses. */
+#define MLX5_MAX_MAC_ADDRESSES \
+	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
 /* CQE status. */
 enum mlx5_cqe_status {
 	MLX5_CQE_STATUS_SW_OWN = -1,
@@ -152,5 +208,7 @@ enum mlx5_cqe_status {
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 int mlx5_vdpa_mode_selected(struct rte_devargs *devargs);
+void mlx5_translate_port_name(const char *port_name_in,
+			      struct mlx5_switch_info *port_info_out);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
new file mode 100644
index 0000000..b4fc053
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -0,0 +1,1337 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <linux/if_link.h>
+#include <linux/rtnetlink.h>
+#include <net/if.h>
+#include <rdma/rdma_netlink.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdalign.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <stdbool.h>
+
+#include <rte_errno.h>
+#include <rte_atomic.h>
+
+#include "mlx5_nl.h"
+#include "mlx5_common_utils.h"
+
+/* Size of the buffer to receive kernel messages */
+#define MLX5_NL_BUF_SIZE (32 * 1024)
+/* Send buffer size for the Netlink socket */
+#define MLX5_SEND_BUF_SIZE 32768
+/* Receive buffer size for the Netlink socket */
+#define MLX5_RECV_BUF_SIZE 32768
+
+/** Parameters of VLAN devices created by driver. */
+#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
+/*
+ * Define NDA_RTA as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef MLX5_NDA_RTA
+#define MLX5_NDA_RTA(r) \
+	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
+#endif
+/*
+ * Define NLMSG_TAIL as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef NLMSG_TAIL
+#define NLMSG_TAIL(nmsg) \
+	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
+#endif
+/*
+ * The following definitions are normally found in rdma/rdma_netlink.h,
+ * however they are so recent that most systems do not expose them yet.
+ */
+#ifndef HAVE_RDMA_NL_NLDEV
+#define RDMA_NL_NLDEV 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_GET
+#define RDMA_NLDEV_CMD_GET 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
+#define RDMA_NLDEV_CMD_PORT_GET 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
+#define RDMA_NLDEV_ATTR_DEV_INDEX 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
+#define RDMA_NLDEV_ATTR_DEV_NAME 2
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
+#define RDMA_NLDEV_ATTR_PORT_INDEX 3
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
+#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
+#endif
+
+/* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
+#ifndef HAVE_IFLA_PHYS_SWITCH_ID
+#define IFLA_PHYS_SWITCH_ID 36
+#endif
+#ifndef HAVE_IFLA_PHYS_PORT_NAME
+#define IFLA_PHYS_PORT_NAME 38
+#endif
+
+/* Add/remove MAC address through Netlink */
+struct mlx5_nl_mac_addr {
+	struct rte_ether_addr (*mac)[];
+	/**< MAC address handled by the device. */
+	int mac_n; /**< Number of addresses in the array. */
+};
+
+#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
+#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
+#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
+#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
+
+/** Data structure used by mlx5_nl_cmdget_cb(). */
+struct mlx5_nl_ifindex_data {
+	const char *name; /**< IB device name (in). */
+	uint32_t flags; /**< found attribute flags (out). */
+	uint32_t ibindex; /**< IB device index (out). */
+	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number (out). */
+};
+
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
+/**
+ * Opens a Netlink socket.
+ *
+ * @param protocol
+ *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
+ *
+ * @return
+ *   A file descriptor on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+int
+mlx5_nl_init(int protocol)
+{
+	int fd;
+	int sndbuf_size = MLX5_SEND_BUF_SIZE;
+	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
+	struct sockaddr_nl local = {
+		.nl_family = AF_NETLINK,
+	};
+	int ret;
+
+	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
+	if (fd == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	return fd;
+error:
+	close(fd);
+	return -rte_errno;
+}
+
+/**
+ * Send a request message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] ssn
+ *   Sequence number.
+ * @param[in] req
+ *   Pointer to the request structure.
+ * @param[in] len
+ *   Length of the request in bytes.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
+		int len)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov[2] = {
+		{ .iov_base = nh, .iov_len = sizeof(*nh), },
+		{ .iov_base = req, .iov_len = len, },
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = iov,
+		.msg_iovlen = 2,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Send a message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] sn
+ *   Sequence number.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov = {
+		.iov_base = nh,
+		.iov_len = nh->nlmsg_len,
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Receive a message from the kernel on the Netlink socket, following
+ * mlx5_nl_send().
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] sn
+ *   Sequence number.
+ * @param[in] cb
+ *   The callback function to call for each Netlink message received.
+ * @param[in, out] arg
+ *   Custom arguments for the callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
+	     void *arg)
+{
+	struct sockaddr_nl sa;
+	char buf[MLX5_RECV_BUF_SIZE];
+	struct iovec iov = {
+		.iov_base = buf,
+		.iov_len = sizeof(buf),
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		/* One message at a time */
+		.msg_iovlen = 1,
+	};
+	int multipart = 0;
+	int ret = 0;
+
+	do {
+		struct nlmsghdr *nh;
+		int recv_bytes = 0;
+
+		do {
+			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
+			if (recv_bytes == -1) {
+				rte_errno = errno;
+				return -rte_errno;
+			}
+			nh = (struct nlmsghdr *)buf;
+		} while (nh->nlmsg_seq != sn);
+		for (;
+		     NLMSG_OK(nh, (unsigned int)recv_bytes);
+		     nh = NLMSG_NEXT(nh, recv_bytes)) {
+			if (nh->nlmsg_type == NLMSG_ERROR) {
+				struct nlmsgerr *err_data = NLMSG_DATA(nh);
+
+				if (err_data->error < 0) {
+					rte_errno = -err_data->error;
+					return -rte_errno;
+				}
+				/* Ack message. */
+				return 0;
+			}
+			/* Multi-part msgs and their trailing DONE message. */
+			if (nh->nlmsg_flags & NLM_F_MULTI) {
+				if (nh->nlmsg_type == NLMSG_DONE)
+					return 0;
+				multipart = 1;
+			}
+			if (cb) {
+				ret = cb(nh, arg);
+				if (ret < 0)
+					return ret;
+			}
+		}
+	} while (multipart);
+	return ret;
+}
+
+/**
+ * Parse Netlink message to retrieve the bridge MAC address.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_mac_addr *data = arg;
+	struct ndmsg *r = NLMSG_DATA(nh);
+	struct rtattr *attribute;
+	int len;
+
+	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
+	for (attribute = MLX5_NDA_RTA(r);
+	     RTA_OK(attribute, len);
+	     attribute = RTA_NEXT(attribute, len)) {
+		if (attribute->rta_type == NDA_LLADDR) {
+			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
+				DRV_LOG(WARNING,
+					"not enough room to finalize the"
+					" request");
+				rte_errno = ENOMEM;
+				return -rte_errno;
+			}
+#ifndef NDEBUG
+			char m[18];
+
+			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
+			DRV_LOG(DEBUG, "bridge MAC address %s", m);
+#endif
+			memcpy(&(*data->mac)[data->mac_n++],
+			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
+		}
+	}
+	return 0;
+}
+
+/**
+ * Get bridge MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac[out]
+ *   Pointer to the array table of MAC addresses to fill.
+ *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
+ * @param mac_n[out]
+ *   Number of entries filled in MAC array.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
+{
+	struct {
+		struct nlmsghdr	hdr;
+		struct ifinfomsg ifm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_GETNEIGH,
+			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		},
+		.ifm = {
+			.ifi_family = PF_BRIDGE,
+			.ifi_index = iface_idx,
+		},
+	};
+	struct mlx5_nl_mac_addr data = {
+		.mac = mac,
+		.mac_n = 0,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
+			      sizeof(struct ifinfomsg));
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
+	if (ret < 0)
+		goto error;
+	*mac_n = data.mac_n;
+	return 0;
+error:
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *   MAC address to consider.
+ * @param add
+ *   1 to add the MAC address, 0 to remove the MAC address.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ndmsg ndm;
+		struct rtattr rta;
+		uint8_t buffer[RTE_ETHER_ADDR_LEN];
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+				NLM_F_EXCL | NLM_F_ACK,
+			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
+		},
+		.ndm = {
+			.ndm_family = PF_BRIDGE,
+			.ndm_state = NUD_NOARP | NUD_PERMANENT,
+			.ndm_ifindex = iface_idx,
+			.ndm_flags = NTF_SELF,
+		},
+		.rta = {
+			.rta_type = NDA_LLADDR,
+			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.rta.rta_len);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
+		add ? "add" : "remove",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the VF MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *    MAC address to consider.
+ * @param vf_index
+ *    VF index.
+ *
+ * @return
+ *    0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac, int vf_index)
+{
+	int ret;
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifm;
+		struct rtattr vf_list_rta;
+		struct rtattr vf_info_rta;
+		struct rtattr vf_mac_rta;
+		struct ifla_vf_mac ivm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+			.nlmsg_type = RTM_BASE,
+		},
+		.ifm = {
+			.ifi_index = iface_idx,
+		},
+		.vf_list_rta = {
+			.rta_type = IFLA_VFINFO_LIST,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_info_rta = {
+			.rta_type = IFLA_VF_INFO,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_mac_rta = {
+			.rta_type = IFLA_VF_MAC,
+		},
+	};
+	struct ifla_vf_mac ivm = {
+		.vf = vf_index,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+
+	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
+	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
+
+	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.vf_list_rta.rta_len) +
+		RTA_ALIGN(req.vf_info_rta.rta_len) +
+		RTA_ALIGN(req.vf_mac_rta.rta_len);
+	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_list_rta);
+	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_info_rta);
+
+	if (nlsk_fd < 0)
+		return -1;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(ERR,
+		"representor %u cannot set VF MAC address "
+		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
+		vf_index,
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Add a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
+		     uint32_t index)
+{
+	int ret;
+
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
+	if (!ret)
+		BITFIELD_SET(mac_own, index);
+	if (ret == -EEXIST)
+		return 0;
+	return ret;
+}
+
+/**
+ * Remove a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to remove.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
+{
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
+}
+
+/**
+ * Synchronize Netlink bridge table to the internal table.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
+ */
+void
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
+{
+	struct rte_ether_addr macs[n];
+	int macs_n = 0;
+	int i;
+	int ret;
+
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
+	if (ret)
+		return;
+	for (i = 0; i != macs_n; ++i) {
+		int j;
+
+		/* Verify the address is not in the array yet. */
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
+				break;
+		if (j != n)
+			continue;
+		/* Find the first entry available. */
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
+				break;
+			}
+		}
+	}
+}
+
+/**
+ * Flush all added MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ */
+void
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
+{
+	int i;
+
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
+
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
+	}
+}
+
+/**
+ * Enable promiscuous / all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param flags
+ *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifi;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_NEWLINK,
+			.nlmsg_flags = NLM_F_REQUEST,
+		},
+		.ifi = {
+			.ifi_flags = enable ? flags : 0,
+			.ifi_change = flags,
+			.ifi_index = iface_idx,
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
+	if (nlsk_fd < 0)
+		return 0;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+/**
+ * Enable promiscuous mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Enable all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Process network interface information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_ifindex_data *data = arg;
+	struct mlx5_nl_ifindex_data local = {
+		.flags = 0,
+	};
+	size_t off = NLMSG_HDRLEN;
+
+	if (nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
+	    nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct nlattr *na = (void *)((uintptr_t)nh + off);
+		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
+
+		if (na->nla_len > nh->nlmsg_len - off)
+			goto error;
+		switch (na->nla_type) {
+		case RDMA_NLDEV_ATTR_DEV_INDEX:
+			local.ibindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_DEV_NAME:
+			if (!strcmp(payload, data->name))
+				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
+			break;
+		case RDMA_NLDEV_ATTR_NDEV_INDEX:
+			local.ifindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			local.portnum = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
+			break;
+		default:
+			break;
+		}
+		off += NLA_ALIGN(na->nla_len);
+	}
+	/*
+	 * It is possible to have multiple messages for all
+	 * Infiniband devices in the system with appropriate name.
+	 * So we should gather parameters locally and copy to
+	 * query context only in case of coinciding device name.
+	 */
+	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
+		data->flags = local.flags;
+		data->ibindex = local.ibindex;
+		data->ifindex = local.ifindex;
+		data->portnum = local.portnum;
+	}
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get index of network interface associated with some IB device.
+ *
+ * This is the only somewhat safe method to avoid resorting to heuristics
+ * when faced with port representors. Unfortunately it requires at least
+ * Linux 4.17.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ * @param[in] pindex
+ *   IB device port index, starting from 1
+ * @return
+ *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
+ *   is set.
+ */
+unsigned int
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.flags = 0,
+		.ibindex = 0, /* Determined during first pass. */
+		.ifindex = 0, /* Determined during second pass. */
+	};
+	union {
+		struct nlmsghdr nh;
+		uint8_t buf[NLMSG_HDRLEN +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(0),
+			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+						       RDMA_NLDEV_CMD_GET),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+		},
+	};
+	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
+		goto error;
+	data.flags = 0;
+	sn = MLX5_NL_SN_GENERATE;
+	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					     RDMA_NLDEV_CMD_PORT_GET);
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
+	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
+	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
+	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &data.ibindex, sizeof(data.ibindex));
+	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
+	na->nla_len = NLA_HDRLEN + sizeof(pindex);
+	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &pindex, sizeof(pindex));
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
+	    !data.ifindex)
+		goto error;
+	return data.ifindex;
+error:
+	rte_errno = ENODEV;
+	return 0;
+}
+
+/**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.flags = 0,
+		.name = name,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
+ * Process switch information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_switch_info info = {
+		.master = 0,
+		.representor = 0,
+		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
+		.port_name = 0,
+		.switch_id = 0,
+	};
+	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+	bool switch_id_set = false;
+	bool num_vf_set = false;
+
+	if (nh->nlmsg_type != RTM_NEWLINK)
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct rtattr *ra = (void *)((uintptr_t)nh + off);
+		void *payload = RTA_DATA(ra);
+		unsigned int i;
+
+		if (ra->rta_len > nh->nlmsg_len - off)
+			goto error;
+		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
+		case IFLA_PHYS_PORT_NAME:
+			mlx5_translate_port_name((char *)payload, &info);
+			break;
+		case IFLA_PHYS_SWITCH_ID:
+			info.switch_id = 0;
+			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
+				info.switch_id <<= 8;
+				info.switch_id |= ((uint8_t *)payload)[i];
+			}
+			switch_id_set = true;
+			break;
+		}
+		off += RTA_ALIGN(ra->rta_len);
+	}
+	if (switch_id_set) {
+		/* We have some E-Switch configuration. */
+		mlx5_nl_check_switch_info(num_vf_set, &info);
+	}
+	assert(!(info.master && info.representor));
+	memcpy(arg, &info, sizeof(info));
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get switch information associated with network interface.
+ *
+ * @param nl
+ *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
+ * @param ifindex
+ *   Network interface index.
+ * @param[out] info
+ *   Switch information object, populated in case of success.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
+{
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
+			.nlmsg_type = RTM_GETLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
+	if (info->master && info->representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		ret = -rte_errno;
+	}
+	return ret;
+}
+
+/*
+ * Delete VLAN network device by ifindex.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Interface index of network device to delete.
+ */
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+		      uint32_t ifindex)
+{
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_DELLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+	};
+
+	if (ifindex) {
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
+		if (ret >= 0)
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+		if (ret < 0)
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
+	}
+}
+
+/* Set of subroutines to build Netlink message. */
+static struct nlattr *
+nl_msg_tail(struct nlmsghdr *nlh)
+{
+	return (struct nlattr *)
+		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
+}
+
+static void
+nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
+{
+	struct nlattr *nla = nl_msg_tail(nlh);
+
+	nla->nla_type = type;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
+	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+
+	if (alen)
+		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
+}
+
+static struct nlattr *
+nl_attr_nest_start(struct nlmsghdr *nlh, int type)
+{
+	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
+
+	nl_attr_put(nlh, type, NULL, 0);
+	return nest;
+}
+
+static void
+nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
+{
+	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
+}
+
+/*
+ * Create network VLAN device with specified VLAN tag.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Base network interface index.
+ * @param[in] tag
+ *   VLAN tag for VLAN network device to create.
+ */
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			 uint32_t ifindex, uint16_t tag)
+{
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
+
+	alignas(RTE_CACHE_LINE_SIZE)
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(name)) +
+		    NLMSG_ALIGN(sizeof("vlan")) +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
+	struct nlattr *na_info;
+	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = RTM_NEWLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+			   NLM_F_EXCL | NLM_F_ACK;
+	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct ifinfomsg);
+	ifm->ifi_family = AF_UNSPEC;
+	ifm->ifi_type = 0;
+	ifm->ifi_index = 0;
+	ifm->ifi_flags = IFF_UP;
+	ifm->ifi_change = 0xffffffff;
+	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
+	ret = snprintf(name, sizeof(name), "%s.%u.%u",
+		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
+	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
+	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
+	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
+	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
+	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
+	nl_attr_nest_end(nlh, na_vlan);
+	nl_attr_nest_end(nlh, na_info);
+	assert(sizeof(buf) >= nlh->nlmsg_len);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
+	}
+	// Try to get ifindex of created or pre-existing device.
+	ret = if_nametoindex(name);
+	if (!ret) {
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
+		return 0;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..8e66a98
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+#include <rte_ether.h>
+
+#include "mlx5_common.h"
+
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			      uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index d32d631..34b66a5 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -26,5 +26,21 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-	mlx5_vdpa_mode_selected;
+
+	mlx5_nl_allmulti;
+	mlx5_nl_ifindex;
+	mlx5_nl_init;
+	mlx5_nl_mac_addr_add;
+	mlx5_nl_mac_addr_flush;
+	mlx5_nl_mac_addr_remove;
+	mlx5_nl_mac_addr_sync;
+	mlx5_nl_portnum;
+	mlx5_nl_promisc;
+	mlx5_nl_switch_info;
+	mlx5_nl_vf_mac_addr_modify;
+	mlx5_nl_vlan_vmwa_create;
+	mlx5_nl_vlan_vmwa_delete;
+
+	mlx5_translate_port_name;
+        mlx5_vdpa_mode_selected;
 };
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index dc6b3c8..d26afbb 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -30,7 +30,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_meter.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index e10ef3a..d45be00 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -19,7 +19,6 @@ sources = files(
 	'mlx5_flow_verbs.c',
 	'mlx5_mac.c',
 	'mlx5_mr.c',
-	'mlx5_nl.c',
 	'mlx5_rss.c',
 	'mlx5_rxmode.c',
 	'mlx5_rxq.c',
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9864aa7..a7e7089 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -35,11 +35,11 @@
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
-#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index dc9b965..9b392ed 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -14,14 +14,6 @@
 /* Reported driver name. */
 #define MLX5_DRIVER_NAME "net_mlx5"
 
-/* Maximum number of simultaneous unicast MAC addresses. */
-#define MLX5_MAX_UC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous Multicast MAC addresses. */
-#define MLX5_MAX_MC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous MAC addresses. */
-#define MLX5_MAX_MAC_ADDRESSES \
-	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
-
 /* Maximum number of simultaneous VLAN filters. */
 #define MLX5_MAX_VLAN_IDS 128
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5484104..b765636 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1940,61 +1940,6 @@ struct mlx5_priv *
 }
 
 /**
- * Extract port name, as a number, from sysfs or netlink information.
- *
- * @param[in] port_name_in
- *   String representing the port name.
- * @param[out] port_info_out
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   port_name field set according to recognized name format.
- */
-void
-mlx5_translate_port_name(const char *port_name_in,
-			 struct mlx5_switch_info *port_info_out)
-{
-	char pf_c1, pf_c2, vf_c1, vf_c2;
-	char *end;
-	int sc_items;
-
-	/*
-	 * Check for port-name as a string of the form pf0vf0
-	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
-			  &pf_c1, &pf_c2, &port_info_out->pf_num,
-			  &vf_c1, &vf_c2, &port_info_out->port_name);
-	if (sc_items == 6 &&
-	    pf_c1 == 'p' && pf_c2 == 'f' &&
-	    vf_c1 == 'v' && vf_c2 == 'f') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
-		return;
-	}
-	/*
-	 * Check for port-name as a string of the form p0
-	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%d",
-			  &pf_c1, &port_info_out->port_name);
-	if (sc_items == 2 && pf_c1 == 'p') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
-		return;
-	}
-	/* Check for port-name as a number (support kernel ver < 5.0 */
-	errno = 0;
-	port_info_out->port_name = strtol(port_name_in, &end, 0);
-	if (!errno &&
-	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
-		return;
-	}
-	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
-	return;
-}
-
-/**
  * DPDK callback to retrieve plug-in module EEPROM information (type and size).
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
deleted file mode 100644
index 6b8ca00..0000000
--- a/drivers/net/mlx5/mlx5_nl.c
+++ /dev/null
@@ -1,1338 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <linux/if_link.h>
-#include <linux/rtnetlink.h>
-#include <net/if.h>
-#include <rdma/rdma_netlink.h>
-#include <stdbool.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <stdalign.h>
-#include <string.h>
-#include <sys/socket.h>
-#include <unistd.h>
-
-#include <rte_errno.h>
-#include <rte_atomic.h>
-#include <rte_ether.h>
-
-#include "mlx5.h"
-#include "mlx5_nl.h"
-#include "mlx5_utils.h"
-
-/* Size of the buffer to receive kernel messages */
-#define MLX5_NL_BUF_SIZE (32 * 1024)
-/* Send buffer size for the Netlink socket */
-#define MLX5_SEND_BUF_SIZE 32768
-/* Receive buffer size for the Netlink socket */
-#define MLX5_RECV_BUF_SIZE 32768
-
-/** Parameters of VLAN devices created by driver. */
-#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
-/*
- * Define NDA_RTA as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef MLX5_NDA_RTA
-#define MLX5_NDA_RTA(r) \
-	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
-#endif
-/*
- * Define NLMSG_TAIL as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef NLMSG_TAIL
-#define NLMSG_TAIL(nmsg) \
-	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
-#endif
-/*
- * The following definitions are normally found in rdma/rdma_netlink.h,
- * however they are so recent that most systems do not expose them yet.
- */
-#ifndef HAVE_RDMA_NL_NLDEV
-#define RDMA_NL_NLDEV 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_GET
-#define RDMA_NLDEV_CMD_GET 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
-#define RDMA_NLDEV_CMD_PORT_GET 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
-#define RDMA_NLDEV_ATTR_DEV_INDEX 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
-#define RDMA_NLDEV_ATTR_DEV_NAME 2
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
-#define RDMA_NLDEV_ATTR_PORT_INDEX 3
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
-#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
-#endif
-
-/* These are normally found in linux/if_link.h. */
-#ifndef HAVE_IFLA_NUM_VF
-#define IFLA_NUM_VF 21
-#endif
-#ifndef HAVE_IFLA_EXT_MASK
-#define IFLA_EXT_MASK 29
-#endif
-#ifndef HAVE_IFLA_PHYS_SWITCH_ID
-#define IFLA_PHYS_SWITCH_ID 36
-#endif
-#ifndef HAVE_IFLA_PHYS_PORT_NAME
-#define IFLA_PHYS_PORT_NAME 38
-#endif
-
-/* Add/remove MAC address through Netlink */
-struct mlx5_nl_mac_addr {
-	struct rte_ether_addr (*mac)[];
-	/**< MAC address handled by the device. */
-	int mac_n; /**< Number of addresses in the array. */
-};
-
-#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
-#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
-#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
-#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
-
-/** Data structure used by mlx5_nl_cmdget_cb(). */
-struct mlx5_nl_ifindex_data {
-	const char *name; /**< IB device name (in). */
-	uint32_t flags; /**< found attribute flags (out). */
-	uint32_t ibindex; /**< IB device index (out). */
-	uint32_t ifindex; /**< Network interface index (out). */
-	uint32_t portnum; /**< IB device max port number (out). */
-};
-
-rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
-
-/* Generate Netlink sequence number. */
-#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
-
-/**
- * Opens a Netlink socket.
- *
- * @param protocol
- *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
- *
- * @return
- *   A file descriptor on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-int
-mlx5_nl_init(int protocol)
-{
-	int fd;
-	int sndbuf_size = MLX5_SEND_BUF_SIZE;
-	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
-	struct sockaddr_nl local = {
-		.nl_family = AF_NETLINK,
-	};
-	int ret;
-
-	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
-	if (fd == -1) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	return fd;
-error:
-	close(fd);
-	return -rte_errno;
-}
-
-/**
- * Send a request message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] ssn
- *   Sequence number.
- * @param[in] req
- *   Pointer to the request structure.
- * @param[in] len
- *   Length of the request in bytes.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
-		int len)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov[2] = {
-		{ .iov_base = nh, .iov_len = sizeof(*nh), },
-		{ .iov_base = req, .iov_len = len, },
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = iov,
-		.msg_iovlen = 2,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Send a message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] sn
- *   Sequence number.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov = {
-		.iov_base = nh,
-		.iov_len = nh->nlmsg_len,
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		.msg_iovlen = 1,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Receive a message from the kernel on the Netlink socket, following
- * mlx5_nl_send().
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] sn
- *   Sequence number.
- * @param[in] cb
- *   The callback function to call for each Netlink message received.
- * @param[in, out] arg
- *   Custom arguments for the callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
-	     void *arg)
-{
-	struct sockaddr_nl sa;
-	char buf[MLX5_RECV_BUF_SIZE];
-	struct iovec iov = {
-		.iov_base = buf,
-		.iov_len = sizeof(buf),
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		/* One message at a time */
-		.msg_iovlen = 1,
-	};
-	int multipart = 0;
-	int ret = 0;
-
-	do {
-		struct nlmsghdr *nh;
-		int recv_bytes = 0;
-
-		do {
-			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
-			if (recv_bytes == -1) {
-				rte_errno = errno;
-				return -rte_errno;
-			}
-			nh = (struct nlmsghdr *)buf;
-		} while (nh->nlmsg_seq != sn);
-		for (;
-		     NLMSG_OK(nh, (unsigned int)recv_bytes);
-		     nh = NLMSG_NEXT(nh, recv_bytes)) {
-			if (nh->nlmsg_type == NLMSG_ERROR) {
-				struct nlmsgerr *err_data = NLMSG_DATA(nh);
-
-				if (err_data->error < 0) {
-					rte_errno = -err_data->error;
-					return -rte_errno;
-				}
-				/* Ack message. */
-				return 0;
-			}
-			/* Multi-part msgs and their trailing DONE message. */
-			if (nh->nlmsg_flags & NLM_F_MULTI) {
-				if (nh->nlmsg_type == NLMSG_DONE)
-					return 0;
-				multipart = 1;
-			}
-			if (cb) {
-				ret = cb(nh, arg);
-				if (ret < 0)
-					return ret;
-			}
-		}
-	} while (multipart);
-	return ret;
-}
-
-/**
- * Parse Netlink message to retrieve the bridge MAC address.
- *
- * @param nh
- *   Pointer to Netlink Message Header.
- * @param arg
- *   PMD data register with this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_mac_addr *data = arg;
-	struct ndmsg *r = NLMSG_DATA(nh);
-	struct rtattr *attribute;
-	int len;
-
-	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
-	for (attribute = MLX5_NDA_RTA(r);
-	     RTA_OK(attribute, len);
-	     attribute = RTA_NEXT(attribute, len)) {
-		if (attribute->rta_type == NDA_LLADDR) {
-			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
-				DRV_LOG(WARNING,
-					"not enough room to finalize the"
-					" request");
-				rte_errno = ENOMEM;
-				return -rte_errno;
-			}
-#ifndef NDEBUG
-			char m[18];
-
-			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
-			DRV_LOG(DEBUG, "bridge MAC address %s", m);
-#endif
-			memcpy(&(*data->mac)[data->mac_n++],
-			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
-		}
-	}
-	return 0;
-}
-
-/**
- * Get bridge MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac[out]
- *   Pointer to the array table of MAC addresses to fill.
- *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
- * @param mac_n[out]
- *   Number of entries filled in MAC array.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr (*mac)[], int *mac_n)
-{
-	struct {
-		struct nlmsghdr	hdr;
-		struct ifinfomsg ifm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_GETNEIGH,
-			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
-		},
-		.ifm = {
-			.ifi_family = PF_BRIDGE,
-			.ifi_index = iface_idx,
-		},
-	};
-	struct mlx5_nl_mac_addr data = {
-		.mac = mac,
-		.mac_n = 0,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
-			      sizeof(struct ifinfomsg));
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
-	if (ret < 0)
-		goto error;
-	*mac_n = data.mac_n;
-	return 0;
-error:
-	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
-		iface_idx, strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *   MAC address to consider.
- * @param add
- *   1 to add the MAC address, 0 to remove the MAC address.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			struct rte_ether_addr *mac, int add)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ndmsg ndm;
-		struct rtattr rta;
-		uint8_t buffer[RTE_ETHER_ADDR_LEN];
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-				NLM_F_EXCL | NLM_F_ACK,
-			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
-		},
-		.ndm = {
-			.ndm_family = PF_BRIDGE,
-			.ndm_state = NUD_NOARP | NUD_PERMANENT,
-			.ndm_ifindex = iface_idx,
-			.ndm_flags = NTF_SELF,
-		},
-		.rta = {
-			.rta_type = NDA_LLADDR,
-			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(DEBUG,
-		"Interface %u cannot %s MAC address"
-		" %02X:%02X:%02X:%02X:%02X:%02X %s",
-		iface_idx,
-		add ? "add" : "remove",
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the VF MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *    MAC address to consider.
- * @param vf_index
- *    VF index.
- *
- * @return
- *    0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac, int vf_index)
-{
-	int ret;
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifm;
-		struct rtattr vf_list_rta;
-		struct rtattr vf_info_rta;
-		struct rtattr vf_mac_rta;
-		struct ifla_vf_mac ivm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-			.nlmsg_type = RTM_BASE,
-		},
-		.ifm = {
-			.ifi_index = iface_idx,
-		},
-		.vf_list_rta = {
-			.rta_type = IFLA_VFINFO_LIST,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_info_rta = {
-			.rta_type = IFLA_VF_INFO,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_mac_rta = {
-			.rta_type = IFLA_VF_MAC,
-		},
-	};
-	struct ifla_vf_mac ivm = {
-		.vf = vf_index,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-
-	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
-	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
-
-	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.vf_list_rta.rta_len) +
-		RTA_ALIGN(req.vf_info_rta.rta_len) +
-		RTA_ALIGN(req.vf_mac_rta.rta_len);
-	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_list_rta);
-	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_info_rta);
-
-	if (nlsk_fd < 0)
-		return -1;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(ERR,
-		"representor %u cannot set VF MAC address "
-		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
-		vf_index,
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Add a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to register.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
-		     uint64_t *mac_own, struct rte_ether_addr *mac,
-		     uint32_t index)
-{
-	int ret;
-
-	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
-	if (!ret)
-		BITFIELD_SET(mac_own, index);
-	if (ret == -EEXIST)
-		return 0;
-	return ret;
-}
-
-/**
- * Remove a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to remove.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			struct rte_ether_addr *mac, uint32_t index)
-{
-	BITFIELD_RESET(mac_own, index);
-	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
-}
-
-/**
- * Synchronize Netlink bridge table to the internal table.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_addrs
- *   Mac addresses array to sync.
- * @param n
- *   @p mac_addrs array size.
- */
-void
-mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr *mac_addrs, int n)
-{
-	struct rte_ether_addr macs[n];
-	int macs_n = 0;
-	int i;
-	int ret;
-
-	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
-	if (ret)
-		return;
-	for (i = 0; i != macs_n; ++i) {
-		int j;
-
-		/* Verify the address is not in the array yet. */
-		for (j = 0; j != n; ++j)
-			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
-				break;
-		if (j != n)
-			continue;
-		/* Find the first entry available. */
-		for (j = 0; j != n; ++j) {
-			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
-				mac_addrs[j] = macs[i];
-				break;
-			}
-		}
-	}
-}
-
-/**
- * Flush all added MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param[in] mac_addrs
- *   Mac addresses array to flush.
- * @param n
- *   @p mac_addrs array size.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- */
-void
-mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-		       struct rte_ether_addr *mac_addrs, int n,
-		       uint64_t *mac_own)
-{
-	int i;
-
-	for (i = n - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &mac_addrs[i];
-
-		if (BITFIELD_ISSET(mac_own, i))
-			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
-						i);
-	}
-}
-
-/**
- * Enable promiscuous / all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param flags
- *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
-		     int enable)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifi;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_NEWLINK,
-			.nlmsg_flags = NLM_F_REQUEST,
-		},
-		.ifi = {
-			.ifi_flags = enable ? flags : 0,
-			.ifi_change = flags,
-			.ifi_index = iface_idx,
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (nlsk_fd < 0)
-		return 0;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		return ret;
-	return 0;
-}
-
-/**
- * Enable promiscuous mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s promisc mode: Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Enable all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
-				       enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s allmulti : Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Process network interface information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_ifindex_data *data = arg;
-	struct mlx5_nl_ifindex_data local = {
-		.flags = 0,
-	};
-	size_t off = NLMSG_HDRLEN;
-
-	if (nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
-	    nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct nlattr *na = (void *)((uintptr_t)nh + off);
-		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
-
-		if (na->nla_len > nh->nlmsg_len - off)
-			goto error;
-		switch (na->nla_type) {
-		case RDMA_NLDEV_ATTR_DEV_INDEX:
-			local.ibindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_DEV_NAME:
-			if (!strcmp(payload, data->name))
-				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
-			break;
-		case RDMA_NLDEV_ATTR_NDEV_INDEX:
-			local.ifindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_PORT_INDEX:
-			local.portnum = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
-			break;
-		default:
-			break;
-		}
-		off += NLA_ALIGN(na->nla_len);
-	}
-	/*
-	 * It is possible to have multiple messages for all
-	 * Infiniband devices in the system with appropriate name.
-	 * So we should gather parameters locally and copy to
-	 * query context only in case of coinciding device name.
-	 */
-	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
-		data->flags = local.flags;
-		data->ibindex = local.ibindex;
-		data->ifindex = local.ifindex;
-		data->portnum = local.portnum;
-	}
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get index of network interface associated with some IB device.
- *
- * This is the only somewhat safe method to avoid resorting to heuristics
- * when faced with port representors. Unfortunately it requires at least
- * Linux 4.17.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- * @param[in] pindex
- *   IB device port index, starting from 1
- * @return
- *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
- *   is set.
- */
-unsigned int
-mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.name = name,
-		.flags = 0,
-		.ibindex = 0, /* Determined during first pass. */
-		.ifindex = 0, /* Determined during second pass. */
-	};
-	union {
-		struct nlmsghdr nh;
-		uint8_t buf[NLMSG_HDRLEN +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(0),
-			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-						       RDMA_NLDEV_CMD_GET),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-		},
-	};
-	struct nlattr *na;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
-		goto error;
-	data.flags = 0;
-	sn = MLX5_NL_SN_GENERATE;
-	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					     RDMA_NLDEV_CMD_PORT_GET);
-	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
-	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
-	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
-	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &data.ibindex, sizeof(data.ibindex));
-	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
-	na->nla_len = NLA_HDRLEN + sizeof(pindex);
-	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
-	    !data.ifindex)
-		goto error;
-	return data.ifindex;
-error:
-	rte_errno = ENODEV;
-	return 0;
-}
-
-/**
- * Get the number of physical ports of given IB device.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- *
- * @return
- *   A valid (nonzero) number of ports on success, 0 otherwise
- *   and rte_errno is set.
- */
-unsigned int
-mlx5_nl_portnum(int nl, const char *name)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.flags = 0,
-		.name = name,
-		.ifindex = 0,
-		.portnum = 0,
-	};
-	struct nlmsghdr req = {
-		.nlmsg_len = NLMSG_LENGTH(0),
-		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					       RDMA_NLDEV_CMD_GET),
-		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
-		rte_errno = ENODEV;
-		return 0;
-	}
-	if (!data.portnum)
-		rte_errno = EINVAL;
-	return data.portnum;
-}
-
-/**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-static void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
- * Process switch information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_switch_info info = {
-		.master = 0,
-		.representor = 0,
-		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
-		.port_name = 0,
-		.switch_id = 0,
-	};
-	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-	bool switch_id_set = false;
-	bool num_vf_set = false;
-
-	if (nh->nlmsg_type != RTM_NEWLINK)
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct rtattr *ra = (void *)((uintptr_t)nh + off);
-		void *payload = RTA_DATA(ra);
-		unsigned int i;
-
-		if (ra->rta_len > nh->nlmsg_len - off)
-			goto error;
-		switch (ra->rta_type) {
-		case IFLA_NUM_VF:
-			num_vf_set = true;
-			break;
-		case IFLA_PHYS_PORT_NAME:
-			mlx5_translate_port_name((char *)payload, &info);
-			break;
-		case IFLA_PHYS_SWITCH_ID:
-			info.switch_id = 0;
-			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
-				info.switch_id <<= 8;
-				info.switch_id |= ((uint8_t *)payload)[i];
-			}
-			switch_id_set = true;
-			break;
-		}
-		off += RTA_ALIGN(ra->rta_len);
-	}
-	if (switch_id_set) {
-		/* We have some E-Switch configuration. */
-		mlx5_nl_check_switch_info(num_vf_set, &info);
-	}
-	assert(!(info.master && info.representor));
-	memcpy(arg, &info, sizeof(info));
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get switch information associated with network interface.
- *
- * @param nl
- *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
- * @param ifindex
- *   Network interface index.
- * @param[out] info
- *   Switch information object, populated in case of success.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_switch_info(int nl, unsigned int ifindex,
-		    struct mlx5_switch_info *info)
-{
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-		struct rtattr rta;
-		uint32_t extmask;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH
-					(sizeof(req.info) +
-					 RTA_LENGTH(sizeof(uint32_t))),
-			.nlmsg_type = RTM_GETLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-		.rta = {
-			.rta_type = IFLA_EXT_MASK,
-			.rta_len = RTA_LENGTH(sizeof(int32_t)),
-		},
-		.extmask = RTE_LE32(1),
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
-	if (info->master && info->representor) {
-		DRV_LOG(ERR, "ifindex %u device is recognized as master"
-			     " and as representor", ifindex);
-		rte_errno = ENODEV;
-		ret = -rte_errno;
-	}
-	return ret;
-}
-
-/*
- * Delete VLAN network device by ifindex.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Interface index of network device to delete.
- */
-void
-mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex)
-{
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_DELLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-	};
-
-	if (ifindex) {
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
-		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
-				" ifindex %u, %d", ifindex, ret);
-	}
-}
-
-/* Set of subroutines to build Netlink message. */
-static struct nlattr *
-nl_msg_tail(struct nlmsghdr *nlh)
-{
-	return (struct nlattr *)
-		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
-}
-
-static void
-nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
-{
-	struct nlattr *nla = nl_msg_tail(nlh);
-
-	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
-
-	if (alen)
-		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
-}
-
-static struct nlattr *
-nl_attr_nest_start(struct nlmsghdr *nlh, int type)
-{
-	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
-
-	nl_attr_put(nlh, type, NULL, 0);
-	return nest;
-}
-
-static void
-nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
-{
-	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
-}
-
-/*
- * Create network VLAN device with specified VLAN tag.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Base network interface index.
- * @param[in] tag
- *   VLAN tag for VLAN network device to create.
- */
-uint32_t
-mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			 uint32_t ifindex, uint16_t tag)
-{
-	struct nlmsghdr *nlh;
-	struct ifinfomsg *ifm;
-	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
-
-	alignas(RTE_CACHE_LINE_SIZE)
-	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
-		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
-		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(name)) +
-		    NLMSG_ALIGN(sizeof("vlan")) +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
-	struct nlattr *na_info;
-	struct nlattr *na_vlan;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	memset(buf, 0, sizeof(buf));
-	nlh = (struct nlmsghdr *)buf;
-	nlh->nlmsg_len = sizeof(struct nlmsghdr);
-	nlh->nlmsg_type = RTM_NEWLINK;
-	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-			   NLM_F_EXCL | NLM_F_ACK;
-	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
-	nlh->nlmsg_len += sizeof(struct ifinfomsg);
-	ifm->ifi_family = AF_UNSPEC;
-	ifm->ifi_type = 0;
-	ifm->ifi_index = 0;
-	ifm->ifi_flags = IFF_UP;
-	ifm->ifi_change = 0xffffffff;
-	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
-	ret = snprintf(name, sizeof(name), "%s.%u.%u",
-		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
-	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
-	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
-	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
-	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
-	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
-	nl_attr_nest_end(nlh, na_vlan);
-	nl_attr_nest_end(nlh, na_info);
-	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-	if (ret < 0) {
-		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
-			ret);
-	}
-	// Try to get ifindex of created or pre-existing device.
-	ret = if_nametoindex(name);
-	if (!ret) {
-		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
-			errno);
-		return 0;
-	}
-	return ret;
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
deleted file mode 100644
index 9be87c0..0000000
--- a/drivers/net/mlx5/mlx5_nl.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_NL_H_
-#define RTE_PMD_MLX5_NL_H_
-
-#include <linux/netlink.h>
-
-
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_nl_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
-
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_nl_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_nl_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t vf_ifindex;
-	struct mlx5_nl_vlan_dev vlan_dev[4096];
-};
-
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			 struct rte_ether_addr *mac, uint32_t index);
-int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
-			    uint64_t *mac_own, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac_addrs, int n);
-void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-			    struct rte_ether_addr *mac_addrs, int n,
-			    uint64_t *mac_own);
-int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
-int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			      uint32_t ifindex);
-uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-				  uint32_t ifindex, uint16_t tag);
-
-#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fc1a91c..8e63b67 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -33,11 +33,11 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_nl.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 25/25] common/mlx5: support ROCE disable through Netlink
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (23 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 24/25] common/mlx5: share Netlink commands Matan Azrad
@ 2020-01-28 16:27     ` Matan Azrad
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-28 16:27 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add new 4 Netlink commands to support enable/disable ROCE:
        1. mlx5_nl_devlink_family_id_get to get the Devlink family ID of
           Netlink general command.
        2. mlx5_nl_enable_roce_get to get the ROCE current status.
        3. mlx5_nl_driver_reload - to reload the device kernel driver.
        4. mlx5_nl_enable_roce_set - to set the ROCE status.
When the user changes the ROCE status, the IB device may disappear and
appear again, so DPDK driver should wait for it and to restart itself.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |   5 +
 drivers/common/mlx5/meson.build                 |   1 +
 drivers/common/mlx5/mlx5_nl.c                   | 366 +++++++++++++++++++++++-
 drivers/common/mlx5/mlx5_nl.h                   |   6 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   4 +
 5 files changed, 380 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 6a14b7d..9d4d81f 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -260,6 +260,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum IFLA_PHYS_PORT_NAME \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_DEVLINK \
+		linux/devlink.h \
+		define DEVLINK_GENL_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_SUPPORTED_40000baseKR4_Full \
 		/usr/include/linux/ethtool.h \
 		define SUPPORTED_40000baseKR4_Full \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 34cb7b9..fdd1e85 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -168,6 +168,7 @@ if build
 		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
 		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
 		'mlx5dv_dump_dr_domain'],
+		[ 'HAVE_DEVLINK', 'linux/devlink.h', 'DEVLINK_GENL_NAME' ],
 	]
 	config = configuration_data()
 	foreach arg:has_sym_args
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
index b4fc053..0d1efd2 100644
--- a/drivers/common/mlx5/mlx5_nl.c
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -6,6 +6,7 @@
 #include <errno.h>
 #include <linux/if_link.h>
 #include <linux/rtnetlink.h>
+#include <linux/genetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
 #include <stdbool.h>
@@ -22,6 +23,10 @@
 
 #include "mlx5_nl.h"
 #include "mlx5_common_utils.h"
+#ifdef HAVE_DEVLINK
+#include <linux/devlink.h>
+#endif
+
 
 /* Size of the buffer to receive kernel messages */
 #define MLX5_NL_BUF_SIZE (32 * 1024)
@@ -90,6 +95,59 @@
 #define IFLA_PHYS_PORT_NAME 38
 #endif
 
+/*
+ * Some Devlink defines may be missed in old kernel versions,
+ * adjust used defines.
+ */
+#ifndef DEVLINK_GENL_NAME
+#define DEVLINK_GENL_NAME "devlink"
+#endif
+#ifndef DEVLINK_GENL_VERSION
+#define DEVLINK_GENL_VERSION 1
+#endif
+#ifndef DEVLINK_ATTR_BUS_NAME
+#define DEVLINK_ATTR_BUS_NAME 1
+#endif
+#ifndef DEVLINK_ATTR_DEV_NAME
+#define DEVLINK_ATTR_DEV_NAME 2
+#endif
+#ifndef DEVLINK_ATTR_PARAM
+#define DEVLINK_ATTR_PARAM 80
+#endif
+#ifndef DEVLINK_ATTR_PARAM_NAME
+#define DEVLINK_ATTR_PARAM_NAME 81
+#endif
+#ifndef DEVLINK_ATTR_PARAM_TYPE
+#define DEVLINK_ATTR_PARAM_TYPE 83
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUES_LIST
+#define DEVLINK_ATTR_PARAM_VALUES_LIST 84
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE
+#define DEVLINK_ATTR_PARAM_VALUE 85
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_DATA
+#define DEVLINK_ATTR_PARAM_VALUE_DATA 86
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_CMODE
+#define DEVLINK_ATTR_PARAM_VALUE_CMODE 87
+#endif
+#ifndef DEVLINK_PARAM_CMODE_DRIVERINIT
+#define DEVLINK_PARAM_CMODE_DRIVERINIT 1
+#endif
+#ifndef DEVLINK_CMD_RELOAD
+#define DEVLINK_CMD_RELOAD 37
+#endif
+#ifndef DEVLINK_CMD_PARAM_GET
+#define DEVLINK_CMD_PARAM_GET 38
+#endif
+#ifndef DEVLINK_CMD_PARAM_SET
+#define DEVLINK_CMD_PARAM_SET 39
+#endif
+#ifndef NLA_FLAG
+#define NLA_FLAG 6
+#endif
+
 /* Add/remove MAC address through Netlink */
 struct mlx5_nl_mac_addr {
 	struct rte_ether_addr (*mac)[];
@@ -1241,8 +1299,8 @@ struct mlx5_nl_ifindex_data {
 	struct nlattr *nla = nl_msg_tail(nlh);
 
 	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr)) + alen;
+	nlh->nlmsg_len += NLMSG_ALIGN(nla->nla_len);
 
 	if (alen)
 		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
@@ -1335,3 +1393,307 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
+
+/**
+ * Parse Netlink message to retrieve the general family ID.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_family_id_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	for (; nla->nla_len && nla < tail;
+	     nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len))) {
+		if (nla->nla_type == CTRL_ATTR_FAMILY_ID) {
+			*(uint16_t *)arg = *(uint16_t *)(nla + 1);
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
+#define MLX5_NL_MAX_ATTR_SIZE 100
+/**
+ * Get generic netlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] name
+ *   The family name.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_generic_family_id_get(int nlsk_fd, const char *name)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int name_size = strlen(name) + 1;
+	int ret;
+	uint16_t id = -1;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE)];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = GENL_ID_CTRL;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = CTRL_CMD_GETFAMILY;
+	genl->version = 1;
+	nl_attr_put(nlh, CTRL_ATTR_FAMILY_NAME, name, name_size);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_family_id_cb, &id);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get Netlink %s family ID: %d.", name,
+			ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Netlink \"%s\" family ID is %u.", name, id);
+	return (int)id;
+}
+
+/**
+ * Get Devlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+
+int
+mlx5_nl_devlink_family_id_get(int nlsk_fd)
+{
+	return mlx5_nl_generic_family_id_get(nlsk_fd, DEVLINK_GENL_NAME);
+}
+
+/**
+ * Parse Netlink message to retrieve the ROCE enable status.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_roce_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	int ret = -EINVAL;
+	int *enable = arg;
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	while (nla->nla_len && nla < tail) {
+		switch (nla->nla_type) {
+		/* Expected nested attributes case. */
+		case DEVLINK_ATTR_PARAM:
+		case DEVLINK_ATTR_PARAM_VALUES_LIST:
+		case DEVLINK_ATTR_PARAM_VALUE:
+			ret = 0;
+			nla += 1;
+			break;
+		case DEVLINK_ATTR_PARAM_VALUE_DATA:
+			*enable = 1;
+			return 0;
+		default:
+			nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len));
+		}
+	}
+	*enable = 0;
+	return ret;
+}
+
+/**
+ * Get ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   Where to store the enable status.
+ *
+ * @return
+ *   0 on success and @p enable is updated, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			int *enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	int cur_en;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 4 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 4];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_GET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_roce_cb, &cur_en);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable on device %s: %d.",
+			pci_addr, ret);
+		return ret;
+	}
+	*enable = cur_en;
+	DRV_LOG(DEBUG, "ROCE is %sabled for device \"%s\".",
+		cur_en ? "en" : "dis", pci_addr);
+	return ret;
+}
+
+/**
+ * Reload mlx5 device kernel driver through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 2 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 2];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_RELOAD;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to reload %s device by Netlink - %d",
+			pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device \"%s\" was reloaded by Netlink successfully.",
+		pci_addr);
+	return 0;
+}
+
+/**
+ * Set ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			int enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 6 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 6];
+	uint8_t cmode = DEVLINK_PARAM_CMODE_DRIVERINIT;
+	uint8_t ptype = NLA_FLAG;
+;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_SET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_CMODE, &cmode, sizeof(cmode));
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_TYPE, &ptype, sizeof(ptype));
+	if (enable)
+		nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, NULL, 0);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to %sable ROCE for device %s by Netlink:"
+			" %d.", enable ? "en" : "dis", pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device %s ROCE was %sabled by Netlink successfully.",
+		pci_addr, enable ? "en" : "dis");
+	/* Now, need to reload the driver. */
+	return mlx5_nl_driver_reload(nlsk_fd, family_id, pci_addr);
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
index 8e66a98..2c3f837 100644
--- a/drivers/common/mlx5/mlx5_nl.h
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -53,5 +53,11 @@ void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
+int mlx5_nl_devlink_family_id_get(int nlsk_fd);
+int mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			    int *enable);
+int mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr);
+int mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			    int enable);
 
 #endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 34b66a5..ee69f99 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -28,6 +28,10 @@ DPDK_20.02 {
 	mlx5_dev_to_pci_addr;
 
 	mlx5_nl_allmulti;
+	mlx5_nl_devlink_family_id_get;
+	mlx5_nl_driver_reload;
+	mlx5_nl_enable_roce_get;
+	mlx5_nl_enable_roce_set;
 	mlx5_nl_ifindex;
 	mlx5_nl_init;
 	mlx5_nl_mac_addr_add;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver
  2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
                   ` (38 preceding siblings ...)
  2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
@ 2020-01-29 10:08 ` Matan Azrad
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 01/13] drivers: introduce " Matan Azrad
                     ` (14 more replies)
  39 siblings, 15 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:08 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
v2:
- Reorder patches for 2 serieses - this is the seccond part of the previous series splitting.
- Fix spelling and per patch complition issues.
- moved to use claim_zero instead of pure asserts.
Matan Azrad (13):
  drivers: introduce mlx5 vDPA driver
  vdpa/mlx5: support queues number operation
  vdpa/mlx5: support features get operations
  vdpa/mlx5: prepare memory regions
  vdpa/mlx5: prepare HW queues
  vdpa/mlx5: prepare virtio queues
  vdpa/mlx5: support stateless offloads
  vdpa/mlx5: add basic steering configurations
  vdpa/mlx5: support queue state operation
  vdpa/mlx5: map doorbell
  vdpa/mlx5: support live migration
  vdpa/mlx5: support close and config operations
  vdpa/mlx5: disable ROCE
 MAINTAINERS                                     |   7 +
 config/common_base                              |   5 +
 doc/guides/rel_notes/release_20_02.rst          |   5 +
 doc/guides/vdpadevs/features/mlx5.ini           |  27 ++
 doc/guides/vdpadevs/index.rst                   |   1 +
 doc/guides/vdpadevs/mlx5.rst                    | 111 +++++
 drivers/common/Makefile                         |   2 +-
 drivers/common/mlx5/Makefile                    |  17 +-
 drivers/common/mlx5/mlx5_prm.h                  |   4 +
 drivers/meson.build                             |   8 +-
 drivers/vdpa/Makefile                           |   2 +
 drivers/vdpa/meson.build                        |   3 +-
 drivers/vdpa/mlx5/Makefile                      |  43 ++
 drivers/vdpa/mlx5/meson.build                   |  34 ++
 drivers/vdpa/mlx5/mlx5_vdpa.c                   | 563 ++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h                   | 303 +++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c             | 399 +++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c                | 130 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c               | 346 +++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c             | 283 ++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             | 388 ++++++++++++++++
 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
 mk/rte.app.mk                                   |  15 +-
 24 files changed, 2702 insertions(+), 17 deletions(-)
 create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
 create mode 100644 doc/guides/vdpadevs/mlx5.rst
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/meson.build
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
 create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 01/13] drivers: introduce mlx5 vDPA driver
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
@ 2020-01-29 10:08   ` Matan Azrad
  2020-01-30 14:38     ` Maxime Coquelin
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 02/13] vdpa/mlx5: support queues number operation Matan Azrad
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:08 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add a new driver to support vDPA operations by Mellanox devices.
The first Mellanox devices which support vDPA operations are
ConnectX6DX and Bluefield1 HCA for their PF ports and VF ports.
This driver is depending on rdma-core like the mlx5 PMD, also it is
going to use mlx5 DevX to create HW objects directly by the FW.
Hence, the common/mlx5 library is linked to the mlx5_vdpa driver.
This driver will not be compiled by default due to the above
dependencies.
Register a new log type for this driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 MAINTAINERS                                     |   7 +
 config/common_base                              |   5 +
 doc/guides/rel_notes/release_20_02.rst          |   5 +
 doc/guides/vdpadevs/features/mlx5.ini           |  14 ++
 doc/guides/vdpadevs/index.rst                   |   1 +
 doc/guides/vdpadevs/mlx5.rst                    | 111 ++++++++++++
 drivers/common/Makefile                         |   2 +-
 drivers/common/mlx5/Makefile                    |  17 +-
 drivers/meson.build                             |   8 +-
 drivers/vdpa/Makefile                           |   2 +
 drivers/vdpa/meson.build                        |   3 +-
 drivers/vdpa/mlx5/Makefile                      |  36 ++++
 drivers/vdpa/mlx5/meson.build                   |  29 +++
 drivers/vdpa/mlx5/mlx5_vdpa.c                   | 227 ++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +++
 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
 mk/rte.app.mk                                   |  15 +-
 17 files changed, 488 insertions(+), 17 deletions(-)
 create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
 create mode 100644 doc/guides/vdpadevs/mlx5.rst
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/meson.build
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
 create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
diff --git a/MAINTAINERS b/MAINTAINERS
index 150d507..f697e9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1103,6 +1103,13 @@ F: drivers/vdpa/ifc/
 F: doc/guides/vdpadevs/ifc.rst
 F: doc/guides/vdpadevs/features/ifcvf.ini
 
+Mellanox mlx5 vDPA
+M: Matan Azrad <matan@mellanox.com>
+M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
+F: drivers/vdpa/mlx5/
+F: doc/guides/vdpadevs/mlx5.rst
+F: doc/guides/vdpadevs/features/mlx5.ini
+
 
 Eventdev Drivers
 ----------------
diff --git a/config/common_base b/config/common_base
index c897dd0..6ea9c63 100644
--- a/config/common_base
+++ b/config/common_base
@@ -366,6 +366,11 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
 
+#
+# Compile vdpa-oriented Mellanox ConnectX-6 & Bluefield (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=n
+
 # Linking method for mlx4/5 dependency on ibverbs and related libraries
 # Default linking is dynamic by linker.
 # Other options are: dynamic by dlopen at run-time, or statically embedded.
diff --git a/doc/guides/rel_notes/release_20_02.rst b/doc/guides/rel_notes/release_20_02.rst
index 50e2c14..690e7db 100644
--- a/doc/guides/rel_notes/release_20_02.rst
+++ b/doc/guides/rel_notes/release_20_02.rst
@@ -113,6 +113,11 @@ New Features
   * Added support for RSS using L3/L4 source/destination only.
   * Added support for matching on GTP tunnel header item.
 
+* **Add new vDPA PMD based on Mellanox devices**
+
+  Added a new Mellanox vDPA  (``mlx5_vdpa``) PMD.
+  See the :doc:`../vdpadevs/mlx5` guide for more details on this driver.
+
 * **Updated testpmd application.**
 
   Added support for ESP and L2TPv3 over IP rte_flow patterns to the testpmd
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
new file mode 100644
index 0000000..d635bdf
--- /dev/null
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'mlx5' VDPA driver.
+;
+; Refer to default.ini for the full list of available driver features.
+;
+[Features]
+Other kdrv           = Y
+ARMv8                = Y
+Power8               = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
+Design doc           = Y
+
diff --git a/doc/guides/vdpadevs/index.rst b/doc/guides/vdpadevs/index.rst
index 9657108..1a13efe 100644
--- a/doc/guides/vdpadevs/index.rst
+++ b/doc/guides/vdpadevs/index.rst
@@ -13,3 +13,4 @@ which can be used from an application through vhost API.
 
     features_overview
     ifc
+    mlx5
diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
new file mode 100644
index 0000000..1861e71
--- /dev/null
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -0,0 +1,111 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2019 Mellanox Technologies, Ltd
+
+MLX5 vDPA driver
+================
+
+The MLX5 vDPA (vhost data path acceleration) driver library
+(**librte_pmd_mlx5_vdpa**) provides support for **Mellanox ConnectX-6**,
+**Mellanox ConnectX-6DX** and **Mellanox BlueField** families of
+10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in
+SR-IOV context.
+
+.. note::
+
+   Due to external dependencies, this driver is disabled in default
+   configuration of the "make" build. It can be enabled with
+   ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=y`` or by using "meson" build system which
+   will detect dependencies.
+
+
+Design
+------
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel,
+combined with hardware specifications that allow to handle virtual memory
+addresses directly, ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+The PMD can use libibverbs and libmlx5 to access the device firmware
+or directly the hardware components.
+There are different levels of objects and bypassing abilities
+to get the best performances:
+
+- Verbs is a complete high-level generic API
+- Direct Verbs is a device-specific API
+- DevX allows to access firmware objects
+- Direct Rules manages flow steering at low-level hardware layer
+
+Enabling librte_pmd_mlx5_vdpa causes DPDK applications to be linked against
+libibverbs.
+
+A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5
+driver but not in parallel. Hence, the user should decide the driver by the
+``class`` parameter in the device argument list.
+By default, the mlx5 device will be probed by the net/mlx5 driver. 
+
+Supported NICs
+--------------
+
+* Mellanox(R) ConnectX(R)-6 200G MCX654106A-HCAT (4x200G)
+* Mellanox(R) ConnectX(R)-6DX EN 100G MCX623106AN-CDAT (2*100G)
+* Mellanox(R) ConnectX(R)-6DX EN 200G MCX623105AN-VDAT (1*200G)
+* Mellanox(R) BlueField SmartNIC 25G MBF1M332A-ASCAT (2*25G)
+
+Prerequisites
+-------------
+
+- Mellanox OFED version: **4.7**
+  see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` (default **n**)
+
+  Toggle compilation of librte_pmd_mlx5 itself.
+
+- ``CONFIG_RTE_IBVERBS_LINK_DLOPEN`` (default **n**)
+
+  Build PMD with additional code to make it loadable without hard
+  dependencies on **libibverbs** nor **libmlx5**, which may not be installed
+  on the target system.
+
+  In this mode, their presence is still required for it to run properly,
+  however their absence won't prevent a DPDK application from starting (with
+  ``CONFIG_RTE_BUILD_SHARED_LIB`` disabled) and they won't show up as
+  missing with ``ldd(1)``.
+
+  It works by moving these dependencies to a purpose-built rdma-core "glue"
+  plug-in which must either be installed in a directory whose name is based
+  on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if set, or in a
+  standard location for the dynamic linker (e.g. ``/lib``) if left to the
+  default empty string (``""``).
+
+  This option has no performance impact.
+
+- ``CONFIG_RTE_IBVERBS_LINK_STATIC`` (default **n**)
+
+  Embed static flavor of the dependencies **libibverbs** and **libmlx5**
+  in the PMD shared library or the executable static binary.
+
+.. note::
+
+   For BlueField, target should be set to ``arm64-bluefield-linux-gcc``. This
+   will enable ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` and set
+   ``RTE_CACHE_LINE_SIZE`` to 64. Default armv8a configuration of make build and
+   meson build set it to 128 then brings performance degradation.
+
+Run-time configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- **ethtool** operations on related kernel interfaces also affect the PMD.
+
+- ``class`` parameter [string]
+
+  Select the class of the driver that should probe the device.
+  `vdpa` for the mlx5 vDPA driver.
+
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 4775d4b..96bd7ac 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,7 +35,7 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
-ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
 DIRS-y += mlx5
 endif
 
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 9d4d81f..c4b7999 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -10,15 +10,16 @@ LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
 LIB_GLUE_VERSION = 20.02.0
 
 # Sources.
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
 ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+SRCS-y += mlx5_glue.c
 endif
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
-
+SRCS-y += mlx5_devx_cmds.c
+SRCS-y += mlx5_common.c
+SRCS-y += mlx5_nl.c
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+INSTALL-y-lib += $(LIB_GLUE)
+endif
 endif
 
 # Basic CFLAGS.
@@ -317,7 +318,9 @@ mlx5_autoconf.h: mlx5_autoconf.h.new
 		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
 		mv '$<' '$@'
 
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+$(SRCS-y:.c=.o): mlx5_autoconf.h
+endif
 
 # Generate dependency plug-in for rdma-core when the PMD must not be linked
 # directly, so that applications do not inherit this dependency.
diff --git a/drivers/meson.build b/drivers/meson.build
index 29708cc..bd154fa 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -42,6 +42,7 @@ foreach class:dpdk_driver_classes
 		build = true # set to false to disable, e.g. missing deps
 		reason = '<unknown reason>' # set if build == false to explain
 		name = drv
+		fmt_name = ''
 		allow_experimental_apis = false
 		sources = []
 		objs = []
@@ -98,8 +99,11 @@ foreach class:dpdk_driver_classes
 		else
 			class_drivers += name
 
-			dpdk_conf.set(config_flag_fmt.format(name.to_upper()),1)
-			lib_name = driver_name_fmt.format(name)
+			if fmt_name == ''
+				fmt_name = name
+			endif
+			dpdk_conf.set(config_flag_fmt.format(fmt_name.to_upper()),1)
+			lib_name = driver_name_fmt.format(fmt_name)
 
 			if allow_experimental_apis
 				cflags += '-DALLOW_EXPERIMENTAL_API'
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index b5a7a11..6e88359 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -7,4 +7,6 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc
 endif
 
+DIRS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build
index 2f047b5..e3ed54a 100644
--- a/drivers/vdpa/meson.build
+++ b/drivers/vdpa/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2019 Mellanox Technologies, Ltd
 
-drivers = ['ifc']
+drivers = ['ifc',
+	   'mlx5',]
 std_deps = ['bus_pci', 'kvargs']
 std_deps += ['vhost']
 config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
new file mode 100644
index 0000000..c1c8cc0
--- /dev/null
+++ b/drivers/vdpa/mlx5/Makefile
@@ -0,0 +1,36 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_pmd_mlx5_vdpa.a
+
+# Sources.
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+LDLIBS += -lrte_common_mlx5
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual
+
+EXPORT_MAP := rte_pmd_mlx5_vdpa_version.map
+# memseg walk is not part of stable API
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+CFLAGS += -DNDEBUG -UPEDANTIC
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
new file mode 100644
index 0000000..4bca6ea
--- /dev/null
+++ b/drivers/vdpa/mlx5/meson.build
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+
+fmt_name = 'mlx5_vdpa'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
+sources = files(
+	'mlx5_vdpa.c',
+)
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
+	endif
+endforeach
+
+cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
new file mode 100644
index 0000000..6286d7a
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -0,0 +1,227 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_errno.h>
+#include <rte_bus_pci.h>
+#include <rte_vdpa.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+
+
+struct mlx5_vdpa_priv {
+	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	int id; /* vDPA device id. */
+	struct ibv_context *ctx; /* Device context. */
+	struct rte_vdpa_dev_addr dev_addr;
+};
+
+TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
+					      TAILQ_HEAD_INITIALIZER(priv_list);
+static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
+int mlx5_vdpa_logtype;
+
+static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
+	.get_queue_num = NULL,
+	.get_features = NULL,
+	.get_protocol_features = NULL,
+	.dev_conf = NULL,
+	.dev_close = NULL,
+	.set_vring_state = NULL,
+	.set_features = NULL,
+	.migration_done = NULL,
+	.get_vfio_group_fd = NULL,
+	.get_vfio_device_fd = NULL,
+	.get_notify_area = NULL,
+};
+
+/**
+ * DPDK callback to register a PCI device.
+ *
+ * This function spawns vdpa device out of a given PCI device.
+ *
+ * @param[in] pci_drv
+ *   PCI driver structure (mlx5_vpda_driver).
+ * @param[in] pci_dev
+ *   PCI device information.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		    struct rte_pci_device *pci_dev __rte_unused)
+{
+	struct ibv_device **ibv_list;
+	struct ibv_device *ibv_match = NULL;
+	struct mlx5_vdpa_priv *priv = NULL;
+	struct ibv_context *ctx = NULL;
+	int ret;
+
+	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_VDPA) {
+		DRV_LOG(DEBUG, "Skip probing - should be probed by other mlx5"
+			" driver.");
+		return 1;
+	}
+	errno = 0;
+	ibv_list = mlx5_glue->get_device_list(&ret);
+	if (!ibv_list) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
+		return -ENOSYS;
+	}
+	while (ret-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
+		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
+			continue;
+		if (pci_dev->addr.domain != pci_addr.domain ||
+		    pci_dev->addr.bus != pci_addr.bus ||
+		    pci_dev->addr.devid != pci_addr.devid ||
+		    pci_dev->addr.function != pci_addr.function)
+			continue;
+		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
+			ibv_list[ret]->name);
+		ibv_match = ibv_list[ret];
+		break;
+	}
+	mlx5_glue->free_device_list(ibv_list);
+	if (!ibv_match) {
+		DRV_LOG(ERR, "No matching IB device for PCI slot "
+			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		return -rte_errno;
+	}
+	ctx = mlx5_glue->dv_open_device(ibv_match);
+	if (!ctx) {
+		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
+			ibv_match->name);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv),
+			   RTE_CACHE_LINE_SIZE);
+	if (!priv) {
+		DRV_LOG(ERR, "Failed to allocate private memory.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	priv->ctx = ctx;
+	priv->dev_addr.pci_addr = pci_dev->addr;
+	priv->dev_addr.type = PCI_ADDR;
+	priv->id = rte_vdpa_register_device(&priv->dev_addr, &mlx5_vdpa_ops);
+	if (priv->id < 0) {
+		DRV_LOG(ERR, "Failed to register vDPA device.");
+		rte_errno = rte_errno ? rte_errno : EINVAL;
+		goto error;
+	}
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_INSERT_TAIL(&priv_list, priv, next);
+	pthread_mutex_unlock(&priv_list_lock);
+	return 0;
+
+error:
+	if (priv)
+		rte_free(priv);
+	if (ctx)
+		mlx5_glue->close_device(ctx);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to remove a PCI device.
+ *
+ * This function removes all vDPA devices belong to a given PCI device.
+ *
+ * @param[in] pci_dev
+ *   Pointer to the PCI device.
+ *
+ * @return
+ *   0 on success, the function cannot fail.
+ */
+static int
+mlx5_vdpa_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct mlx5_vdpa_priv *priv = NULL;
+	int found = 0;
+
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_FOREACH(priv, &priv_list, next) {
+		if (memcmp(&priv->dev_addr.pci_addr, &pci_dev->addr,
+			   sizeof(pci_dev->addr)) == 0) {
+			found = 1;
+			break;
+		}
+	}
+	if (found) {
+		TAILQ_REMOVE(&priv_list, priv, next);
+		mlx5_glue->close_device(priv->ctx);
+		rte_free(priv);
+	}
+	pthread_mutex_unlock(&priv_list_lock);
+	return 0;
+}
+
+static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+			       PCI_DEVICE_ID_MELLANOX_CONNECTX5BF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+			       PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6VF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6DX)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF)
+	},
+	{
+		.vendor_id = 0
+	}
+};
+
+static struct rte_pci_driver mlx5_vdpa_driver = {
+	.driver = {
+		.name = "mlx5_vdpa",
+	},
+	.id_table = mlx5_vdpa_pci_id_map,
+	.probe = mlx5_vdpa_pci_probe,
+	.remove = mlx5_vdpa_pci_remove,
+	.drv_flags = 0,
+};
+
+/**
+ * Driver initialization routine.
+ */
+RTE_INIT(rte_mlx5_vdpa_init)
+{
+	/* Initialize common log type. */
+	mlx5_vdpa_logtype = rte_log_register("pmd.vdpa.mlx5");
+	if (mlx5_vdpa_logtype >= 0)
+		rte_log_set_level(mlx5_vdpa_logtype, RTE_LOG_NOTICE);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_vdpa_driver);
+}
+
+RTE_PMD_EXPORT_NAME(net_mlx5_vdpa, __COUNTER__);
+RTE_PMD_REGISTER_PCI_TABLE(net_mlx5_vdpa, mlx5_vdpa_pci_id_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_mlx5_vdpa, "* ib_uverbs & mlx5_core & mlx5_ib");
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_utils.h b/drivers/vdpa/mlx5/mlx5_vdpa_utils.h
new file mode 100644
index 0000000..a239df9
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_VDPA_UTILS_H_
+#define RTE_PMD_MLX5_VDPA_UTILS_H_
+
+#include <mlx5_common.h>
+
+
+extern int mlx5_vdpa_logtype;
+
+#define MLX5_VDPA_LOG_PREFIX "mlx5_vdpa"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_vdpa_logtype, MLX5_VDPA_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_VDPA_UTILS_H_ */
diff --git a/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map b/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
new file mode 100644
index 0000000..143836e
--- /dev/null
+++ b/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
@@ -0,0 +1,3 @@
+DPDK_20.02 {
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 45f4cad..b33cd8a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,18 +196,21 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+_LDLIBS-y                                   += -lrte_common_mlx5
+endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)  += -lrte_pmd_mlx5_vdpa
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -ldl
+_LDLIBS-y                                   += -ldl
 else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
 LIBS_IBVERBS_STATIC = $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += $(LIBS_IBVERBS_STATIC)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += $(LIBS_IBVERBS_STATIC)
+_LDLIBS-y                                   += $(LIBS_IBVERBS_STATIC)
 else
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+_LDLIBS-y                                   += -libverbs -lmlx5
+endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -libverbs -lmlx4
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -libverbs -lmlx5
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD)     += -lrte_pmd_mvneta
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 02/13] vdpa/mlx5: support queues number operation
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 01/13] drivers: introduce " Matan Azrad
@ 2020-01-29 10:08   ` Matan Azrad
  2020-01-30 14:46     ` Maxime Coquelin
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 03/13] vdpa/mlx5: support features get operations Matan Azrad
                     ` (12 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:08 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Support get_queue_num operation to get the maximum number of queues
supported by the device.
This number comes from the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 6286d7a..15e53f2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -9,6 +9,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
 
 #include "mlx5_vdpa_utils.h"
 
@@ -18,6 +19,7 @@ struct mlx5_vdpa_priv {
 	int id; /* vDPA device id. */
 	struct ibv_context *ctx; /* Device context. */
 	struct rte_vdpa_dev_addr dev_addr;
+	struct mlx5_hca_vdpa_attr caps;
 };
 
 TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
@@ -25,8 +27,43 @@ struct mlx5_vdpa_priv {
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 int mlx5_vdpa_logtype;
 
+static struct mlx5_vdpa_priv *
+mlx5_vdpa_find_priv_resource_by_did(int did)
+{
+	struct mlx5_vdpa_priv *priv;
+	int found = 0;
+
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_FOREACH(priv, &priv_list, next) {
+		if (did == priv->id) {
+			found = 1;
+			break;
+		}
+	}
+	pthread_mutex_unlock(&priv_list_lock);
+	if (!found) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	return priv;
+}
+
+static int
+mlx5_vdpa_get_queue_num(int did, uint32_t *queue_num)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*queue_num = priv->caps.max_num_virtio_queues;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
-	.get_queue_num = NULL,
+	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = NULL,
 	.get_protocol_features = NULL,
 	.dev_conf = NULL,
@@ -60,6 +97,7 @@ struct mlx5_vdpa_priv {
 	struct ibv_device *ibv_match = NULL;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx = NULL;
+	struct mlx5_hca_attr attr;
 	int ret;
 
 	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_VDPA) {
@@ -113,6 +151,20 @@ struct mlx5_vdpa_priv {
 		rte_errno = ENOMEM;
 		goto error;
 	}
+	ret = mlx5_devx_cmd_query_hca_attr(ctx, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to read HCA capabilities.");
+		rte_errno = ENOTSUP;
+		goto error;
+	} else {
+		if (!attr.vdpa.valid || !attr.vdpa.max_num_virtio_queues) {
+			DRV_LOG(ERR, "Not enough capabilities to support vdpa,"
+				" maybe old FW/OFED version?");
+			rte_errno = ENOTSUP;
+			goto error;
+		}
+		priv->caps = attr.vdpa;
+	}
 	priv->ctx = ctx;
 	priv->dev_addr.pci_addr = pci_dev->addr;
 	priv->dev_addr.type = PCI_ADDR;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 03/13] vdpa/mlx5: support features get operations
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 01/13] drivers: introduce " Matan Azrad
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 02/13] vdpa/mlx5: support queues number operation Matan Azrad
@ 2020-01-29 10:08   ` Matan Azrad
  2020-01-30 14:50     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:08 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for get_features and get_protocol_features operations.
Part of the features are reported by the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |  7 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 66 +++++++++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index d635bdf..fea491d 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -4,6 +4,13 @@
 ; Refer to default.ini for the full list of available driver features.
 ;
 [Features]
+
+any layout           = Y
+guest announce       = Y
+mq                   = Y
+proto mq             = Y
+proto log shmfd      = Y
+proto host notifier  = Y
 Other kdrv           = Y
 ARMv8                = Y
 Power8               = Y
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 15e53f2..67e90fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
+#include <linux/virtio_net.h>
+
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
@@ -10,6 +12,7 @@
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
 
 #include "mlx5_vdpa_utils.h"
 
@@ -22,6 +25,27 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 };
 
+#ifndef VIRTIO_F_ORDER_PLATFORM
+#define VIRTIO_F_ORDER_PLATFORM 36
+#endif
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
+#define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
+			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+			    (1ULL << VIRTIO_NET_F_MQ) | \
+			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
+			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
+
+#define MLX5_VDPA_PROTOCOL_FEATURES \
+			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_MQ))
+
 TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -62,10 +86,48 @@ struct mlx5_vdpa_priv {
 	return 0;
 }
 
+static int
+mlx5_vdpa_get_vdpa_features(int did, uint64_t *features)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*features = MLX5_VDPA_DEFAULT_FEATURES;
+	if (priv->caps.virtio_queue_type & (1 << MLX5_VIRTQ_TYPE_PACKED))
+		*features |= (1ULL << VIRTIO_F_RING_PACKED);
+	if (priv->caps.tso_ipv4)
+		*features |= (1ULL << VIRTIO_NET_F_HOST_TSO4);
+	if (priv->caps.tso_ipv6)
+		*features |= (1ULL << VIRTIO_NET_F_HOST_TSO6);
+	if (priv->caps.tx_csum)
+		*features |= (1ULL << VIRTIO_NET_F_CSUM);
+	if (priv->caps.rx_csum)
+		*features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+	if (priv->caps.virtio_version_1_0)
+		*features |= (1ULL << VIRTIO_F_VERSION_1);
+	return 0;
+}
+
+static int
+mlx5_vdpa_get_protocol_features(int did, uint64_t *features)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*features = MLX5_VDPA_PROTOCOL_FEATURES;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
-	.get_features = NULL,
-	.get_protocol_features = NULL,
+	.get_features = mlx5_vdpa_get_vdpa_features,
+	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = NULL,
 	.dev_close = NULL,
 	.set_vring_state = NULL,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 04/13] vdpa/mlx5: prepare memory regions
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (2 preceding siblings ...)
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 03/13] vdpa/mlx5: support features get operations Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-30 17:39     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
In order to map the guest physical addresses used by the virtio device
guest side to the host physical addresses used by the HW as the host
side, memory regions are created.
By this way, for example, the HW can translate the addresses of the
packets posted by the guest and to take the packets from the correct
place.
The design is to work with single MR which will be configured to the
virtio queues in the HW, hence a lot of direct MRs are grouped to single
indirect MR.
Create functions to prepare and release MRs with all the related
resources that are required for it.
Create a new file mlx5_vdpa_mem.c to manage all the MR related code
in the driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile        |   4 +-
 drivers/vdpa/mlx5/meson.build     |   3 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c     |  11 +-
 drivers/vdpa/mlx5/mlx5_vdpa.h     |  60 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 346 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 413 insertions(+), 11 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index c1c8cc0..5472797 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -8,6 +8,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
@@ -15,6 +16,7 @@ CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
 CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
 CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
+CFLAGS += -I$(RTE_SDK)/lib/librte_sched
 CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
@@ -22,7 +24,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 LDLIBS += -lrte_common_mlx5
-LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci -lrte_sched
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 4bca6ea..7e5dd95 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -9,9 +9,10 @@ endif
 
 fmt_name = 'mlx5_vdpa'
 allow_experimental_apis = true
-deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
+deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
+	'mlx5_vdpa_mem.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 67e90fd..c67f93d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -7,7 +7,6 @@
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
-#include <rte_vdpa.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -15,16 +14,9 @@
 #include <mlx5_prm.h>
 
 #include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
 
 
-struct mlx5_vdpa_priv {
-	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	int id; /* vDPA device id. */
-	struct ibv_context *ctx; /* Device context. */
-	struct rte_vdpa_dev_addr dev_addr;
-	struct mlx5_hca_vdpa_attr caps;
-};
-
 #ifndef VIRTIO_F_ORDER_PLATFORM
 #define VIRTIO_F_ORDER_PLATFORM 36
 #endif
@@ -236,6 +228,7 @@ struct mlx5_vdpa_priv {
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
+	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
new file mode 100644
index 0000000..e27baea
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_VDPA_H_
+#define RTE_PMD_MLX5_VDPA_H_
+
+#include <sys/queue.h>
+
+#include <rte_vdpa.h>
+#include <rte_vhost.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+struct mlx5_vdpa_query_mr {
+	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
+	void *addr;
+	uint64_t length;
+	struct mlx5dv_devx_umem *umem;
+	struct mlx5_devx_obj *mkey;
+	int is_indirect;
+};
+
+struct mlx5_vdpa_priv {
+	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	int id; /* vDPA device id. */
+	int vid; /* vhost device id. */
+	struct ibv_context *ctx; /* Device context. */
+	struct rte_vdpa_dev_addr dev_addr;
+	struct mlx5_hca_vdpa_attr caps;
+	uint32_t pdn; /* Protection Domain number. */
+	struct ibv_pd *pd;
+	uint32_t gpa_mkey_index;
+	struct ibv_mr *null_mr;
+	struct rte_vhost_memory *vmem;
+	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+};
+
+/**
+ * Release all the prepared memory regions and all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Register all the memory regions of the virtio device to the HW and allocate
+ * all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
+
+#endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
new file mode 100644
index 0000000..398ca35
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -0,0 +1,346 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <stdlib.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_common.h>
+#include <rte_sched_common.h>
+
+#include <mlx5_prm.h>
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static int
+mlx5_vdpa_pd_prepare(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->pd)
+		return 0;
+	priv->pd = mlx5_glue->alloc_pd(priv->ctx);
+	if (priv->pd == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PD.");
+		return errno ? -errno : -ENOMEM;
+	}
+	struct mlx5dv_obj obj;
+	struct mlx5dv_pd pd_info;
+	int ret = 0;
+
+	obj.pd.in = priv->pd;
+	obj.pd.out = &pd_info;
+	ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
+	if (ret) {
+		DRV_LOG(ERR, "Fail to get PD object info.");
+		mlx5_glue->dealloc_pd(priv->pd);
+		priv->pd = NULL;
+		return -errno;
+	}
+	priv->pdn = pd_info.pdn;
+	return 0;
+#else
+	(void)priv;
+	DRV_LOG(ERR, "Cannot get pdn - no DV support.");
+	return -ENOTSUP;
+#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
+}
+
+void
+mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_query_mr *entry;
+	struct mlx5_vdpa_query_mr *next;
+
+	entry = SLIST_FIRST(&priv->mr_list);
+	while (entry) {
+		next = SLIST_NEXT(entry, next);
+		claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
+		if (!entry->is_indirect)
+			claim_zero(mlx5_glue->devx_umem_dereg(entry->umem));
+		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
+		rte_free(entry);
+		entry = next;
+	}
+	SLIST_INIT(&priv->mr_list);
+	if (priv->null_mr) {
+		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
+		priv->null_mr = NULL;
+	}
+	if (priv->pd) {
+		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
+		priv->pd = NULL;
+	}
+	if (priv->vmem) {
+		free(priv->vmem);
+		priv->vmem = NULL;
+	}
+}
+
+static int
+mlx5_vdpa_regions_addr_cmp(const void *a, const void *b)
+{
+	const struct rte_vhost_mem_region *region_a = a;
+	const struct rte_vhost_mem_region *region_b = b;
+
+	if (region_a->guest_phys_addr < region_b->guest_phys_addr)
+		return -1;
+	if (region_a->guest_phys_addr > region_b->guest_phys_addr)
+		return 1;
+	return 0;
+}
+
+#define KLM_NUM_MAX_ALIGN(sz) (RTE_ALIGN_CEIL(sz, MLX5_MAX_KLM_BYTE_COUNT) / \
+			       MLX5_MAX_KLM_BYTE_COUNT)
+
+/*
+ * Allocate and sort the region list and choose indirect mkey mode:
+ *   1. Calculate GCD, guest memory size and indirect mkey entries num per mode.
+ *   2. Align GCD to the maximum allowed size(2G) and to be power of 2.
+ *   2. Decide the indirect mkey mode according to the next rules:
+ *         a. If both KLM_FBS entries number and KLM entries number are bigger
+ *            than the maximum allowed(MLX5_DEVX_MAX_KLM_ENTRIES) - error.
+ *         b. KLM mode if KLM_FBS entries number is bigger than the maximum
+ *            allowed(MLX5_DEVX_MAX_KLM_ENTRIES).
+ *         c. KLM mode if GCD is smaller than the minimum allowed(4K).
+ *         d. KLM mode if the total size of KLM entries is in one cache line
+ *            and the total size of KLM_FBS entries is not in one cache line.
+ *         e. Otherwise, KLM_FBS mode.
+ */
+static struct rte_vhost_memory *
+mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
+				    uint64_t *gcd, uint32_t *entries_num)
+{
+	struct rte_vhost_memory *mem;
+	uint64_t size;
+	uint64_t klm_entries_num = 0;
+	uint64_t klm_fbs_entries_num;
+	uint32_t i;
+	int ret = rte_vhost_get_mem_table(vid, &mem);
+
+	if (ret < 0) {
+		DRV_LOG(ERR, "Failed to get VM memory layout vid =%d.", vid);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	qsort(mem->regions, mem->nregions, sizeof(mem->regions[0]),
+	      mlx5_vdpa_regions_addr_cmp);
+	*mem_size = (mem->regions[(mem->nregions - 1)].guest_phys_addr) +
+				      (mem->regions[(mem->nregions - 1)].size) -
+					      (mem->regions[0].guest_phys_addr);
+	*gcd = 0;
+	for (i = 0; i < mem->nregions; ++i) {
+		DRV_LOG(INFO,  "Region %u: HVA 0x%" PRIx64 ", GPA 0x%" PRIx64
+			", size 0x%" PRIx64 ".", i,
+			mem->regions[i].host_user_addr,
+			mem->regions[i].guest_phys_addr, mem->regions[i].size);
+		if (i > 0) {
+			/* Hole handle. */
+			size = mem->regions[i].guest_phys_addr -
+				(mem->regions[i - 1].guest_phys_addr +
+				 mem->regions[i - 1].size);
+			*gcd = rte_get_gcd(*gcd, size);
+			klm_entries_num += KLM_NUM_MAX_ALIGN(size);
+		}
+		size = mem->regions[i].size;
+		*gcd = rte_get_gcd(*gcd, size);
+		klm_entries_num += KLM_NUM_MAX_ALIGN(size);
+	}
+	if (*gcd > MLX5_MAX_KLM_BYTE_COUNT)
+		*gcd = rte_get_gcd(*gcd, MLX5_MAX_KLM_BYTE_COUNT);
+	if (!RTE_IS_POWER_OF_2(*gcd)) {
+		uint64_t candidate_gcd = rte_align64prevpow2(*gcd);
+
+		while (candidate_gcd > 1 && (*gcd % candidate_gcd))
+			candidate_gcd /= 2;
+		DRV_LOG(DEBUG, "GCD 0x%" PRIx64 " is not power of 2. Adjusted "
+			"GCD is 0x%" PRIx64 ".", *gcd, candidate_gcd);
+		*gcd = candidate_gcd;
+	}
+	klm_fbs_entries_num = *mem_size / *gcd;
+	if (*gcd < MLX5_MIN_KLM_FIXED_BUFFER_SIZE || klm_fbs_entries_num >
+	    MLX5_DEVX_MAX_KLM_ENTRIES ||
+	    ((klm_entries_num * sizeof(struct mlx5_klm)) <=
+	    RTE_CACHE_LINE_SIZE && (klm_fbs_entries_num *
+				    sizeof(struct mlx5_klm)) >
+							RTE_CACHE_LINE_SIZE)) {
+		*mode = MLX5_MKC_ACCESS_MODE_KLM;
+		*entries_num = klm_entries_num;
+		DRV_LOG(INFO, "Indirect mkey mode is KLM.");
+	} else {
+		*mode = MLX5_MKC_ACCESS_MODE_KLM_FBS;
+		*entries_num = klm_fbs_entries_num;
+		DRV_LOG(INFO, "Indirect mkey mode is KLM Fixed Buffer Size.");
+	}
+	DRV_LOG(DEBUG, "Memory registration information: nregions = %u, "
+		"mem_size = 0x%" PRIx64 ", GCD = 0x%" PRIx64
+		", klm_fbs_entries_num = 0x%" PRIx64 ", klm_entries_num = 0x%"
+		PRIx64 ".", mem->nregions, *mem_size, *gcd, klm_fbs_entries_num,
+		klm_entries_num);
+	if (*entries_num > MLX5_DEVX_MAX_KLM_ENTRIES) {
+		DRV_LOG(ERR, "Failed to prepare memory of vid %d - memory is "
+			"too fragmented.", vid);
+		free(mem);
+		return NULL;
+	}
+	return mem;
+}
+
+#define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
+				MLX5_MAX_KLM_BYTE_COUNT : (sz))
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_mkey_attr mkey_attr;
+	struct mlx5_vdpa_query_mr *entry = NULL;
+	struct rte_vhost_mem_region *reg = NULL;
+	uint8_t mode;
+	uint32_t entries_num = 0;
+	uint32_t i;
+	uint64_t gcd;
+	uint64_t klm_size;
+	uint64_t mem_size;
+	uint64_t k;
+	int klm_index = 0;
+	int ret;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
+	struct mlx5_klm klm_array[entries_num];
+
+	if (!mem)
+		return -rte_errno;
+	priv->vmem = mem;
+	ret = mlx5_vdpa_pd_prepare(priv);
+	if (ret)
+		goto error;
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		ret = -errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
+		if (!entry) {
+			ret = -ENOMEM;
+			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
+			goto error;
+		}
+		entry->umem = mlx5_glue->devx_umem_reg(priv->ctx,
+					 (void *)(uintptr_t)reg->host_user_addr,
+					     reg->size, IBV_ACCESS_LOCAL_WRITE);
+		if (!entry->umem) {
+			DRV_LOG(ERR, "Failed to register Umem by Devx.");
+			ret = -errno;
+			goto error;
+		}
+		mkey_attr.addr = (uintptr_t)(reg->guest_phys_addr);
+		mkey_attr.size = reg->size;
+		mkey_attr.umem_id = entry->umem->umem_id;
+		mkey_attr.pd = priv->pdn;
+		mkey_attr.pg_access = 1;
+		mkey_attr.klm_array = NULL;
+		mkey_attr.klm_num = 0;
+		entry->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+		if (!entry->mkey) {
+			DRV_LOG(ERR, "Failed to create direct Mkey.");
+			ret = -rte_errno;
+			goto error;
+		}
+		entry->addr = (void *)(uintptr_t)(reg->host_user_addr);
+		entry->length = reg->size;
+		entry->is_indirect = 0;
+		if (i > 0) {
+			uint64_t sadd;
+			uint64_t empty_region_sz = reg->guest_phys_addr -
+					  (mem->regions[i - 1].guest_phys_addr +
+					   mem->regions[i - 1].size);
+
+			if (empty_region_sz > 0) {
+				sadd = mem->regions[i - 1].guest_phys_addr +
+				       mem->regions[i - 1].size;
+				klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
+				      KLM_SIZE_MAX_ALIGN(empty_region_sz) : gcd;
+				for (k = 0; k < empty_region_sz;
+				     k += klm_size) {
+					klm_array[klm_index].byte_count =
+						k + klm_size > empty_region_sz ?
+						 empty_region_sz - k : klm_size;
+					klm_array[klm_index].mkey =
+							    priv->null_mr->lkey;
+					klm_array[klm_index].address = sadd + k;
+					klm_index++;
+				}
+			}
+		}
+		klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
+					    KLM_SIZE_MAX_ALIGN(reg->size) : gcd;
+		for (k = 0; k < reg->size; k += klm_size) {
+			klm_array[klm_index].byte_count = k + klm_size >
+					   reg->size ? reg->size - k : klm_size;
+			klm_array[klm_index].mkey = entry->mkey->id;
+			klm_array[klm_index].address = reg->guest_phys_addr + k;
+			klm_index++;
+		}
+		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
+	}
+	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
+	mkey_attr.size = mem_size;
+	mkey_attr.pd = priv->pdn;
+	mkey_attr.umem_id = 0;
+	/* Must be zero for KLM mode. */
+	mkey_attr.log_entity_size = mode == MLX5_MKC_ACCESS_MODE_KLM_FBS ?
+							  rte_log2_u64(gcd) : 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = klm_array;
+	mkey_attr.klm_num = klm_index;
+	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
+	if (!entry) {
+		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
+		ret = -ENOMEM;
+		goto error;
+	}
+	entry->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+	if (!entry->mkey) {
+		DRV_LOG(ERR, "Failed to create indirect Mkey.");
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->is_indirect = 1;
+	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
+	priv->gpa_mkey_index = entry->mkey->id;
+	return 0;
+error:
+	if (entry) {
+		if (entry->mkey)
+			mlx5_devx_cmd_destroy(entry->mkey);
+		if (entry->umem)
+			mlx5_glue->devx_umem_dereg(entry->umem);
+		rte_free(entry);
+	}
+	mlx5_vdpa_mem_dereg(priv);
+	rte_errno = -ret;
+	return ret;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (3 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-30 18:17     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
                     ` (9 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
As an arrangement to the vitrio queues creation, a 2 QPs and CQ may be
created for the virtio queue.
The design is to trigger an event for the guest and for the vdpa driver
when a new CQE is posted by the HW after the packet transition.
This patch add the basic operations to create and destroy the above HW
objects  and to trigger the CQE events when a new CQE is posted.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h      |   4 +
 drivers/vdpa/mlx5/Makefile          |   1 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  89 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 399 ++++++++++++++++++++++++++++++++++++
 5 files changed, 494 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index b48cd0a..b533798 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -392,6 +392,10 @@ struct mlx5_cqe {
 /* CQE format value. */
 #define MLX5_COMPRESSED 0x3
 
+/* CQ doorbell cmd types. */
+#define MLX5_CQ_DBR_CMD_SOL_ONLY (1 << 24)
+#define MLX5_CQ_DBR_CMD_ALL (0 << 24)
+
 /* Action type of header modification. */
 enum {
 	MLX5_MODIFICATION_TYPE_SET = 0x1,
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 5472797..7f13756 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -9,6 +9,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 7e5dd95..c609f7c 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -13,6 +13,7 @@ deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
+	'mlx5_vdpa_event.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e27baea..30030b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -9,9 +9,40 @@
 
 #include <rte_vdpa.h>
 #include <rte_vhost.h>
+#include <rte_spinlock.h>
+#include <rte_interrupts.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+
+#define MLX5_VDPA_INTR_RETRIES 256
+#define MLX5_VDPA_INTR_RETRIES_USEC 1000
+
+struct mlx5_vdpa_cq {
+	uint16_t log_desc_n;
+	uint32_t cq_ci:24;
+	uint32_t arm_sn:2;
+	rte_spinlock_t sl;
+	struct mlx5_devx_obj *cq;
+	struct mlx5dv_devx_umem *umem_obj;
+	union {
+		volatile void *umem_buf;
+		volatile struct mlx5_cqe *cqes;
+	};
+	volatile uint32_t *db_rec;
+	uint64_t errors;
+};
+
+struct mlx5_vdpa_event_qp {
+	struct mlx5_vdpa_cq cq;
+	struct mlx5_devx_obj *fw_qp;
+	struct mlx5_devx_obj *sw_qp;
+	struct mlx5dv_devx_umem *umem_obj;
+	void *umem_buf;
+	volatile uint32_t *db_rec;
+};
 
 struct mlx5_vdpa_query_mr {
 	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
@@ -34,6 +65,10 @@ struct mlx5_vdpa_priv {
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
 	struct rte_vhost_memory *vmem;
+	uint32_t eqn;
+	struct mlx5dv_devx_event_channel *eventc;
+	struct mlx5dv_devx_uar *uar;
+	struct rte_intr_handle intr_handle;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -57,4 +92,58 @@ struct mlx5_vdpa_priv {
  */
 int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
 
+
+/**
+ * Create an event QP and all its related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] desc_n
+ *   Number of descriptors.
+ * @param[in] callfd
+ *   The guest notification file descriptor.
+ * @param[in/out] eqp
+ *   Pointer to the event QP structure.
+ *
+ * @return
+ *   0 on success, -1 otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+
+/**
+ * Destroy an event QP and all its related resources.
+ *
+ * @param[in/out] eqp
+ *   Pointer to the event QP structure.
+ */
+void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
+
+/**
+ * Release all the event global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Setup CQE event.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Unset CQE event .
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
new file mode 100644
index 0000000..35518ad
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -0,0 +1,399 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <unistd.h>
+#include <stdint.h>
+#include <fcntl.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_common.h>
+#include <rte_io.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+void
+mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->uar) {
+		mlx5_glue->devx_free_uar(priv->uar);
+		priv->uar = NULL;
+	}
+	if (priv->eventc) {
+		mlx5_glue->devx_destroy_event_channel(priv->eventc);
+		priv->eventc = NULL;
+	}
+	priv->eqn = 0;
+}
+
+/* Prepare all the global resources for all the event objects.*/
+static int
+mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t lcore;
+
+	if (priv->eventc)
+		return 0;
+	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
+	if (mlx5_glue->devx_query_eqn(priv->ctx, lcore, &priv->eqn)) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
+		return -1;
+	}
+	priv->eventc = mlx5_glue->devx_create_event_channel(priv->ctx,
+			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+	if (!priv->eventc) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create event channel %d.",
+			rte_errno);
+		goto error;
+	}
+	priv->uar = mlx5_glue->devx_alloc_uar(priv->ctx, 0);
+	if (!priv->uar) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to allocate UAR.");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_vdpa_event_qp_global_release(priv);
+	return -1;
+}
+
+static void
+mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq)
+{
+	if (cq->cq)
+		claim_zero(mlx5_devx_cmd_destroy(cq->cq));
+	if (cq->umem_obj)
+		claim_zero(mlx5_glue->devx_umem_dereg(cq->umem_obj));
+	if (cq->umem_buf)
+		rte_free((void *)(uintptr_t)cq->umem_buf);
+	memset(cq, 0, sizeof(*cq));
+}
+
+static inline void
+mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
+{
+	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
+	uint32_t arm_sn = cq->arm_sn << MLX5_CQ_SQN_OFFSET;
+	uint32_t cq_ci = cq->cq_ci & MLX5_CI_MASK & cqe_mask;
+	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
+	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq->id;
+	uint64_t db_be = rte_cpu_to_be_64(doorbell);
+	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr, MLX5_CQ_DOORBELL);
+
+	rte_io_wmb();
+	cq->db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
+	rte_wmb();
+#ifdef RTE_ARCH_64
+	*(uint64_t *)addr = db_be;
+#else
+	*(uint32_t *)addr = db_be;
+	rte_io_wmb();
+	*((uint32_t *)addr + 1) = db_be >> 32;
+#endif
+	cq->arm_sn++;
+}
+
+static int
+mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
+		    int callfd, struct mlx5_vdpa_cq *cq)
+{
+	struct mlx5_devx_cq_attr attr;
+	size_t pgsize = sysconf(_SC_PAGESIZE);
+	uint32_t umem_size;
+	int ret;
+	uint16_t event_nums[1] = {0};
+
+	cq->log_desc_n = log_desc_n;
+	umem_size = sizeof(struct mlx5_cqe) * (1 << log_desc_n) +
+							sizeof(*cq->db_rec) * 2;
+	cq->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
+	if (!cq->umem_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for CQ.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	cq->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
+						(void *)(uintptr_t)cq->umem_buf,
+						umem_size,
+						IBV_ACCESS_LOCAL_WRITE);
+	if (!cq->umem_obj) {
+		DRV_LOG(ERR, "Failed to register umem for CQ.");
+		goto error;
+	}
+	attr.q_umem_valid = 1;
+	attr.db_umem_valid = 1;
+	attr.use_first_only = 0;
+	attr.overrun_ignore = 0;
+	attr.uar_page_id = priv->uar->page_id;
+	attr.q_umem_id = cq->umem_obj->umem_id;
+	attr.q_umem_offset = 0;
+	attr.db_umem_id = cq->umem_obj->umem_id;
+	attr.db_umem_offset = sizeof(struct mlx5_cqe) * (1 << log_desc_n);
+	attr.eqn = priv->eqn;
+	attr.log_cq_size = log_desc_n;
+	attr.log_page_size = rte_log2_u32(pgsize);
+	cq->cq = mlx5_devx_cmd_create_cq(priv->ctx, &attr);
+	if (!cq->cq)
+		goto error;
+	cq->db_rec = RTE_PTR_ADD(cq->umem_buf, (uintptr_t)attr.db_umem_offset);
+	cq->cq_ci = 0;
+	rte_spinlock_init(&cq->sl);
+	/* Subscribe CQ event to the event channel controlled by the driver. */
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc, cq->cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)cq);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to subscribe CQE event.");
+		rte_errno = errno;
+		goto error;
+	}
+	/* Subscribe CQ event to the guest FD only if it is not in poll mode. */
+	if (callfd != -1) {
+		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv->eventc,
+							      callfd,
+							      cq->cq->obj, 0);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
+			rte_errno = errno;
+			goto error;
+		}
+	}
+	/* First arming. */
+	mlx5_vdpa_cq_arm(priv, cq);
+	return 0;
+error:
+	mlx5_vdpa_cq_destroy(cq);
+	return -1;
+}
+
+static inline void __rte_unused
+mlx5_vdpa_cq_poll(struct mlx5_vdpa_priv *priv __rte_unused,
+		  struct mlx5_vdpa_cq *cq)
+{
+	struct mlx5_vdpa_event_qp *eqp =
+				container_of(cq, struct mlx5_vdpa_event_qp, cq);
+	const unsigned int cqe_size = 1 << cq->log_desc_n;
+	const unsigned int cqe_mask = cqe_size - 1;
+	int ret;
+
+	do {
+		volatile struct mlx5_cqe *cqe = cq->cqes + (cq->cq_ci &
+							    cqe_mask);
+
+		ret = check_cqe(cqe, cqe_size, cq->cq_ci);
+		switch (ret) {
+		case MLX5_CQE_STATUS_ERR:
+			cq->errors++;
+			/*fall-through*/
+		case MLX5_CQE_STATUS_SW_OWN:
+			cq->cq_ci++;
+			break;
+		case MLX5_CQE_STATUS_HW_OWN:
+		default:
+			break;
+		}
+	} while (ret != MLX5_CQE_STATUS_HW_OWN);
+	rte_io_wmb();
+	/* Ring CQ doorbell record. */
+	cq->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	rte_io_wmb();
+	/* Ring SW QP doorbell record. */
+	eqp->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cqe_size);
+}
+
+static void
+mlx5_vdpa_interrupt_handler(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_EVENT
+	(void)cb_arg;
+	return;
+#else
+	struct mlx5_vdpa_priv *priv = cb_arg;
+	union {
+		struct mlx5dv_devx_async_event_hdr event_resp;
+		uint8_t buf[sizeof(struct mlx5dv_devx_async_event_hdr) + 128];
+	} out;
+
+	while (mlx5_glue->devx_get_event(priv->eventc, &out.event_resp,
+					 sizeof(out.buf)) >=
+				       (ssize_t)sizeof(out.event_resp.cookie)) {
+		struct mlx5_vdpa_cq *cq = (struct mlx5_vdpa_cq *)
+					       (uintptr_t)out.event_resp.cookie;
+		rte_spinlock_lock(&cq->sl);
+		mlx5_vdpa_cq_poll(priv, cq);
+		mlx5_vdpa_cq_arm(priv, cq);
+		rte_spinlock_unlock(&cq->sl);
+		DRV_LOG(DEBUG, "CQ %p event: new cq_ci = %u.", cq, cq->cq_ci);
+	}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+int
+mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
+{
+	int flags = fcntl(priv->eventc->fd, F_GETFL);
+	int ret = fcntl(priv->eventc->fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to change event channel FD.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	priv->intr_handle.fd = priv->eventc->fd;
+	priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&priv->intr_handle,
+				       mlx5_vdpa_interrupt_handler, priv)) {
+		priv->intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register CQE interrupt %d.", rte_errno);
+		return -rte_errno;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
+{
+	int retries = MLX5_VDPA_INTR_RETRIES;
+	int ret = -EAGAIN;
+
+	if (priv->intr_handle.fd) {
+		while (retries-- && ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(&priv->intr_handle,
+						    mlx5_vdpa_interrupt_handler,
+						    priv);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d "
+					"of CQ interrupt, retries = %d.",
+					priv->intr_handle.fd, retries);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+			}
+		}
+		memset(&priv->intr_handle, 0, sizeof(priv->intr_handle));
+	}
+}
+
+void
+mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (eqp->sw_qp)
+		claim_zero(mlx5_devx_cmd_destroy(eqp->sw_qp));
+	if (eqp->umem_obj)
+		claim_zero(mlx5_glue->devx_umem_dereg(eqp->umem_obj));
+	if (eqp->umem_buf)
+		rte_free(eqp->umem_buf);
+	if (eqp->fw_qp)
+		claim_zero(mlx5_devx_cmd_destroy(eqp->fw_qp));
+	mlx5_vdpa_cq_destroy(&eqp->cq);
+	memset(eqp, 0, sizeof(*eqp));
+}
+
+static int
+mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_RST2INIT_QP,
+					  eqp->sw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to INIT state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp, MLX5_CMD_OP_RST2INIT_QP,
+					  eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to INIT state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_INIT2RTR_QP,
+					  eqp->sw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RTR state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp, MLX5_CMD_OP_INIT2RTR_QP,
+					  eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RTR state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_RTR2RTS_QP,
+					  eqp->sw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RTS state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp, MLX5_CMD_OP_RTR2RTS_QP,
+					  eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RTS state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+{
+	struct mlx5_devx_qp_attr attr = {0};
+	uint16_t log_desc_n = rte_log2_u32(desc_n);
+	uint32_t umem_size = (1 << log_desc_n) * MLX5_WSEG_SIZE +
+						       sizeof(*eqp->db_rec) * 2;
+
+	if (mlx5_vdpa_event_qp_global_prepare(priv))
+		return -1;
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+		return -1;
+	attr.pd = priv->pdn;
+	eqp->fw_qp = mlx5_devx_cmd_create_qp(priv->ctx, &attr);
+	if (!eqp->fw_qp) {
+		DRV_LOG(ERR, "Failed to create FW QP(%u).", rte_errno);
+		goto error;
+	}
+	eqp->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
+	if (!eqp->umem_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for SW QP.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	eqp->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
+					       (void *)(uintptr_t)eqp->umem_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!eqp->umem_obj) {
+		DRV_LOG(ERR, "Failed to register umem for SW QP.");
+		goto error;
+	}
+	attr.uar_index = priv->uar->page_id;
+	attr.cqn = eqp->cq.cq->id;
+	attr.log_page_size = rte_log2_u32(sysconf(_SC_PAGESIZE));
+	attr.rq_size = 1 << log_desc_n;
+	attr.log_rq_stride = rte_log2_u32(MLX5_WSEG_SIZE);
+	attr.sq_size = 0; /* No need SQ. */
+	attr.dbr_umem_valid = 1;
+	attr.wq_umem_id = eqp->umem_obj->umem_id;
+	attr.wq_umem_offset = 0;
+	attr.dbr_umem_id = eqp->umem_obj->umem_id;
+	attr.dbr_address = (1 << log_desc_n) * MLX5_WSEG_SIZE;
+	eqp->sw_qp = mlx5_devx_cmd_create_qp(priv->ctx, &attr);
+	if (!eqp->sw_qp) {
+		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
+		goto error;
+	}
+	eqp->db_rec = RTE_PTR_ADD(eqp->umem_buf, (uintptr_t)attr.dbr_address);
+	if (mlx5_vdpa_qps2rts(eqp))
+		goto error;
+	/* First ringing. */
+	rte_write32(rte_cpu_to_be_32(1 << log_desc_n), &eqp->db_rec[0]);
+	return 0;
+error:
+	mlx5_vdpa_event_qp_destroy(eqp);
+	return -1;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (4 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-30 20:00     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
                     ` (8 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
The HW virtq object represents an emulated context for a VIRTIO_NET
virtqueue which was created and managed by a VIRTIO_NET driver as
defined in VIRTIO Specification.
Add support to prepare and release all the basic HW resources needed
the user virtqs emulation according to the rte_vhost configurations.
This patch prepares the basic configurations needed by DevX commands to
create a virtq.
Add new file mlx5_vdpa_virtq.c to manage virtq operations.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile          |   1 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  36 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 212 ++++++++++++++++++++++++++++++++++++
 5 files changed, 251 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 7f13756..353e262 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index c609f7c..e017f95 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -14,6 +14,7 @@ sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
 	'mlx5_vdpa_event.c',
+	'mlx5_vdpa_virtq.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index c67f93d..4d30b35 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -229,6 +229,7 @@
 		goto error;
 	}
 	SLIST_INIT(&priv->mr_list);
+	SLIST_INIT(&priv->virtq_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 30030b7..a7e2185 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -53,6 +53,19 @@ struct mlx5_vdpa_query_mr {
 	int is_indirect;
 };
 
+struct mlx5_vdpa_virtq {
+	SLIST_ENTRY(mlx5_vdpa_virtq) next;
+	uint16_t index;
+	uint16_t vq_size;
+	struct mlx5_devx_obj *virtq;
+	struct mlx5_vdpa_event_qp eqp;
+	struct {
+		struct mlx5dv_devx_umem *obj;
+		void *buf;
+		uint32_t size;
+	} umems[3];
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	int id; /* vDPA device id. */
@@ -69,6 +82,10 @@ struct mlx5_vdpa_priv {
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_uar *uar;
 	struct rte_intr_handle intr_handle;
+	struct mlx5_devx_obj *td;
+	struct mlx5_devx_obj *tis;
+	uint16_t nr_virtqs;
+	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -146,4 +163,23 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Release a virtq and all its related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create all the HW virtqs resources and all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
new file mode 100644
index 0000000..781bccf
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <string.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+static int
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
+{
+	int i;
+
+	if (virtq->virtq) {
+		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->virtq = NULL;
+	}
+	for (i = 0; i < 3; ++i) {
+		if (virtq->umems[i].obj)
+			claim_zero(mlx5_glue->devx_umem_dereg
+							 (virtq->umems[i].obj));
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+	}
+	memset(&virtq->umems, 0, sizeof(virtq->umems));
+	if (virtq->eqp.fw_qp)
+		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+	return 0;
+}
+
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_virtq *entry;
+	struct mlx5_vdpa_virtq *next;
+
+	entry = SLIST_FIRST(&priv->virtq_list);
+	while (entry) {
+		next = SLIST_NEXT(entry, next);
+		mlx5_vdpa_virtq_unset(entry);
+		SLIST_REMOVE(&priv->virtq_list, entry, mlx5_vdpa_virtq, next);
+		rte_free(entry);
+		entry = next;
+	}
+	SLIST_INIT(&priv->virtq_list);
+	if (priv->tis) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
+		priv->tis = NULL;
+	}
+	if (priv->td) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->td));
+		priv->td = NULL;
+	}
+}
+
+static uint64_t
+mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
+{
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+	uint64_t gpa = 0;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		if (hva >= reg->host_user_addr &&
+		    hva < reg->host_user_addr + reg->size) {
+			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
+			break;
+		}
+	}
+	return gpa;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv,
+		      struct mlx5_vdpa_virtq *virtq, int index)
+{
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	uint64_t gpa;
+	int ret;
+	int i;
+	uint16_t last_avail_idx;
+	uint16_t last_used_idx;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	/*
+	 * No need event QPs creation when the guest in poll mode or when the
+	 * capability allows it.
+	 */
+	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
+					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+						      MLX5_VIRTQ_EVENT_MODE_QP :
+						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+						&virtq->eqp);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
+		attr.qp_id = virtq->eqp.fw_qp->id;
+	} else {
+		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
+			" need event QPs and event mechanism.", index);
+	}
+	/* Setup 3 UMEMs for each virtq. */
+	for (i = 0; i < 3; ++i) {
+		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
+							  priv->caps.umems[i].b;
+		virtq->umems[i].buf = rte_zmalloc(__func__,
+						  virtq->umems[i].size, 4096);
+		if (!virtq->umems[i].buf) {
+			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				" %u.", i, index);
+			goto error;
+		}
+		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->ctx,
+							virtq->umems[i].buf,
+							virtq->umems[i].size,
+							IBV_ACCESS_LOCAL_WRITE);
+		if (!virtq->umems[i].obj) {
+			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				i, index);
+			goto error;
+		}
+		attr.umems[i].id = virtq->umems[i].obj->umem_id;
+		attr.umems[i].offset = 0;
+		attr.umems[i].size = virtq->umems[i].size;
+	}
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+		goto error;
+	}
+	attr.desc_addr = gpa;
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for used ring.");
+		goto error;
+	}
+	attr.used_addr = gpa;
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for available ring.");
+		goto error;
+	}
+	attr.available_addr = gpa;
+	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
+				 &last_used_idx);
+	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		"virtq %d.", priv->vid, last_avail_idx, last_used_idx, index);
+	attr.hw_available_index = last_avail_idx;
+	attr.hw_used_index = last_used_idx;
+	attr.q_size = vq.size;
+	attr.mkey = priv->gpa_mkey_index;
+	attr.tis_id = priv->tis->id;
+	attr.queue_index = index;
+	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
+	if (!virtq->virtq)
+		goto error;
+	return 0;
+error:
+	mlx5_vdpa_virtq_unset(virtq);
+	return -1;
+}
+
+int
+mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	uint32_t i;
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+
+	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transport domain.");
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	priv->tis = mlx5_devx_cmd_create_tis(priv->ctx, &tis_attr);
+	if (!priv->tis) {
+		DRV_LOG(ERR, "Failed to create TIS.");
+		goto error;
+	}
+	for (i = 0; i < nr_vring; i++) {
+		virtq = rte_zmalloc(__func__, sizeof(*virtq), 0);
+		if (!virtq || mlx5_vdpa_virtq_setup(priv, virtq, i)) {
+			if (virtq)
+				rte_free(virtq);
+			goto error;
+		}
+		SLIST_INSERT_HEAD(&priv->virtq_list, virtq, next);
+	}
+	priv->nr_virtqs = nr_vring;
+	return 0;
+error:
+	mlx5_vdpa_virtqs_release(priv);
+	return -1;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 07/13] vdpa/mlx5: support stateless offloads
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (5 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-30 20:08     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
                     ` (7 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for the next features in virtq configuration:
	VIRTIO_F_RING_PACKED,
	VIRTIO_NET_F_HOST_TSO4,
	VIRTIO_NET_F_HOST_TSO6,
	VIRTIO_NET_F_CSUM,
	VIRTIO_NET_F_GUEST_CSUM,
	VIRTIO_F_VERSION_1,
These features support depends in the DevX capabilities reported by the
device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |   7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  10 ----
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  10 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 108 ++++++++++++++++++++++++++++------
 4 files changed, 107 insertions(+), 28 deletions(-)
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index fea491d..e4ee34b 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -4,10 +4,15 @@
 ; Refer to default.ini for the full list of available driver features.
 ;
 [Features]
-
+csum                 = Y
+guest csum           = Y
+host tso4            = Y
+host tso6            = Y
+version 1            = Y
 any layout           = Y
 guest announce       = Y
 mq                   = Y
+packed               = Y
 proto mq             = Y
 proto log shmfd      = Y
 proto host notifier  = Y
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 4d30b35..dfbd0af 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,8 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
-#include <linux/virtio_net.h>
-
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
@@ -17,14 +15,6 @@
 #include "mlx5_vdpa.h"
 
 
-#ifndef VIRTIO_F_ORDER_PLATFORM
-#define VIRTIO_F_ORDER_PLATFORM 36
-#endif
-
-#ifndef VIRTIO_F_RING_PACKED
-#define VIRTIO_F_RING_PACKED 34
-#endif
-
 #define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
 			    (1ULL << VIRTIO_NET_F_MQ) | \
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index a7e2185..e530058 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -5,6 +5,7 @@
 #ifndef RTE_PMD_MLX5_VDPA_H_
 #define RTE_PMD_MLX5_VDPA_H_
 
+#include <linux/virtio_net.h>
 #include <sys/queue.h>
 
 #include <rte_vdpa.h>
@@ -20,6 +21,14 @@
 #define MLX5_VDPA_INTR_RETRIES 256
 #define MLX5_VDPA_INTR_RETRIES_USEC 1000
 
+#ifndef VIRTIO_F_ORDER_PLATFORM
+#define VIRTIO_F_ORDER_PLATFORM 36
+#endif
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
 struct mlx5_vdpa_cq {
 	uint16_t log_desc_n;
 	uint32_t cq_ci:24;
@@ -85,6 +94,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_devx_obj *td;
 	struct mlx5_devx_obj *tis;
 	uint16_t nr_virtqs;
+	uint64_t features; /* Negotiated features. */
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 781bccf..e27af28 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -57,6 +57,7 @@
 		claim_zero(mlx5_devx_cmd_destroy(priv->td));
 		priv->td = NULL;
 	}
+	priv->features = 0;
 }
 
 static uint64_t
@@ -94,6 +95,14 @@
 		return -1;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
+	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
+							VIRTIO_F_VERSION_1));
+	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
@@ -139,24 +148,29 @@
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
 	}
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
-		goto error;
-	}
-	attr.desc_addr = gpa;
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for used ring.");
-		goto error;
-	}
-	attr.used_addr = gpa;
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for available ring.");
-		goto error;
+	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.desc);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
+			goto error;
+		}
+		attr.desc_addr = gpa;
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.used);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get GPA for used ring.");
+			goto error;
+		}
+		attr.used_addr = gpa;
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.avail);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get GPA for available ring.");
+			goto error;
+		}
+		attr.available_addr = gpa;
 	}
-	attr.available_addr = gpa;
 	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
 				 &last_used_idx);
 	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
@@ -176,6 +190,61 @@
 	return -1;
 }
 
+static int
+mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+		if (!(priv->caps.virtio_queue_type & (1 <<
+						     MLX5_VIRTQ_TYPE_PACKED))) {
+			DRV_LOG(ERR, "Failed to configur PACKED mode for vdev "
+				"%d - it was not reported by HW/driver"
+				" capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4)) {
+		if (!priv->caps.tso_ipv4) {
+			DRV_LOG(ERR, "Failed to enable TSO4 for vdev %d - TSO4"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6)) {
+		if (!priv->caps.tso_ipv6) {
+			DRV_LOG(ERR, "Failed to enable TSO6 for vdev %d - TSO6"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_CSUM)) {
+		if (!priv->caps.tx_csum) {
+			DRV_LOG(ERR, "Failed to enable CSUM for vdev %d - CSUM"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)) {
+		if (!priv->caps.rx_csum) {
+			DRV_LOG(ERR, "Failed to enable GUEST CSUM for vdev %d"
+				" GUEST CSUM was not reported by HW/driver "
+				"capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_F_VERSION_1)) {
+		if (!priv->caps.virtio_version_1_0) {
+			DRV_LOG(ERR, "Failed to enable version 1 for vdev %d "
+				"version 1 was not reported by HW/driver"
+				" capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -183,7 +252,12 @@
 	struct mlx5_vdpa_virtq *virtq;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
 
+	if (ret || mlx5_vdpa_features_validate(priv)) {
+		DRV_LOG(ERR, "Failed to configure negotiated features.");
+		return -1;
+	}
 	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
 	if (!priv->td) {
 		DRV_LOG(ERR, "Failed to create transport domain.");
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 08/13] vdpa/mlx5: add basic steering configurations
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (6 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-31 15:10     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 09/13] vdpa/mlx5: support queue state operation Matan Azrad
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add a steering object to be managed by a new file mlx5_vdpa_steer.c.
Allow promiscuous flow to scatter the device Rx packets to the virtio
queues using RSS action.
In order to allow correct RSS in L3 and L4, split the flow to 7 flows
as required by the device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile          |   2 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  34 +++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 265 ++++++++++++++++++++++++++++++++++++
 5 files changed, 303 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 353e262..2f70a98 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -11,6 +11,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
+
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index e017f95..2849178 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 	'mlx5_vdpa_mem.c',
 	'mlx5_vdpa_event.c',
 	'mlx5_vdpa_virtq.c',
+	'mlx5_vdpa_steer.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index dfbd0af..12cfee2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -208,6 +208,7 @@
 			goto error;
 		}
 		priv->caps = attr.vdpa;
+		priv->log_max_rqt_size = attr.log_max_rqt_size;
 	}
 	priv->ctx = ctx;
 	priv->dev_addr.pci_addr = pci_dev->addr;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e530058..2b0b285 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -75,6 +75,18 @@ struct mlx5_vdpa_virtq {
 	} umems[3];
 };
 
+struct mlx5_vdpa_steer {
+	struct mlx5_devx_obj *rqt;
+	void *domain;
+	void *tbl;
+	struct {
+		struct mlx5dv_flow_matcher *matcher;
+		struct mlx5_devx_obj *tir;
+		void *tir_action;
+		void *flow;
+	} rss[7];
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	int id; /* vDPA device id. */
@@ -95,7 +107,9 @@ struct mlx5_vdpa_priv {
 	struct mlx5_devx_obj *tis;
 	uint16_t nr_virtqs;
 	uint64_t features; /* Negotiated features. */
+	uint16_t log_max_rqt_size;
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
+	struct mlx5_vdpa_steer steer;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -192,4 +206,24 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Unset steering and release all its related resources- stop traffic.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Setup steering and all its related resources to enable RSS trafic from the
+ * device to all the Rx host queues.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
new file mode 100644
index 0000000..f365c10
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -0,0 +1,265 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <netinet/in.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_common.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+int
+mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
+{
+	int ret __rte_unused;
+	unsigned i;
+
+	for (i = 0; i < RTE_DIM(priv->steer.rss); ++i) {
+		if (priv->steer.rss[i].flow) {
+			claim_zero(mlx5_glue->dv_destroy_flow
+						     (priv->steer.rss[i].flow));
+			priv->steer.rss[i].flow = NULL;
+		}
+		if (priv->steer.rss[i].tir_action) {
+			claim_zero(mlx5_glue->destroy_flow_action
+					       (priv->steer.rss[i].tir_action));
+			priv->steer.rss[i].tir_action = NULL;
+		}
+		if (priv->steer.rss[i].tir) {
+			claim_zero(mlx5_devx_cmd_destroy
+						      (priv->steer.rss[i].tir));
+			priv->steer.rss[i].tir = NULL;
+		}
+		if (priv->steer.rss[i].matcher) {
+			claim_zero(mlx5_glue->dv_destroy_flow_matcher
+						  (priv->steer.rss[i].matcher));
+			priv->steer.rss[i].matcher = NULL;
+		}
+	}
+	if (priv->steer.tbl) {
+		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
+		priv->steer.tbl = NULL;
+	}
+	if (priv->steer.domain) {
+		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
+		priv->steer.domain = NULL;
+	}
+	if (priv->steer.rqt) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
+		priv->steer.rqt = NULL;
+	}
+	return 0;
+}
+
+/*
+ * According to VIRTIO_NET Spec the virtqueues index identity its type by:
+ * 0 receiveq1
+ * 1 transmitq1
+ * ...
+ * 2(N-1) receiveqN
+ * 2(N-1)+1 transmitqN
+ * 2N controlq
+ */
+static uint8_t
+is_virtq_recvq(int virtq_index, int nr_vring)
+{
+	if (virtq_index % 2 == 0 && virtq_index != nr_vring - 1)
+		return 1;
+	return 0;
+}
+
+#define MLX5_VDPA_DEFAULT_RQT_SIZE 512
+static int __rte_unused
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_virtq *virtq;
+	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
+				 1 << priv->log_max_rqt_size);
+	struct mlx5_devx_rqt_attr *attr = rte_zmalloc(__func__, sizeof(*attr)
+						      + rqt_n *
+						      sizeof(uint32_t), 0);
+	uint32_t i = 0, j;
+	int ret = 0;
+
+	if (!attr) {
+		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+			attr->rq_list[i] = virtq->virtq->id;
+			i++;
+		}
+	}
+	for (j = 0; i != rqt_n; ++i, ++j)
+		attr->rq_list[i] = attr->rq_list[j];
+	attr->rq_type = MLX5_INLINE_Q_TYPE_VIRTQ;
+	attr->rqt_max_size = rqt_n;
+	attr->rqt_actual_size = rqt_n;
+	if (!priv->steer.rqt) {
+		priv->steer.rqt = mlx5_devx_cmd_create_rqt(priv->ctx, attr);
+		if (!priv->steer.rqt) {
+			DRV_LOG(ERR, "Failed to create RQT.");
+			ret = -rte_errno;
+		}
+	} else {
+		ret = mlx5_devx_cmd_modify_rqt(priv->steer.rqt, attr);
+		if (ret)
+			DRV_LOG(ERR, "Failed to modify RQT.");
+	}
+	rte_free(attr);
+	return ret;
+}
+
+static int __rte_unused
+mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_MLX5DV_DR
+	struct mlx5_devx_tir_attr tir_att = {
+		.disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT,
+		.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ,
+		.transport_domain = priv->td->id,
+		.indirect_table = priv->steer.rqt->id,
+		.rx_hash_symmetric = 1,
+		.rx_hash_toeplitz_key = { 0x2cc681d1, 0x5bdbf4f7, 0xfca28319,
+					  0xdb1a3e94, 0x6b9e38d9, 0x2c9c03d1,
+					  0xad9944a7, 0xd9563d59, 0x063c25f3,
+					  0xfc1fdc2a },
+	};
+	struct {
+		size_t size;
+		/**< Size of match value. Do NOT split size and key! */
+		uint32_t buf[MLX5_ST_SZ_DW(fte_match_param)];
+		/**< Matcher value. This value is used as the mask or a key. */
+	} matcher_mask = {
+				.size = sizeof(matcher_mask.buf),
+			},
+	  matcher_value = {
+				.size = sizeof(matcher_value.buf),
+			};
+	struct mlx5dv_flow_matcher_attr dv_attr = {
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.match_mask = (void *)&matcher_mask,
+	};
+	void *match_m = matcher_mask.buf;
+	void *match_v = matcher_value.buf;
+	void *headers_m = MLX5_ADDR_OF(fte_match_param, match_m, outer_headers);
+	void *headers_v = MLX5_ADDR_OF(fte_match_param, match_v, outer_headers);
+	void *actions[1];
+	const uint8_t l3_hash =
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP) |
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP);
+	const uint8_t l4_hash =
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT) |
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT);
+	enum { PRIO, CRITERIA, IP_VER_M, IP_VER_V, IP_PROT_M, IP_PROT_V, L3_BIT,
+	       L4_BIT, HASH, END};
+	const uint8_t vars[RTE_DIM(priv->steer.rss)][END] = {
+		{ 7, 0, 0, 0, 0, 0, 0, 0, 0 },
+		{ 6, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0, 0,
+		 MLX5_L3_PROT_TYPE_IPV4, 0, l3_hash },
+		{ 6, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0, 0,
+		 MLX5_L3_PROT_TYPE_IPV6, 0, l3_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0xff,
+		 IPPROTO_UDP, MLX5_L3_PROT_TYPE_IPV4, MLX5_L4_PROT_TYPE_UDP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0xff,
+		 IPPROTO_TCP, MLX5_L3_PROT_TYPE_IPV4, MLX5_L4_PROT_TYPE_TCP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0xff,
+		 IPPROTO_UDP, MLX5_L3_PROT_TYPE_IPV6, MLX5_L4_PROT_TYPE_UDP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0xff,
+		 IPPROTO_TCP, MLX5_L3_PROT_TYPE_IPV6, MLX5_L4_PROT_TYPE_TCP,
+		 l3_hash | l4_hash },
+	};
+	unsigned i;
+
+	for (i = 0; i < RTE_DIM(priv->steer.rss); ++i) {
+		dv_attr.priority = vars[i][PRIO];
+		dv_attr.match_criteria_enable = vars[i][CRITERIA];
+		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_version,
+			 vars[i][IP_VER_M]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_version,
+			 vars[i][IP_VER_V]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_protocol,
+			 vars[i][IP_PROT_M]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
+			 vars[i][IP_PROT_V]);
+		tir_att.rx_hash_field_selector_outer.l3_prot_type =
+								vars[i][L3_BIT];
+		tir_att.rx_hash_field_selector_outer.l4_prot_type =
+								vars[i][L4_BIT];
+		tir_att.rx_hash_field_selector_outer.selected_fields =
+								  vars[i][HASH];
+		priv->steer.rss[i].matcher = mlx5_glue->dv_create_flow_matcher
+					 (priv->ctx, &dv_attr, priv->steer.tbl);
+		if (!priv->steer.rss[i].matcher) {
+			DRV_LOG(ERR, "Failed to create matcher %d.", i);
+			goto error;
+		}
+		priv->steer.rss[i].tir = mlx5_devx_cmd_create_tir(priv->ctx,
+								  &tir_att);
+		if (!priv->steer.rss[i].tir) {
+			DRV_LOG(ERR, "Failed to create TIR %d.", i);
+			goto error;
+		}
+		priv->steer.rss[i].tir_action =
+				mlx5_glue->dv_create_flow_action_dest_devx_tir
+						  (priv->steer.rss[i].tir->obj);
+		if (!priv->steer.rss[i].tir_action) {
+			DRV_LOG(ERR, "Failed to create TIR action %d.", i);
+			goto error;
+		}
+		actions[0] = priv->steer.rss[i].tir_action;
+		priv->steer.rss[i].flow = mlx5_glue->dv_create_flow
+					(priv->steer.rss[i].matcher,
+					 (void *)&matcher_value, 1, actions);
+		if (!priv->steer.rss[i].flow) {
+			DRV_LOG(ERR, "Failed to create flow %d.", i);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	/* Resources will be freed by the caller. */
+	return -1;
+#else
+	(void)priv;
+	return -ENOTSUP;
+#endif /* HAVE_MLX5DV_DR */
+}
+
+int
+mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_MLX5DV_DR
+	if (mlx5_vdpa_rqt_prepare(priv))
+		return -1;
+	priv->steer.domain = mlx5_glue->dr_create_domain(priv->ctx,
+						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		goto error;
+	}
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		goto error;
+	}
+	if (mlx5_vdpa_rss_flows_create(priv))
+		goto error;
+	return 0;
+error:
+	mlx5_vdpa_steer_unset(priv);
+	return -1;
+#else
+	(void)priv;
+	return -ENOTSUP;
+#endif /* HAVE_MLX5DV_DR */
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 09/13] vdpa/mlx5: support queue state operation
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (7 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-31 15:32     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 10/13] vdpa/mlx5: map doorbell Matan Azrad
                     ` (5 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for set_vring_state operation.
Using DevX API the virtq state can be changed as described in PRM:
	enable - move to ready state.
	disable - move to suspend state.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 23 ++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 15 +++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 22 ++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 25 +++++++++++++++++++++----
 4 files changed, 78 insertions(+), 7 deletions(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 12cfee2..71189c4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -106,13 +106,34 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_set_vring_state(int vid, int vring, int state)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	struct mlx5_vdpa_virtq *virtq = NULL;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	SLIST_FOREACH(virtq, &priv->virtq_list, next)
+		if (virtq->index == vring)
+			break;
+	if (!virtq) {
+		DRV_LOG(ERR, "Invalid or unconfigured vring id: %d.", vring);
+		return -EINVAL;
+	}
+	return mlx5_vdpa_virtq_enable(virtq, state);
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = NULL,
 	.dev_close = NULL,
-	.set_vring_state = NULL,
+	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = NULL,
 	.migration_done = NULL,
 	.get_vfio_group_fd = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2b0b285..383a33e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -64,8 +64,10 @@ struct mlx5_vdpa_query_mr {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
+	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
+	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_vdpa_event_qp eqp;
 	struct {
@@ -207,6 +209,19 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 
 /**
+ * Enable\Disable virtq..
+ *
+ * @param[in] virtq
+ *   The vdpa driver private virtq structure.
+ * @param[in] enable
+ *   Set to enable, otherwise disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_virtq *virtq, int enable);
+
+/**
  * Unset steering and release all its related resources- stop traffic.
  *
  * @param[in] priv
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index f365c10..36017f1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -73,7 +73,7 @@
 }
 
 #define MLX5_VDPA_DEFAULT_RQT_SIZE 512
-static int __rte_unused
+static int
 mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_vdpa_virtq *virtq;
@@ -91,7 +91,8 @@
 		return -ENOMEM;
 	}
 	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
-		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+		if (is_virtq_recvq(virtq->index, priv->nr_virtqs) &&
+		    virtq->enable) {
 			attr->rq_list[i] = virtq->virtq->id;
 			i++;
 		}
@@ -116,6 +117,23 @@
 	return ret;
 }
 
+int
+mlx5_vdpa_virtq_enable(struct mlx5_vdpa_virtq *virtq, int enable)
+{
+	struct mlx5_vdpa_priv *priv = virtq->priv;
+	int ret = 0;
+
+	if (virtq->enable == !!enable)
+		return 0;
+	virtq->enable = !!enable;
+	if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+		ret = mlx5_vdpa_rqt_prepare(priv);
+		if (ret)
+			virtq->enable = !enable;
+	}
+	return ret;
+}
+
 static int __rte_unused
 mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e27af28..60aa040 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -15,13 +15,13 @@
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int i;
+	unsigned int;
 
 	if (virtq->virtq) {
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 		virtq->virtq = NULL;
 	}
-	for (i = 0; i < 3; ++i) {
+	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 		if (virtq->umems[i].obj)
 			claim_zero(mlx5_glue->devx_umem_dereg
 							 (virtq->umems[i].obj));
@@ -60,6 +60,19 @@
 	priv->features = 0;
 }
 
+static int
+mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
+{
+	struct mlx5_devx_virtq_attr attr = {
+			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.state = state ? MLX5_VIRTQ_STATE_RDY :
+					 MLX5_VIRTQ_STATE_SUSPEND,
+			.queue_index = virtq->index,
+	};
+
+	return mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+}
+
 static uint64_t
 mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 {
@@ -86,7 +99,7 @@
 	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
-	int i;
+	unsigned i;
 	uint16_t last_avail_idx;
 	uint16_t last_used_idx;
 
@@ -125,7 +138,7 @@
 			" need event QPs and event mechanism.", index);
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < 3; ++i) {
+	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
 							  priv->caps.umems[i].b;
 		virtq->umems[i].buf = rte_zmalloc(__func__,
@@ -182,8 +195,12 @@
 	attr.tis_id = priv->tis->id;
 	attr.queue_index = index;
 	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
+	virtq->priv = priv;
 	if (!virtq->virtq)
 		goto error;
+	if (mlx5_vdpa_virtq_modify(virtq, 1))
+		goto error;
+	virtq->enable = 1;
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 10/13] vdpa/mlx5: map doorbell
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (8 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 09/13] vdpa/mlx5: support queue state operation Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-31 15:40     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 11/13] vdpa/mlx5: support live migration Matan Azrad
                     ` (4 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
The HW supports only 4 bytes doorbell writing detection.
The virtio device set only 2 bytes when it rings the doorbell.
Map the virtio doorbell detected by the virtio queue kickfd to the HW
VAR space when it expects to get the virtio emulation doorbell.
Use the EAL interrupt mechanism to get notification when a new event
appears in kickfd by the guest and write 4 bytes to the HW doorbell space
in the notification callback.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  3 ++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 82 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 383a33e..af78ea1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -75,6 +75,7 @@ struct mlx5_vdpa_virtq {
 		void *buf;
 		uint32_t size;
 	} umems[3];
+	struct rte_intr_handle intr_handle;
 };
 
 struct mlx5_vdpa_steer {
@@ -112,6 +113,8 @@ struct mlx5_vdpa_priv {
 	uint16_t log_max_rqt_size;
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	struct mlx5_vdpa_steer steer;
+	struct mlx5dv_var *var;
+	void *virtq_db_addr;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 60aa040..91347e9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -2,9 +2,12 @@
  * Copyright 2019 Mellanox Technologies, Ltd
  */
 #include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
 
 #include <rte_malloc.h>
 #include <rte_errno.h>
+#include <rte_io.h>
 
 #include <mlx5_common.h>
 
@@ -12,11 +15,52 @@
 #include "mlx5_vdpa.h"
 
 
+static void
+mlx5_vdpa_virtq_handler(void *cb_arg)
+{
+	struct mlx5_vdpa_virtq *virtq = cb_arg;
+	struct mlx5_vdpa_priv *priv = virtq->priv;
+	uint64_t buf;
+	int nbytes;
+
+	do {
+		nbytes = read(virtq->intr_handle.fd, &buf, 8);
+		if (nbytes < 0) {
+			if (errno == EINTR ||
+			    errno == EWOULDBLOCK ||
+			    errno == EAGAIN)
+				continue;
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+				virtq->index, strerror(errno));
+		}
+		break;
+	} while (1);
+	rte_write32(virtq->index, priv->virtq_db_addr);
+	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	unsigned int;
+	unsigned int i;
+	int retries = MLX5_VDPA_INTR_RETRIES;
+	int ret = -EAGAIN;
 
+	if (virtq->intr_handle.fd) {
+		while (retries-- && ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(&virtq->intr_handle,
+							mlx5_vdpa_virtq_handler,
+							virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d "
+					"of virtq %d interrupt, retries = %d.",
+					virtq->intr_handle.fd,
+					(int)virtq->index, retries);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+			}
+		}
+		memset(&virtq->intr_handle, 0, sizeof(virtq->intr_handle));
+	}
 	if (virtq->virtq) {
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 		virtq->virtq = NULL;
@@ -57,6 +101,14 @@
 		claim_zero(mlx5_devx_cmd_destroy(priv->td));
 		priv->td = NULL;
 	}
+	if (priv->virtq_db_addr) {
+		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
+		priv->virtq_db_addr = NULL;
+	}
+	if (priv->var) {
+		mlx5_glue->dv_free_var(priv->var);
+		priv->var = NULL;
+	}
 	priv->features = 0;
 }
 
@@ -201,6 +253,17 @@
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->enable = 1;
+	virtq->intr_handle.fd = vq.kickfd;
+	virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&virtq->intr_handle,
+				       mlx5_vdpa_virtq_handler, virtq)) {
+		virtq->intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register virtq %d interrupt.", index);
+		goto error;
+	} else {
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			virtq->intr_handle.fd, index);
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
@@ -275,6 +338,23 @@
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
 		return -1;
 	}
+	priv->var = mlx5_glue->dv_alloc_var(priv->ctx, 0);
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.\n", errno);
+		return -1;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, priv->ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		goto error;
+	} else {
+		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+			priv->virtq_db_addr);
+	}
 	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
 	if (!priv->td) {
 		DRV_LOG(ERR, "Failed to create transport domain.");
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 11/13] vdpa/mlx5: support live migration
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (9 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 10/13] vdpa/mlx5: map doorbell Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-31 16:01     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 12/13] vdpa/mlx5: support close and config operations Matan Azrad
                     ` (3 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for live migration feature by the HW:
	Create a single Mkey that maps the memory address space of the
		VHOST live migration log file.
	Modify VIRTIO_NET_Q object and provide vhost_log_page,
		dirty_bitmap_mkey, dirty_bitmap_size, dirty_bitmap_addr
		and dirty_bitmap_dump_enable.
	Modify VIRTIO_NET_Q object and move state to SUSPEND.
	Query VIRTIO_NET_Q and get hw_available_idx and hw_used_idx.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |   1 +
 drivers/vdpa/mlx5/Makefile            |   1 +
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  44 +++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  55 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 130 ++++++++++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   7 +-
 7 files changed, 236 insertions(+), 3 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index e4ee34b..1da9c1b 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -9,6 +9,7 @@ guest csum           = Y
 host tso4            = Y
 host tso6            = Y
 version 1            = Y
+log all              = Y
 any layout           = Y
 guest announce       = Y
 mq                   = Y
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 2f70a98..4d1f528 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -12,6 +12,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_lm.c
 
 
 # Basic CFLAGS.
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 2849178..2e521b8 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -16,6 +16,7 @@ sources = files(
 	'mlx5_vdpa_event.c',
 	'mlx5_vdpa_virtq.c',
 	'mlx5_vdpa_steer.c',
+	'mlx5_vdpa_lm.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 71189c4..4ce0ba0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -19,7 +19,8 @@
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
 			    (1ULL << VIRTIO_NET_F_MQ) | \
 			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
-			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
+			    (1ULL << VIRTIO_F_ORDER_PLATFORM) | \
+			    (1ULL << VHOST_F_LOG_ALL))
 
 #define MLX5_VDPA_PROTOCOL_FEATURES \
 			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
@@ -127,6 +128,45 @@
 	return mlx5_vdpa_virtq_enable(virtq, state);
 }
 
+static int
+mlx5_vdpa_features_set(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	uint64_t log_base, log_size;
+	uint64_t features;
+	int ret;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	ret = rte_vhost_get_negotiated_features(vid, &features);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to get negotiated features.");
+		return ret;
+	}
+	if (RTE_VHOST_NEED_LOG(features)) {
+		ret = rte_vhost_get_log_base(vid, &log_base, &log_size);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to get log base.");
+			return ret;
+		}
+		ret = mlx5_vdpa_dirty_bitmap_set(priv, log_base, log_size);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set dirty bitmap.");
+			return ret;
+		}
+		DRV_LOG(INFO, "mlx5 vdpa: enabling dirty logging...");
+		ret = mlx5_vdpa_logging_enable(priv, 1);
+		if (ret) {
+			DRV_LOG(ERR, "Failed t enable dirty logging.");
+			return ret;
+		}
+	}
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
@@ -134,7 +174,7 @@
 	.dev_conf = NULL,
 	.dev_close = NULL,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
-	.set_features = NULL,
+	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
 	.get_vfio_group_fd = NULL,
 	.get_vfio_device_fd = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index af78ea1..70264e4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -244,4 +244,59 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Enable\Disable live migration logging.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] enable
+ *   Set for enable, unset for disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable);
+
+/**
+ * Set dirty bitmap logging to allow live migration.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] log_base
+ *   Vhost log base.
+ * @param[in] log_size
+ *   Vhost log size.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
+			       uint64_t log_size);
+
+/**
+ * Log all virtqs information for live migration.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] enable
+ *   Set for enable, unset for disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Modify virtq state to be ready or suspend.
+ *
+ * @param[in] virtq
+ *   The vdpa driver private virtq structure.
+ * @param[in] state
+ *   Set for ready, otherwise suspend.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
new file mode 100644
index 0000000..cfeec5f
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+int
+mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
+{
+	struct mlx5_devx_virtq_attr attr = {
+		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.dirty_bitmap_dump_enable = enable,
+	};
+	struct mlx5_vdpa_virtq *virtq;
+
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		attr.queue_index = virtq->index;
+		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to modify virtq %d logging.",
+				virtq->index);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
+			   uint64_t log_size)
+{
+	struct mlx5_devx_mkey_attr mkey_attr = {
+			.addr = (uintptr_t)log_base,
+			.size = log_size,
+			.pd = priv->pdn,
+			.pg_access = 1,
+			.klm_array = NULL,
+			.klm_num = 0,
+	};
+	struct mlx5_devx_virtq_attr attr = {
+		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.dirty_bitmap_addr = log_base,
+		.dirty_bitmap_size = log_size,
+	};
+	struct mlx5_vdpa_query_mr *mr = rte_malloc(__func__, sizeof(*mr), 0);
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!mr) {
+		DRV_LOG(ERR, "Failed to allocate mem for lm mr.");
+		return -1;
+	}
+	mr->umem = mlx5_glue->devx_umem_reg(priv->ctx,
+					    (void *)(uintptr_t)log_base,
+					    log_size, IBV_ACCESS_LOCAL_WRITE);
+	if (!mr->umem) {
+		DRV_LOG(ERR, "Failed to register umem for lm mr.");
+		goto err;
+	}
+	mkey_attr.umem_id = mr->umem->umem_id;
+	mr->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+	if (!mr->mkey) {
+		DRV_LOG(ERR, "Failed to create Mkey for lm.");
+		goto err;
+	}
+	attr.dirty_bitmap_mkey = mr->mkey->id;
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		attr.queue_index = virtq->index;
+		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to modify virtq %d for lm.",
+				virtq->index);
+			goto err;
+		}
+	}
+	mr->is_indirect = 0;
+	SLIST_INSERT_HEAD(&priv->mr_list, mr, next);
+	return 0;
+err:
+	if (mr->mkey)
+		mlx5_devx_cmd_destroy(mr->mkey);
+	if (mr->umem)
+		mlx5_glue->devx_umem_dereg(mr->umem);
+	rte_free(mr);
+	return -1;
+}
+
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
+int
+mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	uint64_t features;
+	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
+
+	if (ret) {
+		DRV_LOG(ERR, "Failed to get negotiated features.");
+		return -1;
+	}
+	if (RTE_VHOST_NEED_LOG(features)) {
+		SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+			ret = mlx5_vdpa_virtq_modify(virtq, 0);
+			if (ret)
+				return -1;
+			if (mlx5_devx_cmd_query_virtq(virtq->virtq, &attr)) {
+				DRV_LOG(ERR, "Failed to query virtq %d.",
+					virtq->index);
+				return -1;
+			}
+			DRV_LOG(INFO, "Query vid %d vring %d: hw_available_idx="
+				"%d, hw_used_index=%d", priv->vid, virtq->index,
+				attr.hw_available_index, attr.hw_used_index);
+			ret = rte_vhost_set_vring_base(priv->vid, virtq->index,
+						       attr.hw_available_index,
+						       attr.hw_used_index);
+			if (ret) {
+				DRV_LOG(ERR, "Failed to set virtq %d base.",
+					virtq->index);
+				return -1;
+			}
+			rte_vhost_log_used_vring(priv->vid, virtq->index, 0,
+				       MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+		}
+	}
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 91347e9..af058a1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -112,7 +112,7 @@
 	priv->features = 0;
 }
 
-static int
+int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
@@ -253,6 +253,11 @@
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->enable = 1;
+	virtq->priv = priv;
+	/* Be sure notifications are not missed during configuration. */
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	rte_write32(virtq->index, priv->virtq_db_addr);
+	/* Setup doorbell mapping. */
 	virtq->intr_handle.fd = vq.kickfd;
 	virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
 	if (rte_intr_callback_register(&virtq->intr_handle,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 12/13] vdpa/mlx5: support close and config operations
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (10 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 11/13] vdpa/mlx5: support live migration Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-31 16:06     ` Maxime Coquelin
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 13/13] vdpa/mlx5: disable ROCE Matan Azrad
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Support dev_conf and dev_conf operations.
These operations allow vdpa traffic.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 58 ++++++++++++++++++++++++++++++++++++++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 4ce0ba0..c8ef3b4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -167,12 +167,59 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	int ret = 0;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	if (priv->configured)
+		ret |= mlx5_vdpa_lm_log(priv);
+	mlx5_vdpa_cqe_event_unset(priv);
+	ret |= mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_event_qp_global_release(priv);
+	mlx5_vdpa_mem_dereg(priv);
+	priv->configured = 0;
+	priv->vid = 0;
+	return ret;
+}
+
+static int
+mlx5_vdpa_dev_config(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
+		return -1;
+	}
+	priv->vid = vid;
+	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_virtqs_prepare(priv) ||
+	    mlx5_vdpa_steer_setup(priv) || mlx5_vdpa_cqe_event_setup(priv)) {
+		mlx5_vdpa_dev_close(vid);
+		return -1;
+	}
+	priv->configured = 1;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
-	.dev_conf = NULL,
-	.dev_close = NULL,
+	.dev_conf = mlx5_vdpa_dev_config,
+	.dev_close = mlx5_vdpa_dev_close,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
@@ -320,12 +367,15 @@
 			break;
 		}
 	}
-	if (found) {
+	if (found)
 		TAILQ_REMOVE(&priv_list, priv, next);
+	pthread_mutex_unlock(&priv_list_lock);
+	if (found) {
+		if (priv->configured)
+			mlx5_vdpa_dev_close(priv->vid);
 		mlx5_glue->close_device(priv->ctx);
 		rte_free(priv);
 	}
-	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 70264e4..75e96d6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -92,6 +92,7 @@ struct mlx5_vdpa_steer {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	uint8_t configured;
 	int id; /* vDPA device id. */
 	int vid; /* vhost device id. */
 	struct ibv_context *ctx; /* Device context. */
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v2 13/13] vdpa/mlx5: disable ROCE
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (11 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 12/13] vdpa/mlx5: support close and config operations Matan Azrad
@ 2020-01-29 10:09   ` Matan Azrad
  2020-01-31 16:42     ` Maxime Coquelin
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
  2020-02-03 13:24   ` [dpdk-dev] [PATCH v2 " Maxime Coquelin
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 10:09 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
In order to support virtio queue creation by the FW, ROCE mode
should be disabled in the device.
Do it by netlink which is like the devlink tool commands:
	1. devlink dev param set pci/[pci] name enable_roce value false
	   cmode driverinit
    	2. devlink dev reload pci/[pci]
Or by sysfs which is like:
	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
The IB device is matched again after ROCE disabling.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile    |   2 +-
 drivers/vdpa/mlx5/meson.build |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c | 192 ++++++++++++++++++++++++++++++++++--------
 3 files changed, 161 insertions(+), 35 deletions(-)
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 4d1f528..5cdcdd4 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -29,7 +29,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 LDLIBS += -lrte_common_mlx5
-LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci -lrte_sched
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_pci -lrte_bus_pci -lrte_sched
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 2e521b8..e5236b3 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -9,7 +9,7 @@ endif
 
 fmt_name = 'mlx5_vdpa'
 allow_experimental_apis = true
-deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
+deps += ['hash', 'common_mlx5', 'vhost', 'pci', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index c8ef3b4..e098be9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,15 +1,19 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
+#include <unistd.h>
+
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
+#include <rte_pci.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
@@ -228,6 +232,145 @@
 	.get_notify_area = NULL,
 };
 
+static struct ibv_device *
+mlx5_vdpa_get_ib_device_match(struct rte_pci_addr *addr)
+{
+	int n;
+	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
+	struct ibv_device *ibv_match = NULL;
+
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		return NULL;
+	}
+	while (n-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
+		if (mlx5_dev_to_pci_addr(ibv_list[n]->ibdev_path, &pci_addr))
+			continue;
+		if (memcmp(addr, &pci_addr, sizeof(pci_addr)))
+			continue;
+		ibv_match = ibv_list[n];
+		break;
+	}
+	if (!ibv_match)
+		rte_errno = ENOENT;
+	mlx5_glue->free_device_list(ibv_list);
+	return ibv_match;
+}
+
+/* Try to disable ROCE by Netlink\Devlink. */
+static int
+mlx5_vdpa_nl_roce_disable(const char *addr)
+{
+	int nlsk_fd = mlx5_nl_init(NETLINK_GENERIC);
+	int devlink_id;
+	int enable;
+	int ret;
+
+	if (nlsk_fd < 0)
+		return nlsk_fd;
+	devlink_id = mlx5_nl_devlink_family_id_get(nlsk_fd);
+	if (devlink_id < 0) {
+		ret = devlink_id;
+		DRV_LOG(DEBUG, "Failed to get devlink id for ROCE operations by"
+			" Netlink.");
+		goto close;
+	}
+	ret = mlx5_nl_enable_roce_get(nlsk_fd, devlink_id, addr, &enable);
+	if (ret) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable by Netlink: %d.",
+			ret);
+		goto close;
+	} else if (!enable) {
+		DRV_LOG(INFO, "ROCE has already disabled(Netlink).");
+		goto close;
+	}
+	ret = mlx5_nl_enable_roce_set(nlsk_fd, devlink_id, addr, 0);
+	if (ret)
+		DRV_LOG(DEBUG, "Failed to disable ROCE by Netlink: %d.", ret);
+	else
+		DRV_LOG(INFO, "ROCE is disabled by Netlink successfully.");
+close:
+	close(nlsk_fd);
+	return ret;
+}
+
+/* Try to disable ROCE by sysfs. */
+static int
+mlx5_vdpa_sys_roce_disable(const char *addr)
+{
+	FILE *file_o;
+	int enable;
+	int ret;
+
+	MKSTR(file_p, "/sys/bus/pci/devices/%s/roce_enable", addr);
+	file_o = fopen(file_p, "rb");
+	if (!file_o) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	ret = fscanf(file_o, "%d", &enable);
+	if (ret != 1) {
+		rte_errno = EINVAL;
+		ret = EINVAL;
+		goto close;
+	} else if (!enable) {
+		ret = 0;
+		DRV_LOG(INFO, "ROCE has already disabled(sysfs).");
+		goto close;
+	}
+	fclose(file_o);
+	file_o = fopen(file_p, "wb");
+	if (!file_o) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	fprintf(file_o, "0\n");
+	ret = 0;
+close:
+	if (ret)
+		DRV_LOG(DEBUG, "Failed to disable ROCE by sysfs: %d.", ret);
+	else
+		DRV_LOG(INFO, "ROCE is disabled by sysfs successfully.");
+	fclose(file_o);
+	return ret;
+}
+
+#define MLX5_VDPA_MAX_RETRIES 20
+#define MLX5_VDPA_USEC 1000
+static int
+mlx5_vdpa_roce_disable(struct rte_pci_addr *addr, struct ibv_device **ibv)
+{
+	char addr_name[64] = {0};
+
+	rte_pci_device_name(addr, addr_name, sizeof(addr_name));
+	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
+	if (mlx5_vdpa_nl_roce_disable(addr_name) == 0 ||
+	    mlx5_vdpa_sys_roce_disable(addr_name) == 0) {
+		/*
+		 * Succeed to disable ROCE, wait for the IB device to appear
+		 * again after reload.
+		 */
+		int r;
+		struct ibv_device *ibv_new;
+
+		for (r = MLX5_VDPA_MAX_RETRIES; r; r--) {
+			ibv_new = mlx5_vdpa_get_ib_device_match(addr);
+			if (ibv_new) {
+				*ibv = ibv_new;
+				return 0;
+			}
+			usleep(MLX5_VDPA_USEC);
+		}
+		DRV_LOG(ERR, "Cannot much device %s after ROCE disable, "
+			"retries exceed %d", addr_name, MLX5_VDPA_MAX_RETRIES);
+		rte_errno = EAGAIN;
+	}
+	return -rte_errno;
+}
+
 /**
  * DPDK callback to register a PCI device.
  *
@@ -245,8 +388,7 @@
 mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		    struct rte_pci_device *pci_dev __rte_unused)
 {
-	struct ibv_device **ibv_list;
-	struct ibv_device *ibv_match = NULL;
+	struct ibv_device *ibv;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx = NULL;
 	struct mlx5_hca_attr attr;
@@ -257,42 +399,26 @@
 			" driver.");
 		return 1;
 	}
-	errno = 0;
-	ibv_list = mlx5_glue->get_device_list(&ret);
-	if (!ibv_list) {
-		rte_errno = errno;
-		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
-		return -ENOSYS;
-	}
-	while (ret-- > 0) {
-		struct rte_pci_addr pci_addr;
-
-		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
-		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
-			continue;
-		if (pci_dev->addr.domain != pci_addr.domain ||
-		    pci_dev->addr.bus != pci_addr.bus ||
-		    pci_dev->addr.devid != pci_addr.devid ||
-		    pci_dev->addr.function != pci_addr.function)
-			continue;
-		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-			ibv_list[ret]->name);
-		ibv_match = ibv_list[ret];
-		break;
-	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!ibv_match) {
+	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
+	if (!ibv) {
 		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
+			PCI_PRI_FMT ".", pci_dev->addr.domain,
+			pci_dev->addr.bus, pci_dev->addr.devid,
+			pci_dev->addr.function);
 		rte_errno = ENOENT;
 		return -rte_errno;
+	} else {
+		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
+			ibv->name);
+	}
+	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
+		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
+			ibv->name);
+		//return -rte_errno;
 	}
-	ctx = mlx5_glue->dv_open_device(ibv_match);
+	ctx = mlx5_glue->dv_open_device(ibv);
 	if (!ctx) {
-		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
-			ibv_match->name);
+		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library
  2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
                       ` (24 preceding siblings ...)
  2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
@ 2020-01-29 12:38     ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 01/25] net/mlx5: separate DevX commands interface Matan Azrad
                         ` (25 more replies)
  25 siblings, 26 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Steps:
- Prepare net/mlx5 for code sharing.
- Introduce new common lib for mlx5 devices.
- Share code from net/mlx5 to common/mlx5.
v2:
- Reorder patches for 2 serieses - this is the first one for common directory and vDPA preparation,
  the second will be sent later for vDPA new driver part.
- Fix spelling and per patch complition issues.
- moved to use claim_zero instead of pure asserts.
- improve title names.
v3:
rebase.
v4:
Change devargs argument to get class name.
Actually only the last 4 pathes here were changed.
Matan Azrad (25):
  net/mlx5: separate DevX commands interface
  drivers: introduce mlx5 common library
  common/mlx5: share the mlx5 glue reference
  common/mlx5: share mlx5 PCI device detection
  common/mlx5: share mlx5 devices information
  common/mlx5: share CQ entry check
  common/mlx5: add query vDPA DevX capabilities
  common/mlx5: glue null memory region allocation
  common/mlx5: support DevX indirect mkey creation
  common/mlx5: glue event queue query
  common/mlx5: glue event interrupt commands
  common/mlx5: glue UAR allocation
  common/mlx5: add DevX command to create CQ
  common/mlx5: glue VAR allocation
  common/mlx5: add DevX virtq commands
  common/mlx5: add support for DevX QP operations
  common/mlx5: allow type configuration for DevX RQT
  common/mlx5: add TIR field constants
  common/mlx5: add DevX command to modify RQT
  common/mlx5: get DevX capability for max RQT size
  net/mlx5: select driver by class device argument
  net/mlx5: separate Netlink command interface
  net/mlx5: reduce Netlink commands dependencies
  common/mlx5: share Netlink commands
  common/mlx5: support ROCE disable through Netlink
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  347 ++++
 drivers/common/mlx5/meson.build                 |  210 ++
 drivers/common/mlx5/mlx5_common.c               |  332 +++
 drivers/common/mlx5/mlx5_common.h               |  223 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            | 1530 ++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  351 ++++
 drivers/common/mlx5/mlx5_glue.c                 | 1296 ++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  305 +++
 drivers/common/mlx5/mlx5_nl.c                   | 1699 +++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   63 +
 drivers/common/mlx5/mlx5_prm.h                  | 2542 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   51 +
 drivers/net/mlx5/Makefile                       |  307 +--
 drivers/net/mlx5/meson.build                    |  257 +--
 drivers/net/mlx5/mlx5.c                         |  197 +-
 drivers/net/mlx5/mlx5.h                         |  326 +--
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_devx_cmds.c               |  969 ---------
 drivers/net/mlx5/mlx5_ethdev.c                  |  161 +-
 drivers/net/mlx5/mlx5_flow.c                    |   12 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |   12 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 ----------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ---
 drivers/net/mlx5/mlx5_mac.c                     |   16 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_nl.c                      | 1402 -------------
 drivers/net/mlx5/mlx5_prm.h                     | 1888 -----------------
 drivers/net/mlx5/mlx5_rss.c                     |    2 +-
 drivers/net/mlx5/mlx5_rxmode.c                  |   12 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    7 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |   46 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    5 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |  137 +-
 mk/rte.app.mk                                   |    1 +
 49 files changed, 9286 insertions(+), 7000 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 01/25] net/mlx5: separate DevX commands interface
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 02/25] drivers: introduce mlx5 common library Matan Azrad
                         ` (24 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The DevX commands interface is included in the mlx5.h file with a lot
of other PMD interfaces.
As an arrangement to make the DevX commands shared with different PMDs,
this patch moves the DevX interface to a new file called mlx5_devx_cmds.h.
Also remove shared device structure dependency on DevX commands.
Replace the DevX commands log mechanism from the mlx5 driver log
mechanism to the EAL log mechanism.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c           |   1 +
 drivers/net/mlx5/mlx5.h           | 221 +-----------------------------------
 drivers/net/mlx5/mlx5_devx_cmds.c |  33 +++---
 drivers/net/mlx5/mlx5_devx_cmds.h | 231 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c    |   1 +
 drivers/net/mlx5/mlx5_flow.c      |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c   |   1 +
 drivers/net/mlx5/mlx5_rxq.c       |   1 +
 drivers/net/mlx5/mlx5_rxtx.c      |   1 +
 drivers/net/mlx5/mlx5_txq.c       |   1 +
 drivers/net/mlx5/mlx5_vlan.c      |   1 +
 11 files changed, 263 insertions(+), 234 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2049370..7126edf 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -46,6 +46,7 @@
 #include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5818349..4d0485d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -38,6 +38,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
@@ -156,62 +157,6 @@ struct mlx5_stats_ctrl {
 	uint64_t imissed_base;
 };
 
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint32_t flow_meter_reg_share:1;
-	/* Whether reg_c share is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
 /* Flow list . */
 TAILQ_HEAD(mlx5_flows, rte_flow);
 
@@ -291,133 +236,6 @@ struct mlx5_dev_config {
 	struct mlx5_lro_config lro; /* LRO configuration. */
 };
 
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
 
 /**
  * Type of object being allocated.
@@ -1026,43 +844,6 @@ void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
 void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
 			    struct mlx5_vf_vlan *vf_vlan);
 
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					     struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				struct mlx5_devx_create_rq_attr *rq_attr,
-				int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq
-	(struct ibv_context *ctx, struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq
-	(struct mlx5_devx_obj *sq, struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis
-	(struct ibv_context *ctx, struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-
-int mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh, FILE *file);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
index 282d501..62ca590 100644
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ b/drivers/net/mlx5/mlx5_devx_cmds.c
@@ -1,13 +1,15 @@
 // SPDX-License-Identifier: BSD-3-Clause
 /* Copyright 2018 Mellanox Technologies, Ltd */
 
+#include <unistd.h>
+
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
-#include <unistd.h>
 
-#include "mlx5.h"
-#include "mlx5_glue.h"
 #include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_utils.h"
+
 
 /**
  * Allocate flow counters via devx interface.
@@ -936,8 +938,12 @@ struct mlx5_devx_obj *
 /**
  * Dump all flows to file.
  *
- * @param[in] sh
- *   Pointer to context.
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
  * @param[out] file
  *   Pointer to file stream.
  *
@@ -945,23 +951,24 @@ struct mlx5_devx_obj *
  *   0 on success, a nagative value otherwise.
  */
 int
-mlx5_devx_cmd_flow_dump(struct mlx5_ibv_shared *sh __rte_unused,
-			FILE *file __rte_unused)
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
 {
 	int ret = 0;
 
 #ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (sh->fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, sh->fdb_domain);
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
 		if (ret)
 			return ret;
 	}
-	assert(sh->rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->rx_domain);
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
 	if (ret)
 		return ret;
-	assert(sh->tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, sh->tx_domain);
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
 #else
 	ret = ENOTSUP;
 #endif
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..2d58d96
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t flow_meter_reg_share:1;
+	/* Whether reg_c share is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+/* HCA attributes. */
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3b4c5db..ce0109c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include "mlx5.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 5c9fea6..983b1c3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -31,6 +31,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_flow.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
@@ -5692,6 +5693,8 @@ struct mlx5_flow_counter *
 		   struct rte_flow_error *error __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
 
-	return mlx5_devx_cmd_flow_dump(priv->sh, file);
+	return mlx5_devx_cmd_flow_dump(sh->fdb_domain, sh->rx_domain,
+				       sh->tx_domain, file);
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b90734e..653d649 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -32,6 +32,7 @@
 #include "mlx5.h"
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_flow.h"
 #include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4092cb7..371b996 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -37,6 +37,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_glue.h"
 #include "mlx5_flow.h"
+#include "mlx5_devx_cmds.h"
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5e31f01..5a03556 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -29,6 +29,7 @@
 #include <rte_flow.h>
 
 #include "mlx5.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1a76f6e..5adb4dc 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -34,6 +34,7 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index 5f6554a..feac0f1 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -30,6 +30,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_glue.h"
+#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 02/25] drivers: introduce mlx5 common library
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 01/25] net/mlx5: separate DevX commands interface Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
                         ` (23 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.
Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.
The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.
Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.
Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 MAINTAINERS                                     |    1 +
 drivers/common/Makefile                         |    4 +
 drivers/common/meson.build                      |    2 +-
 drivers/common/mlx5/Makefile                    |  331 ++++
 drivers/common/mlx5/meson.build                 |  205 +++
 drivers/common/mlx5/mlx5_common.c               |   17 +
 drivers/common/mlx5/mlx5_common.h               |   87 ++
 drivers/common/mlx5/mlx5_common_utils.h         |   20 +
 drivers/common/mlx5/mlx5_devx_cmds.c            |  976 ++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  231 +++
 drivers/common/mlx5/mlx5_glue.c                 | 1138 ++++++++++++++
 drivers/common/mlx5/mlx5_glue.h                 |  265 ++++
 drivers/common/mlx5/mlx5_prm.h                  | 1889 +++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   20 +
 drivers/net/mlx5/Makefile                       |  309 +---
 drivers/net/mlx5/meson.build                    |  256 +--
 drivers/net/mlx5/mlx5.c                         |    7 +-
 drivers/net/mlx5/mlx5.h                         |    9 +-
 drivers/net/mlx5/mlx5_devx_cmds.c               |  976 ------------
 drivers/net/mlx5/mlx5_devx_cmds.h               |  231 ---
 drivers/net/mlx5/mlx5_ethdev.c                  |    5 +-
 drivers/net/mlx5/mlx5_flow.c                    |    9 +-
 drivers/net/mlx5/mlx5_flow.h                    |    3 +-
 drivers/net/mlx5/mlx5_flow_dv.c                 |    9 +-
 drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
 drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
 drivers/net/mlx5/mlx5_glue.c                    | 1150 --------------
 drivers/net/mlx5/mlx5_glue.h                    |  264 ----
 drivers/net/mlx5/mlx5_mac.c                     |    2 +-
 drivers/net/mlx5/mlx5_mr.c                      |    3 +-
 drivers/net/mlx5/mlx5_prm.h                     | 1888 ----------------------
                      |    2 +-
 drivers/net/mlx5/mlx5_rxq.c                     |    8 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx.h                    |    7 +-
 drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
 drivers/net/mlx5/mlx5_stats.c                   |    2 +-
 drivers/net/mlx5/mlx5_txq.c                     |    7 +-
 drivers/net/mlx5/mlx5_utils.h                   |   79 +-
 drivers/net/mlx5/mlx5_vlan.c                    |    5 +-
 mk/rte.app.mk                                   |    1 +
 45 files changed, 5313 insertions(+), 5144 deletions(-)
 create mode 100644 drivers/common/mlx5/Makefile
 create mode 100644 drivers/common/mlx5/meson.build
 create mode 100644 drivers/common/mlx5/mlx5_common.c
 create mode 100644 drivers/common/mlx5/mlx5_common.h
 create mode 100644 drivers/common/mlx5/mlx5_common_utils.h
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
 create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
 create mode 100644 drivers/common/mlx5/mlx5_glue.c
 create mode 100644 drivers/common/mlx5/mlx5_glue.h
 create mode 100644 drivers/common/mlx5/mlx5_prm.h
 create mode 100644 drivers/common/mlx5/rte_common_mlx5_version.map
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c
 delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.h
 delete mode 100644 drivers/net/mlx5/mlx5_glue.c
 delete mode 100644 drivers/net/mlx5/mlx5_glue.h
 delete mode 100644 drivers/net/mlx5/mlx5_prm.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 94bccae..150d507 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -737,6 +737,7 @@ M: Matan Azrad <matan@mellanox.com>
 M: Shahaf Shuler <shahafs@mellanox.com>
 M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
 T: git://dpdk.org/next/dpdk-next-net-mlx
+F: drivers/common/mlx5/
 F: drivers/net/mlx5/
 F: buildtools/options-ibverbs-static.sh
 F: doc/guides/nics/mlx5.rst
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 3254c52..4775d4b 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,4 +35,8 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+DIRS-y += mlx5
+endif
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/common/meson.build b/drivers/common/meson.build
index fc620f7..ffd06e2 100644
--- a/drivers/common/meson.build
+++ b/drivers/common/meson.build
@@ -2,6 +2,6 @@
 # Copyright(c) 2018 Cavium, Inc
 
 std_deps = ['eal']
-drivers = ['cpt', 'dpaax', 'iavf', 'mvep', 'octeontx', 'octeontx2', 'qat']
+drivers = ['cpt', 'dpaax', 'iavf', 'mlx5', 'mvep', 'octeontx', 'octeontx2', 'qat']
 config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON'
 driver_name_fmt = 'rte_common_@0@'
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
new file mode 100644
index 0000000..b94d3c0
--- /dev/null
+++ b/drivers/common/mlx5/Makefile
@@ -0,0 +1,331 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_common_mlx5.a
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
+LIB_GLUE_VERSION = 20.02.0
+
+# Sources.
+ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+endif
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+endif
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+CFLAGS_mlx5_glue.o += -fPIC
+LDLIBS += -ldl
+else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
+LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
+else
+LDLIBS += -libverbs -lmlx5
+endif
+
+LDLIBS += -lrte_eal
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
+
+EXPORT_MAP := rte_common_mlx5_version.map
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up mlx5_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+mlx5_autoconf.h.new: FORCE
+
+mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
+	$Q $(RM) -f -- '$@'
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_MPLS_SUPPORT \
+		infiniband/verbs.h \
+		enum IBV_FLOW_SPEC_MPLS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
+		infiniband/verbs.h \
+		enum IBV_WQ_FLAG_RX_END_PADDING \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_SWP \
+		infiniband/mlx5dv.h \
+		type 'struct mlx5dv_sw_parsing_caps' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_MPW \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DV_SUPPORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_create_flow_action_packet_reformat \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ESWITCH \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_VLAN \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_push_vlan \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_DEVX_PORT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_query_devx_port \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_OBJ \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_create \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_FLOW_DEVX_COUNTERS \
+		infiniband/mlx5dv.h \
+		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_ASYNC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_obj_query_async \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_dest_devx_tir \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dr_action_create_flow_meter \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_DR_FLOW_DUMP \
+		infiniband/mlx5dv.h \
+		func mlx5dv_dump_dr_domain \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
+		infiniband/mlx5dv.h \
+		enum MLX5_MMAP_GET_NC_PAGES_CMD \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_25G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_50G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_ETHTOOL_LINK_MODE_100G \
+		/usr/include/linux/ethtool.h \
+		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
+		infiniband/verbs.h \
+		type 'struct ibv_counter_set_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
+		infiniband/verbs.h \
+		type 'struct ibv_counters_init_attr' \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NL_NLDEV \
+		rdma/rdma_netlink.h \
+		enum RDMA_NL_NLDEV \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_CMD_PORT_GET \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_CMD_PORT_GET \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_DEV_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_PORT_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
+		rdma/rdma_netlink.h \
+		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_SWITCH_ID \
+		linux/if_link.h \
+		enum IFLA_PHYS_SWITCH_ID \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_PHYS_PORT_NAME \
+		linux/if_link.h \
+		enum IFLA_PHYS_PORT_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_40000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_40000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseKR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseKR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseCR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseCR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseSR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseSR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_SUPPORTED_56000baseLR4_Full \
+		/usr/include/linux/ethtool.h \
+		define SUPPORTED_56000baseLR4_Full \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_STATIC_ASSERT \
+		/usr/include/assert.h \
+		define static_assert \
+		$(AUTOCONF_OUTPUT)
+
+# Create mlx5_autoconf.h or update it in case it differs from the new one.
+
+mlx5_autoconf.h: mlx5_autoconf.h.new
+	$Q [ -f '$@' ] && \
+		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
+		mv '$<' '$@'
+
+$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+
+# Generate dependency plug-in for rdma-core when the PMD must not be linked
+# directly, so that applications do not inherit this dependency.
+
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+
+$(LIB): $(LIB_GLUE)
+
+ifeq ($(LINK_USING_CC),1)
+GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
+else
+GLUE_LDFLAGS := $(LDFLAGS)
+endif
+$(LIB_GLUE): mlx5_glue.o
+	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
+		-shared -o $@ $< -libverbs -lmlx5
+
+mlx5_glue.o: mlx5_autoconf.h
+
+endif
+
+clean_mlx5: FORCE
+	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
+	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
+
+clean: clean_mlx5
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
new file mode 100644
index 0000000..718cef2
--- /dev/null
+++ b/drivers/common/mlx5/meson.build
@@ -0,0 +1,205 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+build = true
+
+pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
+LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
+LIB_GLUE_VERSION = '20.02.0'
+LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
+if pmd_dlopen
+	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
+	cflags += [
+		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
+		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
+	]
+endif
+
+libnames = [ 'mlx5', 'ibverbs' ]
+libs = []
+foreach libname:libnames
+	lib = dependency('lib' + libname, required:false)
+	if not lib.found()
+		lib = cc.find_library(libname, required:false)
+	endif
+	if lib.found()
+		libs += lib
+	else
+		build = false
+		reason = 'missing dependency, "' + libname + '"'
+	endif
+endforeach
+
+if build
+	allow_experimental_apis = true
+	deps += ['hash', 'pci', 'net', 'eal']
+	ext_deps += libs
+	sources = files(
+		'mlx5_devx_cmds.c',
+		'mlx5_common.c',
+	)
+	if not pmd_dlopen
+		sources += files('mlx5_glue.c')
+	endif
+	cflags_options = [
+		'-std=c11',
+		'-Wno-strict-prototypes',
+		'-D_BSD_SOURCE',
+		'-D_DEFAULT_SOURCE',
+		'-D_XOPEN_SOURCE=600'
+	]
+	foreach option:cflags_options
+		if cc.has_argument(option)
+			cflags += option
+		endif
+	endforeach
+	if get_option('buildtype').contains('debug')
+		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+	else
+		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
+	endif
+	# To maintain the compatibility with the make build system
+	# mlx5_autoconf.h file is still generated.
+	# input array for meson member search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search", "struct member to search" ]
+	has_member_args = [
+		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
+		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
+		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
+		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
+		'struct ibv_counters_init_attr', 'comp_mask' ],
+	]
+	# input array for meson symbol search:
+	# [ "MACRO to define if found", "header for the search",
+	#   "symbol to search" ]
+	has_sym_args = [
+		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
+		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
+		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
+		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
+		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
+		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
+		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_create_flow_action_packet_reformat' ],
+		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
+		'IBV_FLOW_SPEC_MPLS' ],
+		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
+		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
+		'IBV_WQ_FLAG_RX_END_PADDING' ],
+		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
+		'mlx5dv_query_devx_port' ],
+		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_create' ],
+		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
+		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
+		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_obj_query_async' ],
+		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_flow_meter' ],
+		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
+		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
+		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
+		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
+		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
+		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
+		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_40000baseLR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseKR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseCR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseSR4_Full' ],
+		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
+		'SUPPORTED_56000baseLR4_Full' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
+		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
+		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
+		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
+		'IFLA_PHYS_SWITCH_ID' ],
+		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
+		'IFLA_PHYS_PORT_NAME' ],
+		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
+		'RDMA_NL_NLDEV' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_GET' ],
+		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_CMD_PORT_GET' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_DEV_NAME' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
+		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
+		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
+		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
+		'mlx5dv_dump_dr_domain'],
+	]
+	config = configuration_data()
+	foreach arg:has_sym_args
+		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
+			dependencies: libs))
+	endforeach
+	foreach arg:has_member_args
+		file_prefix = '#include <' + arg[1] + '>'
+		config.set(arg[0], cc.has_member(arg[2], arg[3],
+			prefix : file_prefix, dependencies: libs))
+	endforeach
+	configure_file(output : 'mlx5_autoconf.h', configuration : config)
+endif
+# Build Glue Library
+if pmd_dlopen and build
+	dlopen_name = 'mlx5_glue'
+	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
+	dlopen_so_version = LIB_GLUE_VERSION
+	dlopen_sources = files('mlx5_glue.c')
+	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
+	dlopen_includes = [global_inc]
+	dlopen_includes += include_directories(
+		'../../../lib/librte_eal/common/include/generic',
+	)
+	shared_lib = shared_library(
+		dlopen_lib_name,
+		dlopen_sources,
+		include_directories: dlopen_includes,
+		c_args: cflags,
+		dependencies: libs,
+		link_args: [
+		'-Wl,-export-dynamic',
+		'-Wl,-h,@0@'.format(LIB_GLUE),
+		],
+		soversion: dlopen_so_version,
+		install: true,
+		install_dir: dlopen_install_dir,
+	)
+endif
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
new file mode 100644
index 0000000..14ebd30
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#include "mlx5_common.h"
+
+
+int mlx5_common_logtype;
+
+
+RTE_INIT(rte_mlx5_common_pmd_init)
+{
+	/* Initialize driver log type. */
+	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
+	if (mlx5_common_logtype >= 0)
+		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
new file mode 100644
index 0000000..9f10def
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_H_
+#define RTE_PMD_MLX5_COMMON_H_
+
+#include <assert.h>
+
+#include <rte_log.h>
+
+
+/*
+ * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
+ * manner.
+ */
+#define PMD_DRV_LOG_STRIP(a, b) a
+#define PMD_DRV_LOG_OPAREN (
+#define PMD_DRV_LOG_CPAREN )
+#define PMD_DRV_LOG_COMMA ,
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+	const char *n = s;
+
+	while (*n)
+		if (*(n++) == '/')
+			s = n;
+	return s;
+}
+
+#define PMD_DRV_LOG___(level, type, name, ...) \
+	rte_log(RTE_LOG_ ## level, \
+		type, \
+		RTE_FMT(name ": " \
+			RTE_FMT_HEAD(__VA_ARGS__,), \
+		RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
+ */
+#ifndef NDEBUG
+
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, "%s:%u: %s(): " __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name,\
+		s "\n" PMD_DRV_LOG_COMMA \
+		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
+		__LINE__ PMD_DRV_LOG_COMMA \
+		__func__, \
+		__VA_ARGS__)
+
+#else /* NDEBUG */
+#define PMD_DRV_LOG__(level, type, name, ...) \
+	PMD_DRV_LOG___(level, type, name, __VA_ARGS__)
+#define PMD_DRV_LOG_(level, type, name, s, ...) \
+	PMD_DRV_LOG__(level, type, name, s "\n", __VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* claim_zero() does not perform any check when debugging is disabled. */
+#ifndef NDEBUG
+
+#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
+
+#else /* NDEBUG */
+
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+#define claim_nonzero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
+	char name[mkstr_size_##name + 1]; \
+	\
+	snprintf(name, sizeof(name), "" __VA_ARGS__)
+
+#endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_common_utils.h b/drivers/common/mlx5/mlx5_common_utils.h
new file mode 100644
index 0000000..32c3adf
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_common_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_COMMON_UTILS_H_
+#define RTE_PMD_MLX5_COMMON_UTILS_H_
+
+#include "mlx5_common.h"
+
+
+extern int mlx5_common_logtype;
+
+#define MLX5_COMMON_LOG_PREFIX "common_mlx5"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_common_logtype, MLX5_COMMON_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_COMMON_UTILS_H_ */
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
new file mode 100644
index 0000000..4d94f92
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -0,0 +1,976 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/* Copyright 2018 Mellanox Technologies, Ltd */
+
+#include <unistd.h>
+
+#include <rte_errno.h>
+#include <rte_malloc.h>
+
+#include "mlx5_prm.h"
+#include "mlx5_devx_cmds.h"
+#include "mlx5_common_utils.h"
+
+
+/**
+ * Allocate flow counters via devx interface.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param dcs
+ *   Pointer to counters properties structure to be filled by the routine.
+ * @param bulk_n_128
+ *   Bulk counter numbers in 128 counters units.
+ *
+ * @return
+ *   Pointer to counter object on success, a negative value otherwise and
+ *   rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
+{
+	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		rte_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
+/**
+ * Query flow counters values.
+ *
+ * @param[in] dcs
+ *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
+ * @param[in] clear
+ *   Whether hardware should clear the counters after the query or not.
+ * @param[in] n_counters
+ *   0 in case of 1 counter to read, otherwise the counter number to read.
+ *  @param pkts
+ *   The number of packets that matched the flow.
+ *  @param bytes
+ *    The number of bytes that matched the flow.
+ *  @param mkey
+ *   The mkey key for batch query.
+ *  @param addr
+ *    The address in the mkey range for batch query.
+ *  @param cmd_comp
+ *   The completion object for asynchronous batch query.
+ *  @param async_id
+ *    The ID to be returned in the asynchronous batch query response.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				 int clear, uint32_t n_counters,
+				 uint64_t *pkts, uint64_t *bytes,
+				 uint32_t mkey, void *addr,
+				 struct mlx5dv_devx_cmd_comp *cmd_comp,
+				 uint64_t async_id)
+{
+	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
+			MLX5_ST_SZ_BYTES(traffic_counter);
+	uint32_t out[out_len];
+	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
+	void *stats;
+	int rc;
+
+	MLX5_SET(query_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
+	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
+	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
+	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
+
+	if (n_counters) {
+		MLX5_SET(query_flow_counter_in, in, num_of_counters,
+			 n_counters);
+		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
+		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
+		MLX5_SET64(query_flow_counter_in, in, address,
+			   (uint64_t)(uintptr_t)addr);
+	}
+	if (!cmd_comp)
+		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
+					       out_len);
+	else
+		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
+						     out_len, async_id,
+						     cmd_comp);
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
+		rte_errno = rc;
+		return -rc;
+	}
+	if (!n_counters) {
+		stats = MLX5_ADDR_OF(query_flow_counter_out,
+				     out, flow_statistics);
+		*pkts = MLX5_GET64(traffic_counter, stats, packets);
+		*bytes = MLX5_GET64(traffic_counter, stats, octets);
+	}
+	return 0;
+}
+
+/**
+ * Create a new mkey.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] attr
+ *   Attributes of the requested mkey.
+ *
+ * @return
+ *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
+ *   is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+			  struct mlx5_devx_mkey_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
+	void *mkc;
+	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
+	size_t pgsize;
+	uint32_t translation_size;
+
+	if (!mkey) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	pgsize = sysconf(_SC_PAGESIZE);
+	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+		 translation_size);
+	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(mkc, mkc, lw, 0x1);
+	MLX5_SET(mkc, mkc, lr, 0x1);
+	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+	MLX5_SET(mkc, mkc, qpn, 0xffffff);
+	MLX5_SET(mkc, mkc, pd, attr->pd);
+	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
+	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
+	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
+	MLX5_SET64(mkc, mkc, len, attr->size);
+	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+					       sizeof(out));
+	if (!mkey->obj) {
+		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		rte_errno = errno;
+		rte_free(mkey);
+		return NULL;
+	}
+	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
+	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
+	return mkey;
+}
+
+/**
+ * Get status of devx command response.
+ * Mainly used for asynchronous commands.
+ *
+ * @param[in] out
+ *   The out response buffer.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+int
+mlx5_devx_get_out_command_status(void *out)
+{
+	int status;
+
+	if (!out)
+		return -EINVAL;
+	status = MLX5_GET(query_flow_counter_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
+
+		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
+			syndrome);
+	}
+	return status;
+}
+
+/**
+ * Destroy any object allocated by a Devx API.
+ *
+ * @param[in] obj
+ *   Pointer to a general object.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
+{
+	int ret;
+
+	if (!obj)
+		return 0;
+	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
+	rte_free(obj);
+	return ret;
+}
+
+/**
+ * Query NIC vport context.
+ * Fills minimal inline attribute.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[in] vport
+ *   vport index
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+static int
+mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
+				      unsigned int vport,
+				      struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
+	void *vctx;
+	int status, syndrome, rc;
+
+	/* Query NIC vport context to determine inline mode. */
+	MLX5_SET(query_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+	if (vport)
+		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_nic_vport_context_out, out, status);
+	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+			    nic_vport_context);
+	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
+					   min_wqe_inline_mode);
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query HCA attributes.
+ * Using those attributes we can check on run time if the device
+ * is having the required capabilities.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] attr
+ *   Attributes device values.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+			     struct mlx5_hca_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr;
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in), out, sizeof(out));
+	if (rc)
+		goto error;
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->flow_counter_bulk_alloc_bitmap =
+			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
+	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
+					    flow_counters_dump);
+	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
+	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
+	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
+						log_max_hairpin_queues);
+	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
+						    log_max_hairpin_wq_data_sz);
+	attr->log_max_hairpin_num_packets = MLX5_GET
+		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
+	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
+	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
+					  eth_net_offloads);
+	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
+	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
+					       flex_parser_protocols);
+	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	if (attr->qos.sup) {
+		MLX5_SET(query_hca_cap_in, in, op_mod,
+			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
+			 MLX5_HCA_CAP_OPMOD_GET_CUR);
+		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
+						 out, sizeof(out));
+		if (rc)
+			goto error;
+		if (status) {
+			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
+				" status %x, syndrome = %x",
+				status, syndrome);
+			return -1;
+		}
+		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+		attr->qos.srtcm_sup =
+				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
+		attr->qos.log_max_flow_meter =
+				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
+		attr->qos.flow_meter_reg_c_ids =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
+		attr->qos.flow_meter_reg_share =
+			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+	}
+	if (!attr->eth_net_offloads)
+		return 0;
+
+	/* Query HCA offloads for Ethernet protocol. */
+	memset(in, 0, sizeof(in));
+	memset(out, 0, sizeof(out));
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+
+	rc = mlx5_glue->devx_general_cmd(ctx,
+					 in, sizeof(in),
+					 out, sizeof(out));
+	if (rc) {
+		attr->eth_net_offloads = 0;
+		goto error;
+	}
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (status) {
+		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
+			"status %x, syndrome = %x",
+			status, syndrome);
+		attr->eth_net_offloads = 0;
+		return -1;
+	}
+	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_vlan_insert);
+	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_cap);
+	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
+					hcattr, tunnel_lro_gre);
+	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
+					  hcattr, tunnel_lro_vxlan);
+	attr->lro_max_msg_sz_mode = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, lro_max_msg_sz_mode);
+	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
+		attr->lro_timer_supported_periods[i] =
+			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
+				 lro_timer_supported_periods[i]);
+	}
+	attr->tunnel_stateless_geneve_rx =
+			    MLX5_GET(per_protocol_networking_offload_caps,
+				     hcattr, tunnel_stateless_geneve_rx);
+	attr->geneve_max_opt_len =
+		    MLX5_GET(per_protocol_networking_offload_caps,
+			     hcattr, max_geneve_opt_len);
+	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
+					 hcattr, wqe_inline_mode);
+	attr->tunnel_stateless_gtp = MLX5_GET
+					(per_protocol_networking_offload_caps,
+					 hcattr, tunnel_stateless_gtp);
+	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
+		return 0;
+	if (attr->eth_virt) {
+		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
+		if (rc) {
+			attr->eth_virt = 0;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
+ * Query TIS transport domain from QP verbs object using DevX API.
+ *
+ * @param[in] qp
+ *   Pointer to verbs QP returned by ibv_create_qp .
+ * @param[in] tis_num
+ *   TIS number of TIS to query.
+ * @param[out] tis_td
+ *   Pointer to TIS transport domain variable, to be set by the routine.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+			      uint32_t *tis_td)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
+	int rc;
+	void *tis_ctx;
+
+	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
+	MLX5_SET(query_tis_in, in, tisn, tis_num);
+	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
+	if (rc) {
+		DRV_LOG(ERR, "Failed to query QP using DevX");
+		return -rc;
+	};
+	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
+	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
+	return 0;
+}
+
+/**
+ * Fill WQ data for DevX API command.
+ * Utility function for use when creating DevX objects containing a WQ.
+ *
+ * @param[in] wq_ctx
+ *   Pointer to WQ context to fill with data.
+ * @param [in] wq_attr
+ *   Pointer to WQ attributes structure to fill in WQ context.
+ */
+static void
+devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
+{
+	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
+	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
+	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
+	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
+	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
+	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
+	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
+	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
+	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
+	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
+	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
+	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
+	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
+	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
+	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
+	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
+	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
+	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
+	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
+		 wq_attr->log_hairpin_num_packets);
+	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
+	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
+		 wq_attr->single_wqe_log_num_of_strides);
+	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
+	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
+		 wq_attr->single_stride_log_num_of_bytes);
+	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
+	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
+	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
+}
+
+/**
+ * Create RQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rq_attr
+ *   Pointer to create RQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+			struct mlx5_devx_create_rq_attr *rq_attr,
+			int socket)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *rq = NULL;
+
+	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
+	if (!rq) {
+		DRV_LOG(ERR, "Failed to allocate RQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
+	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
+	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
+	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
+	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
+	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
+	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
+	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
+	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+	wq_attr = &rq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						  out, sizeof(out));
+	if (!rq->obj) {
+		DRV_LOG(ERR, "Failed to create RQ using DevX");
+		rte_errno = errno;
+		rte_free(rq);
+		return NULL;
+	}
+	rq->id = MLX5_GET(create_rq_out, out, rqn);
+	return rq;
+}
+
+/**
+ * Modify RQ using DevX API.
+ *
+ * @param[in] rq
+ *   Pointer to RQ object structure.
+ * @param [in] rq_attr
+ *   Pointer to modify RQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			struct mlx5_devx_modify_rq_attr *rq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
+	void *rq_ctx, *wq_ctx;
+	int ret;
+
+	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
+	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
+	MLX5_SET(modify_rq_in, in, rqn, rq->id);
+	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
+	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
+	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
+		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
+		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
+	if (rq_attr->modify_bitmask &
+			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
+		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
+	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
+	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
+		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
+		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
+	}
+	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIR using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tir_attr
+ *   Pointer to TIR attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+			 struct mlx5_devx_tir_attr *tir_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
+	void *tir_ctx, *outer, *inner;
+	struct mlx5_devx_obj *tir = NULL;
+	int i;
+
+	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
+	if (!tir) {
+		DRV_LOG(ERR, "Failed to allocate TIR data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
+	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
+	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
+	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
+		 tir_attr->lro_timeout_period_usecs);
+	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
+	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
+	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
+	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
+	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
+		 tir_attr->tunneled_offload_en);
+	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
+	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
+	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
+	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
+	for (i = 0; i < 10; i++) {
+		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
+			 tir_attr->rx_hash_toeplitz_key[i]);
+	}
+	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
+	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, outer, selected_fields,
+		 tir_attr->rx_hash_field_selector_outer.selected_fields);
+	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
+	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
+		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
+	MLX5_SET(rx_hash_field_select, inner, selected_fields,
+		 tir_attr->rx_hash_field_selector_inner.selected_fields);
+	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+						   out, sizeof(out));
+	if (!tir->obj) {
+		DRV_LOG(ERR, "Failed to create TIR using DevX");
+		rte_errno = errno;
+		rte_free(tir);
+		return NULL;
+	}
+	tir->id = MLX5_GET(create_tir_out, out, tirn);
+	return tir;
+}
+
+/**
+ * Create RQT using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t *in = NULL;
+	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
+	void *rqt_ctx;
+	struct mlx5_devx_obj *rqt = NULL;
+	int i;
+
+	in = rte_calloc(__func__, 1, inlen, 0);
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT IN data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
+	if (!rqt) {
+		DRV_LOG(ERR, "Failed to allocate RQT data");
+		rte_errno = ENOMEM;
+		rte_free(in);
+		return NULL;
+	}
+	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
+	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (!rqt->obj) {
+		DRV_LOG(ERR, "Failed to create RQT using DevX");
+		rte_errno = errno;
+		rte_free(rqt);
+		return NULL;
+	}
+	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
+	return rqt;
+}
+
+/**
+ * Create SQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ * @param [in] socket
+ *   CPU socket ID for allocations.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ **/
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+			struct mlx5_devx_create_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
+	void *sq_ctx;
+	void *wq_ctx;
+	struct mlx5_devx_wq_attr *wq_attr;
+	struct mlx5_devx_obj *sq = NULL;
+
+	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
+	if (!sq) {
+		DRV_LOG(ERR, "Failed to allocate SQ data");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
+	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
+	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
+	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
+	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
+		 sq_attr->flush_in_error_en);
+	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
+		 sq_attr->min_wqe_inline_mode);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
+	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
+	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
+	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
+	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+		 sq_attr->packet_pacing_rate_limit_index);
+	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
+	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
+	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
+	wq_attr = &sq_attr->wq_attr;
+	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
+	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!sq->obj) {
+		DRV_LOG(ERR, "Failed to create SQ using DevX");
+		rte_errno = errno;
+		rte_free(sq);
+		return NULL;
+	}
+	sq->id = MLX5_GET(create_sq_out, out, sqn);
+	return sq;
+}
+
+/**
+ * Modify SQ using DevX API.
+ *
+ * @param[in] sq
+ *   Pointer to SQ object structure.
+ * @param [in] sq_attr
+ *   Pointer to SQ attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			struct mlx5_devx_modify_sq_attr *sq_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
+	void *sq_ctx;
+	int ret;
+
+	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
+	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
+	MLX5_SET(modify_sq_in, in, sqn, sq->id);
+	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
+	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
+	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify SQ using DevX");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Create TIS using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] tis_attr
+ *   Pointer to TIS attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+			 struct mlx5_devx_tis_attr *tis_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
+	struct mlx5_devx_obj *tis = NULL;
+	void *tis_ctx;
+
+	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
+	if (!tis) {
+		DRV_LOG(ERR, "Failed to allocate TIS object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
+		 tis_attr->strict_lag_tx_port_affinity);
+	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
+	MLX5_SET(tisc, tis_ctx, transport_domain,
+		 tis_attr->transport_domain);
+	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					      out, sizeof(out));
+	if (!tis->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(tis);
+		return NULL;
+	}
+	tis->id = MLX5_GET(create_tis_out, out, tisn);
+	return tis;
+}
+
+/**
+ * Create transport domain using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_td(struct ibv_context *ctx)
+{
+	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
+	struct mlx5_devx_obj *td = NULL;
+
+	td = rte_calloc(__func__, 1, sizeof(*td), 0);
+	if (!td) {
+		DRV_LOG(ERR, "Failed to allocate TD object");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_transport_domain_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
+					     out, sizeof(out));
+	if (!td->obj) {
+		DRV_LOG(ERR, "Failed to create TIS using DevX");
+		rte_errno = errno;
+		rte_free(td);
+		return NULL;
+	}
+	td->id = MLX5_GET(alloc_transport_domain_out, out,
+			   transport_domain);
+	return td;
+}
+
+/**
+ * Dump all flows to file.
+ *
+ * @param[in] fdb_domain
+ *   FDB domain.
+ * @param[in] rx_domain
+ *   RX domain.
+ * @param[in] tx_domain
+ *   TX domain.
+ * @param[out] file
+ *   Pointer to file stream.
+ *
+ * @return
+ *   0 on success, a nagative value otherwise.
+ */
+int
+mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
+			void *rx_domain __rte_unused,
+			void *tx_domain __rte_unused, FILE *file __rte_unused)
+{
+	int ret = 0;
+
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	if (fdb_domain) {
+		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
+		if (ret)
+			return ret;
+	}
+	assert(rx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
+	if (ret)
+		return ret;
+	assert(tx_domain);
+	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
+#else
+	ret = ENOTSUP;
+#endif
+	return -ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
new file mode 100644
index 0000000..2d58d96
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
+#define RTE_PMD_MLX5_DEVX_CMDS_H_
+
+#include "mlx5_glue.h"
+
+
+/* devX creation object */
+struct mlx5_devx_obj {
+	struct mlx5dv_devx_obj *obj; /* The DV object. */
+	int id; /* The object ID. */
+};
+
+struct mlx5_devx_mkey_attr {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t umem_id;
+	uint32_t pd;
+};
+
+/* HCA qos attributes. */
+struct mlx5_hca_qos_attr {
+	uint32_t sup:1;	/* Whether QOS is supported. */
+	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t flow_meter_reg_share:1;
+	/* Whether reg_c share is supported. */
+	uint8_t log_max_flow_meter;
+	/* Power of the maximum supported meters. */
+	uint8_t flow_meter_reg_c_ids;
+	/* Bitmap of the reg_Cs available for flow meter to use. */
+
+};
+
+/* HCA supports this number of time periods for LRO. */
+#define MLX5_LRO_NUM_SUPP_PERIODS 4
+
+/* HCA attributes. */
+struct mlx5_hca_attr {
+	uint32_t eswitch_manager:1;
+	uint32_t flow_counters_dump:1;
+	uint8_t flow_counter_bulk_alloc_bitmap;
+	uint32_t eth_net_offloads:1;
+	uint32_t eth_virt:1;
+	uint32_t wqe_vlan_insert:1;
+	uint32_t wqe_inline_mode:2;
+	uint32_t vport_inline_mode:3;
+	uint32_t tunnel_stateless_geneve_rx:1;
+	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
+	uint32_t tunnel_stateless_gtp:1;
+	uint32_t lro_cap:1;
+	uint32_t tunnel_lro_gre:1;
+	uint32_t tunnel_lro_vxlan:1;
+	uint32_t lro_max_msg_sz_mode:2;
+	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
+	uint32_t flex_parser_protocols;
+	uint32_t hairpin:1;
+	uint32_t log_max_hairpin_queues:5;
+	uint32_t log_max_hairpin_wq_data_sz:5;
+	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t vhca_id:16;
+	struct mlx5_hca_qos_attr qos;
+};
+
+struct mlx5_devx_wq_attr {
+	uint32_t wq_type:4;
+	uint32_t wq_signature:1;
+	uint32_t end_padding_mode:2;
+	uint32_t cd_slave:1;
+	uint32_t hds_skip_first_sge:1;
+	uint32_t log2_hds_buf_size:3;
+	uint32_t page_offset:5;
+	uint32_t lwm:16;
+	uint32_t pd:24;
+	uint32_t uar_page:24;
+	uint64_t dbr_addr;
+	uint32_t hw_counter;
+	uint32_t sw_counter;
+	uint32_t log_wq_stride:4;
+	uint32_t log_wq_pg_sz:5;
+	uint32_t log_wq_sz:5;
+	uint32_t dbr_umem_valid:1;
+	uint32_t wq_umem_valid:1;
+	uint32_t log_hairpin_num_packets:5;
+	uint32_t log_hairpin_data_sz:5;
+	uint32_t single_wqe_log_num_of_strides:4;
+	uint32_t two_byte_shift_en:1;
+	uint32_t single_stride_log_num_of_bytes:3;
+	uint32_t dbr_umem_id;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
+/* Create RQ attributes structure, used by create RQ operation. */
+struct mlx5_devx_create_rq_attr {
+	uint32_t rlky:1;
+	uint32_t delay_drop_en:1;
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t mem_rq_type:4;
+	uint32_t state:4;
+	uint32_t flush_in_error_en:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t counter_set_id:8;
+	uint32_t rmpn:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* Modify RQ attributes structure, used by modify RQ operation. */
+struct mlx5_devx_modify_rq_attr {
+	uint32_t rqn:24;
+	uint32_t rq_state:4; /* Current RQ state. */
+	uint32_t state:4; /* Required RQ state. */
+	uint32_t scatter_fcs:1;
+	uint32_t vsd:1;
+	uint32_t counter_set_id:8;
+	uint32_t hairpin_peer_sq:24;
+	uint32_t hairpin_peer_vhca:16;
+	uint64_t modify_bitmask;
+	uint32_t lwm:16; /* Contained WQ lwm. */
+};
+
+struct mlx5_rx_hash_field_select {
+	uint32_t l3_prot_type:1;
+	uint32_t l4_prot_type:1;
+	uint32_t selected_fields:30;
+};
+
+/* TIR attributes structure, used by TIR operations. */
+struct mlx5_devx_tir_attr {
+	uint32_t disp_type:4;
+	uint32_t lro_timeout_period_usecs:16;
+	uint32_t lro_enable_mask:4;
+	uint32_t lro_max_msg_sz:8;
+	uint32_t inline_rqn:24;
+	uint32_t rx_hash_symmetric:1;
+	uint32_t tunneled_offload_en:1;
+	uint32_t indirect_table:24;
+	uint32_t rx_hash_fn:4;
+	uint32_t self_lb_block:2;
+	uint32_t transport_domain:24;
+	uint32_t rx_hash_toeplitz_key[10];
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
+	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
+};
+
+/* RQT attributes structure, used by RQT operations. */
+struct mlx5_devx_rqt_attr {
+	uint32_t rqt_max_size:16;
+	uint32_t rqt_actual_size:16;
+	uint32_t rq_list[];
+};
+
+/* TIS attributes structure. */
+struct mlx5_devx_tis_attr {
+	uint32_t strict_lag_tx_port_affinity:1;
+	uint32_t tls_en:1;
+	uint32_t lag_tx_port_affinity:4;
+	uint32_t prio:4;
+	uint32_t transport_domain:24;
+};
+
+/* SQ attributes structure, used by SQ create operation. */
+struct mlx5_devx_create_sq_attr {
+	uint32_t rlky:1;
+	uint32_t cd_master:1;
+	uint32_t fre:1;
+	uint32_t flush_in_error_en:1;
+	uint32_t allow_multi_pkt_send_wqe:1;
+	uint32_t min_wqe_inline_mode:3;
+	uint32_t state:4;
+	uint32_t reg_umr:1;
+	uint32_t allow_swp:1;
+	uint32_t hairpin:1;
+	uint32_t user_index:24;
+	uint32_t cqn:24;
+	uint32_t packet_pacing_rate_limit_index:16;
+	uint32_t tis_lst_sz:16;
+	uint32_t tis_num:24;
+	struct mlx5_devx_wq_attr wq_attr;
+};
+
+/* SQ attributes structure, used by SQ modify operation. */
+struct mlx5_devx_modify_sq_attr {
+	uint32_t sq_state:4;
+	uint32_t state:4;
+	uint32_t hairpin_peer_rq:24;
+	uint32_t hairpin_peer_vhca:16;
+};
+
+/* mlx5_devx_cmds.c */
+
+struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
+						       uint32_t bulk_sz);
+int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
+int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
+				     int clear, uint32_t n_counters,
+				     uint64_t *pkts, uint64_t *bytes,
+				     uint32_t mkey, void *addr,
+				     struct mlx5dv_devx_cmd_comp *cmd_comp,
+				     uint64_t async_id);
+int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
+				 struct mlx5_hca_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
+					      struct mlx5_devx_mkey_attr *attr);
+int mlx5_devx_get_out_command_status(void *out);
+int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
+				  uint32_t *tis_td);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
+				       struct mlx5_devx_create_rq_attr *rq_attr,
+				       int socket);
+int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
+			    struct mlx5_devx_modify_rq_attr *rq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
+					   struct mlx5_devx_tir_attr *tir_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
+					   struct mlx5_devx_rqt_attr *rqt_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
+				      struct mlx5_devx_create_sq_attr *sq_attr);
+int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
+			    struct mlx5_devx_modify_sq_attr *sq_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
+					   struct mlx5_devx_tis_attr *tis_attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
+int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
+			    FILE *file);
+#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
new file mode 100644
index 0000000..d5bc84e
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -0,0 +1,1138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <stdalign.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+/*
+ * Not needed by this file; included to work around the lack of off_t
+ * definition for mlx5dv.h with unpatched rdma-core versions.
+ */
+#include <sys/types.h>
+
+#include <rte_config.h>
+
+#include "mlx5_glue.h"
+
+static int
+mlx5_glue_fork_init(void)
+{
+	return ibv_fork_init();
+}
+
+static struct ibv_pd *
+mlx5_glue_alloc_pd(struct ibv_context *context)
+{
+	return ibv_alloc_pd(context);
+}
+
+static int
+mlx5_glue_dealloc_pd(struct ibv_pd *pd)
+{
+	return ibv_dealloc_pd(pd);
+}
+
+static struct ibv_device **
+mlx5_glue_get_device_list(int *num_devices)
+{
+	return ibv_get_device_list(num_devices);
+}
+
+static void
+mlx5_glue_free_device_list(struct ibv_device **list)
+{
+	ibv_free_device_list(list);
+}
+
+static struct ibv_context *
+mlx5_glue_open_device(struct ibv_device *device)
+{
+	return ibv_open_device(device);
+}
+
+static int
+mlx5_glue_close_device(struct ibv_context *context)
+{
+	return ibv_close_device(context);
+}
+
+static int
+mlx5_glue_query_device(struct ibv_context *context,
+		       struct ibv_device_attr *device_attr)
+{
+	return ibv_query_device(context, device_attr);
+}
+
+static int
+mlx5_glue_query_device_ex(struct ibv_context *context,
+			  const struct ibv_query_device_ex_input *input,
+			  struct ibv_device_attr_ex *attr)
+{
+	return ibv_query_device_ex(context, input, attr);
+}
+
+static int
+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
+			  struct ibv_values_ex *values)
+{
+	return ibv_query_rt_values_ex(context, values);
+}
+
+static int
+mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
+		     struct ibv_port_attr *port_attr)
+{
+	return ibv_query_port(context, port_num, port_attr);
+}
+
+static struct ibv_comp_channel *
+mlx5_glue_create_comp_channel(struct ibv_context *context)
+{
+	return ibv_create_comp_channel(context);
+}
+
+static int
+mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
+{
+	return ibv_destroy_comp_channel(channel);
+}
+
+static struct ibv_cq *
+mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
+		    struct ibv_comp_channel *channel, int comp_vector)
+{
+	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
+}
+
+static int
+mlx5_glue_destroy_cq(struct ibv_cq *cq)
+{
+	return ibv_destroy_cq(cq);
+}
+
+static int
+mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
+		       void **cq_context)
+{
+	return ibv_get_cq_event(channel, cq, cq_context);
+}
+
+static void
+mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
+{
+	ibv_ack_cq_events(cq, nevents);
+}
+
+static struct ibv_rwq_ind_table *
+mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
+			       struct ibv_rwq_ind_table_init_attr *init_attr)
+{
+	return ibv_create_rwq_ind_table(context, init_attr);
+}
+
+static int
+mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
+{
+	return ibv_destroy_rwq_ind_table(rwq_ind_table);
+}
+
+static struct ibv_wq *
+mlx5_glue_create_wq(struct ibv_context *context,
+		    struct ibv_wq_init_attr *wq_init_attr)
+{
+	return ibv_create_wq(context, wq_init_attr);
+}
+
+static int
+mlx5_glue_destroy_wq(struct ibv_wq *wq)
+{
+	return ibv_destroy_wq(wq);
+}
+static int
+mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
+{
+	return ibv_modify_wq(wq, wq_attr);
+}
+
+static struct ibv_flow *
+mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
+{
+	return ibv_create_flow(qp, flow);
+}
+
+static int
+mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
+{
+	return ibv_destroy_flow(flow_id);
+}
+
+static int
+mlx5_glue_destroy_flow_action(void *action)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_destroy(action);
+#else
+	struct mlx5dv_flow_action_attr *attr = action;
+	int res = 0;
+	switch (attr->type) {
+	case MLX5DV_FLOW_ACTION_TAG:
+		break;
+	default:
+		res = ibv_destroy_flow_action(attr->action);
+		break;
+	}
+	free(action);
+	return res;
+#endif
+#else
+	(void)action;
+	return ENOTSUP;
+#endif
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
+{
+	return ibv_create_qp(pd, qp_init_attr);
+}
+
+static struct ibv_qp *
+mlx5_glue_create_qp_ex(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
+{
+	return ibv_create_qp_ex(context, qp_init_attr_ex);
+}
+
+static int
+mlx5_glue_destroy_qp(struct ibv_qp *qp)
+{
+	return ibv_destroy_qp(qp);
+}
+
+static int
+mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
+{
+	return ibv_modify_qp(qp, attr, attr_mask);
+}
+
+static struct ibv_mr *
+mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
+{
+	return ibv_reg_mr(pd, addr, length, access);
+}
+
+static int
+mlx5_glue_dereg_mr(struct ibv_mr *mr)
+{
+	return ibv_dereg_mr(mr);
+}
+
+static struct ibv_counter_set *
+mlx5_glue_create_counter_set(struct ibv_context *context,
+			     struct ibv_counter_set_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)init_attr;
+	return NULL;
+#else
+	return ibv_create_counter_set(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)cs;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counter_set(cs);
+#endif
+}
+
+static int
+mlx5_glue_describe_counter_set(struct ibv_context *context,
+			       uint16_t counter_set_id,
+			       struct ibv_counter_set_description *cs_desc)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)context;
+	(void)counter_set_id;
+	(void)cs_desc;
+	return ENOTSUP;
+#else
+	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
+#endif
+}
+
+static int
+mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
+			    struct ibv_counter_set_data *cs_data)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+	(void)query_attr;
+	(void)cs_data;
+	return ENOTSUP;
+#else
+	return ibv_query_counter_set(query_attr, cs_data);
+#endif
+}
+
+static struct ibv_counters *
+mlx5_glue_create_counters(struct ibv_context *context,
+			  struct ibv_counters_init_attr *init_attr)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)context;
+	(void)init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return ibv_create_counters(context, init_attr);
+#endif
+}
+
+static int
+mlx5_glue_destroy_counters(struct ibv_counters *counters)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	return ENOTSUP;
+#else
+	return ibv_destroy_counters(counters);
+#endif
+}
+
+static int
+mlx5_glue_attach_counters(struct ibv_counters *counters,
+			  struct ibv_counter_attach_attr *attr,
+			  struct ibv_flow *flow)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)attr;
+	(void)flow;
+	return ENOTSUP;
+#else
+	return ibv_attach_counters_point_flow(counters, attr, flow);
+#endif
+}
+
+static int
+mlx5_glue_query_counters(struct ibv_counters *counters,
+			 uint64_t *counters_value,
+			 uint32_t ncounters,
+			 uint32_t flags)
+{
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+	(void)counters;
+	(void)counters_value;
+	(void)ncounters;
+	(void)flags;
+	return ENOTSUP;
+#else
+	return ibv_read_counters(counters, counters_value, ncounters, flags);
+#endif
+}
+
+static void
+mlx5_glue_ack_async_event(struct ibv_async_event *event)
+{
+	ibv_ack_async_event(event);
+}
+
+static int
+mlx5_glue_get_async_event(struct ibv_context *context,
+			  struct ibv_async_event *event)
+{
+	return ibv_get_async_event(context, event);
+}
+
+static const char *
+mlx5_glue_port_state_str(enum ibv_port_state port_state)
+{
+	return ibv_port_state_str(port_state);
+}
+
+static struct ibv_cq *
+mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
+{
+	return ibv_cq_ex_to_cq(cq);
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_table(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
+#else
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_dest_vport(domain, port);
+#else
+	(void)domain;
+	(void)port;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_drop(void)
+{
+#ifdef HAVE_MLX5DV_DR_ESWITCH
+	return mlx5dv_dr_action_create_drop();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
+					  rte_be32_t vlan_tag)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
+#else
+	(void)domain;
+	(void)vlan_tag;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_action_pop_vlan(void)
+{
+#ifdef HAVE_MLX5DV_DR_VLAN
+	return mlx5dv_dr_action_create_pop_vlan();
+#else
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_create(domain, level);
+#else
+	(void)domain;
+	(void)level;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_flow_tbl(void *tbl)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_table_destroy(tbl);
+#else
+	(void)tbl;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static void *
+mlx5_glue_dr_create_domain(struct ibv_context *ctx,
+			   enum  mlx5dv_dr_domain_type domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_create(ctx, domain);
+#else
+	(void)ctx;
+	(void)domain;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dr_destroy_domain(void *domain)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_domain_destroy(domain);
+#else
+	(void)domain;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_cq_ex *
+mlx5_glue_dv_create_cq(struct ibv_context *context,
+		       struct ibv_cq_init_attr_ex *cq_attr,
+		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
+{
+	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
+}
+
+static struct ibv_wq *
+mlx5_glue_dv_create_wq(struct ibv_context *context,
+		       struct ibv_wq_init_attr *wq_attr,
+		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
+{
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+	(void)context;
+	(void)wq_attr;
+	(void)mlx5_wq_attr;
+	errno = ENOTSUP;
+	return NULL;
+#else
+	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
+#endif
+}
+
+static int
+mlx5_glue_dv_query_device(struct ibv_context *ctx,
+			  struct mlx5dv_context *attrs_out)
+{
+	return mlx5dv_query_device(ctx, attrs_out);
+}
+
+static int
+mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
+			      enum mlx5dv_set_ctx_attr_type type, void *attr)
+{
+	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
+}
+
+static int
+mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
+{
+	return mlx5dv_init_obj(obj, obj_type);
+}
+
+static struct ibv_qp *
+mlx5_glue_dv_create_qp(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
+{
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
+#else
+	(void)context;
+	(void)qp_init_attr_ex;
+	(void)dv_qp_init_attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
+				 struct mlx5dv_flow_matcher_attr *matcher_attr,
+				 void *tbl)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)context;
+	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
+					matcher_attr->match_criteria_enable,
+					matcher_attr->match_mask);
+#else
+	(void)tbl;
+	return mlx5dv_create_flow_matcher(context, matcher_attr);
+#endif
+#else
+	(void)context;
+	(void)matcher_attr;
+	(void)tbl;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow(void *matcher,
+			 void *match_value,
+			 size_t num_actions,
+			 void *actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
+				     (struct mlx5dv_dr_action **)actions);
+#else
+	struct mlx5dv_flow_action_attr actions_attr[8];
+
+	if (num_actions > 8)
+		return NULL;
+	for (size_t i = 0; i < num_actions; i++)
+		actions_attr[i] =
+			*((struct mlx5dv_flow_action_attr *)(actions[i]));
+	return mlx5dv_create_flow(matcher, match_value,
+				  num_actions, actions_attr);
+#endif
+#else
+	(void)matcher;
+	(void)match_value;
+	(void)num_actions;
+	(void)actions;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)offset;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
+	action->obj = counter_obj;
+	return action;
+#endif
+#else
+	(void)counter_obj;
+	(void)offset;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
+	action->obj = qp;
+	return action;
+#endif
+#else
+	(void)qp;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
+{
+#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
+	return mlx5dv_dr_action_create_dest_devx_tir(tir);
+#else
+	(void)tir;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_modify_header
+					(struct ibv_context *ctx,
+					 enum mlx5dv_flow_table_type ft_type,
+					 void *domain, uint64_t flags,
+					 size_t actions_sz,
+					 uint64_t actions[])
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
+						     (__be64 *)actions);
+#else
+	struct mlx5dv_flow_action_attr *action;
+
+	(void)domain;
+	(void)flags;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_modify_header
+		(ctx, actions_sz, actions, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)actions_sz;
+	(void)actions;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_packet_reformat
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	(void)ctx;
+	(void)ft_type;
+	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
+						       reformat_type, data_sz,
+						       data);
+#else
+	(void)domain;
+	(void)flags;
+	struct mlx5dv_flow_action_attr *action;
+
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
+	action->action = mlx5dv_create_flow_action_packet_reformat
+		(ctx, data_sz, data, reformat_type, ft_type);
+	return action;
+#endif
+#else
+	(void)ctx;
+	(void)reformat_type;
+	(void)ft_type;
+	(void)domain;
+	(void)flags;
+	(void)data_sz;
+	(void)data;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_action_create_tag(tag);
+#else
+	struct mlx5dv_flow_action_attr *action;
+	action = malloc(sizeof(*action));
+	if (!action)
+		return NULL;
+	action->type = MLX5DV_FLOW_ACTION_TAG;
+	action->tag_value = tag;
+	return action;
+#endif
+#endif
+	(void)tag;
+	errno = ENOTSUP;
+	return NULL;
+}
+
+static void *
+mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_create_flow_meter(attr);
+#else
+	(void)attr;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_dv_modify_flow_action_meter(void *action,
+				      struct mlx5dv_dr_flow_meter_attr *attr,
+				      uint64_t modify_bits)
+{
+#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
+	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
+#else
+	(void)action;
+	(void)attr;
+	(void)modify_bits;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow(void *flow_id)
+{
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_rule_destroy(flow_id);
+#else
+	return ibv_destroy_flow(flow_id);
+#endif
+}
+
+static int
+mlx5_glue_dv_destroy_flow_matcher(void *matcher)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5DV_DR
+	return mlx5dv_dr_matcher_destroy(matcher);
+#else
+	return mlx5dv_destroy_flow_matcher(matcher);
+#endif
+#else
+	(void)matcher;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static struct ibv_context *
+mlx5_glue_dv_open_device(struct ibv_device *device)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_open_device(device,
+				  &(struct mlx5dv_context_attr){
+					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
+				  });
+#else
+	(void)device;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static struct mlx5dv_devx_obj *
+mlx5_glue_devx_obj_create(struct ibv_context *ctx,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_destroy(obj);
+#else
+	(void)obj;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
+			 const void *in, size_t inlen,
+			 void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
+			  const void *in, size_t inlen,
+			  void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
+			   const void *in, size_t inlen,
+			   void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
+#else
+	(void)ctx;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_cmd_comp *
+mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_create_cmd_comp(ctx);
+#else
+	(void)ctx;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
+#else
+	(void)cmd_comp;
+	errno = -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
+			       size_t inlen, size_t outlen, uint64_t wr_id,
+			       struct mlx5dv_devx_cmd_comp *cmd_comp)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
+					   cmd_comp);
+#else
+	(void)obj;
+	(void)in;
+	(void)inlen;
+	(void)outlen;
+	(void)wr_id;
+	(void)cmd_comp;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
+				  size_t cmd_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_ASYNC
+	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
+					      cmd_resp_len);
+#else
+	(void)cmd_comp;
+	(void)cmd_resp;
+	(void)cmd_resp_len;
+	return -ENOTSUP;
+#endif
+}
+
+static struct mlx5dv_devx_umem *
+mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
+			uint32_t access)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_reg(context, addr, size, access);
+#else
+	(void)context;
+	(void)addr;
+	(void)size;
+	(void)access;
+	errno = -ENOTSUP;
+	return NULL;
+#endif
+}
+
+static int
+mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_umem_dereg(dv_devx_umem);
+#else
+	(void)dv_devx_umem;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_qp_query(struct ibv_qp *qp,
+			const void *in, size_t inlen,
+			void *out, size_t outlen)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
+#else
+	(void)qp;
+	(void)in;
+	(void)inlen;
+	(void)out;
+	(void)outlen;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_devx_port_query(struct ibv_context *ctx,
+			  uint32_t port_num,
+			  struct mlx5dv_devx_port *mlx5_devx_port)
+{
+#ifdef HAVE_MLX5DV_DR_DEVX_PORT
+	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
+#else
+	(void)ctx;
+	(void)port_num;
+	(void)mlx5_devx_port;
+	errno = ENOTSUP;
+	return errno;
+#endif
+}
+
+static int
+mlx5_glue_dr_dump_domain(FILE *file, void *domain)
+{
+#ifdef HAVE_MLX5_DR_FLOW_DUMP
+	return mlx5dv_dump_dr_domain(file, domain);
+#else
+	RTE_SET_USED(file);
+	RTE_SET_USED(domain);
+	return -ENOTSUP;
+#endif
+}
+
+alignas(RTE_CACHE_LINE_SIZE)
+const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
+	.version = MLX5_GLUE_VERSION,
+	.fork_init = mlx5_glue_fork_init,
+	.alloc_pd = mlx5_glue_alloc_pd,
+	.dealloc_pd = mlx5_glue_dealloc_pd,
+	.get_device_list = mlx5_glue_get_device_list,
+	.free_device_list = mlx5_glue_free_device_list,
+	.open_device = mlx5_glue_open_device,
+	.close_device = mlx5_glue_close_device,
+	.query_device = mlx5_glue_query_device,
+	.query_device_ex = mlx5_glue_query_device_ex,
+	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
+	.query_port = mlx5_glue_query_port,
+	.create_comp_channel = mlx5_glue_create_comp_channel,
+	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
+	.create_cq = mlx5_glue_create_cq,
+	.destroy_cq = mlx5_glue_destroy_cq,
+	.get_cq_event = mlx5_glue_get_cq_event,
+	.ack_cq_events = mlx5_glue_ack_cq_events,
+	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
+	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
+	.create_wq = mlx5_glue_create_wq,
+	.destroy_wq = mlx5_glue_destroy_wq,
+	.modify_wq = mlx5_glue_modify_wq,
+	.create_flow = mlx5_glue_create_flow,
+	.destroy_flow = mlx5_glue_destroy_flow,
+	.destroy_flow_action = mlx5_glue_destroy_flow_action,
+	.create_qp = mlx5_glue_create_qp,
+	.create_qp_ex = mlx5_glue_create_qp_ex,
+	.destroy_qp = mlx5_glue_destroy_qp,
+	.modify_qp = mlx5_glue_modify_qp,
+	.reg_mr = mlx5_glue_reg_mr,
+	.dereg_mr = mlx5_glue_dereg_mr,
+	.create_counter_set = mlx5_glue_create_counter_set,
+	.destroy_counter_set = mlx5_glue_destroy_counter_set,
+	.describe_counter_set = mlx5_glue_describe_counter_set,
+	.query_counter_set = mlx5_glue_query_counter_set,
+	.create_counters = mlx5_glue_create_counters,
+	.destroy_counters = mlx5_glue_destroy_counters,
+	.attach_counters = mlx5_glue_attach_counters,
+	.query_counters = mlx5_glue_query_counters,
+	.ack_async_event = mlx5_glue_ack_async_event,
+	.get_async_event = mlx5_glue_get_async_event,
+	.port_state_str = mlx5_glue_port_state_str,
+	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
+	.dr_create_flow_action_dest_flow_tbl =
+		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
+	.dr_create_flow_action_dest_port =
+		mlx5_glue_dr_create_flow_action_dest_port,
+	.dr_create_flow_action_drop =
+		mlx5_glue_dr_create_flow_action_drop,
+	.dr_create_flow_action_push_vlan =
+		mlx5_glue_dr_create_flow_action_push_vlan,
+	.dr_create_flow_action_pop_vlan =
+		mlx5_glue_dr_create_flow_action_pop_vlan,
+	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
+	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
+	.dr_create_domain = mlx5_glue_dr_create_domain,
+	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
+	.dv_create_cq = mlx5_glue_dv_create_cq,
+	.dv_create_wq = mlx5_glue_dv_create_wq,
+	.dv_query_device = mlx5_glue_dv_query_device,
+	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
+	.dv_init_obj = mlx5_glue_dv_init_obj,
+	.dv_create_qp = mlx5_glue_dv_create_qp,
+	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
+	.dv_create_flow = mlx5_glue_dv_create_flow,
+	.dv_create_flow_action_counter =
+		mlx5_glue_dv_create_flow_action_counter,
+	.dv_create_flow_action_dest_ibv_qp =
+		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
+	.dv_create_flow_action_dest_devx_tir =
+		mlx5_glue_dv_create_flow_action_dest_devx_tir,
+	.dv_create_flow_action_modify_header =
+		mlx5_glue_dv_create_flow_action_modify_header,
+	.dv_create_flow_action_packet_reformat =
+		mlx5_glue_dv_create_flow_action_packet_reformat,
+	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
+	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
+	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
+	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
+	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
+	.dv_open_device = mlx5_glue_dv_open_device,
+	.devx_obj_create = mlx5_glue_devx_obj_create,
+	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
+	.devx_obj_query = mlx5_glue_devx_obj_query,
+	.devx_obj_modify = mlx5_glue_devx_obj_modify,
+	.devx_general_cmd = mlx5_glue_devx_general_cmd,
+	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
+	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
+	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
+	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
+	.devx_umem_reg = mlx5_glue_devx_umem_reg,
+	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
+	.devx_qp_query = mlx5_glue_devx_qp_query,
+	.devx_port_query = mlx5_glue_devx_port_query,
+	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+};
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
new file mode 100644
index 0000000..f4c3180
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -0,0 +1,265 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#ifndef MLX5_GLUE_H_
+#define MLX5_GLUE_H_
+
+#include <stddef.h>
+#include <stdint.h>
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+#ifndef MLX5_GLUE_VERSION
+#define MLX5_GLUE_VERSION ""
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
+struct ibv_counter_set;
+struct ibv_counter_set_data;
+struct ibv_counter_set_description;
+struct ibv_counter_set_init_attr;
+struct ibv_query_counter_set_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
+struct ibv_counters;
+struct ibv_counters_init_attr;
+struct ibv_counter_attach_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+struct mlx5dv_qp_init_attr;
+#endif
+
+#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
+struct mlx5dv_wq_init_attr;
+#endif
+
+#ifndef HAVE_IBV_FLOW_DV_SUPPORT
+struct mlx5dv_flow_matcher;
+struct mlx5dv_flow_matcher_attr;
+struct mlx5dv_flow_action_attr;
+struct mlx5dv_flow_match_parameters;
+struct mlx5dv_dr_flow_meter_attr;
+struct ibv_flow_action;
+enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
+enum mlx5dv_flow_table_type { flow_table_type = 0, };
+#endif
+
+#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
+#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
+#endif
+
+#ifndef HAVE_IBV_DEVX_OBJ
+struct mlx5dv_devx_obj;
+struct mlx5dv_devx_umem { uint32_t umem_id; };
+#endif
+
+#ifndef HAVE_IBV_DEVX_ASYNC
+struct mlx5dv_devx_cmd_comp;
+struct mlx5dv_devx_async_cmd_hdr;
+#endif
+
+#ifndef HAVE_MLX5DV_DR
+enum  mlx5dv_dr_domain_type { unused, };
+struct mlx5dv_dr_domain;
+#endif
+
+#ifndef HAVE_MLX5DV_DR_DEVX_PORT
+struct mlx5dv_devx_port;
+#endif
+
+#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
+struct mlx5dv_dr_flow_meter_attr;
+#endif
+
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
+struct mlx5_glue {
+	const char *version;
+	int (*fork_init)(void);
+	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
+	int (*dealloc_pd)(struct ibv_pd *pd);
+	struct ibv_device **(*get_device_list)(int *num_devices);
+	void (*free_device_list)(struct ibv_device **list);
+	struct ibv_context *(*open_device)(struct ibv_device *device);
+	int (*close_device)(struct ibv_context *context);
+	int (*query_device)(struct ibv_context *context,
+			    struct ibv_device_attr *device_attr);
+	int (*query_device_ex)(struct ibv_context *context,
+			       const struct ibv_query_device_ex_input *input,
+			       struct ibv_device_attr_ex *attr);
+	int (*query_rt_values_ex)(struct ibv_context *context,
+			       struct ibv_values_ex *values);
+	int (*query_port)(struct ibv_context *context, uint8_t port_num,
+			  struct ibv_port_attr *port_attr);
+	struct ibv_comp_channel *(*create_comp_channel)
+		(struct ibv_context *context);
+	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
+	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
+				    void *cq_context,
+				    struct ibv_comp_channel *channel,
+				    int comp_vector);
+	int (*destroy_cq)(struct ibv_cq *cq);
+	int (*get_cq_event)(struct ibv_comp_channel *channel,
+			    struct ibv_cq **cq, void **cq_context);
+	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
+	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
+		(struct ibv_context *context,
+		 struct ibv_rwq_ind_table_init_attr *init_attr);
+	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
+	struct ibv_wq *(*create_wq)(struct ibv_context *context,
+				    struct ibv_wq_init_attr *wq_init_attr);
+	int (*destroy_wq)(struct ibv_wq *wq);
+	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
+	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
+					struct ibv_flow_attr *flow);
+	int (*destroy_flow)(struct ibv_flow *flow_id);
+	int (*destroy_flow_action)(void *action);
+	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
+				    struct ibv_qp_init_attr *qp_init_attr);
+	struct ibv_qp *(*create_qp_ex)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
+	int (*destroy_qp)(struct ibv_qp *qp);
+	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+			 int attr_mask);
+	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
+				 size_t length, int access);
+	int (*dereg_mr)(struct ibv_mr *mr);
+	struct ibv_counter_set *(*create_counter_set)
+		(struct ibv_context *context,
+		 struct ibv_counter_set_init_attr *init_attr);
+	int (*destroy_counter_set)(struct ibv_counter_set *cs);
+	int (*describe_counter_set)
+		(struct ibv_context *context,
+		 uint16_t counter_set_id,
+		 struct ibv_counter_set_description *cs_desc);
+	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
+				 struct ibv_counter_set_data *cs_data);
+	struct ibv_counters *(*create_counters)
+		(struct ibv_context *context,
+		 struct ibv_counters_init_attr *init_attr);
+	int (*destroy_counters)(struct ibv_counters *counters);
+	int (*attach_counters)(struct ibv_counters *counters,
+			       struct ibv_counter_attach_attr *attr,
+			       struct ibv_flow *flow);
+	int (*query_counters)(struct ibv_counters *counters,
+			      uint64_t *counters_value,
+			      uint32_t ncounters,
+			      uint32_t flags);
+	void (*ack_async_event)(struct ibv_async_event *event);
+	int (*get_async_event)(struct ibv_context *context,
+			       struct ibv_async_event *event);
+	const char *(*port_state_str)(enum ibv_port_state port_state);
+	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
+	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
+	void *(*dr_create_flow_action_dest_port)(void *domain,
+						 uint32_t port);
+	void *(*dr_create_flow_action_drop)();
+	void *(*dr_create_flow_action_push_vlan)
+					(struct mlx5dv_dr_domain *domain,
+					 rte_be32_t vlan_tag);
+	void *(*dr_create_flow_action_pop_vlan)();
+	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
+	int (*dr_destroy_flow_tbl)(void *tbl);
+	void *(*dr_create_domain)(struct ibv_context *ctx,
+				  enum mlx5dv_dr_domain_type domain);
+	int (*dr_destroy_domain)(void *domain);
+	struct ibv_cq_ex *(*dv_create_cq)
+		(struct ibv_context *context,
+		 struct ibv_cq_init_attr_ex *cq_attr,
+		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
+	struct ibv_wq *(*dv_create_wq)
+		(struct ibv_context *context,
+		 struct ibv_wq_init_attr *wq_attr,
+		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
+	int (*dv_query_device)(struct ibv_context *ctx_in,
+			       struct mlx5dv_context *attrs_out);
+	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
+				   enum mlx5dv_set_ctx_attr_type type,
+				   void *attr);
+	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
+	struct ibv_qp *(*dv_create_qp)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
+	void *(*dv_create_flow_matcher)
+		(struct ibv_context *context,
+		 struct mlx5dv_flow_matcher_attr *matcher_attr,
+		 void *tbl);
+	void *(*dv_create_flow)(void *matcher, void *match_value,
+			  size_t num_actions, void *actions[]);
+	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
+	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
+	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
+	void *(*dv_create_flow_action_modify_header)
+		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
+		 void *domain, uint64_t flags, size_t actions_sz,
+		 uint64_t actions[]);
+	void *(*dv_create_flow_action_packet_reformat)
+		(struct ibv_context *ctx,
+		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
+		 enum mlx5dv_flow_table_type ft_type,
+		 struct mlx5dv_dr_domain *domain,
+		 uint32_t flags, size_t data_sz, void *data);
+	void *(*dv_create_flow_action_tag)(uint32_t tag);
+	void *(*dv_create_flow_action_meter)
+		(struct mlx5dv_dr_flow_meter_attr *attr);
+	int (*dv_modify_flow_action_meter)(void *action,
+		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
+	int (*dv_destroy_flow)(void *flow);
+	int (*dv_destroy_flow_matcher)(void *matcher);
+	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_obj *(*devx_obj_create)
+					(struct ibv_context *ctx,
+					 const void *in, size_t inlen,
+					 void *out, size_t outlen);
+	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
+	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
+			      const void *in, size_t inlen,
+			      void *out, size_t outlen);
+	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
+			       const void *in, size_t inlen,
+			       void *out, size_t outlen);
+	int (*devx_general_cmd)(struct ibv_context *context,
+				const void *in, size_t inlen,
+				void *out, size_t outlen);
+	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
+					(struct ibv_context *context);
+	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
+				    const void *in, size_t inlen,
+				    size_t outlen, uint64_t wr_id,
+				    struct mlx5dv_devx_cmd_comp *cmd_comp);
+	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
+				       struct mlx5dv_devx_async_cmd_hdr *resp,
+				       size_t cmd_resp_len);
+	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
+						  void *addr, size_t size,
+						  uint32_t access);
+	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
+	int (*devx_qp_query)(struct ibv_qp *qp,
+			     const void *in, size_t inlen,
+			     void *out, size_t outlen);
+	int (*devx_port_query)(struct ibv_context *ctx,
+			       uint32_t port_num,
+			       struct mlx5dv_devx_port *mlx5_devx_port);
+	int (*dr_dump_domain)(FILE *file, void *domain);
+};
+
+const struct mlx5_glue *mlx5_glue;
+
+#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
new file mode 100644
index 0000000..5730ad1
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -0,0 +1,1889 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2016 6WIND S.A.
+ * Copyright 2016 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_PRM_H_
+#define RTE_PMD_MLX5_PRM_H_
+
+#include <assert.h>
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/mlx5dv.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_vect.h>
+#include <rte_byteorder.h>
+
+#include "mlx5_autoconf.h"
+
+/* RSS hash key size. */
+#define MLX5_RSS_HASH_KEY_LEN 40
+
+/* Get CQE owner bit. */
+#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
+
+/* Get CQE format. */
+#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
+
+/* Get CQE opcode. */
+#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
+
+/* Get CQE solicited event. */
+#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
+
+/* Invalidate a CQE. */
+#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
+
+/* WQE Segment sizes in bytes. */
+#define MLX5_WSEG_SIZE 16u
+#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
+#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
+#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
+
+/* WQE/WQEBB size in bytes. */
+#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
+
+/*
+ * Max size of a WQE session.
+ * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
+ * the WQE size field in Control Segment is 6 bits wide.
+ */
+#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
+
+/*
+ * Default minimum number of Tx queues for inlining packets.
+ * If there are less queues as specified we assume we have
+ * no enough CPU resources (cycles) to perform inlining,
+ * the PCIe throughput is not supposed as bottleneck and
+ * inlining is disabled.
+ */
+#define MLX5_INLINE_MAX_TXQS 8u
+#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
+
+/*
+ * Default packet length threshold to be inlined with
+ * enhanced MPW. If packet length exceeds the threshold
+ * the data are not inlined. Should be aligned in WQEBB
+ * boundary with accounting the title Control and Ethernet
+ * segments.
+ */
+#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Maximal inline data length sent with enhanced MPW.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_DSEG_MIN_INLINE_SIZE)
+/*
+ * Minimal amount of packets to be sent with EMPW.
+ * This limits the minimal required size of sent EMPW.
+ * If there are no enough resources to built minimal
+ * EMPW the sending loop exits.
+ */
+#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
+/*
+ * Maximal amount of packets to be sent with EMPW.
+ * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
+ * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
+ * without CQE generation request, being multiplied by
+ * MLX5_TX_COMP_MAX_CQE it may cause significant latency
+ * in tx burst routine at the moment of freeing multiple mbufs.
+ */
+#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
+#define MLX5_MPW_MAX_PACKETS 6
+#define MLX5_MPW_INLINE_MAX_PACKETS 2
+
+/*
+ * Default packet length threshold to be inlined with
+ * ordinary SEND. Inlining saves the MR key search
+ * and extra PCIe data fetch transaction, but eats the
+ * CPU cycles.
+ */
+#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE)
+/*
+ * Maximal inline data length sent with ordinary SEND.
+ * Is based on maximal WQE size.
+ */
+#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
+				  MLX5_WQE_CSEG_SIZE - \
+				  MLX5_WQE_ESEG_SIZE - \
+				  MLX5_WQE_DSEG_SIZE + \
+				  MLX5_ESEG_MIN_INLINE_SIZE)
+
+/* Missed in mlv5dv.h, should define here. */
+#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
+
+/* CQE value to inform that VLAN is stripped. */
+#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
+
+/* IPv4 options. */
+#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
+
+/* IPv6 packet. */
+#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
+
+/* IPv4 packet. */
+#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
+
+/* TCP packet. */
+#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
+
+/* UDP packet. */
+#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
+
+/* IP is fragmented. */
+#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
+
+/* L2 header is valid. */
+#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
+
+/* L3 header is valid. */
+#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
+
+/* L4 header is valid. */
+#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
+
+/* Outer packet, 0 IPv4, 1 IPv6. */
+#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
+
+/* Tunnel packet bit in the CQE. */
+#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
+
+/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
+#define MLX5_CQE_LRO_PUSH_MASK 0x40
+
+/* Mask for L4 type in the CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_MASK 0x70
+
+/* The bit index of L4 type in CQE hdr_type_etc field. */
+#define MLX5_CQE_L4_TYPE_SHIFT 0x4
+
+/* L4 type to indicate TCP packet without acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
+
+/* L4 type to indicate TCP packet with acknowledgment. */
+#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
+
+/* Inner L3 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
+
+/* Inner L4 checksum offload (Tunneled packets only). */
+#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
+
+/* Outer L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
+
+/* Outer L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
+
+/* Outer L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
+
+/* Outer L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
+
+/* Inner L4 type is TCP. */
+#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
+
+/* Inner L4 type is UDP. */
+#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
+
+/* Inner L3 type is IPV4. */
+#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
+
+/* Inner L3 type is IPV6. */
+#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
+
+/* VLAN insertion flag. */
+#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
+
+/* Data inline segment flag. */
+#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
+
+/* Is flow mark valid. */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
+#else
+#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
+#endif
+
+/* INVALID is used by packets matching no flow rules. */
+#define MLX5_FLOW_MARK_INVALID 0
+
+/* Maximum allowed value to mark a packet. */
+#define MLX5_FLOW_MARK_MAX 0xfffff0
+
+/* Default mark value used when none is provided. */
+#define MLX5_FLOW_MARK_DEFAULT 0xffffff
+
+/* Default mark mask for metadata legacy mode. */
+#define MLX5_FLOW_MARK_MASK 0xffffff
+
+/* Maximum number of DS in WQE. Limited by 6-bit field. */
+#define MLX5_DSEG_MAX 63
+
+/* The completion mode offset in the WQE control segment line 2. */
+#define MLX5_COMP_MODE_OFFSET 2
+
+/* Amount of data bytes in minimal inline data segment. */
+#define MLX5_DSEG_MIN_INLINE_SIZE 12u
+
+/* Amount of data bytes in minimal inline eth segment. */
+#define MLX5_ESEG_MIN_INLINE_SIZE 18u
+
+/* Amount of data bytes after eth data segment. */
+#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
+
+/* The maximum log value of segments per RQ WQE. */
+#define MLX5_MAX_LOG_RQ_SEGS 5u
+
+/* The alignment needed for WQ buffer. */
+#define MLX5_WQE_BUF_ALIGNMENT 512
+
+/* Completion mode. */
+enum mlx5_completion_mode {
+	MLX5_COMP_ONLY_ERR = 0x0,
+	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
+	MLX5_COMP_ALWAYS = 0x2,
+	MLX5_COMP_CQE_AND_EQE = 0x3,
+};
+
+/* MPW mode. */
+enum mlx5_mpw_mode {
+	MLX5_MPW_DISABLED,
+	MLX5_MPW,
+	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
+};
+
+/* WQE Control segment. */
+struct mlx5_wqe_cseg {
+	uint32_t opcode;
+	uint32_t sq_ds;
+	uint32_t flags;
+	uint32_t misc;
+} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
+
+/* Header of data segment. Minimal size Data Segment */
+struct mlx5_wqe_dseg {
+	uint32_t bcount;
+	union {
+		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
+		struct {
+			uint32_t lkey;
+			uint64_t pbuf;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* Subset of struct WQE Ethernet Segment. */
+struct mlx5_wqe_eseg {
+	union {
+		struct {
+			uint32_t swp_offs;
+			uint8_t	cs_flags;
+			uint8_t	swp_flags;
+			uint16_t mss;
+			uint32_t metadata;
+			uint16_t inline_hdr_sz;
+			union {
+				uint16_t inline_data;
+				uint16_t vlan_tag;
+			};
+		} __rte_packed;
+		struct {
+			uint32_t offsets;
+			uint32_t flags;
+			uint32_t flow_metadata;
+			uint32_t inline_hdr;
+		} __rte_packed;
+	};
+} __rte_packed;
+
+/* The title WQEBB, header of WQE. */
+struct mlx5_wqe {
+	union {
+		struct mlx5_wqe_cseg cseg;
+		uint32_t ctrl[4];
+	};
+	struct mlx5_wqe_eseg eseg;
+	union {
+		struct mlx5_wqe_dseg dseg[2];
+		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
+	};
+} __rte_packed;
+
+/* WQE for Multi-Packet RQ. */
+struct mlx5_wqe_mprq {
+	struct mlx5_wqe_srq_next_seg next_seg;
+	struct mlx5_wqe_data_seg dseg;
+};
+
+#define MLX5_MPRQ_LEN_MASK 0x000ffff
+#define MLX5_MPRQ_LEN_SHIFT 0
+#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
+#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
+#define MLX5_MPRQ_FILLER_MASK 0x80000000
+#define MLX5_MPRQ_FILLER_SHIFT 31
+
+#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
+
+/* CQ element structure - should be equal to the cache line size */
+struct mlx5_cqe {
+#if (RTE_CACHE_LINE_SIZE == 128)
+	uint8_t padding[64];
+#endif
+	uint8_t pkt_info;
+	uint8_t rsvd0;
+	uint16_t wqe_id;
+	uint8_t lro_tcppsh_abort_dupack;
+	uint8_t lro_min_ttl;
+	uint16_t lro_tcp_win;
+	uint32_t lro_ack_seq_num;
+	uint32_t rx_hash_res;
+	uint8_t rx_hash_type;
+	uint8_t rsvd1[3];
+	uint16_t csum;
+	uint8_t rsvd2[6];
+	uint16_t hdr_type_etc;
+	uint16_t vlan_info;
+	uint8_t lro_num_seg;
+	uint8_t rsvd3[3];
+	uint32_t flow_table_metadata;
+	uint8_t rsvd4[4];
+	uint32_t byte_cnt;
+	uint64_t timestamp;
+	uint32_t sop_drop_qpn;
+	uint16_t wqe_counter;
+	uint8_t rsvd5;
+	uint8_t op_own;
+};
+
+/* Adding direct verbs to data-path. */
+
+/* CQ sequence number mask. */
+#define MLX5_CQ_SQN_MASK 0x3
+
+/* CQ sequence number index. */
+#define MLX5_CQ_SQN_OFFSET 28
+
+/* CQ doorbell index mask. */
+#define MLX5_CI_MASK 0xffffff
+
+/* CQ doorbell offset. */
+#define MLX5_CQ_ARM_DB 1
+
+/* CQ doorbell offset*/
+#define MLX5_CQ_DOORBELL 0x20
+
+/* CQE format value. */
+#define MLX5_COMPRESSED 0x3
+
+/* Action type of header modification. */
+enum {
+	MLX5_MODIFICATION_TYPE_SET = 0x1,
+	MLX5_MODIFICATION_TYPE_ADD = 0x2,
+	MLX5_MODIFICATION_TYPE_COPY = 0x3,
+};
+
+/* The field of packet to be modified. */
+enum mlx5_modification_field {
+	MLX5_MODI_OUT_NONE = -1,
+	MLX5_MODI_OUT_SMAC_47_16 = 1,
+	MLX5_MODI_OUT_SMAC_15_0,
+	MLX5_MODI_OUT_ETHERTYPE,
+	MLX5_MODI_OUT_DMAC_47_16,
+	MLX5_MODI_OUT_DMAC_15_0,
+	MLX5_MODI_OUT_IP_DSCP,
+	MLX5_MODI_OUT_TCP_FLAGS,
+	MLX5_MODI_OUT_TCP_SPORT,
+	MLX5_MODI_OUT_TCP_DPORT,
+	MLX5_MODI_OUT_IPV4_TTL,
+	MLX5_MODI_OUT_UDP_SPORT,
+	MLX5_MODI_OUT_UDP_DPORT,
+	MLX5_MODI_OUT_SIPV6_127_96,
+	MLX5_MODI_OUT_SIPV6_95_64,
+	MLX5_MODI_OUT_SIPV6_63_32,
+	MLX5_MODI_OUT_SIPV6_31_0,
+	MLX5_MODI_OUT_DIPV6_127_96,
+	MLX5_MODI_OUT_DIPV6_95_64,
+	MLX5_MODI_OUT_DIPV6_63_32,
+	MLX5_MODI_OUT_DIPV6_31_0,
+	MLX5_MODI_OUT_SIPV4,
+	MLX5_MODI_OUT_DIPV4,
+	MLX5_MODI_OUT_FIRST_VID,
+	MLX5_MODI_IN_SMAC_47_16 = 0x31,
+	MLX5_MODI_IN_SMAC_15_0,
+	MLX5_MODI_IN_ETHERTYPE,
+	MLX5_MODI_IN_DMAC_47_16,
+	MLX5_MODI_IN_DMAC_15_0,
+	MLX5_MODI_IN_IP_DSCP,
+	MLX5_MODI_IN_TCP_FLAGS,
+	MLX5_MODI_IN_TCP_SPORT,
+	MLX5_MODI_IN_TCP_DPORT,
+	MLX5_MODI_IN_IPV4_TTL,
+	MLX5_MODI_IN_UDP_SPORT,
+	MLX5_MODI_IN_UDP_DPORT,
+	MLX5_MODI_IN_SIPV6_127_96,
+	MLX5_MODI_IN_SIPV6_95_64,
+	MLX5_MODI_IN_SIPV6_63_32,
+	MLX5_MODI_IN_SIPV6_31_0,
+	MLX5_MODI_IN_DIPV6_127_96,
+	MLX5_MODI_IN_DIPV6_95_64,
+	MLX5_MODI_IN_DIPV6_63_32,
+	MLX5_MODI_IN_DIPV6_31_0,
+	MLX5_MODI_IN_SIPV4,
+	MLX5_MODI_IN_DIPV4,
+	MLX5_MODI_OUT_IPV6_HOPLIMIT,
+	MLX5_MODI_IN_IPV6_HOPLIMIT,
+	MLX5_MODI_META_DATA_REG_A,
+	MLX5_MODI_META_DATA_REG_B = 0x50,
+	MLX5_MODI_META_REG_C_0,
+	MLX5_MODI_META_REG_C_1,
+	MLX5_MODI_META_REG_C_2,
+	MLX5_MODI_META_REG_C_3,
+	MLX5_MODI_META_REG_C_4,
+	MLX5_MODI_META_REG_C_5,
+	MLX5_MODI_META_REG_C_6,
+	MLX5_MODI_META_REG_C_7,
+	MLX5_MODI_OUT_TCP_SEQ_NUM,
+	MLX5_MODI_IN_TCP_SEQ_NUM,
+	MLX5_MODI_OUT_TCP_ACK_NUM,
+	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
+};
+
+/* Total number of metadata reg_c's. */
+#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
+
+enum modify_reg {
+	REG_NONE = 0,
+	REG_A,
+	REG_B,
+	REG_C_0,
+	REG_C_1,
+	REG_C_2,
+	REG_C_3,
+	REG_C_4,
+	REG_C_5,
+	REG_C_6,
+	REG_C_7,
+};
+
+/* Modification sub command. */
+struct mlx5_modification_cmd {
+	union {
+		uint32_t data0;
+		struct {
+			unsigned int length:5;
+			unsigned int rsvd0:3;
+			unsigned int offset:5;
+			unsigned int rsvd1:3;
+			unsigned int field:12;
+			unsigned int action_type:4;
+		};
+	};
+	union {
+		uint32_t data1;
+		uint8_t data[4];
+		struct {
+			unsigned int rsvd2:8;
+			unsigned int dst_offset:5;
+			unsigned int rsvd3:3;
+			unsigned int dst_field:12;
+			unsigned int rsvd4:4;
+		};
+	};
+};
+
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+
+#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
+#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
+#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
+				  (&(__mlx5_nullp(typ)->fld)))
+#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0x1f))
+#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
+#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
+#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
+				  __mlx5_dw_bit_off(typ, fld))
+#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
+#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
+				    (__mlx5_bit_off(typ, fld) & 0xf))
+#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
+#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
+#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
+#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
+
+/* insert a value to a struct */
+#define MLX5_SET(typ, p, fld, v) \
+	do { \
+		u32 _v = v; \
+		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
+		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
+				  __mlx5_dw_off(typ, fld))) & \
+				  (~__mlx5_dw_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask(typ, fld)) << \
+				   __mlx5_dw_bit_off(typ, fld))); \
+	} while (0)
+
+#define MLX5_SET64(typ, p, fld, v) \
+	do { \
+		assert(__mlx5_bit_sz(typ, fld) == 64); \
+		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
+			rte_cpu_to_be_64(v); \
+	} while (0)
+
+#define MLX5_GET(typ, p, fld) \
+	((rte_be_to_cpu_32(*((__be32 *)(p) +\
+	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
+	__mlx5_mask(typ, fld))
+#define MLX5_GET16(typ, p, fld) \
+	((rte_be_to_cpu_16(*((__be16 *)(p) + \
+	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
+	 __mlx5_mask16(typ, fld))
+#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
+						   __mlx5_64_off(typ, fld)))
+#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
+
+struct mlx5_ifc_fte_match_set_misc_bits {
+	u8 gre_c_present[0x1];
+	u8 reserved_at_1[0x1];
+	u8 gre_k_present[0x1];
+	u8 gre_s_present[0x1];
+	u8 source_vhci_port[0x4];
+	u8 source_sqn[0x18];
+	u8 reserved_at_20[0x10];
+	u8 source_port[0x10];
+	u8 outer_second_prio[0x3];
+	u8 outer_second_cfi[0x1];
+	u8 outer_second_vid[0xc];
+	u8 inner_second_prio[0x3];
+	u8 inner_second_cfi[0x1];
+	u8 inner_second_vid[0xc];
+	u8 outer_second_cvlan_tag[0x1];
+	u8 inner_second_cvlan_tag[0x1];
+	u8 outer_second_svlan_tag[0x1];
+	u8 inner_second_svlan_tag[0x1];
+	u8 reserved_at_64[0xc];
+	u8 gre_protocol[0x10];
+	u8 gre_key_h[0x18];
+	u8 gre_key_l[0x8];
+	u8 vxlan_vni[0x18];
+	u8 reserved_at_b8[0x8];
+	u8 geneve_vni[0x18];
+	u8 reserved_at_e4[0x7];
+	u8 geneve_oam[0x1];
+	u8 reserved_at_e0[0xc];
+	u8 outer_ipv6_flow_label[0x14];
+	u8 reserved_at_100[0xc];
+	u8 inner_ipv6_flow_label[0x14];
+	u8 reserved_at_120[0xa];
+	u8 geneve_opt_len[0x6];
+	u8 geneve_protocol_type[0x10];
+	u8 reserved_at_140[0xc0];
+};
+
+struct mlx5_ifc_ipv4_layout_bits {
+	u8 reserved_at_0[0x60];
+	u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+	u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+	u8 reserved_at_0[0x80];
+};
+
+struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
+	u8 smac_47_16[0x20];
+	u8 smac_15_0[0x10];
+	u8 ethertype[0x10];
+	u8 dmac_47_16[0x20];
+	u8 dmac_15_0[0x10];
+	u8 first_prio[0x3];
+	u8 first_cfi[0x1];
+	u8 first_vid[0xc];
+	u8 ip_protocol[0x8];
+	u8 ip_dscp[0x6];
+	u8 ip_ecn[0x2];
+	u8 cvlan_tag[0x1];
+	u8 svlan_tag[0x1];
+	u8 frag[0x1];
+	u8 ip_version[0x4];
+	u8 tcp_flags[0x9];
+	u8 tcp_sport[0x10];
+	u8 tcp_dport[0x10];
+	u8 reserved_at_c0[0x20];
+	u8 udp_sport[0x10];
+	u8 udp_dport[0x10];
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
+	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
+};
+
+struct mlx5_ifc_fte_match_mpls_bits {
+	u8 mpls_label[0x14];
+	u8 mpls_exp[0x3];
+	u8 mpls_s_bos[0x1];
+	u8 mpls_ttl[0x8];
+};
+
+struct mlx5_ifc_fte_match_set_misc2_bits {
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
+	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
+	u8 metadata_reg_c_7[0x20];
+	u8 metadata_reg_c_6[0x20];
+	u8 metadata_reg_c_5[0x20];
+	u8 metadata_reg_c_4[0x20];
+	u8 metadata_reg_c_3[0x20];
+	u8 metadata_reg_c_2[0x20];
+	u8 metadata_reg_c_1[0x20];
+	u8 metadata_reg_c_0[0x20];
+	u8 metadata_reg_a[0x20];
+	u8 metadata_reg_b[0x20];
+	u8 reserved_at_1c0[0x40];
+};
+
+struct mlx5_ifc_fte_match_set_misc3_bits {
+	u8 inner_tcp_seq_num[0x20];
+	u8 outer_tcp_seq_num[0x20];
+	u8 inner_tcp_ack_num[0x20];
+	u8 outer_tcp_ack_num[0x20];
+	u8 reserved_at_auto1[0x8];
+	u8 outer_vxlan_gpe_vni[0x18];
+	u8 outer_vxlan_gpe_next_protocol[0x8];
+	u8 outer_vxlan_gpe_flags[0x8];
+	u8 reserved_at_a8[0x10];
+	u8 icmp_header_data[0x20];
+	u8 icmpv6_header_data[0x20];
+	u8 icmp_type[0x8];
+	u8 icmp_code[0x8];
+	u8 icmpv6_type[0x8];
+	u8 icmpv6_code[0x8];
+	u8 reserved_at_120[0x20];
+	u8 gtpu_teid[0x20];
+	u8 gtpu_msg_type[0x08];
+	u8 gtpu_msg_flags[0x08];
+	u8 reserved_at_170[0x90];
+};
+
+/* Flow matcher. */
+struct mlx5_ifc_fte_match_param_bits {
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
+	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
+	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
+	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
+	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
+};
+
+enum {
+	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
+	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
+};
+
+enum {
+	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
+	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
+	MLX5_CMD_OP_CREATE_TIR = 0x900,
+	MLX5_CMD_OP_CREATE_SQ = 0X904,
+	MLX5_CMD_OP_MODIFY_SQ = 0X905,
+	MLX5_CMD_OP_CREATE_RQ = 0x908,
+	MLX5_CMD_OP_MODIFY_RQ = 0x909,
+	MLX5_CMD_OP_CREATE_TIS = 0x912,
+	MLX5_CMD_OP_QUERY_TIS = 0x915,
+	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
+	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+};
+
+enum {
+	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+};
+
+/* Flow counters. */
+struct mlx5_ifc_alloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_40[0x18];
+	u8         flow_counter_bulk[0x8];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_dealloc_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         flow_counter_id[0x20];
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_traffic_counter_bits {
+	u8         packets[0x40];
+	u8         octets[0x40];
+};
+
+struct mlx5_ifc_query_flow_counter_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+	u8         syndrome[0x20];
+	u8         reserved_at_40[0x40];
+	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
+};
+
+struct mlx5_ifc_query_flow_counter_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+	u8         reserved_at_40[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+	u8         clear[0x1];
+	u8         dump_to_memory[0x1];
+	u8         num_of_counters[0x1e];
+	u8         flow_counter_id[0x20];
+};
+
+struct mlx5_ifc_mkc_bits {
+	u8         reserved_at_0[0x1];
+	u8         free[0x1];
+	u8         reserved_at_2[0x1];
+	u8         access_mode_4_2[0x3];
+	u8         reserved_at_6[0x7];
+	u8         relaxed_ordering_write[0x1];
+	u8         reserved_at_e[0x1];
+	u8         small_fence_on_rdma_read_response[0x1];
+	u8         umr_en[0x1];
+	u8         a[0x1];
+	u8         rw[0x1];
+	u8         rr[0x1];
+	u8         lw[0x1];
+	u8         lr[0x1];
+	u8         access_mode_1_0[0x2];
+	u8         reserved_at_18[0x8];
+
+	u8         qpn[0x18];
+	u8         mkey_7_0[0x8];
+
+	u8         reserved_at_40[0x20];
+
+	u8         length64[0x1];
+	u8         bsf_en[0x1];
+	u8         sync_umr[0x1];
+	u8         reserved_at_63[0x2];
+	u8         expected_sigerr_count[0x1];
+	u8         reserved_at_66[0x1];
+	u8         en_rinval[0x1];
+	u8         pd[0x18];
+
+	u8         start_addr[0x40];
+
+	u8         len[0x40];
+
+	u8         bsf_octword_size[0x20];
+
+	u8         reserved_at_120[0x80];
+
+	u8         translations_octword_size[0x20];
+
+	u8         reserved_at_1c0[0x1b];
+	u8         log_page_size[0x5];
+
+	u8         reserved_at_1e0[0x20];
+};
+
+struct mlx5_ifc_create_mkey_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x8];
+	u8         mkey_index[0x18];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_mkey_in_bits {
+	u8         opcode[0x10];
+	u8         reserved_at_10[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         reserved_at_40[0x20];
+
+	u8         pg_access[0x1];
+	u8         reserved_at_61[0x1f];
+
+	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
+
+	u8         reserved_at_280[0x80];
+
+	u8         translations_octword_actual_size[0x20];
+
+	u8         mkey_umem_id[0x20];
+
+	u8         mkey_umem_offset[0x40];
+
+	u8         reserved_at_380[0x500];
+
+	u8         klm_pas_mtt[][0x20];
+};
+
+enum {
+	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+};
+
+enum {
+	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
+	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
+};
+
+enum {
+	MLX5_CAP_INLINE_MODE_L2,
+	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
+	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
+};
+
+enum {
+	MLX5_INLINE_MODE_NONE,
+	MLX5_INLINE_MODE_L2,
+	MLX5_INLINE_MODE_IP,
+	MLX5_INLINE_MODE_TCP_UDP,
+	MLX5_INLINE_MODE_RESERVED4,
+	MLX5_INLINE_MODE_INNER_L2,
+	MLX5_INLINE_MODE_INNER_IP,
+	MLX5_INLINE_MODE_INNER_TCP_UDP,
+};
+
+/* HCA bit masks indicating which Flex parser protocols are already enabled. */
+#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
+#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
+#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
+#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
+#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
+#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
+#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
+#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
+#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
+
+struct mlx5_ifc_cmd_hca_cap_bits {
+	u8 reserved_at_0[0x30];
+	u8 vhca_id[0x10];
+	u8 reserved_at_40[0x40];
+	u8 log_max_srq_sz[0x8];
+	u8 log_max_qp_sz[0x8];
+	u8 reserved_at_90[0xb];
+	u8 log_max_qp[0x5];
+	u8 reserved_at_a0[0xb];
+	u8 log_max_srq[0x5];
+	u8 reserved_at_b0[0x10];
+	u8 reserved_at_c0[0x8];
+	u8 log_max_cq_sz[0x8];
+	u8 reserved_at_d0[0xb];
+	u8 log_max_cq[0x5];
+	u8 log_max_eq_sz[0x8];
+	u8 reserved_at_e8[0x2];
+	u8 log_max_mkey[0x6];
+	u8 reserved_at_f0[0x8];
+	u8 dump_fill_mkey[0x1];
+	u8 reserved_at_f9[0x3];
+	u8 log_max_eq[0x4];
+	u8 max_indirection[0x8];
+	u8 fixed_buffer_size[0x1];
+	u8 log_max_mrw_sz[0x7];
+	u8 force_teardown[0x1];
+	u8 reserved_at_111[0x1];
+	u8 log_max_bsf_list_size[0x6];
+	u8 umr_extended_translation_offset[0x1];
+	u8 null_mkey[0x1];
+	u8 log_max_klm_list_size[0x6];
+	u8 reserved_at_120[0xa];
+	u8 log_max_ra_req_dc[0x6];
+	u8 reserved_at_130[0xa];
+	u8 log_max_ra_res_dc[0x6];
+	u8 reserved_at_140[0xa];
+	u8 log_max_ra_req_qp[0x6];
+	u8 reserved_at_150[0xa];
+	u8 log_max_ra_res_qp[0x6];
+	u8 end_pad[0x1];
+	u8 cc_query_allowed[0x1];
+	u8 cc_modify_allowed[0x1];
+	u8 start_pad[0x1];
+	u8 cache_line_128byte[0x1];
+	u8 reserved_at_165[0xa];
+	u8 qcam_reg[0x1];
+	u8 gid_table_size[0x10];
+	u8 out_of_seq_cnt[0x1];
+	u8 vport_counters[0x1];
+	u8 retransmission_q_counters[0x1];
+	u8 debug[0x1];
+	u8 modify_rq_counter_set_id[0x1];
+	u8 rq_delay_drop[0x1];
+	u8 max_qp_cnt[0xa];
+	u8 pkey_table_size[0x10];
+	u8 vport_group_manager[0x1];
+	u8 vhca_group_manager[0x1];
+	u8 ib_virt[0x1];
+	u8 eth_virt[0x1];
+	u8 vnic_env_queue_counters[0x1];
+	u8 ets[0x1];
+	u8 nic_flow_table[0x1];
+	u8 eswitch_manager[0x1];
+	u8 device_memory[0x1];
+	u8 mcam_reg[0x1];
+	u8 pcam_reg[0x1];
+	u8 local_ca_ack_delay[0x5];
+	u8 port_module_event[0x1];
+	u8 enhanced_error_q_counters[0x1];
+	u8 ports_check[0x1];
+	u8 reserved_at_1b3[0x1];
+	u8 disable_link_up[0x1];
+	u8 beacon_led[0x1];
+	u8 port_type[0x2];
+	u8 num_ports[0x8];
+	u8 reserved_at_1c0[0x1];
+	u8 pps[0x1];
+	u8 pps_modify[0x1];
+	u8 log_max_msg[0x5];
+	u8 reserved_at_1c8[0x4];
+	u8 max_tc[0x4];
+	u8 temp_warn_event[0x1];
+	u8 dcbx[0x1];
+	u8 general_notification_event[0x1];
+	u8 reserved_at_1d3[0x2];
+	u8 fpga[0x1];
+	u8 rol_s[0x1];
+	u8 rol_g[0x1];
+	u8 reserved_at_1d8[0x1];
+	u8 wol_s[0x1];
+	u8 wol_g[0x1];
+	u8 wol_a[0x1];
+	u8 wol_b[0x1];
+	u8 wol_m[0x1];
+	u8 wol_u[0x1];
+	u8 wol_p[0x1];
+	u8 stat_rate_support[0x10];
+	u8 reserved_at_1f0[0xc];
+	u8 cqe_version[0x4];
+	u8 compact_address_vector[0x1];
+	u8 striding_rq[0x1];
+	u8 reserved_at_202[0x1];
+	u8 ipoib_enhanced_offloads[0x1];
+	u8 ipoib_basic_offloads[0x1];
+	u8 reserved_at_205[0x1];
+	u8 repeated_block_disabled[0x1];
+	u8 umr_modify_entity_size_disabled[0x1];
+	u8 umr_modify_atomic_disabled[0x1];
+	u8 umr_indirect_mkey_disabled[0x1];
+	u8 umr_fence[0x2];
+	u8 reserved_at_20c[0x3];
+	u8 drain_sigerr[0x1];
+	u8 cmdif_checksum[0x2];
+	u8 sigerr_cqe[0x1];
+	u8 reserved_at_213[0x1];
+	u8 wq_signature[0x1];
+	u8 sctr_data_cqe[0x1];
+	u8 reserved_at_216[0x1];
+	u8 sho[0x1];
+	u8 tph[0x1];
+	u8 rf[0x1];
+	u8 dct[0x1];
+	u8 qos[0x1];
+	u8 eth_net_offloads[0x1];
+	u8 roce[0x1];
+	u8 atomic[0x1];
+	u8 reserved_at_21f[0x1];
+	u8 cq_oi[0x1];
+	u8 cq_resize[0x1];
+	u8 cq_moderation[0x1];
+	u8 reserved_at_223[0x3];
+	u8 cq_eq_remap[0x1];
+	u8 pg[0x1];
+	u8 block_lb_mc[0x1];
+	u8 reserved_at_229[0x1];
+	u8 scqe_break_moderation[0x1];
+	u8 cq_period_start_from_cqe[0x1];
+	u8 cd[0x1];
+	u8 reserved_at_22d[0x1];
+	u8 apm[0x1];
+	u8 vector_calc[0x1];
+	u8 umr_ptr_rlky[0x1];
+	u8 imaicl[0x1];
+	u8 reserved_at_232[0x4];
+	u8 qkv[0x1];
+	u8 pkv[0x1];
+	u8 set_deth_sqpn[0x1];
+	u8 reserved_at_239[0x3];
+	u8 xrc[0x1];
+	u8 ud[0x1];
+	u8 uc[0x1];
+	u8 rc[0x1];
+	u8 uar_4k[0x1];
+	u8 reserved_at_241[0x9];
+	u8 uar_sz[0x6];
+	u8 reserved_at_250[0x8];
+	u8 log_pg_sz[0x8];
+	u8 bf[0x1];
+	u8 driver_version[0x1];
+	u8 pad_tx_eth_packet[0x1];
+	u8 reserved_at_263[0x8];
+	u8 log_bf_reg_size[0x5];
+	u8 reserved_at_270[0xb];
+	u8 lag_master[0x1];
+	u8 num_lag_ports[0x4];
+	u8 reserved_at_280[0x10];
+	u8 max_wqe_sz_sq[0x10];
+	u8 reserved_at_2a0[0x10];
+	u8 max_wqe_sz_rq[0x10];
+	u8 max_flow_counter_31_16[0x10];
+	u8 max_wqe_sz_sq_dc[0x10];
+	u8 reserved_at_2e0[0x7];
+	u8 max_qp_mcg[0x19];
+	u8 reserved_at_300[0x10];
+	u8 flow_counter_bulk_alloc[0x08];
+	u8 log_max_mcg[0x8];
+	u8 reserved_at_320[0x3];
+	u8 log_max_transport_domain[0x5];
+	u8 reserved_at_328[0x3];
+	u8 log_max_pd[0x5];
+	u8 reserved_at_330[0xb];
+	u8 log_max_xrcd[0x5];
+	u8 nic_receive_steering_discard[0x1];
+	u8 receive_discard_vport_down[0x1];
+	u8 transmit_discard_vport_down[0x1];
+	u8 reserved_at_343[0x5];
+	u8 log_max_flow_counter_bulk[0x8];
+	u8 max_flow_counter_15_0[0x10];
+	u8 modify_tis[0x1];
+	u8 flow_counters_dump[0x1];
+	u8 reserved_at_360[0x1];
+	u8 log_max_rq[0x5];
+	u8 reserved_at_368[0x3];
+	u8 log_max_sq[0x5];
+	u8 reserved_at_370[0x3];
+	u8 log_max_tir[0x5];
+	u8 reserved_at_378[0x3];
+	u8 log_max_tis[0x5];
+	u8 basic_cyclic_rcv_wqe[0x1];
+	u8 reserved_at_381[0x2];
+	u8 log_max_rmp[0x5];
+	u8 reserved_at_388[0x3];
+	u8 log_max_rqt[0x5];
+	u8 reserved_at_390[0x3];
+	u8 log_max_rqt_size[0x5];
+	u8 reserved_at_398[0x3];
+	u8 log_max_tis_per_sq[0x5];
+	u8 ext_stride_num_range[0x1];
+	u8 reserved_at_3a1[0x2];
+	u8 log_max_stride_sz_rq[0x5];
+	u8 reserved_at_3a8[0x3];
+	u8 log_min_stride_sz_rq[0x5];
+	u8 reserved_at_3b0[0x3];
+	u8 log_max_stride_sz_sq[0x5];
+	u8 reserved_at_3b8[0x3];
+	u8 log_min_stride_sz_sq[0x5];
+	u8 hairpin[0x1];
+	u8 reserved_at_3c1[0x2];
+	u8 log_max_hairpin_queues[0x5];
+	u8 reserved_at_3c8[0x3];
+	u8 log_max_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3d0[0x3];
+	u8 log_max_hairpin_num_packets[0x5];
+	u8 reserved_at_3d8[0x3];
+	u8 log_max_wq_sz[0x5];
+	u8 nic_vport_change_event[0x1];
+	u8 disable_local_lb_uc[0x1];
+	u8 disable_local_lb_mc[0x1];
+	u8 log_min_hairpin_wq_data_sz[0x5];
+	u8 reserved_at_3e8[0x3];
+	u8 log_max_vlan_list[0x5];
+	u8 reserved_at_3f0[0x3];
+	u8 log_max_current_mc_list[0x5];
+	u8 reserved_at_3f8[0x3];
+	u8 log_max_current_uc_list[0x5];
+	u8 general_obj_types[0x40];
+	u8 reserved_at_440[0x20];
+	u8 reserved_at_460[0x10];
+	u8 max_num_eqs[0x10];
+	u8 reserved_at_480[0x3];
+	u8 log_max_l2_table[0x5];
+	u8 reserved_at_488[0x8];
+	u8 log_uar_page_sz[0x10];
+	u8 reserved_at_4a0[0x20];
+	u8 device_frequency_mhz[0x20];
+	u8 device_frequency_khz[0x20];
+	u8 reserved_at_500[0x20];
+	u8 num_of_uars_per_page[0x20];
+	u8 flex_parser_protocols[0x20];
+	u8 reserved_at_560[0x20];
+	u8 reserved_at_580[0x3c];
+	u8 mini_cqe_resp_stride_index[0x1];
+	u8 cqe_128_always[0x1];
+	u8 cqe_compression_128[0x1];
+	u8 cqe_compression[0x1];
+	u8 cqe_compression_timeout[0x10];
+	u8 cqe_compression_max_num[0x10];
+	u8 reserved_at_5e0[0x10];
+	u8 tag_matching[0x1];
+	u8 rndv_offload_rc[0x1];
+	u8 rndv_offload_dc[0x1];
+	u8 log_tag_matching_list_sz[0x5];
+	u8 reserved_at_5f8[0x3];
+	u8 log_max_xrq[0x5];
+	u8 affiliate_nic_vport_criteria[0x8];
+	u8 native_port_num[0x8];
+	u8 num_vhca_ports[0x8];
+	u8 reserved_at_618[0x6];
+	u8 sw_owner_id[0x1];
+	u8 reserved_at_61f[0x1e1];
+};
+
+struct mlx5_ifc_qos_cap_bits {
+	u8 packet_pacing[0x1];
+	u8 esw_scheduling[0x1];
+	u8 esw_bw_share[0x1];
+	u8 esw_rate_limit[0x1];
+	u8 reserved_at_4[0x1];
+	u8 packet_pacing_burst_bound[0x1];
+	u8 packet_pacing_typical_size[0x1];
+	u8 flow_meter_srtcm[0x1];
+	u8 reserved_at_8[0x8];
+	u8 log_max_flow_meter[0x8];
+	u8 flow_meter_reg_id[0x8];
+	u8 reserved_at_25[0x8];
+	u8 flow_meter_reg_share[0x1];
+	u8 reserved_at_2e[0x17];
+	u8 packet_pacing_max_rate[0x20];
+	u8 packet_pacing_min_rate[0x20];
+	u8 reserved_at_80[0x10];
+	u8 packet_pacing_rate_table_size[0x10];
+	u8 esw_element_type[0x10];
+	u8 esw_tsar_type[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 max_qos_para_vport[0x10];
+	u8 max_tsar_bw_share[0x20];
+	u8 reserved_at_100[0x6e8];
+};
+
+struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
+	u8 csum_cap[0x1];
+	u8 vlan_cap[0x1];
+	u8 lro_cap[0x1];
+	u8 lro_psh_flag[0x1];
+	u8 lro_time_stamp[0x1];
+	u8 lro_max_msg_sz_mode[0x2];
+	u8 wqe_vlan_insert[0x1];
+	u8 self_lb_en_modifiable[0x1];
+	u8 self_lb_mc[0x1];
+	u8 self_lb_uc[0x1];
+	u8 max_lso_cap[0x5];
+	u8 multi_pkt_send_wqe[0x2];
+	u8 wqe_inline_mode[0x2];
+	u8 rss_ind_tbl_cap[0x4];
+	u8 reg_umr_sq[0x1];
+	u8 scatter_fcs[0x1];
+	u8 enhanced_multi_pkt_send_wqe[0x1];
+	u8 tunnel_lso_const_out_ip_id[0x1];
+	u8 tunnel_lro_gre[0x1];
+	u8 tunnel_lro_vxlan[0x1];
+	u8 tunnel_stateless_gre[0x1];
+	u8 tunnel_stateless_vxlan[0x1];
+	u8 swp[0x1];
+	u8 swp_csum[0x1];
+	u8 swp_lso[0x1];
+	u8 reserved_at_23[0x8];
+	u8 tunnel_stateless_gtp[0x1];
+	u8 reserved_at_25[0x4];
+	u8 max_vxlan_udp_ports[0x8];
+	u8 reserved_at_38[0x6];
+	u8 max_geneve_opt_len[0x1];
+	u8 tunnel_stateless_geneve_rx[0x1];
+	u8 reserved_at_40[0x10];
+	u8 lro_min_mss_size[0x10];
+	u8 reserved_at_60[0x120];
+	u8 lro_timer_supported_periods[4][0x20];
+	u8 reserved_at_200[0x600];
+};
+
+union mlx5_ifc_hca_cap_union_bits {
+	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
+	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
+	       per_protocol_networking_offload_caps;
+	struct mlx5_ifc_qos_cap_bits qos_cap;
+	u8 reserved_at_0[0x8000];
+};
+
+struct mlx5_ifc_query_hca_cap_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	union mlx5_ifc_hca_cap_union_bits capability;
+};
+
+struct mlx5_ifc_query_hca_cap_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mac_address_layout_bits {
+	u8 reserved_at_0[0x10];
+	u8 mac_addr_47_32[0x10];
+	u8 mac_addr_31_0[0x20];
+};
+
+struct mlx5_ifc_nic_vport_context_bits {
+	u8 reserved_at_0[0x5];
+	u8 min_wqe_inline_mode[0x3];
+	u8 reserved_at_8[0x15];
+	u8 disable_mc_local_lb[0x1];
+	u8 disable_uc_local_lb[0x1];
+	u8 roce_en[0x1];
+	u8 arm_change_event[0x1];
+	u8 reserved_at_21[0x1a];
+	u8 event_on_mtu[0x1];
+	u8 event_on_promisc_change[0x1];
+	u8 event_on_vlan_change[0x1];
+	u8 event_on_mc_address_change[0x1];
+	u8 event_on_uc_address_change[0x1];
+	u8 reserved_at_40[0xc];
+	u8 affiliation_criteria[0x4];
+	u8 affiliated_vhca_id[0x10];
+	u8 reserved_at_60[0xd0];
+	u8 mtu[0x10];
+	u8 system_image_guid[0x40];
+	u8 port_guid[0x40];
+	u8 node_guid[0x40];
+	u8 reserved_at_200[0x140];
+	u8 qkey_violation_counter[0x10];
+	u8 reserved_at_350[0x430];
+	u8 promisc_uc[0x1];
+	u8 promisc_mc[0x1];
+	u8 promisc_all[0x1];
+	u8 reserved_at_783[0x2];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_788[0xc];
+	u8 allowed_list_size[0xc];
+	struct mlx5_ifc_mac_address_layout_bits permanent_address;
+	u8 reserved_at_7e0[0x20];
+};
+
+struct mlx5_ifc_query_nic_vport_context_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
+};
+
+struct mlx5_ifc_query_nic_vport_context_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 other_vport[0x1];
+	u8 reserved_at_41[0xf];
+	u8 vport_number[0x10];
+	u8 reserved_at_60[0x5];
+	u8 allowed_list_type[0x3];
+	u8 reserved_at_68[0x18];
+};
+
+struct mlx5_ifc_tisc_bits {
+	u8 strict_lag_tx_port_affinity[0x1];
+	u8 reserved_at_1[0x3];
+	u8 lag_tx_port_affinity[0x04];
+	u8 reserved_at_8[0x4];
+	u8 prio[0x4];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x100];
+	u8 reserved_at_120[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_140[0x8];
+	u8 underlay_qpn[0x18];
+	u8 reserved_at_160[0x3a0];
+};
+
+struct mlx5_ifc_query_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_tisc_bits tis_context;
+};
+
+struct mlx5_ifc_query_tis_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 transport_domain[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_alloc_transport_domain_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+};
+
+enum {
+	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
+	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
+	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
+	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
+};
+
+enum {
+	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
+	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
+};
+
+struct mlx5_ifc_wq_bits {
+	u8 wq_type[0x4];
+	u8 wq_signature[0x1];
+	u8 end_padding_mode[0x2];
+	u8 cd_slave[0x1];
+	u8 reserved_at_8[0x18];
+	u8 hds_skip_first_sge[0x1];
+	u8 log2_hds_buf_size[0x3];
+	u8 reserved_at_24[0x7];
+	u8 page_offset[0x5];
+	u8 lwm[0x10];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x8];
+	u8 uar_page[0x18];
+	u8 dbr_addr[0x40];
+	u8 hw_counter[0x20];
+	u8 sw_counter[0x20];
+	u8 reserved_at_100[0xc];
+	u8 log_wq_stride[0x4];
+	u8 reserved_at_110[0x3];
+	u8 log_wq_pg_sz[0x5];
+	u8 reserved_at_118[0x3];
+	u8 log_wq_sz[0x5];
+	u8 dbr_umem_valid[0x1];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_122[0x1];
+	u8 log_hairpin_num_packets[0x5];
+	u8 reserved_at_128[0x3];
+	u8 log_hairpin_data_sz[0x5];
+	u8 reserved_at_130[0x4];
+	u8 single_wqe_log_num_of_strides[0x4];
+	u8 two_byte_shift_en[0x1];
+	u8 reserved_at_139[0x4];
+	u8 single_stride_log_num_of_bytes[0x3];
+	u8 dbr_umem_id[0x20];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_offset[0x40];
+	u8 reserved_at_1c0[0x440];
+};
+
+enum {
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
+	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
+};
+
+enum {
+	MLX5_RQC_STATE_RST  = 0x0,
+	MLX5_RQC_STATE_RDY  = 0x1,
+	MLX5_RQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_rqc_bits {
+	u8 rlky[0x1];
+	u8 delay_drop_en[0x1];
+	u8 scatter_fcs[0x1];
+	u8 vsd[0x1];
+	u8 mem_rq_type[0x4];
+	u8 state[0x4];
+	u8 reserved_at_c[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 counter_set_id[0x8];
+	u8 reserved_at_68[0x18];
+	u8 reserved_at_80[0x8];
+	u8 rmpn[0x18];
+	u8 reserved_at_a0[0x8];
+	u8 hairpin_peer_sq[0x18];
+	u8 reserved_at_c0[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_e0[0xa0];
+	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
+};
+
+struct mlx5_ifc_create_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+struct mlx5_ifc_modify_rq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_create_tis_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tisn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tis_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tisc_bits ctx;
+};
+
+enum {
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
+	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
+};
+
+struct mlx5_ifc_modify_rq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 rq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 rqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqc_bits ctx;
+};
+
+enum {
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
+	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
+};
+
+struct mlx5_ifc_rx_hash_field_select_bits {
+	u8 l3_prot_type[0x1];
+	u8 l4_prot_type[0x1];
+	u8 selected_fields[0x1e];
+};
+
+enum {
+	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
+	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
+};
+
+enum {
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
+	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
+};
+
+enum {
+	MLX5_RX_HASH_FN_NONE           = 0x0,
+	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
+	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
+};
+
+enum {
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
+	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
+};
+
+enum {
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
+	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
+};
+
+struct mlx5_ifc_tirc_bits {
+	u8 reserved_at_0[0x20];
+	u8 disp_type[0x4];
+	u8 reserved_at_24[0x1c];
+	u8 reserved_at_40[0x40];
+	u8 reserved_at_80[0x4];
+	u8 lro_timeout_period_usecs[0x10];
+	u8 lro_enable_mask[0x4];
+	u8 lro_max_msg_sz[0x8];
+	u8 reserved_at_a0[0x40];
+	u8 reserved_at_e0[0x8];
+	u8 inline_rqn[0x18];
+	u8 rx_hash_symmetric[0x1];
+	u8 reserved_at_101[0x1];
+	u8 tunneled_offload_en[0x1];
+	u8 reserved_at_103[0x5];
+	u8 indirect_table[0x18];
+	u8 rx_hash_fn[0x4];
+	u8 reserved_at_124[0x2];
+	u8 self_lb_block[0x2];
+	u8 transport_domain[0x18];
+	u8 rx_hash_toeplitz_key[10][0x20];
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
+	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
+	u8 reserved_at_2c0[0x4c0];
+};
+
+struct mlx5_ifc_create_tir_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 tirn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_tir_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_tirc_bits ctx;
+};
+
+struct mlx5_ifc_rq_num_bits {
+	u8 reserved_at_0[0x8];
+	u8 rq_num[0x18];
+};
+
+struct mlx5_ifc_rqtc_bits {
+	u8 reserved_at_0[0xa0];
+	u8 reserved_at_a0[0x10];
+	u8 rqt_max_size[0x10];
+	u8 reserved_at_c0[0x10];
+	u8 rqt_actual_size[0x10];
+	u8 reserved_at_e0[0x6a0];
+	struct mlx5_ifc_rq_num_bits rq_num[];
+};
+
+struct mlx5_ifc_create_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+enum {
+	MLX5_SQC_STATE_RST  = 0x0,
+	MLX5_SQC_STATE_RDY  = 0x1,
+	MLX5_SQC_STATE_ERR  = 0x3,
+};
+
+struct mlx5_ifc_sqc_bits {
+	u8 rlky[0x1];
+	u8 cd_master[0x1];
+	u8 fre[0x1];
+	u8 flush_in_error_en[0x1];
+	u8 allow_multi_pkt_send_wqe[0x1];
+	u8 min_wqe_inline_mode[0x3];
+	u8 state[0x4];
+	u8 reg_umr[0x1];
+	u8 allow_swp[0x1];
+	u8 hairpin[0x1];
+	u8 reserved_at_f[0x11];
+	u8 reserved_at_20[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x8];
+	u8 hairpin_peer_rq[0x18];
+	u8 reserved_at_80[0x10];
+	u8 hairpin_peer_vhca[0x10];
+	u8 reserved_at_a0[0x50];
+	u8 packet_pacing_rate_limit_index[0x10];
+	u8 tis_lst_sz[0x10];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x40];
+	u8 reserved_at_160[0x8];
+	u8 tis_num_0[0x18];
+	struct mlx5_ifc_wq_bits wq;
+};
+
+struct mlx5_ifc_query_sq_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_modify_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 sq_state[0x4];
+	u8 reserved_at_44[0x4];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+struct mlx5_ifc_create_sq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 sqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_sq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0xc0];
+	struct mlx5_ifc_sqc_bits ctx;
+};
+
+enum {
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
+	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
+};
+
+struct mlx5_ifc_flow_meter_parameters_bits {
+	u8         valid[0x1];			// 00h
+	u8         bucket_overflow[0x1];
+	u8         start_color[0x2];
+	u8         both_buckets_on_green[0x1];
+	u8         meter_mode[0x2];
+	u8         reserved_at_1[0x19];
+	u8         reserved_at_2[0x20]; //04h
+	u8         reserved_at_3[0x3];
+	u8         cbs_exponent[0x5];		// 08h
+	u8         cbs_mantissa[0x8];
+	u8         reserved_at_4[0x3];
+	u8         cir_exponent[0x5];
+	u8         cir_mantissa[0x8];
+	u8         reserved_at_5[0x20];		// 0Ch
+	u8         reserved_at_6[0x3];
+	u8         ebs_exponent[0x5];		// 10h
+	u8         ebs_mantissa[0x8];
+	u8         reserved_at_7[0x3];
+	u8         eir_exponent[0x5];
+	u8         eir_mantissa[0x8];
+	u8         reserved_at_8[0x60];		// 14h-1Ch
+};
+
+/* CQE format mask. */
+#define MLX5E_CQE_FORMAT_MASK 0xc
+
+/* MPW opcode. */
+#define MLX5_OPC_MOD_MPW 0x01
+
+/* Compressed Rx CQE structure. */
+struct mlx5_mini_cqe8 {
+	union {
+		uint32_t rx_hash_result;
+		struct {
+			uint16_t checksum;
+			uint16_t stride_idx;
+		};
+		struct {
+			uint16_t wqe_counter;
+			uint8_t  s_wqe_opcode;
+			uint8_t  reserved;
+		} s_wqe_info;
+	};
+	uint32_t byte_cnt;
+};
+
+/* srTCM PRM flow meter parameters. */
+enum {
+	MLX5_FLOW_COLOR_RED = 0,
+	MLX5_FLOW_COLOR_YELLOW,
+	MLX5_FLOW_COLOR_GREEN,
+	MLX5_FLOW_COLOR_UNDEFINED,
+};
+
+/* Maximum value of srTCM metering parameters. */
+#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
+#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
+#define MLX5_SRTCM_EBS_MAX 0
+
+/* The bits meter color use. */
+#define MLX5_MTR_COLOR_BITS 8
+
+/**
+ * Convert a user mark to flow mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_set(uint32_t val)
+{
+	uint32_t ret;
+
+	/*
+	 * Add one to the user value to differentiate un-marked flows from
+	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
+	 * remains untouched.
+	 */
+	if (val != MLX5_FLOW_MARK_DEFAULT)
+		++val;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	/*
+	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
+	 * word, byte-swapped by the kernel on little-endian systems. In this
+	 * case, left-shifting the resulting big-endian value ensures the
+	 * least significant 24 bits are retained when converting it back.
+	 */
+	ret = rte_cpu_to_be_32(val) >> 8;
+#else
+	ret = val;
+#endif
+	return ret;
+}
+
+/**
+ * Convert a mark to user mark.
+ *
+ * @param val
+ *   Mark value to convert.
+ *
+ * @return
+ *   Converted mark value.
+ */
+static inline uint32_t
+mlx5_flow_mark_get(uint32_t val)
+{
+	/*
+	 * Subtract one from the retrieved value. It was added by
+	 * mlx5_flow_mark_set() to distinguish unmarked flows.
+	 */
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	return (val >> 8) - 1;
+#else
+	return val - 1;
+#endif
+}
+
+#endif /* RTE_PMD_MLX5_PRM_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
new file mode 100644
index 0000000..e4f85e2
--- /dev/null
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -0,0 +1,20 @@
+DPDK_20.02 {
+	global:
+
+	mlx5_devx_cmd_create_rq;
+	mlx5_devx_cmd_create_rqt;
+	mlx5_devx_cmd_create_sq;
+	mlx5_devx_cmd_create_tir;
+	mlx5_devx_cmd_create_td;
+	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_destroy;
+	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_query;
+	mlx5_devx_cmd_flow_dump;
+	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_qp_query_tis_td;
+	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_get_out_command_status;
+};
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 0466d9d..a9558ca 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -10,11 +10,14 @@ LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
 LIB_GLUE_VERSION = 20.02.0
 
+ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
+CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
+LDLIBS += -ldl
+endif
+
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
-ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
-endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
@@ -37,34 +40,22 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
-endif
-
 # Basic CFLAGS.
 CFLAGS += -O3
 CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
-CFLAGS += -I.
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-CFLAGS_mlx5_glue.o += -fPIC
-LDLIBS += -ldl
-else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
-LDLIBS += $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-else
-LDLIBS += -libverbs -lmlx5
-endif
+LDLIBS += -lrte_common_mlx5
 LDLIBS += -lm
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
@@ -74,6 +65,7 @@ LDLIBS += -lrte_bus_pci
 CFLAGS += -Wno-error=cast-qual
 
 EXPORT_MAP := rte_pmd_mlx5_version.map
+
 # memseg walk is not part of stable API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
@@ -96,282 +88,3 @@ endif
 
 include $(RTE_SDK)/mk/rte.lib.mk
 
-# Generate and clean-up mlx5_autoconf.h.
-
-export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
-export AUTO_CONFIG_CFLAGS += -Wno-error
-
-ifndef V
-AUTOCONF_OUTPUT := >/dev/null
-endif
-
-mlx5_autoconf.h.new: FORCE
-
-mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
-	$Q $(RM) -f -- '$@'
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_TUNNEL_SUPPORT \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_MPLS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_FLOW_SPEC_MPLS \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAGS_PCI_WRITE_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
-		infiniband/verbs.h \
-		enum IBV_WQ_FLAG_RX_END_PADDING \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_SWP \
-		infiniband/mlx5dv.h \
-		type 'struct mlx5dv_sw_parsing_caps' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_MPW \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_COMP \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_MLX5_MOD_CQE_128B_PAD \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DV_SUPPORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_create_flow_action_packet_reformat \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_NIC_RX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ESWITCH \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_DR_DOMAIN_TYPE_FDB \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_VLAN \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_push_vlan \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_DEVX_PORT \
-		infiniband/mlx5dv.h \
-		func mlx5dv_query_devx_port \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_OBJ \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_create \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_FLOW_DEVX_COUNTERS \
-		infiniband/mlx5dv.h \
-		enum MLX5DV_FLOW_ACTION_COUNTERS_DEVX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVX_ASYNC \
-		infiniband/mlx5dv.h \
-		func mlx5dv_devx_obj_query_async \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_dest_devx_tir \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dr_action_create_flow_meter \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5_DR_FLOW_DUMP \
-		infiniband/mlx5dv.h \
-		func mlx5dv_dump_dr_domain \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD \
-		infiniband/mlx5dv.h \
-		enum MLX5_MMAP_GET_NC_PAGES_CMD \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_25G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_50G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_ETHTOOL_LINK_MODE_100G \
-		/usr/include/linux/ethtool.h \
-		enum ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V42 \
-		infiniband/verbs.h \
-		type 'struct ibv_counter_set_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IBV_DEVICE_COUNTERS_SET_V45 \
-		infiniband/verbs.h \
-		type 'struct ibv_counters_init_attr' \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NL_NLDEV \
-		rdma/rdma_netlink.h \
-		enum RDMA_NL_NLDEV \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_CMD_PORT_GET \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_CMD_PORT_GET \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_DEV_NAME \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_DEV_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_PORT_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_PORT_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX \
-		rdma/rdma_netlink.h \
-		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_NUM_VF \
-		linux/if_link.h \
-		enum IFLA_NUM_VF \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_EXT_MASK \
-		linux/if_link.h \
-		enum IFLA_EXT_MASK \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_SWITCH_ID \
-		linux/if_link.h \
-		enum IFLA_PHYS_SWITCH_ID \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_IFLA_PHYS_PORT_NAME \
-		linux/if_link.h \
-		enum IFLA_PHYS_PORT_NAME \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_40000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_40000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseKR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseKR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseCR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseCR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseSR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseSR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_SUPPORTED_56000baseLR4_Full \
-		/usr/include/linux/ethtool.h \
-		define SUPPORTED_56000baseLR4_Full \
-		$(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_STATIC_ASSERT \
-		/usr/include/assert.h \
-		define static_assert \
-		$(AUTOCONF_OUTPUT)
-
-# Create mlx5_autoconf.h or update it in case it differs from the new one.
-
-mlx5_autoconf.h: mlx5_autoconf.h.new
-	$Q [ -f '$@' ] && \
-		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
-		mv '$<' '$@'
-
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
-
-# Generate dependency plug-in for rdma-core when the PMD must not be linked
-# directly, so that applications do not inherit this dependency.
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-
-$(LIB): $(LIB_GLUE)
-
-ifeq ($(LINK_USING_CC),1)
-GLUE_LDFLAGS := $(call linkerprefix,$(LDFLAGS))
-else
-GLUE_LDFLAGS := $(LDFLAGS)
-endif
-$(LIB_GLUE): mlx5_glue.o
-	$Q $(LD) $(GLUE_LDFLAGS) $(EXTRA_LDFLAGS) \
-		-Wl,-h,$(LIB_GLUE) \
-		-shared -o $@ $< -libverbs -lmlx5
-
-mlx5_glue.o: mlx5_autoconf.h
-
-endif
-
-clean_mlx5: FORCE
-	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
-	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
-
-clean: clean_mlx5
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 3ad4f02..f6d0db9 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -7,224 +7,54 @@ if not is_linux
 	reason = 'only supported on Linux'
 	subdir_done()
 endif
-build = true
 
-pmd_dlopen = (get_option('ibverbs_link') == 'dlopen')
 LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
 LIB_GLUE_VERSION = '20.02.0'
 LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-if pmd_dlopen
-	dpdk_conf.set('RTE_IBVERBS_LINK_DLOPEN', 1)
-	cflags += [
-		'-DMLX5_GLUE="@0@"'.format(LIB_GLUE),
-		'-DMLX5_GLUE_VERSION="@0@"'.format(LIB_GLUE_VERSION),
-	]
-endif
 
-libnames = [ 'mlx5', 'ibverbs' ]
-libs = []
-foreach libname:libnames
-	lib = dependency('lib' + libname, required:false)
-	if not lib.found()
-		lib = cc.find_library(libname, required:false)
-	endif
-	if lib.found()
-		libs += [ lib ]
-	else
-		build = false
-		reason = 'missing dependency, "' + libname + '"'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5']
+sources = files(
+	'mlx5.c',
+	'mlx5_ethdev.c',
+	'mlx5_flow.c',
+	'mlx5_flow_meter.c',
+	'mlx5_flow_dv.c',
+	'mlx5_flow_verbs.c',
+	'mlx5_mac.c',
+	'mlx5_mr.c',
+	'mlx5_nl.c',
+	'mlx5_rss.c',
+	'mlx5_rxmode.c',
+	'mlx5_rxq.c',
+	'mlx5_rxtx.c',
+	'mlx5_mp.c',
+	'mlx5_stats.c',
+	'mlx5_trigger.c',
+	'mlx5_txq.c',
+	'mlx5_vlan.c',
+	'mlx5_utils.c',
+	'mlx5_socket.c',
+)
+if (dpdk_conf.has('RTE_ARCH_X86_64')
+	or dpdk_conf.has('RTE_ARCH_ARM64')
+	or dpdk_conf.has('RTE_ARCH_PPC_64'))
+	sources += files('mlx5_rxtx_vec.c')
+endif
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
 	endif
 endforeach
-
-if build
-	allow_experimental_apis = true
-	deps += ['hash']
-	ext_deps += libs
-	sources = files(
-		'mlx5.c',
-		'mlx5_ethdev.c',
-		'mlx5_flow.c',
-		'mlx5_flow_meter.c',
-		'mlx5_flow_dv.c',
-		'mlx5_flow_verbs.c',
-		'mlx5_mac.c',
-		'mlx5_mr.c',
-		'mlx5_nl.c',
-		'mlx5_rss.c',
-		'mlx5_rxmode.c',
-		'mlx5_rxq.c',
-		'mlx5_rxtx.c',
-		'mlx5_mp.c',
-		'mlx5_stats.c',
-		'mlx5_trigger.c',
-		'mlx5_txq.c',
-		'mlx5_vlan.c',
-		'mlx5_devx_cmds.c',
-		'mlx5_utils.c',
-		'mlx5_socket.c',
-	)
-	if (dpdk_conf.has('RTE_ARCH_X86_64')
-		or dpdk_conf.has('RTE_ARCH_ARM64')
-		or dpdk_conf.has('RTE_ARCH_PPC_64'))
-		sources += files('mlx5_rxtx_vec.c')
-	endif
-	if not pmd_dlopen
-		sources += files('mlx5_glue.c')
-	endif
-	cflags_options = [
-		'-std=c11',
-		'-Wno-strict-prototypes',
-		'-D_BSD_SOURCE',
-		'-D_DEFAULT_SOURCE',
-		'-D_XOPEN_SOURCE=600'
-	]
-	foreach option:cflags_options
-		if cc.has_argument(option)
-			cflags += option
-		endif
-	endforeach
-	if get_option('buildtype').contains('debug')
-		cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
-	else
-		cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
-	endif
-	# To maintain the compatibility with the make build system
-	# mlx5_autoconf.h file is still generated.
-	# input array for meson member search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search", "struct member to search" ]
-	has_member_args = [
-		[ 'HAVE_IBV_MLX5_MOD_SWP', 'infiniband/mlx5dv.h',
-		'struct mlx5dv_sw_parsing_caps', 'sw_parsing_offloads' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V42', 'infiniband/verbs.h',
-		'struct ibv_counter_set_init_attr', 'counter_set_id' ],
-		[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
-		'struct ibv_counters_init_attr', 'comp_mask' ],
-	]
-	# input array for meson symbol search:
-	# [ "MACRO to define if found", "header for the search",
-	#   "symbol to search" ]
-	has_sym_args = [
-		[ 'HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQE_RES_FORMAT_CSUM_STRIDX' ],
-		[ 'HAVE_IBV_DEVICE_TUNNEL_SUPPORT', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS' ],
-		[ 'HAVE_IBV_MLX5_MOD_MPW', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_COMP', 'infiniband/mlx5dv.h',
-		'MLX5DV_CONTEXT_FLAGS_CQE_128B_COMP' ],
-		[ 'HAVE_IBV_MLX5_MOD_CQE_128B_PAD', 'infiniband/mlx5dv.h',
-		'MLX5DV_CQ_INIT_ATTR_FLAGS_CQE_PAD' ],
-		[ 'HAVE_IBV_FLOW_DV_SUPPORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_create_flow_action_packet_reformat' ],
-		[ 'HAVE_IBV_DEVICE_MPLS_SUPPORT', 'infiniband/verbs.h',
-		'IBV_FLOW_SPEC_MPLS' ],
-		[ 'HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAGS_PCI_WRITE_END_PADDING' ],
-		[ 'HAVE_IBV_WQ_FLAG_RX_END_PADDING', 'infiniband/verbs.h',
-		'IBV_WQ_FLAG_RX_END_PADDING' ],
-		[ 'HAVE_MLX5DV_DR_DEVX_PORT', 'infiniband/mlx5dv.h',
-		'mlx5dv_query_devx_port' ],
-		[ 'HAVE_IBV_DEVX_OBJ', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_create' ],
-		[ 'HAVE_IBV_FLOW_DEVX_COUNTERS', 'infiniband/mlx5dv.h',
-		'MLX5DV_FLOW_ACTION_COUNTERS_DEVX' ],
-		[ 'HAVE_IBV_DEVX_ASYNC', 'infiniband/mlx5dv.h',
-		'mlx5dv_devx_obj_query_async' ],
-		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_dest_devx_tir' ],
-		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_flow_meter' ],
-		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
-		'MLX5_MMAP_GET_NC_PAGES_CMD' ],
-		[ 'HAVE_MLX5DV_DR', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_NIC_RX' ],
-		[ 'HAVE_MLX5DV_DR_ESWITCH', 'infiniband/mlx5dv.h',
-		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
-		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
-		'mlx5dv_dr_action_create_push_vlan' ],
-		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_40000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_40000baseLR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseKR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseKR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseCR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseCR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseSR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseSR4_Full' ],
-		[ 'HAVE_SUPPORTED_56000baseLR4_Full', 'linux/ethtool.h',
-		'SUPPORTED_56000baseLR4_Full' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_25G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_25000baseCR_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_50G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
-		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
-		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
-		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
-		'IFLA_NUM_VF' ],
-		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
-		'IFLA_EXT_MASK' ],
-		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
-		'IFLA_PHYS_SWITCH_ID' ],
-		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
-		'IFLA_PHYS_PORT_NAME' ],
-		[ 'HAVE_RDMA_NL_NLDEV', 'rdma/rdma_netlink.h',
-		'RDMA_NL_NLDEV' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_GET' ],
-		[ 'HAVE_RDMA_NLDEV_CMD_PORT_GET', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_CMD_PORT_GET' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_DEV_NAME', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_DEV_NAME' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_PORT_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_PORT_INDEX' ],
-		[ 'HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX', 'rdma/rdma_netlink.h',
-		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
-		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
-		'mlx5dv_dump_dr_domain'],
-	]
-	config = configuration_data()
-	foreach arg:has_sym_args
-		config.set(arg[0], cc.has_header_symbol(arg[1], arg[2],
-			dependencies: libs))
-	endforeach
-	foreach arg:has_member_args
-		file_prefix = '#include <' + arg[1] + '>'
-		config.set(arg[0], cc.has_member(arg[2], arg[3],
-			prefix : file_prefix, dependencies: libs))
-	endforeach
-	configure_file(output : 'mlx5_autoconf.h', configuration : config)
-endif
-# Build Glue Library
-if pmd_dlopen and build
-	dlopen_name = 'mlx5_glue'
-	dlopen_lib_name = driver_name_fmt.format(dlopen_name)
-	dlopen_so_version = LIB_GLUE_VERSION
-	dlopen_sources = files('mlx5_glue.c')
-	dlopen_install_dir = [ eal_pmd_path + '-glue' ]
-	dlopen_includes = [global_inc]
-	dlopen_includes += include_directories(
-		'../../../lib/librte_eal/common/include/generic',
-	)
-	shared_lib = shared_library(
-		dlopen_lib_name,
-		dlopen_sources,
-		include_directories: dlopen_includes,
-		c_args: cflags,
-		dependencies: libs,
-		link_args: [
-		'-Wl,-export-dynamic',
-		'-Wl,-h,@0@'.format(LIB_GLUE),
-		],
-		soversion: dlopen_so_version,
-		install: true,
-		install_dir: dlopen_install_dir,
-	)
+if get_option('buildtype').contains('debug')
+	cflags += [ '-pedantic', '-UNDEBUG', '-DPEDANTIC' ]
+else
+	cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
 endif
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7126edf..7cf357d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -38,15 +38,16 @@
 #include <rte_string_fns.h>
 #include <rte_alarm.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_mr.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
 
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4d0485d..872fccb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -32,13 +32,14 @@
 #include <rte_errno.h>
 #include <rte_flow.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
 
 enum {
 	PCI_VENDOR_ID_MELLANOX = 0x15b3,
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.c b/drivers/net/mlx5/mlx5_devx_cmds.c
deleted file mode 100644
index 62ca590..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.c
+++ /dev/null
@@ -1,976 +0,0 @@
-// SPDX-License-Identifier: BSD-3-Clause
-/* Copyright 2018 Mellanox Technologies, Ltd */
-
-#include <unistd.h>
-
-#include <rte_flow_driver.h>
-#include <rte_malloc.h>
-
-#include "mlx5_prm.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_utils.h"
-
-
-/**
- * Allocate flow counters via devx interface.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param dcs
- *   Pointer to counters properties structure to be filled by the routine.
- * @param bulk_n_128
- *   Bulk counter numbers in 128 counters units.
- *
- * @return
- *   Pointer to counter object on success, a negative value otherwise and
- *   rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx, uint32_t bulk_n_128)
-{
-	struct mlx5_devx_obj *dcs = rte_zmalloc("dcs", sizeof(*dcs), 0);
-	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
-
-	if (!dcs) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
-	MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk, bulk_n_128);
-	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
-					      sizeof(in), out, sizeof(out));
-	if (!dcs->obj) {
-		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
-		rte_errno = errno;
-		rte_free(dcs);
-		return NULL;
-	}
-	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
-	return dcs;
-}
-
-/**
- * Query flow counters values.
- *
- * @param[in] dcs
- *   devx object that was obtained from mlx5_devx_cmd_fc_alloc.
- * @param[in] clear
- *   Whether hardware should clear the counters after the query or not.
- * @param[in] n_counters
- *   0 in case of 1 counter to read, otherwise the counter number to read.
- *  @param pkts
- *   The number of packets that matched the flow.
- *  @param bytes
- *    The number of bytes that matched the flow.
- *  @param mkey
- *   The mkey key for batch query.
- *  @param addr
- *    The address in the mkey range for batch query.
- *  @param cmd_comp
- *   The completion object for asynchronous batch query.
- *  @param async_id
- *    The ID to be returned in the asynchronous batch query response.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				 int clear, uint32_t n_counters,
-				 uint64_t *pkts, uint64_t *bytes,
-				 uint32_t mkey, void *addr,
-				 struct mlx5dv_devx_cmd_comp *cmd_comp,
-				 uint64_t async_id)
-{
-	int out_len = MLX5_ST_SZ_BYTES(query_flow_counter_out) +
-			MLX5_ST_SZ_BYTES(traffic_counter);
-	uint32_t out[out_len];
-	uint32_t in[MLX5_ST_SZ_DW(query_flow_counter_in)] = {0};
-	void *stats;
-	int rc;
-
-	MLX5_SET(query_flow_counter_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_FLOW_COUNTER);
-	MLX5_SET(query_flow_counter_in, in, op_mod, 0);
-	MLX5_SET(query_flow_counter_in, in, flow_counter_id, dcs->id);
-	MLX5_SET(query_flow_counter_in, in, clear, !!clear);
-
-	if (n_counters) {
-		MLX5_SET(query_flow_counter_in, in, num_of_counters,
-			 n_counters);
-		MLX5_SET(query_flow_counter_in, in, dump_to_memory, 1);
-		MLX5_SET(query_flow_counter_in, in, mkey, mkey);
-		MLX5_SET64(query_flow_counter_in, in, address,
-			   (uint64_t)(uintptr_t)addr);
-	}
-	if (!cmd_comp)
-		rc = mlx5_glue->devx_obj_query(dcs->obj, in, sizeof(in), out,
-					       out_len);
-	else
-		rc = mlx5_glue->devx_obj_query_async(dcs->obj, in, sizeof(in),
-						     out_len, async_id,
-						     cmd_comp);
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query devx counters with rc %d", rc);
-		rte_errno = rc;
-		return -rc;
-	}
-	if (!n_counters) {
-		stats = MLX5_ADDR_OF(query_flow_counter_out,
-				     out, flow_statistics);
-		*pkts = MLX5_GET64(traffic_counter, stats, packets);
-		*bytes = MLX5_GET64(traffic_counter, stats, octets);
-	}
-	return 0;
-}
-
-/**
- * Create a new mkey.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] attr
- *   Attributes of the requested mkey.
- *
- * @return
- *   Pointer to Devx mkey on success, a negative value otherwise and rte_errno
- *   is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-			  struct mlx5_devx_mkey_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
-	void *mkc;
-	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
-	size_t pgsize;
-	uint32_t translation_size;
-
-	if (!mkey) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
-	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
-	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
-		 translation_size);
-	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
-	MLX5_SET(mkc, mkc, lw, 0x1);
-	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
-	MLX5_SET(mkc, mkc, qpn, 0xffffff);
-	MLX5_SET(mkc, mkc, pd, attr->pd);
-	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
-	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
-	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
-	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
-					       sizeof(out));
-	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
-		rte_errno = errno;
-		rte_free(mkey);
-		return NULL;
-	}
-	mkey->id = MLX5_GET(create_mkey_out, out, mkey_index);
-	mkey->id = (mkey->id << 8) | (attr->umem_id & 0xFF);
-	return mkey;
-}
-
-/**
- * Get status of devx command response.
- * Mainly used for asynchronous commands.
- *
- * @param[in] out
- *   The out response buffer.
- *
- * @return
- *   0 on success, non-zero value otherwise.
- */
-int
-mlx5_devx_get_out_command_status(void *out)
-{
-	int status;
-
-	if (!out)
-		return -EINVAL;
-	status = MLX5_GET(query_flow_counter_out, out, status);
-	if (status) {
-		int syndrome = MLX5_GET(query_flow_counter_out, out, syndrome);
-
-		DRV_LOG(ERR, "Bad devX status %x, syndrome = %x", status,
-			syndrome);
-	}
-	return status;
-}
-
-/**
- * Destroy any object allocated by a Devx API.
- *
- * @param[in] obj
- *   Pointer to a general object.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
-{
-	int ret;
-
-	if (!obj)
-		return 0;
-	ret =  mlx5_glue->devx_obj_destroy(obj->obj);
-	rte_free(obj);
-	return ret;
-}
-
-/**
- * Query NIC vport context.
- * Fills minimal inline attribute.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[in] vport
- *   vport index
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-static int
-mlx5_devx_cmd_query_nic_vport_context(struct ibv_context *ctx,
-				      unsigned int vport,
-				      struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_nic_vport_context_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_nic_vport_context_out)] = {0};
-	void *vctx;
-	int status, syndrome, rc;
-
-	/* Query NIC vport context to determine inline mode. */
-	MLX5_SET(query_nic_vport_context_in, in, opcode,
-		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
-	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
-	if (vport)
-		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_nic_vport_context_out, out, status);
-	syndrome = MLX5_GET(query_nic_vport_context_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query NIC vport context, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	vctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
-			    nic_vport_context);
-	attr->vport_inline_mode = MLX5_GET(nic_vport_context, vctx,
-					   min_wqe_inline_mode);
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query HCA attributes.
- * Using those attributes we can check on run time if the device
- * is having the required capabilities.
- *
- * @param[in] ctx
- *   ibv contexts returned from mlx5dv_open_device.
- * @param[out] attr
- *   Attributes device values.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-			     struct mlx5_hca_attr *attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
-	void *hcattr;
-	int status, syndrome, rc;
-
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in), out, sizeof(out));
-	if (rc)
-		goto error;
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->flow_counter_bulk_alloc_bitmap =
-			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
-	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
-					    flow_counters_dump);
-	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
-	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
-	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
-						log_max_hairpin_queues);
-	attr->log_max_hairpin_wq_data_sz = MLX5_GET(cmd_hca_cap, hcattr,
-						    log_max_hairpin_wq_data_sz);
-	attr->log_max_hairpin_num_packets = MLX5_GET
-		(cmd_hca_cap, hcattr, log_min_hairpin_wq_data_sz);
-	attr->vhca_id = MLX5_GET(cmd_hca_cap, hcattr, vhca_id);
-	attr->eth_net_offloads = MLX5_GET(cmd_hca_cap, hcattr,
-					  eth_net_offloads);
-	attr->eth_virt = MLX5_GET(cmd_hca_cap, hcattr, eth_virt);
-	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
-					       flex_parser_protocols);
-	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
-	if (attr->qos.sup) {
-		MLX5_SET(query_hca_cap_in, in, op_mod,
-			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
-			 MLX5_HCA_CAP_OPMOD_GET_CUR);
-		rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in),
-						 out, sizeof(out));
-		if (rc)
-			goto error;
-		if (status) {
-			DRV_LOG(DEBUG, "Failed to query devx QOS capabilities,"
-				" status %x, syndrome = %x",
-				status, syndrome);
-			return -1;
-		}
-		hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-		attr->qos.srtcm_sup =
-				MLX5_GET(qos_cap, hcattr, flow_meter_srtcm);
-		attr->qos.log_max_flow_meter =
-				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
-		attr->qos.flow_meter_reg_c_ids =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
-		attr->qos.flow_meter_reg_share =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
-	}
-	if (!attr->eth_net_offloads)
-		return 0;
-
-	/* Query HCA offloads for Ethernet protocol. */
-	memset(in, 0, sizeof(in));
-	memset(out, 0, sizeof(out));
-	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
-	MLX5_SET(query_hca_cap_in, in, op_mod,
-		 MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS |
-		 MLX5_HCA_CAP_OPMOD_GET_CUR);
-
-	rc = mlx5_glue->devx_general_cmd(ctx,
-					 in, sizeof(in),
-					 out, sizeof(out));
-	if (rc) {
-		attr->eth_net_offloads = 0;
-		goto error;
-	}
-	status = MLX5_GET(query_hca_cap_out, out, status);
-	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
-	if (status) {
-		DRV_LOG(DEBUG, "Failed to query devx HCA capabilities, "
-			"status %x, syndrome = %x",
-			status, syndrome);
-		attr->eth_net_offloads = 0;
-		return -1;
-	}
-	hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
-	attr->wqe_vlan_insert = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_vlan_insert);
-	attr->lro_cap = MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_cap);
-	attr->tunnel_lro_gre = MLX5_GET(per_protocol_networking_offload_caps,
-					hcattr, tunnel_lro_gre);
-	attr->tunnel_lro_vxlan = MLX5_GET(per_protocol_networking_offload_caps,
-					  hcattr, tunnel_lro_vxlan);
-	attr->lro_max_msg_sz_mode = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, lro_max_msg_sz_mode);
-	for (int i = 0 ; i < MLX5_LRO_NUM_SUPP_PERIODS ; i++) {
-		attr->lro_timer_supported_periods[i] =
-			MLX5_GET(per_protocol_networking_offload_caps, hcattr,
-				 lro_timer_supported_periods[i]);
-	}
-	attr->tunnel_stateless_geneve_rx =
-			    MLX5_GET(per_protocol_networking_offload_caps,
-				     hcattr, tunnel_stateless_geneve_rx);
-	attr->geneve_max_opt_len =
-		    MLX5_GET(per_protocol_networking_offload_caps,
-			     hcattr, max_geneve_opt_len);
-	attr->wqe_inline_mode = MLX5_GET(per_protocol_networking_offload_caps,
-					 hcattr, wqe_inline_mode);
-	attr->tunnel_stateless_gtp = MLX5_GET
-					(per_protocol_networking_offload_caps,
-					 hcattr, tunnel_stateless_gtp);
-	if (attr->wqe_inline_mode != MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
-		return 0;
-	if (attr->eth_virt) {
-		rc = mlx5_devx_cmd_query_nic_vport_context(ctx, 0, attr);
-		if (rc) {
-			attr->eth_virt = 0;
-			goto error;
-		}
-	}
-	return 0;
-error:
-	rc = (rc > 0) ? -rc : rc;
-	return rc;
-}
-
-/**
- * Query TIS transport domain from QP verbs object using DevX API.
- *
- * @param[in] qp
- *   Pointer to verbs QP returned by ibv_create_qp .
- * @param[in] tis_num
- *   TIS number of TIS to query.
- * @param[out] tis_td
- *   Pointer to TIS transport domain variable, to be set by the routine.
- *
- * @return
- *   0 on success, a negative value otherwise.
- */
-int
-mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-			      uint32_t *tis_td)
-{
-	uint32_t in[MLX5_ST_SZ_DW(query_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(query_tis_out)] = {0};
-	int rc;
-	void *tis_ctx;
-
-	MLX5_SET(query_tis_in, in, opcode, MLX5_CMD_OP_QUERY_TIS);
-	MLX5_SET(query_tis_in, in, tisn, tis_num);
-	rc = mlx5_glue->devx_qp_query(qp, in, sizeof(in), out, sizeof(out));
-	if (rc) {
-		DRV_LOG(ERR, "Failed to query QP using DevX");
-		return -rc;
-	};
-	tis_ctx = MLX5_ADDR_OF(query_tis_out, out, tis_context);
-	*tis_td = MLX5_GET(tisc, tis_ctx, transport_domain);
-	return 0;
-}
-
-/**
- * Fill WQ data for DevX API command.
- * Utility function for use when creating DevX objects containing a WQ.
- *
- * @param[in] wq_ctx
- *   Pointer to WQ context to fill with data.
- * @param [in] wq_attr
- *   Pointer to WQ attributes structure to fill in WQ context.
- */
-static void
-devx_cmd_fill_wq_data(void *wq_ctx, struct mlx5_devx_wq_attr *wq_attr)
-{
-	MLX5_SET(wq, wq_ctx, wq_type, wq_attr->wq_type);
-	MLX5_SET(wq, wq_ctx, wq_signature, wq_attr->wq_signature);
-	MLX5_SET(wq, wq_ctx, end_padding_mode, wq_attr->end_padding_mode);
-	MLX5_SET(wq, wq_ctx, cd_slave, wq_attr->cd_slave);
-	MLX5_SET(wq, wq_ctx, hds_skip_first_sge, wq_attr->hds_skip_first_sge);
-	MLX5_SET(wq, wq_ctx, log2_hds_buf_size, wq_attr->log2_hds_buf_size);
-	MLX5_SET(wq, wq_ctx, page_offset, wq_attr->page_offset);
-	MLX5_SET(wq, wq_ctx, lwm, wq_attr->lwm);
-	MLX5_SET(wq, wq_ctx, pd, wq_attr->pd);
-	MLX5_SET(wq, wq_ctx, uar_page, wq_attr->uar_page);
-	MLX5_SET64(wq, wq_ctx, dbr_addr, wq_attr->dbr_addr);
-	MLX5_SET(wq, wq_ctx, hw_counter, wq_attr->hw_counter);
-	MLX5_SET(wq, wq_ctx, sw_counter, wq_attr->sw_counter);
-	MLX5_SET(wq, wq_ctx, log_wq_stride, wq_attr->log_wq_stride);
-	MLX5_SET(wq, wq_ctx, log_wq_pg_sz, wq_attr->log_wq_pg_sz);
-	MLX5_SET(wq, wq_ctx, log_wq_sz, wq_attr->log_wq_sz);
-	MLX5_SET(wq, wq_ctx, dbr_umem_valid, wq_attr->dbr_umem_valid);
-	MLX5_SET(wq, wq_ctx, wq_umem_valid, wq_attr->wq_umem_valid);
-	MLX5_SET(wq, wq_ctx, log_hairpin_num_packets,
-		 wq_attr->log_hairpin_num_packets);
-	MLX5_SET(wq, wq_ctx, log_hairpin_data_sz, wq_attr->log_hairpin_data_sz);
-	MLX5_SET(wq, wq_ctx, single_wqe_log_num_of_strides,
-		 wq_attr->single_wqe_log_num_of_strides);
-	MLX5_SET(wq, wq_ctx, two_byte_shift_en, wq_attr->two_byte_shift_en);
-	MLX5_SET(wq, wq_ctx, single_stride_log_num_of_bytes,
-		 wq_attr->single_stride_log_num_of_bytes);
-	MLX5_SET(wq, wq_ctx, dbr_umem_id, wq_attr->dbr_umem_id);
-	MLX5_SET(wq, wq_ctx, wq_umem_id, wq_attr->wq_umem_id);
-	MLX5_SET64(wq, wq_ctx, wq_umem_offset, wq_attr->wq_umem_offset);
-}
-
-/**
- * Create RQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rq_attr
- *   Pointer to create RQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-			struct mlx5_devx_create_rq_attr *rq_attr,
-			int socket)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *rq = NULL;
-
-	rq = rte_calloc_socket(__func__, 1, sizeof(*rq), 0, socket);
-	if (!rq) {
-		DRV_LOG(ERR, "Failed to allocate RQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_rq_in, in, opcode, MLX5_CMD_OP_CREATE_RQ);
-	rq_ctx = MLX5_ADDR_OF(create_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, rlky, rq_attr->rlky);
-	MLX5_SET(rqc, rq_ctx, delay_drop_en, rq_attr->delay_drop_en);
-	MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	MLX5_SET(rqc, rq_ctx, mem_rq_type, rq_attr->mem_rq_type);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
-	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
-	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
-	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
-	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, rmpn, rq_attr->rmpn);
-	wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-	wq_attr = &rq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	rq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						  out, sizeof(out));
-	if (!rq->obj) {
-		DRV_LOG(ERR, "Failed to create RQ using DevX");
-		rte_errno = errno;
-		rte_free(rq);
-		return NULL;
-	}
-	rq->id = MLX5_GET(create_rq_out, out, rqn);
-	return rq;
-}
-
-/**
- * Modify RQ using DevX API.
- *
- * @param[in] rq
- *   Pointer to RQ object structure.
- * @param [in] rq_attr
- *   Pointer to modify RQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			struct mlx5_devx_modify_rq_attr *rq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_rq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_rq_out)] = {0};
-	void *rq_ctx, *wq_ctx;
-	int ret;
-
-	MLX5_SET(modify_rq_in, in, opcode, MLX5_CMD_OP_MODIFY_RQ);
-	MLX5_SET(modify_rq_in, in, rq_state, rq_attr->rq_state);
-	MLX5_SET(modify_rq_in, in, rqn, rq->id);
-	MLX5_SET64(modify_rq_in, in, modify_bitmask, rq_attr->modify_bitmask);
-	rq_ctx = MLX5_ADDR_OF(modify_rq_in, in, ctx);
-	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS)
-		MLX5_SET(rqc, rq_ctx, scatter_fcs, rq_attr->scatter_fcs);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD)
-		MLX5_SET(rqc, rq_ctx, vsd, rq_attr->vsd);
-	if (rq_attr->modify_bitmask &
-			MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID)
-		MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_sq, rq_attr->hairpin_peer_sq);
-	MLX5_SET(rqc, rq_ctx, hairpin_peer_vhca, rq_attr->hairpin_peer_vhca);
-	if (rq_attr->modify_bitmask & MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM) {
-		wq_ctx = MLX5_ADDR_OF(rqc, rq_ctx, wq);
-		MLX5_SET(wq, wq_ctx, lwm, rq_attr->lwm);
-	}
-	ret = mlx5_glue->devx_obj_modify(rq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify RQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIR using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tir_attr
- *   Pointer to TIR attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-			 struct mlx5_devx_tir_attr *tir_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tir_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tir_out)] = {0};
-	void *tir_ctx, *outer, *inner;
-	struct mlx5_devx_obj *tir = NULL;
-	int i;
-
-	tir = rte_calloc(__func__, 1, sizeof(*tir), 0);
-	if (!tir) {
-		DRV_LOG(ERR, "Failed to allocate TIR data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
-	tir_ctx = MLX5_ADDR_OF(create_tir_in, in, ctx);
-	MLX5_SET(tirc, tir_ctx, disp_type, tir_attr->disp_type);
-	MLX5_SET(tirc, tir_ctx, lro_timeout_period_usecs,
-		 tir_attr->lro_timeout_period_usecs);
-	MLX5_SET(tirc, tir_ctx, lro_enable_mask, tir_attr->lro_enable_mask);
-	MLX5_SET(tirc, tir_ctx, lro_max_msg_sz, tir_attr->lro_max_msg_sz);
-	MLX5_SET(tirc, tir_ctx, inline_rqn, tir_attr->inline_rqn);
-	MLX5_SET(tirc, tir_ctx, rx_hash_symmetric, tir_attr->rx_hash_symmetric);
-	MLX5_SET(tirc, tir_ctx, tunneled_offload_en,
-		 tir_attr->tunneled_offload_en);
-	MLX5_SET(tirc, tir_ctx, indirect_table, tir_attr->indirect_table);
-	MLX5_SET(tirc, tir_ctx, rx_hash_fn, tir_attr->rx_hash_fn);
-	MLX5_SET(tirc, tir_ctx, self_lb_block, tir_attr->self_lb_block);
-	MLX5_SET(tirc, tir_ctx, transport_domain, tir_attr->transport_domain);
-	for (i = 0; i < 10; i++) {
-		MLX5_SET(tirc, tir_ctx, rx_hash_toeplitz_key[i],
-			 tir_attr->rx_hash_toeplitz_key[i]);
-	}
-	outer = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_outer);
-	MLX5_SET(rx_hash_field_select, outer, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_outer.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, outer, selected_fields,
-		 tir_attr->rx_hash_field_selector_outer.selected_fields);
-	inner = MLX5_ADDR_OF(tirc, tir_ctx, rx_hash_field_selector_inner);
-	MLX5_SET(rx_hash_field_select, inner, l3_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l3_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, l4_prot_type,
-		 tir_attr->rx_hash_field_selector_inner.l4_prot_type);
-	MLX5_SET(rx_hash_field_select, inner, selected_fields,
-		 tir_attr->rx_hash_field_selector_inner.selected_fields);
-	tir->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-						   out, sizeof(out));
-	if (!tir->obj) {
-		DRV_LOG(ERR, "Failed to create TIR using DevX");
-		rte_errno = errno;
-		rte_free(tir);
-		return NULL;
-	}
-	tir->id = MLX5_GET(create_tir_out, out, tirn);
-	return tir;
-}
-
-/**
- * Create RQT using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] rqt_attr
- *   Pointer to RQT attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-			 struct mlx5_devx_rqt_attr *rqt_attr)
-{
-	uint32_t *in = NULL;
-	uint32_t inlen = MLX5_ST_SZ_BYTES(create_rqt_in) +
-			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
-	uint32_t out[MLX5_ST_SZ_DW(create_rqt_out)] = {0};
-	void *rqt_ctx;
-	struct mlx5_devx_obj *rqt = NULL;
-	int i;
-
-	in = rte_calloc(__func__, 1, inlen, 0);
-	if (!in) {
-		DRV_LOG(ERR, "Failed to allocate RQT IN data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	rqt = rte_calloc(__func__, 1, sizeof(*rqt), 0);
-	if (!rqt) {
-		DRV_LOG(ERR, "Failed to allocate RQT data");
-		rte_errno = ENOMEM;
-		rte_free(in);
-		return NULL;
-	}
-	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
-	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
-	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
-	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
-	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
-		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
-	rqt->obj = mlx5_glue->devx_obj_create(ctx, in, inlen, out, sizeof(out));
-	rte_free(in);
-	if (!rqt->obj) {
-		DRV_LOG(ERR, "Failed to create RQT using DevX");
-		rte_errno = errno;
-		rte_free(rqt);
-		return NULL;
-	}
-	rqt->id = MLX5_GET(create_rqt_out, out, rqtn);
-	return rqt;
-}
-
-/**
- * Create SQ using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- * @param [in] socket
- *   CPU socket ID for allocations.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- **/
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-			struct mlx5_devx_create_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_sq_out)] = {0};
-	void *sq_ctx;
-	void *wq_ctx;
-	struct mlx5_devx_wq_attr *wq_attr;
-	struct mlx5_devx_obj *sq = NULL;
-
-	sq = rte_calloc(__func__, 1, sizeof(*sq), 0);
-	if (!sq) {
-		DRV_LOG(ERR, "Failed to allocate SQ data");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_sq_in, in, opcode, MLX5_CMD_OP_CREATE_SQ);
-	sq_ctx = MLX5_ADDR_OF(create_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, rlky, sq_attr->rlky);
-	MLX5_SET(sqc, sq_ctx, cd_master, sq_attr->cd_master);
-	MLX5_SET(sqc, sq_ctx, fre, sq_attr->fre);
-	MLX5_SET(sqc, sq_ctx, flush_in_error_en, sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, allow_multi_pkt_send_wqe,
-		 sq_attr->flush_in_error_en);
-	MLX5_SET(sqc, sq_ctx, min_wqe_inline_mode,
-		 sq_attr->min_wqe_inline_mode);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
-	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
-	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
-	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
-	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
-	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
-		 sq_attr->packet_pacing_rate_limit_index);
-	MLX5_SET(sqc, sq_ctx, tis_lst_sz, sq_attr->tis_lst_sz);
-	MLX5_SET(sqc, sq_ctx, tis_num_0, sq_attr->tis_num);
-	wq_ctx = MLX5_ADDR_OF(sqc, sq_ctx, wq);
-	wq_attr = &sq_attr->wq_attr;
-	devx_cmd_fill_wq_data(wq_ctx, wq_attr);
-	sq->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!sq->obj) {
-		DRV_LOG(ERR, "Failed to create SQ using DevX");
-		rte_errno = errno;
-		rte_free(sq);
-		return NULL;
-	}
-	sq->id = MLX5_GET(create_sq_out, out, sqn);
-	return sq;
-}
-
-/**
- * Modify SQ using DevX API.
- *
- * @param[in] sq
- *   Pointer to SQ object structure.
- * @param [in] sq_attr
- *   Pointer to SQ attributes structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			struct mlx5_devx_modify_sq_attr *sq_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(modify_sq_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(modify_sq_out)] = {0};
-	void *sq_ctx;
-	int ret;
-
-	MLX5_SET(modify_sq_in, in, opcode, MLX5_CMD_OP_MODIFY_SQ);
-	MLX5_SET(modify_sq_in, in, sq_state, sq_attr->sq_state);
-	MLX5_SET(modify_sq_in, in, sqn, sq->id);
-	sq_ctx = MLX5_ADDR_OF(modify_sq_in, in, ctx);
-	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
-	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
-	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
-					 out, sizeof(out));
-	if (ret) {
-		DRV_LOG(ERR, "Failed to modify SQ using DevX");
-		rte_errno = errno;
-		return -errno;
-	}
-	return ret;
-}
-
-/**
- * Create TIS using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- * @param [in] tis_attr
- *   Pointer to TIS attributes structure.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-			 struct mlx5_devx_tis_attr *tis_attr)
-{
-	uint32_t in[MLX5_ST_SZ_DW(create_tis_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(create_tis_out)] = {0};
-	struct mlx5_devx_obj *tis = NULL;
-	void *tis_ctx;
-
-	tis = rte_calloc(__func__, 1, sizeof(*tis), 0);
-	if (!tis) {
-		DRV_LOG(ERR, "Failed to allocate TIS object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
-	tis_ctx = MLX5_ADDR_OF(create_tis_in, in, ctx);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, strict_lag_tx_port_affinity,
-		 tis_attr->strict_lag_tx_port_affinity);
-	MLX5_SET(tisc, tis_ctx, prio, tis_attr->prio);
-	MLX5_SET(tisc, tis_ctx, transport_domain,
-		 tis_attr->transport_domain);
-	tis->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					      out, sizeof(out));
-	if (!tis->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(tis);
-		return NULL;
-	}
-	tis->id = MLX5_GET(create_tis_out, out, tisn);
-	return tis;
-}
-
-/**
- * Create transport domain using DevX API.
- *
- * @param[in] ctx
- *   ibv_context returned from mlx5dv_open_device.
- *
- * @return
- *   The DevX object created, NULL otherwise and rte_errno is set.
- */
-struct mlx5_devx_obj *
-mlx5_devx_cmd_create_td(struct ibv_context *ctx)
-{
-	uint32_t in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {0};
-	uint32_t out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {0};
-	struct mlx5_devx_obj *td = NULL;
-
-	td = rte_calloc(__func__, 1, sizeof(*td), 0);
-	if (!td) {
-		DRV_LOG(ERR, "Failed to allocate TD object");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-	MLX5_SET(alloc_transport_domain_in, in, opcode,
-		 MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
-	td->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
-					     out, sizeof(out));
-	if (!td->obj) {
-		DRV_LOG(ERR, "Failed to create TIS using DevX");
-		rte_errno = errno;
-		rte_free(td);
-		return NULL;
-	}
-	td->id = MLX5_GET(alloc_transport_domain_out, out,
-			   transport_domain);
-	return td;
-}
-
-/**
- * Dump all flows to file.
- *
- * @param[in] fdb_domain
- *   FDB domain.
- * @param[in] rx_domain
- *   RX domain.
- * @param[in] tx_domain
- *   TX domain.
- * @param[out] file
- *   Pointer to file stream.
- *
- * @return
- *   0 on success, a nagative value otherwise.
- */
-int
-mlx5_devx_cmd_flow_dump(void *fdb_domain __rte_unused,
-			void *rx_domain __rte_unused,
-			void *tx_domain __rte_unused, FILE *file __rte_unused)
-{
-	int ret = 0;
-
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	if (fdb_domain) {
-		ret = mlx5_glue->dr_dump_domain(file, fdb_domain);
-		if (ret)
-			return ret;
-	}
-	assert(rx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, rx_domain);
-	if (ret)
-		return ret;
-	assert(tx_domain);
-	ret = mlx5_glue->dr_dump_domain(file, tx_domain);
-#else
-	ret = ENOTSUP;
-#endif
-	return -ret;
-}
diff --git a/drivers/net/mlx5/mlx5_devx_cmds.h b/drivers/net/mlx5/mlx5_devx_cmds.h
deleted file mode 100644
index 2d58d96..0000000
--- a/drivers/net/mlx5/mlx5_devx_cmds.h
+++ /dev/null
@@ -1,231 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_DEVX_CMDS_H_
-#define RTE_PMD_MLX5_DEVX_CMDS_H_
-
-#include "mlx5_glue.h"
-
-
-/* devX creation object */
-struct mlx5_devx_obj {
-	struct mlx5dv_devx_obj *obj; /* The DV object. */
-	int id; /* The object ID. */
-};
-
-struct mlx5_devx_mkey_attr {
-	uint64_t addr;
-	uint64_t size;
-	uint32_t umem_id;
-	uint32_t pd;
-};
-
-/* HCA qos attributes. */
-struct mlx5_hca_qos_attr {
-	uint32_t sup:1;	/* Whether QOS is supported. */
-	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
-	uint32_t flow_meter_reg_share:1;
-	/* Whether reg_c share is supported. */
-	uint8_t log_max_flow_meter;
-	/* Power of the maximum supported meters. */
-	uint8_t flow_meter_reg_c_ids;
-	/* Bitmap of the reg_Cs available for flow meter to use. */
-
-};
-
-/* HCA supports this number of time periods for LRO. */
-#define MLX5_LRO_NUM_SUPP_PERIODS 4
-
-/* HCA attributes. */
-struct mlx5_hca_attr {
-	uint32_t eswitch_manager:1;
-	uint32_t flow_counters_dump:1;
-	uint8_t flow_counter_bulk_alloc_bitmap;
-	uint32_t eth_net_offloads:1;
-	uint32_t eth_virt:1;
-	uint32_t wqe_vlan_insert:1;
-	uint32_t wqe_inline_mode:2;
-	uint32_t vport_inline_mode:3;
-	uint32_t tunnel_stateless_geneve_rx:1;
-	uint32_t geneve_max_opt_len:1; /* 0x0: 14DW, 0x1: 63DW */
-	uint32_t tunnel_stateless_gtp:1;
-	uint32_t lro_cap:1;
-	uint32_t tunnel_lro_gre:1;
-	uint32_t tunnel_lro_vxlan:1;
-	uint32_t lro_max_msg_sz_mode:2;
-	uint32_t lro_timer_supported_periods[MLX5_LRO_NUM_SUPP_PERIODS];
-	uint32_t flex_parser_protocols;
-	uint32_t hairpin:1;
-	uint32_t log_max_hairpin_queues:5;
-	uint32_t log_max_hairpin_wq_data_sz:5;
-	uint32_t log_max_hairpin_num_packets:5;
-	uint32_t vhca_id:16;
-	struct mlx5_hca_qos_attr qos;
-};
-
-struct mlx5_devx_wq_attr {
-	uint32_t wq_type:4;
-	uint32_t wq_signature:1;
-	uint32_t end_padding_mode:2;
-	uint32_t cd_slave:1;
-	uint32_t hds_skip_first_sge:1;
-	uint32_t log2_hds_buf_size:3;
-	uint32_t page_offset:5;
-	uint32_t lwm:16;
-	uint32_t pd:24;
-	uint32_t uar_page:24;
-	uint64_t dbr_addr;
-	uint32_t hw_counter;
-	uint32_t sw_counter;
-	uint32_t log_wq_stride:4;
-	uint32_t log_wq_pg_sz:5;
-	uint32_t log_wq_sz:5;
-	uint32_t dbr_umem_valid:1;
-	uint32_t wq_umem_valid:1;
-	uint32_t log_hairpin_num_packets:5;
-	uint32_t log_hairpin_data_sz:5;
-	uint32_t single_wqe_log_num_of_strides:4;
-	uint32_t two_byte_shift_en:1;
-	uint32_t single_stride_log_num_of_bytes:3;
-	uint32_t dbr_umem_id;
-	uint32_t wq_umem_id;
-	uint64_t wq_umem_offset;
-};
-
-/* Create RQ attributes structure, used by create RQ operation. */
-struct mlx5_devx_create_rq_attr {
-	uint32_t rlky:1;
-	uint32_t delay_drop_en:1;
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t mem_rq_type:4;
-	uint32_t state:4;
-	uint32_t flush_in_error_en:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t counter_set_id:8;
-	uint32_t rmpn:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* Modify RQ attributes structure, used by modify RQ operation. */
-struct mlx5_devx_modify_rq_attr {
-	uint32_t rqn:24;
-	uint32_t rq_state:4; /* Current RQ state. */
-	uint32_t state:4; /* Required RQ state. */
-	uint32_t scatter_fcs:1;
-	uint32_t vsd:1;
-	uint32_t counter_set_id:8;
-	uint32_t hairpin_peer_sq:24;
-	uint32_t hairpin_peer_vhca:16;
-	uint64_t modify_bitmask;
-	uint32_t lwm:16; /* Contained WQ lwm. */
-};
-
-struct mlx5_rx_hash_field_select {
-	uint32_t l3_prot_type:1;
-	uint32_t l4_prot_type:1;
-	uint32_t selected_fields:30;
-};
-
-/* TIR attributes structure, used by TIR operations. */
-struct mlx5_devx_tir_attr {
-	uint32_t disp_type:4;
-	uint32_t lro_timeout_period_usecs:16;
-	uint32_t lro_enable_mask:4;
-	uint32_t lro_max_msg_sz:8;
-	uint32_t inline_rqn:24;
-	uint32_t rx_hash_symmetric:1;
-	uint32_t tunneled_offload_en:1;
-	uint32_t indirect_table:24;
-	uint32_t rx_hash_fn:4;
-	uint32_t self_lb_block:2;
-	uint32_t transport_domain:24;
-	uint32_t rx_hash_toeplitz_key[10];
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_outer;
-	struct mlx5_rx_hash_field_select rx_hash_field_selector_inner;
-};
-
-/* RQT attributes structure, used by RQT operations. */
-struct mlx5_devx_rqt_attr {
-	uint32_t rqt_max_size:16;
-	uint32_t rqt_actual_size:16;
-	uint32_t rq_list[];
-};
-
-/* TIS attributes structure. */
-struct mlx5_devx_tis_attr {
-	uint32_t strict_lag_tx_port_affinity:1;
-	uint32_t tls_en:1;
-	uint32_t lag_tx_port_affinity:4;
-	uint32_t prio:4;
-	uint32_t transport_domain:24;
-};
-
-/* SQ attributes structure, used by SQ create operation. */
-struct mlx5_devx_create_sq_attr {
-	uint32_t rlky:1;
-	uint32_t cd_master:1;
-	uint32_t fre:1;
-	uint32_t flush_in_error_en:1;
-	uint32_t allow_multi_pkt_send_wqe:1;
-	uint32_t min_wqe_inline_mode:3;
-	uint32_t state:4;
-	uint32_t reg_umr:1;
-	uint32_t allow_swp:1;
-	uint32_t hairpin:1;
-	uint32_t user_index:24;
-	uint32_t cqn:24;
-	uint32_t packet_pacing_rate_limit_index:16;
-	uint32_t tis_lst_sz:16;
-	uint32_t tis_num:24;
-	struct mlx5_devx_wq_attr wq_attr;
-};
-
-/* SQ attributes structure, used by SQ modify operation. */
-struct mlx5_devx_modify_sq_attr {
-	uint32_t sq_state:4;
-	uint32_t state:4;
-	uint32_t hairpin_peer_rq:24;
-	uint32_t hairpin_peer_vhca:16;
-};
-
-/* mlx5_devx_cmds.c */
-
-struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
-						       uint32_t bulk_sz);
-int mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj);
-int mlx5_devx_cmd_flow_counter_query(struct mlx5_devx_obj *dcs,
-				     int clear, uint32_t n_counters,
-				     uint64_t *pkts, uint64_t *bytes,
-				     uint32_t mkey, void *addr,
-				     struct mlx5dv_devx_cmd_comp *cmd_comp,
-				     uint64_t async_id);
-int mlx5_devx_cmd_query_hca_attr(struct ibv_context *ctx,
-				 struct mlx5_hca_attr *attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
-					      struct mlx5_devx_mkey_attr *attr);
-int mlx5_devx_get_out_command_status(void *out);
-int mlx5_devx_cmd_qp_query_tis_td(struct ibv_qp *qp, uint32_t tis_num,
-				  uint32_t *tis_td);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rq(struct ibv_context *ctx,
-				       struct mlx5_devx_create_rq_attr *rq_attr,
-				       int socket);
-int mlx5_devx_cmd_modify_rq(struct mlx5_devx_obj *rq,
-			    struct mlx5_devx_modify_rq_attr *rq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tir(struct ibv_context *ctx,
-					   struct mlx5_devx_tir_attr *tir_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_rqt(struct ibv_context *ctx,
-					   struct mlx5_devx_rqt_attr *rqt_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_sq(struct ibv_context *ctx,
-				      struct mlx5_devx_create_sq_attr *sq_attr);
-int mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
-			    struct mlx5_devx_modify_sq_attr *sq_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
-					   struct mlx5_devx_tis_attr *tis_attr);
-struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
-int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
-			    FILE *file);
-#endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ce0109c..eddf888 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -36,9 +36,10 @@
 #include <rte_rwlock.h>
 #include <rte_cycles.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 983b1c3..47ba521 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -27,12 +27,13 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 /* Dev ops structure defined in mlx5.c */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 39be5ba..4255472 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -25,8 +25,9 @@
 #include <rte_alarm.h>
 #include <rte_mtr.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5.h"
-#include "mlx5_prm.h"
 
 /* Private rte flow items. */
 enum mlx5_rte_flow_item_type {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 653d649..1b31602 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -29,12 +29,13 @@
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index c4d28b2..32d51c0 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -9,6 +9,8 @@
 #include <rte_mtr.h>
 #include <rte_mtr_driver.h>
 
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_flow.h"
 
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 72fb1e4..9231451 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -26,11 +26,12 @@
 #include <rte_malloc.h>
 #include <rte_ip.h>
 
-#include "mlx5.h"
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_flow.h"
-#include "mlx5_glue.h"
-#include "mlx5_prm.h"
 #include "mlx5_rxtx.h"
 
 #define VERBS_SPEC_INNER(item_flags) \
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
deleted file mode 100644
index 4906eeb..0000000
--- a/drivers/net/mlx5/mlx5_glue.c
+++ /dev/null
@@ -1,1150 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <stdalign.h>
-#include <stddef.h>
-#include <stdint.h>
-#include <stdlib.h>
-
-/*
- * Not needed by this file; included to work around the lack of off_t
- * definition for mlx5dv.h with unpatched rdma-core versions.
- */
-#include <sys/types.h>
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_config.h>
-
-#include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-
-static int
-mlx5_glue_fork_init(void)
-{
-	return ibv_fork_init();
-}
-
-static struct ibv_pd *
-mlx5_glue_alloc_pd(struct ibv_context *context)
-{
-	return ibv_alloc_pd(context);
-}
-
-static int
-mlx5_glue_dealloc_pd(struct ibv_pd *pd)
-{
-	return ibv_dealloc_pd(pd);
-}
-
-static struct ibv_device **
-mlx5_glue_get_device_list(int *num_devices)
-{
-	return ibv_get_device_list(num_devices);
-}
-
-static void
-mlx5_glue_free_device_list(struct ibv_device **list)
-{
-	ibv_free_device_list(list);
-}
-
-static struct ibv_context *
-mlx5_glue_open_device(struct ibv_device *device)
-{
-	return ibv_open_device(device);
-}
-
-static int
-mlx5_glue_close_device(struct ibv_context *context)
-{
-	return ibv_close_device(context);
-}
-
-static int
-mlx5_glue_query_device(struct ibv_context *context,
-		       struct ibv_device_attr *device_attr)
-{
-	return ibv_query_device(context, device_attr);
-}
-
-static int
-mlx5_glue_query_device_ex(struct ibv_context *context,
-			  const struct ibv_query_device_ex_input *input,
-			  struct ibv_device_attr_ex *attr)
-{
-	return ibv_query_device_ex(context, input, attr);
-}
-
-static int
-mlx5_glue_query_rt_values_ex(struct ibv_context *context,
-			  struct ibv_values_ex *values)
-{
-	return ibv_query_rt_values_ex(context, values);
-}
-
-static int
-mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
-		     struct ibv_port_attr *port_attr)
-{
-	return ibv_query_port(context, port_num, port_attr);
-}
-
-static struct ibv_comp_channel *
-mlx5_glue_create_comp_channel(struct ibv_context *context)
-{
-	return ibv_create_comp_channel(context);
-}
-
-static int
-mlx5_glue_destroy_comp_channel(struct ibv_comp_channel *channel)
-{
-	return ibv_destroy_comp_channel(channel);
-}
-
-static struct ibv_cq *
-mlx5_glue_create_cq(struct ibv_context *context, int cqe, void *cq_context,
-		    struct ibv_comp_channel *channel, int comp_vector)
-{
-	return ibv_create_cq(context, cqe, cq_context, channel, comp_vector);
-}
-
-static int
-mlx5_glue_destroy_cq(struct ibv_cq *cq)
-{
-	return ibv_destroy_cq(cq);
-}
-
-static int
-mlx5_glue_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq,
-		       void **cq_context)
-{
-	return ibv_get_cq_event(channel, cq, cq_context);
-}
-
-static void
-mlx5_glue_ack_cq_events(struct ibv_cq *cq, unsigned int nevents)
-{
-	ibv_ack_cq_events(cq, nevents);
-}
-
-static struct ibv_rwq_ind_table *
-mlx5_glue_create_rwq_ind_table(struct ibv_context *context,
-			       struct ibv_rwq_ind_table_init_attr *init_attr)
-{
-	return ibv_create_rwq_ind_table(context, init_attr);
-}
-
-static int
-mlx5_glue_destroy_rwq_ind_table(struct ibv_rwq_ind_table *rwq_ind_table)
-{
-	return ibv_destroy_rwq_ind_table(rwq_ind_table);
-}
-
-static struct ibv_wq *
-mlx5_glue_create_wq(struct ibv_context *context,
-		    struct ibv_wq_init_attr *wq_init_attr)
-{
-	return ibv_create_wq(context, wq_init_attr);
-}
-
-static int
-mlx5_glue_destroy_wq(struct ibv_wq *wq)
-{
-	return ibv_destroy_wq(wq);
-}
-static int
-mlx5_glue_modify_wq(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr)
-{
-	return ibv_modify_wq(wq, wq_attr);
-}
-
-static struct ibv_flow *
-mlx5_glue_create_flow(struct ibv_qp *qp, struct ibv_flow_attr *flow)
-{
-	return ibv_create_flow(qp, flow);
-}
-
-static int
-mlx5_glue_destroy_flow(struct ibv_flow *flow_id)
-{
-	return ibv_destroy_flow(flow_id);
-}
-
-static int
-mlx5_glue_destroy_flow_action(void *action)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_destroy(action);
-#else
-	struct mlx5dv_flow_action_attr *attr = action;
-	int res = 0;
-	switch (attr->type) {
-	case MLX5DV_FLOW_ACTION_TAG:
-		break;
-	default:
-		res = ibv_destroy_flow_action(attr->action);
-		break;
-	}
-	free(action);
-	return res;
-#endif
-#else
-	(void)action;
-	return ENOTSUP;
-#endif
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr)
-{
-	return ibv_create_qp(pd, qp_init_attr);
-}
-
-static struct ibv_qp *
-mlx5_glue_create_qp_ex(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex)
-{
-	return ibv_create_qp_ex(context, qp_init_attr_ex);
-}
-
-static int
-mlx5_glue_destroy_qp(struct ibv_qp *qp)
-{
-	return ibv_destroy_qp(qp);
-}
-
-static int
-mlx5_glue_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, int attr_mask)
-{
-	return ibv_modify_qp(qp, attr, attr_mask);
-}
-
-static struct ibv_mr *
-mlx5_glue_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access)
-{
-	return ibv_reg_mr(pd, addr, length, access);
-}
-
-static int
-mlx5_glue_dereg_mr(struct ibv_mr *mr)
-{
-	return ibv_dereg_mr(mr);
-}
-
-static struct ibv_counter_set *
-mlx5_glue_create_counter_set(struct ibv_context *context,
-			     struct ibv_counter_set_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)init_attr;
-	return NULL;
-#else
-	return ibv_create_counter_set(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counter_set(struct ibv_counter_set *cs)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)cs;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counter_set(cs);
-#endif
-}
-
-static int
-mlx5_glue_describe_counter_set(struct ibv_context *context,
-			       uint16_t counter_set_id,
-			       struct ibv_counter_set_description *cs_desc)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)context;
-	(void)counter_set_id;
-	(void)cs_desc;
-	return ENOTSUP;
-#else
-	return ibv_describe_counter_set(context, counter_set_id, cs_desc);
-#endif
-}
-
-static int
-mlx5_glue_query_counter_set(struct ibv_query_counter_set_attr *query_attr,
-			    struct ibv_counter_set_data *cs_data)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-	(void)query_attr;
-	(void)cs_data;
-	return ENOTSUP;
-#else
-	return ibv_query_counter_set(query_attr, cs_data);
-#endif
-}
-
-static struct ibv_counters *
-mlx5_glue_create_counters(struct ibv_context *context,
-			  struct ibv_counters_init_attr *init_attr)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)context;
-	(void)init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return ibv_create_counters(context, init_attr);
-#endif
-}
-
-static int
-mlx5_glue_destroy_counters(struct ibv_counters *counters)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	return ENOTSUP;
-#else
-	return ibv_destroy_counters(counters);
-#endif
-}
-
-static int
-mlx5_glue_attach_counters(struct ibv_counters *counters,
-			  struct ibv_counter_attach_attr *attr,
-			  struct ibv_flow *flow)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)attr;
-	(void)flow;
-	return ENOTSUP;
-#else
-	return ibv_attach_counters_point_flow(counters, attr, flow);
-#endif
-}
-
-static int
-mlx5_glue_query_counters(struct ibv_counters *counters,
-			 uint64_t *counters_value,
-			 uint32_t ncounters,
-			 uint32_t flags)
-{
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-	(void)counters;
-	(void)counters_value;
-	(void)ncounters;
-	(void)flags;
-	return ENOTSUP;
-#else
-	return ibv_read_counters(counters, counters_value, ncounters, flags);
-#endif
-}
-
-static void
-mlx5_glue_ack_async_event(struct ibv_async_event *event)
-{
-	ibv_ack_async_event(event);
-}
-
-static int
-mlx5_glue_get_async_event(struct ibv_context *context,
-			  struct ibv_async_event *event)
-{
-	return ibv_get_async_event(context, event);
-}
-
-static const char *
-mlx5_glue_port_state_str(enum ibv_port_state port_state)
-{
-	return ibv_port_state_str(port_state);
-}
-
-static struct ibv_cq *
-mlx5_glue_cq_ex_to_cq(struct ibv_cq_ex *cq)
-{
-	return ibv_cq_ex_to_cq(cq);
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_table(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_dest_port(void *domain, uint32_t port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_dr_action_create_dest_ib_port(domain, port);
-#else
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_dest_vport(domain, port);
-#else
-	(void)domain;
-	(void)port;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_drop(void)
-{
-#ifdef HAVE_MLX5DV_DR_ESWITCH
-	return mlx5dv_dr_action_create_drop();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_push_vlan(struct mlx5dv_dr_domain *domain,
-					  rte_be32_t vlan_tag)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_push_vlan(domain, vlan_tag);
-#else
-	(void)domain;
-	(void)vlan_tag;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_action_pop_vlan(void)
-{
-#ifdef HAVE_MLX5DV_DR_VLAN
-	return mlx5dv_dr_action_create_pop_vlan();
-#else
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_flow_tbl(void *domain, uint32_t level)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_create(domain, level);
-#else
-	(void)domain;
-	(void)level;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_flow_tbl(void *tbl)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_table_destroy(tbl);
-#else
-	(void)tbl;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static void *
-mlx5_glue_dr_create_domain(struct ibv_context *ctx,
-			   enum  mlx5dv_dr_domain_type domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_create(ctx, domain);
-#else
-	(void)ctx;
-	(void)domain;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dr_destroy_domain(void *domain)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_domain_destroy(domain);
-#else
-	(void)domain;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_cq_ex *
-mlx5_glue_dv_create_cq(struct ibv_context *context,
-		       struct ibv_cq_init_attr_ex *cq_attr,
-		       struct mlx5dv_cq_init_attr *mlx5_cq_attr)
-{
-	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
-}
-
-static struct ibv_wq *
-mlx5_glue_dv_create_wq(struct ibv_context *context,
-		       struct ibv_wq_init_attr *wq_attr,
-		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
-{
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-	(void)context;
-	(void)wq_attr;
-	(void)mlx5_wq_attr;
-	errno = ENOTSUP;
-	return NULL;
-#else
-	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
-#endif
-}
-
-static int
-mlx5_glue_dv_query_device(struct ibv_context *ctx,
-			  struct mlx5dv_context *attrs_out)
-{
-	return mlx5dv_query_device(ctx, attrs_out);
-}
-
-static int
-mlx5_glue_dv_set_context_attr(struct ibv_context *ibv_ctx,
-			      enum mlx5dv_set_ctx_attr_type type, void *attr)
-{
-	return mlx5dv_set_context_attr(ibv_ctx, type, attr);
-}
-
-static int
-mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
-{
-	return mlx5dv_init_obj(obj, obj_type);
-}
-
-static struct ibv_qp *
-mlx5_glue_dv_create_qp(struct ibv_context *context,
-		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
-{
-#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
-#else
-	(void)context;
-	(void)qp_init_attr_ex;
-	(void)dv_qp_init_attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_matcher(struct ibv_context *context,
-				 struct mlx5dv_flow_matcher_attr *matcher_attr,
-				 void *tbl)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)context;
-	return mlx5dv_dr_matcher_create(tbl, matcher_attr->priority,
-					matcher_attr->match_criteria_enable,
-					matcher_attr->match_mask);
-#else
-	(void)tbl;
-	return mlx5dv_create_flow_matcher(context, matcher_attr);
-#endif
-#else
-	(void)context;
-	(void)matcher_attr;
-	(void)tbl;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow(void *matcher,
-			 void *match_value,
-			 size_t num_actions,
-			 void *actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_create(matcher, match_value, num_actions,
-				     (struct mlx5dv_dr_action **)actions);
-#else
-	struct mlx5dv_flow_action_attr actions_attr[8];
-
-	if (num_actions > 8)
-		return NULL;
-	for (size_t i = 0; i < num_actions; i++)
-		actions_attr[i] =
-			*((struct mlx5dv_flow_action_attr *)(actions[i]));
-	return mlx5dv_create_flow(matcher, match_value,
-				  num_actions, actions_attr);
-#endif
-#else
-	(void)matcher;
-	(void)match_value;
-	(void)num_actions;
-	(void)actions;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_counter(void *counter_obj, uint32_t offset)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_flow_counter(counter_obj, offset);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)offset;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_COUNTERS_DEVX;
-	action->obj = counter_obj;
-	return action;
-#endif
-#else
-	(void)counter_obj;
-	(void)offset;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_ibv_qp(void *qp)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_dest_ibv_qp(qp);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_DEST_IBV_QP;
-	action->obj = qp;
-	return action;
-#endif
-#else
-	(void)qp;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_dest_devx_tir(void *tir)
-{
-#ifdef HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR
-	return mlx5dv_dr_action_create_dest_devx_tir(tir);
-#else
-	(void)tir;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_modify_header
-					(struct ibv_context *ctx,
-					 enum mlx5dv_flow_table_type ft_type,
-					 void *domain, uint64_t flags,
-					 size_t actions_sz,
-					 uint64_t actions[])
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_modify_header(domain, flags, actions_sz,
-						     (__be64 *)actions);
-#else
-	struct mlx5dv_flow_action_attr *action;
-
-	(void)domain;
-	(void)flags;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_modify_header
-		(ctx, actions_sz, actions, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)actions_sz;
-	(void)actions;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_packet_reformat
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	(void)ctx;
-	(void)ft_type;
-	return mlx5dv_dr_action_create_packet_reformat(domain, flags,
-						       reformat_type, data_sz,
-						       data);
-#else
-	(void)domain;
-	(void)flags;
-	struct mlx5dv_flow_action_attr *action;
-
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_IBV_FLOW_ACTION;
-	action->action = mlx5dv_create_flow_action_packet_reformat
-		(ctx, data_sz, data, reformat_type, ft_type);
-	return action;
-#endif
-#else
-	(void)ctx;
-	(void)reformat_type;
-	(void)ft_type;
-	(void)domain;
-	(void)flags;
-	(void)data_sz;
-	(void)data;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_tag(uint32_t tag)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_action_create_tag(tag);
-#else
-	struct mlx5dv_flow_action_attr *action;
-	action = malloc(sizeof(*action));
-	if (!action)
-		return NULL;
-	action->type = MLX5DV_FLOW_ACTION_TAG;
-	action->tag_value = tag;
-	return action;
-#endif
-#endif
-	(void)tag;
-	errno = ENOTSUP;
-	return NULL;
-}
-
-static void *
-mlx5_glue_dv_create_flow_action_meter(struct mlx5dv_dr_flow_meter_attr *attr)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_create_flow_meter(attr);
-#else
-	(void)attr;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_dv_modify_flow_action_meter(void *action,
-				      struct mlx5dv_dr_flow_meter_attr *attr,
-				      uint64_t modify_bits)
-{
-#if defined(HAVE_MLX5DV_DR) && defined(HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER)
-	return mlx5dv_dr_action_modify_flow_meter(action, attr, modify_bits);
-#else
-	(void)action;
-	(void)attr;
-	(void)modify_bits;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow(void *flow_id)
-{
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_rule_destroy(flow_id);
-#else
-	return ibv_destroy_flow(flow_id);
-#endif
-}
-
-static int
-mlx5_glue_dv_destroy_flow_matcher(void *matcher)
-{
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
-#ifdef HAVE_MLX5DV_DR
-	return mlx5dv_dr_matcher_destroy(matcher);
-#else
-	return mlx5dv_destroy_flow_matcher(matcher);
-#endif
-#else
-	(void)matcher;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static struct ibv_context *
-mlx5_glue_dv_open_device(struct ibv_device *device)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_open_device(device,
-				  &(struct mlx5dv_context_attr){
-					.flags = MLX5DV_CONTEXT_FLAGS_DEVX,
-				  });
-#else
-	(void)device;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static struct mlx5dv_devx_obj *
-mlx5_glue_devx_obj_create(struct ibv_context *ctx,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_create(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_destroy(struct mlx5dv_devx_obj *obj)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_destroy(obj);
-#else
-	(void)obj;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query(struct mlx5dv_devx_obj *obj,
-			 const void *in, size_t inlen,
-			 void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_query(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_modify(struct mlx5dv_devx_obj *obj,
-			  const void *in, size_t inlen,
-			  void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_obj_modify(obj, in, inlen, out, outlen);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_general_cmd(struct ibv_context *ctx,
-			   const void *in, size_t inlen,
-			   void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_general_cmd(ctx, in, inlen, out, outlen);
-#else
-	(void)ctx;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_cmd_comp *
-mlx5_glue_devx_create_cmd_comp(struct ibv_context *ctx)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_create_cmd_comp(ctx);
-#else
-	(void)ctx;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static void
-mlx5_glue_devx_destroy_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	mlx5dv_devx_destroy_cmd_comp(cmd_comp);
-#else
-	(void)cmd_comp;
-	errno = -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_obj_query_async(struct mlx5dv_devx_obj *obj, const void *in,
-			       size_t inlen, size_t outlen, uint64_t wr_id,
-			       struct mlx5dv_devx_cmd_comp *cmd_comp)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_obj_query_async(obj, in, inlen, outlen, wr_id,
-					   cmd_comp);
-#else
-	(void)obj;
-	(void)in;
-	(void)inlen;
-	(void)outlen;
-	(void)wr_id;
-	(void)cmd_comp;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_get_async_cmd_comp(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				  struct mlx5dv_devx_async_cmd_hdr *cmd_resp,
-				  size_t cmd_resp_len)
-{
-#ifdef HAVE_IBV_DEVX_ASYNC
-	return mlx5dv_devx_get_async_cmd_comp(cmd_comp, cmd_resp,
-					      cmd_resp_len);
-#else
-	(void)cmd_comp;
-	(void)cmd_resp;
-	(void)cmd_resp_len;
-	return -ENOTSUP;
-#endif
-}
-
-static struct mlx5dv_devx_umem *
-mlx5_glue_devx_umem_reg(struct ibv_context *context, void *addr, size_t size,
-			uint32_t access)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_reg(context, addr, size, access);
-#else
-	(void)context;
-	(void)addr;
-	(void)size;
-	(void)access;
-	errno = -ENOTSUP;
-	return NULL;
-#endif
-}
-
-static int
-mlx5_glue_devx_umem_dereg(struct mlx5dv_devx_umem *dv_devx_umem)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_umem_dereg(dv_devx_umem);
-#else
-	(void)dv_devx_umem;
-	return -ENOTSUP;
-#endif
-}
-
-static int
-mlx5_glue_devx_qp_query(struct ibv_qp *qp,
-			const void *in, size_t inlen,
-			void *out, size_t outlen)
-{
-#ifdef HAVE_IBV_DEVX_OBJ
-	return mlx5dv_devx_qp_query(qp, in, inlen, out, outlen);
-#else
-	(void)qp;
-	(void)in;
-	(void)inlen;
-	(void)out;
-	(void)outlen;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_devx_port_query(struct ibv_context *ctx,
-			  uint32_t port_num,
-			  struct mlx5dv_devx_port *mlx5_devx_port)
-{
-#ifdef HAVE_MLX5DV_DR_DEVX_PORT
-	return mlx5dv_query_devx_port(ctx, port_num, mlx5_devx_port);
-#else
-	(void)ctx;
-	(void)port_num;
-	(void)mlx5_devx_port;
-	errno = ENOTSUP;
-	return errno;
-#endif
-}
-
-static int
-mlx5_glue_dr_dump_domain(FILE *file, void *domain)
-{
-#ifdef HAVE_MLX5_DR_FLOW_DUMP
-	return mlx5dv_dump_dr_domain(file, domain);
-#else
-	RTE_SET_USED(file);
-	RTE_SET_USED(domain);
-	return -ENOTSUP;
-#endif
-}
-
-alignas(RTE_CACHE_LINE_SIZE)
-const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
-	.version = MLX5_GLUE_VERSION,
-	.fork_init = mlx5_glue_fork_init,
-	.alloc_pd = mlx5_glue_alloc_pd,
-	.dealloc_pd = mlx5_glue_dealloc_pd,
-	.get_device_list = mlx5_glue_get_device_list,
-	.free_device_list = mlx5_glue_free_device_list,
-	.open_device = mlx5_glue_open_device,
-	.close_device = mlx5_glue_close_device,
-	.query_device = mlx5_glue_query_device,
-	.query_device_ex = mlx5_glue_query_device_ex,
-	.query_rt_values_ex = mlx5_glue_query_rt_values_ex,
-	.query_port = mlx5_glue_query_port,
-	.create_comp_channel = mlx5_glue_create_comp_channel,
-	.destroy_comp_channel = mlx5_glue_destroy_comp_channel,
-	.create_cq = mlx5_glue_create_cq,
-	.destroy_cq = mlx5_glue_destroy_cq,
-	.get_cq_event = mlx5_glue_get_cq_event,
-	.ack_cq_events = mlx5_glue_ack_cq_events,
-	.create_rwq_ind_table = mlx5_glue_create_rwq_ind_table,
-	.destroy_rwq_ind_table = mlx5_glue_destroy_rwq_ind_table,
-	.create_wq = mlx5_glue_create_wq,
-	.destroy_wq = mlx5_glue_destroy_wq,
-	.modify_wq = mlx5_glue_modify_wq,
-	.create_flow = mlx5_glue_create_flow,
-	.destroy_flow = mlx5_glue_destroy_flow,
-	.destroy_flow_action = mlx5_glue_destroy_flow_action,
-	.create_qp = mlx5_glue_create_qp,
-	.create_qp_ex = mlx5_glue_create_qp_ex,
-	.destroy_qp = mlx5_glue_destroy_qp,
-	.modify_qp = mlx5_glue_modify_qp,
-	.reg_mr = mlx5_glue_reg_mr,
-	.dereg_mr = mlx5_glue_dereg_mr,
-	.create_counter_set = mlx5_glue_create_counter_set,
-	.destroy_counter_set = mlx5_glue_destroy_counter_set,
-	.describe_counter_set = mlx5_glue_describe_counter_set,
-	.query_counter_set = mlx5_glue_query_counter_set,
-	.create_counters = mlx5_glue_create_counters,
-	.destroy_counters = mlx5_glue_destroy_counters,
-	.attach_counters = mlx5_glue_attach_counters,
-	.query_counters = mlx5_glue_query_counters,
-	.ack_async_event = mlx5_glue_ack_async_event,
-	.get_async_event = mlx5_glue_get_async_event,
-	.port_state_str = mlx5_glue_port_state_str,
-	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
-	.dr_create_flow_action_dest_flow_tbl =
-		mlx5_glue_dr_create_flow_action_dest_flow_tbl,
-	.dr_create_flow_action_dest_port =
-		mlx5_glue_dr_create_flow_action_dest_port,
-	.dr_create_flow_action_drop =
-		mlx5_glue_dr_create_flow_action_drop,
-	.dr_create_flow_action_push_vlan =
-		mlx5_glue_dr_create_flow_action_push_vlan,
-	.dr_create_flow_action_pop_vlan =
-		mlx5_glue_dr_create_flow_action_pop_vlan,
-	.dr_create_flow_tbl = mlx5_glue_dr_create_flow_tbl,
-	.dr_destroy_flow_tbl = mlx5_glue_dr_destroy_flow_tbl,
-	.dr_create_domain = mlx5_glue_dr_create_domain,
-	.dr_destroy_domain = mlx5_glue_dr_destroy_domain,
-	.dv_create_cq = mlx5_glue_dv_create_cq,
-	.dv_create_wq = mlx5_glue_dv_create_wq,
-	.dv_query_device = mlx5_glue_dv_query_device,
-	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
-	.dv_init_obj = mlx5_glue_dv_init_obj,
-	.dv_create_qp = mlx5_glue_dv_create_qp,
-	.dv_create_flow_matcher = mlx5_glue_dv_create_flow_matcher,
-	.dv_create_flow = mlx5_glue_dv_create_flow,
-	.dv_create_flow_action_counter =
-		mlx5_glue_dv_create_flow_action_counter,
-	.dv_create_flow_action_dest_ibv_qp =
-		mlx5_glue_dv_create_flow_action_dest_ibv_qp,
-	.dv_create_flow_action_dest_devx_tir =
-		mlx5_glue_dv_create_flow_action_dest_devx_tir,
-	.dv_create_flow_action_modify_header =
-		mlx5_glue_dv_create_flow_action_modify_header,
-	.dv_create_flow_action_packet_reformat =
-		mlx5_glue_dv_create_flow_action_packet_reformat,
-	.dv_create_flow_action_tag =  mlx5_glue_dv_create_flow_action_tag,
-	.dv_create_flow_action_meter = mlx5_glue_dv_create_flow_action_meter,
-	.dv_modify_flow_action_meter = mlx5_glue_dv_modify_flow_action_meter,
-	.dv_destroy_flow = mlx5_glue_dv_destroy_flow,
-	.dv_destroy_flow_matcher = mlx5_glue_dv_destroy_flow_matcher,
-	.dv_open_device = mlx5_glue_dv_open_device,
-	.devx_obj_create = mlx5_glue_devx_obj_create,
-	.devx_obj_destroy = mlx5_glue_devx_obj_destroy,
-	.devx_obj_query = mlx5_glue_devx_obj_query,
-	.devx_obj_modify = mlx5_glue_devx_obj_modify,
-	.devx_general_cmd = mlx5_glue_devx_general_cmd,
-	.devx_create_cmd_comp = mlx5_glue_devx_create_cmd_comp,
-	.devx_destroy_cmd_comp = mlx5_glue_devx_destroy_cmd_comp,
-	.devx_obj_query_async = mlx5_glue_devx_obj_query_async,
-	.devx_get_async_cmd_comp = mlx5_glue_devx_get_async_cmd_comp,
-	.devx_umem_reg = mlx5_glue_devx_umem_reg,
-	.devx_umem_dereg = mlx5_glue_devx_umem_dereg,
-	.devx_qp_query = mlx5_glue_devx_qp_query,
-	.devx_port_query = mlx5_glue_devx_port_query,
-	.dr_dump_domain = mlx5_glue_dr_dump_domain,
-};
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
deleted file mode 100644
index 6771a18..0000000
--- a/drivers/net/mlx5/mlx5_glue.h
+++ /dev/null
@@ -1,264 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#ifndef MLX5_GLUE_H_
-#define MLX5_GLUE_H_
-
-#include <stddef.h>
-#include <stdint.h>
-
-#include "rte_byteorder.h"
-
-/* Verbs headers do not support -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#include <infiniband/verbs.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#ifndef MLX5_GLUE_VERSION
-#define MLX5_GLUE_VERSION ""
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V42
-struct ibv_counter_set;
-struct ibv_counter_set_data;
-struct ibv_counter_set_description;
-struct ibv_counter_set_init_attr;
-struct ibv_query_counter_set_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_V45
-struct ibv_counters;
-struct ibv_counters_init_attr;
-struct ibv_counter_attach_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
-struct mlx5dv_qp_init_attr;
-#endif
-
-#ifndef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
-struct mlx5dv_wq_init_attr;
-#endif
-
-#ifndef HAVE_IBV_FLOW_DV_SUPPORT
-struct mlx5dv_flow_matcher;
-struct mlx5dv_flow_matcher_attr;
-struct mlx5dv_flow_action_attr;
-struct mlx5dv_flow_match_parameters;
-struct mlx5dv_dr_flow_meter_attr;
-struct ibv_flow_action;
-enum mlx5dv_flow_action_packet_reformat_type { packet_reformat_type = 0, };
-enum mlx5dv_flow_table_type { flow_table_type = 0, };
-#endif
-
-#ifndef HAVE_IBV_FLOW_DEVX_COUNTERS
-#define MLX5DV_FLOW_ACTION_COUNTERS_DEVX 0
-#endif
-
-#ifndef HAVE_IBV_DEVX_OBJ
-struct mlx5dv_devx_obj;
-struct mlx5dv_devx_umem { uint32_t umem_id; };
-#endif
-
-#ifndef HAVE_IBV_DEVX_ASYNC
-struct mlx5dv_devx_cmd_comp;
-struct mlx5dv_devx_async_cmd_hdr;
-#endif
-
-#ifndef HAVE_MLX5DV_DR
-enum  mlx5dv_dr_domain_type { unused, };
-struct mlx5dv_dr_domain;
-#endif
-
-#ifndef HAVE_MLX5DV_DR_DEVX_PORT
-struct mlx5dv_devx_port;
-#endif
-
-#ifndef HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER
-struct mlx5dv_dr_flow_meter_attr;
-#endif
-
-/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
-struct mlx5_glue {
-	const char *version;
-	int (*fork_init)(void);
-	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
-	int (*dealloc_pd)(struct ibv_pd *pd);
-	struct ibv_device **(*get_device_list)(int *num_devices);
-	void (*free_device_list)(struct ibv_device **list);
-	struct ibv_context *(*open_device)(struct ibv_device *device);
-	int (*close_device)(struct ibv_context *context);
-	int (*query_device)(struct ibv_context *context,
-			    struct ibv_device_attr *device_attr);
-	int (*query_device_ex)(struct ibv_context *context,
-			       const struct ibv_query_device_ex_input *input,
-			       struct ibv_device_attr_ex *attr);
-	int (*query_rt_values_ex)(struct ibv_context *context,
-			       struct ibv_values_ex *values);
-	int (*query_port)(struct ibv_context *context, uint8_t port_num,
-			  struct ibv_port_attr *port_attr);
-	struct ibv_comp_channel *(*create_comp_channel)
-		(struct ibv_context *context);
-	int (*destroy_comp_channel)(struct ibv_comp_channel *channel);
-	struct ibv_cq *(*create_cq)(struct ibv_context *context, int cqe,
-				    void *cq_context,
-				    struct ibv_comp_channel *channel,
-				    int comp_vector);
-	int (*destroy_cq)(struct ibv_cq *cq);
-	int (*get_cq_event)(struct ibv_comp_channel *channel,
-			    struct ibv_cq **cq, void **cq_context);
-	void (*ack_cq_events)(struct ibv_cq *cq, unsigned int nevents);
-	struct ibv_rwq_ind_table *(*create_rwq_ind_table)
-		(struct ibv_context *context,
-		 struct ibv_rwq_ind_table_init_attr *init_attr);
-	int (*destroy_rwq_ind_table)(struct ibv_rwq_ind_table *rwq_ind_table);
-	struct ibv_wq *(*create_wq)(struct ibv_context *context,
-				    struct ibv_wq_init_attr *wq_init_attr);
-	int (*destroy_wq)(struct ibv_wq *wq);
-	int (*modify_wq)(struct ibv_wq *wq, struct ibv_wq_attr *wq_attr);
-	struct ibv_flow *(*create_flow)(struct ibv_qp *qp,
-					struct ibv_flow_attr *flow);
-	int (*destroy_flow)(struct ibv_flow *flow_id);
-	int (*destroy_flow_action)(void *action);
-	struct ibv_qp *(*create_qp)(struct ibv_pd *pd,
-				    struct ibv_qp_init_attr *qp_init_attr);
-	struct ibv_qp *(*create_qp_ex)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex);
-	int (*destroy_qp)(struct ibv_qp *qp);
-	int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
-			 int attr_mask);
-	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
-				 size_t length, int access);
-	int (*dereg_mr)(struct ibv_mr *mr);
-	struct ibv_counter_set *(*create_counter_set)
-		(struct ibv_context *context,
-		 struct ibv_counter_set_init_attr *init_attr);
-	int (*destroy_counter_set)(struct ibv_counter_set *cs);
-	int (*describe_counter_set)
-		(struct ibv_context *context,
-		 uint16_t counter_set_id,
-		 struct ibv_counter_set_description *cs_desc);
-	int (*query_counter_set)(struct ibv_query_counter_set_attr *query_attr,
-				 struct ibv_counter_set_data *cs_data);
-	struct ibv_counters *(*create_counters)
-		(struct ibv_context *context,
-		 struct ibv_counters_init_attr *init_attr);
-	int (*destroy_counters)(struct ibv_counters *counters);
-	int (*attach_counters)(struct ibv_counters *counters,
-			       struct ibv_counter_attach_attr *attr,
-			       struct ibv_flow *flow);
-	int (*query_counters)(struct ibv_counters *counters,
-			      uint64_t *counters_value,
-			      uint32_t ncounters,
-			      uint32_t flags);
-	void (*ack_async_event)(struct ibv_async_event *event);
-	int (*get_async_event)(struct ibv_context *context,
-			       struct ibv_async_event *event);
-	const char *(*port_state_str)(enum ibv_port_state port_state);
-	struct ibv_cq *(*cq_ex_to_cq)(struct ibv_cq_ex *cq);
-	void *(*dr_create_flow_action_dest_flow_tbl)(void *tbl);
-	void *(*dr_create_flow_action_dest_port)(void *domain,
-						 uint32_t port);
-	void *(*dr_create_flow_action_drop)();
-	void *(*dr_create_flow_action_push_vlan)
-					(struct mlx5dv_dr_domain *domain,
-					 rte_be32_t vlan_tag);
-	void *(*dr_create_flow_action_pop_vlan)();
-	void *(*dr_create_flow_tbl)(void *domain, uint32_t level);
-	int (*dr_destroy_flow_tbl)(void *tbl);
-	void *(*dr_create_domain)(struct ibv_context *ctx,
-				  enum mlx5dv_dr_domain_type domain);
-	int (*dr_destroy_domain)(void *domain);
-	struct ibv_cq_ex *(*dv_create_cq)
-		(struct ibv_context *context,
-		 struct ibv_cq_init_attr_ex *cq_attr,
-		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
-	struct ibv_wq *(*dv_create_wq)
-		(struct ibv_context *context,
-		 struct ibv_wq_init_attr *wq_attr,
-		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
-	int (*dv_query_device)(struct ibv_context *ctx_in,
-			       struct mlx5dv_context *attrs_out);
-	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
-				   enum mlx5dv_set_ctx_attr_type type,
-				   void *attr);
-	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
-	struct ibv_qp *(*dv_create_qp)
-		(struct ibv_context *context,
-		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
-		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
-	void *(*dv_create_flow_matcher)
-		(struct ibv_context *context,
-		 struct mlx5dv_flow_matcher_attr *matcher_attr,
-		 void *tbl);
-	void *(*dv_create_flow)(void *matcher, void *match_value,
-			  size_t num_actions, void *actions[]);
-	void *(*dv_create_flow_action_counter)(void *obj, uint32_t  offset);
-	void *(*dv_create_flow_action_dest_ibv_qp)(void *qp);
-	void *(*dv_create_flow_action_dest_devx_tir)(void *tir);
-	void *(*dv_create_flow_action_modify_header)
-		(struct ibv_context *ctx, enum mlx5dv_flow_table_type ft_type,
-		 void *domain, uint64_t flags, size_t actions_sz,
-		 uint64_t actions[]);
-	void *(*dv_create_flow_action_packet_reformat)
-		(struct ibv_context *ctx,
-		 enum mlx5dv_flow_action_packet_reformat_type reformat_type,
-		 enum mlx5dv_flow_table_type ft_type,
-		 struct mlx5dv_dr_domain *domain,
-		 uint32_t flags, size_t data_sz, void *data);
-	void *(*dv_create_flow_action_tag)(uint32_t tag);
-	void *(*dv_create_flow_action_meter)
-		(struct mlx5dv_dr_flow_meter_attr *attr);
-	int (*dv_modify_flow_action_meter)(void *action,
-		struct mlx5dv_dr_flow_meter_attr *attr, uint64_t modify_bits);
-	int (*dv_destroy_flow)(void *flow);
-	int (*dv_destroy_flow_matcher)(void *matcher);
-	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
-	struct mlx5dv_devx_obj *(*devx_obj_create)
-					(struct ibv_context *ctx,
-					 const void *in, size_t inlen,
-					 void *out, size_t outlen);
-	int (*devx_obj_destroy)(struct mlx5dv_devx_obj *obj);
-	int (*devx_obj_query)(struct mlx5dv_devx_obj *obj,
-			      const void *in, size_t inlen,
-			      void *out, size_t outlen);
-	int (*devx_obj_modify)(struct mlx5dv_devx_obj *obj,
-			       const void *in, size_t inlen,
-			       void *out, size_t outlen);
-	int (*devx_general_cmd)(struct ibv_context *context,
-				const void *in, size_t inlen,
-				void *out, size_t outlen);
-	struct mlx5dv_devx_cmd_comp *(*devx_create_cmd_comp)
-					(struct ibv_context *context);
-	void (*devx_destroy_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_obj_query_async)(struct mlx5dv_devx_obj *obj,
-				    const void *in, size_t inlen,
-				    size_t outlen, uint64_t wr_id,
-				    struct mlx5dv_devx_cmd_comp *cmd_comp);
-	int (*devx_get_async_cmd_comp)(struct mlx5dv_devx_cmd_comp *cmd_comp,
-				       struct mlx5dv_devx_async_cmd_hdr *resp,
-				       size_t cmd_resp_len);
-	struct mlx5dv_devx_umem *(*devx_umem_reg)(struct ibv_context *context,
-						  void *addr, size_t size,
-						  uint32_t access);
-	int (*devx_umem_dereg)(struct mlx5dv_devx_umem *dv_devx_umem);
-	int (*devx_qp_query)(struct ibv_qp *qp,
-			     const void *in, size_t inlen,
-			     void *out, size_t outlen);
-	int (*devx_port_query)(struct ibv_context *ctx,
-			       uint32_t port_num,
-			       struct mlx5dv_devx_port *mlx5_devx_port);
-	int (*dr_dump_domain)(FILE *file, void *domain);
-};
-
-const struct mlx5_glue *mlx5_glue;
-
-#endif /* MLX5_GLUE_H_ */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 7bdaa2a..a646b90 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -27,10 +27,10 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 /**
  * Get MAC address by querying netdevice.
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 0d549b6..b1cd9f7 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -17,10 +17,11 @@
 #include <rte_rwlock.h>
 #include <rte_bus_pci.h>
 
+#include <mlx5_glue.h>
+
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_glue.h"
 
 struct mr_find_contig_memsegs_data {
 	uintptr_t addr;
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
deleted file mode 100644
index 8a67025..0000000
--- a/drivers/net/mlx5/mlx5_prm.h
+++ /dev/null
@@ -1,1888 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2016 6WIND S.A.
- * Copyright 2016 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_PRM_H_
-#define RTE_PMD_MLX5_PRM_H_
-
-#include <assert.h>
-
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <infiniband/mlx5dv.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-#include <rte_vect.h>
-#include "mlx5_autoconf.h"
-
-/* RSS hash key size. */
-#define MLX5_RSS_HASH_KEY_LEN 40
-
-/* Get CQE owner bit. */
-#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
-
-/* Get CQE format. */
-#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
-
-/* Get CQE opcode. */
-#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
-
-/* Get CQE solicited event. */
-#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
-
-/* Invalidate a CQE. */
-#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
-
-/* WQE Segment sizes in bytes. */
-#define MLX5_WSEG_SIZE 16u
-#define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
-#define MLX5_WQE_DSEG_SIZE sizeof(struct mlx5_wqe_dseg)
-#define MLX5_WQE_ESEG_SIZE sizeof(struct mlx5_wqe_eseg)
-
-/* WQE/WQEBB size in bytes. */
-#define MLX5_WQE_SIZE sizeof(struct mlx5_wqe)
-
-/*
- * Max size of a WQE session.
- * Absolute maximum size is 63 (MLX5_DSEG_MAX) segments,
- * the WQE size field in Control Segment is 6 bits wide.
- */
-#define MLX5_WQE_SIZE_MAX (60 * MLX5_WSEG_SIZE)
-
-/*
- * Default minimum number of Tx queues for inlining packets.
- * If there are less queues as specified we assume we have
- * no enough CPU resources (cycles) to perform inlining,
- * the PCIe throughput is not supposed as bottleneck and
- * inlining is disabled.
- */
-#define MLX5_INLINE_MAX_TXQS 8u
-#define MLX5_INLINE_MAX_TXQS_BLUEFIELD 16u
-
-/*
- * Default packet length threshold to be inlined with
- * enhanced MPW. If packet length exceeds the threshold
- * the data are not inlined. Should be aligned in WQEBB
- * boundary with accounting the title Control and Ethernet
- * segments.
- */
-#define MLX5_EMPW_DEF_INLINE_LEN (4u * MLX5_WQE_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Maximal inline data length sent with enhanced MPW.
- * Is based on maximal WQE size.
- */
-#define MLX5_EMPW_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_DSEG_MIN_INLINE_SIZE)
-/*
- * Minimal amount of packets to be sent with EMPW.
- * This limits the minimal required size of sent EMPW.
- * If there are no enough resources to built minimal
- * EMPW the sending loop exits.
- */
-#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
-/*
- * Maximal amount of packets to be sent with EMPW.
- * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
- * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
- * without CQE generation request, being multiplied by
- * MLX5_TX_COMP_MAX_CQE it may cause significant latency
- * in tx burst routine at the moment of freeing multiple mbufs.
- */
-#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
-#define MLX5_MPW_MAX_PACKETS 6
-#define MLX5_MPW_INLINE_MAX_PACKETS 2
-
-/*
- * Default packet length threshold to be inlined with
- * ordinary SEND. Inlining saves the MR key search
- * and extra PCIe data fetch transaction, but eats the
- * CPU cycles.
- */
-#define MLX5_SEND_DEF_INLINE_LEN (5U * MLX5_WQE_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE)
-/*
- * Maximal inline data length sent with ordinary SEND.
- * Is based on maximal WQE size.
- */
-#define MLX5_SEND_MAX_INLINE_LEN (MLX5_WQE_SIZE_MAX - \
-				  MLX5_WQE_CSEG_SIZE - \
-				  MLX5_WQE_ESEG_SIZE - \
-				  MLX5_WQE_DSEG_SIZE + \
-				  MLX5_ESEG_MIN_INLINE_SIZE)
-
-/* Missed in mlv5dv.h, should define here. */
-#define MLX5_OPCODE_ENHANCED_MPSW 0x29u
-
-/* CQE value to inform that VLAN is stripped. */
-#define MLX5_CQE_VLAN_STRIPPED (1u << 0)
-
-/* IPv4 options. */
-#define MLX5_CQE_RX_IP_EXT_OPTS_PACKET (1u << 1)
-
-/* IPv6 packet. */
-#define MLX5_CQE_RX_IPV6_PACKET (1u << 2)
-
-/* IPv4 packet. */
-#define MLX5_CQE_RX_IPV4_PACKET (1u << 3)
-
-/* TCP packet. */
-#define MLX5_CQE_RX_TCP_PACKET (1u << 4)
-
-/* UDP packet. */
-#define MLX5_CQE_RX_UDP_PACKET (1u << 5)
-
-/* IP is fragmented. */
-#define MLX5_CQE_RX_IP_FRAG_PACKET (1u << 7)
-
-/* L2 header is valid. */
-#define MLX5_CQE_RX_L2_HDR_VALID (1u << 8)
-
-/* L3 header is valid. */
-#define MLX5_CQE_RX_L3_HDR_VALID (1u << 9)
-
-/* L4 header is valid. */
-#define MLX5_CQE_RX_L4_HDR_VALID (1u << 10)
-
-/* Outer packet, 0 IPv4, 1 IPv6. */
-#define MLX5_CQE_RX_OUTER_PACKET (1u << 1)
-
-/* Tunnel packet bit in the CQE. */
-#define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
-
-/* Mask for LRO push flag in the CQE lro_tcppsh_abort_dupack field. */
-#define MLX5_CQE_LRO_PUSH_MASK 0x40
-
-/* Mask for L4 type in the CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_MASK 0x70
-
-/* The bit index of L4 type in CQE hdr_type_etc field. */
-#define MLX5_CQE_L4_TYPE_SHIFT 0x4
-
-/* L4 type to indicate TCP packet without acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_EMPTY_ACK 0x3
-
-/* L4 type to indicate TCP packet with acknowledgment. */
-#define MLX5_L4_HDR_TYPE_TCP_WITH_ACL 0x4
-
-/* Inner L3 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L3_INNER_CSUM (1u << 4)
-
-/* Inner L4 checksum offload (Tunneled packets only). */
-#define MLX5_ETH_WQE_L4_INNER_CSUM (1u << 5)
-
-/* Outer L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_OUTER_TCP  (0u << 5)
-
-/* Outer L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_OUTER_UDP  (1u << 5)
-
-/* Outer L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV4 (0u << 4)
-
-/* Outer L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_OUTER_IPV6 (1u << 4)
-
-/* Inner L4 type is TCP. */
-#define MLX5_ETH_WQE_L4_INNER_TCP (0u << 1)
-
-/* Inner L4 type is UDP. */
-#define MLX5_ETH_WQE_L4_INNER_UDP (1u << 1)
-
-/* Inner L3 type is IPV4. */
-#define MLX5_ETH_WQE_L3_INNER_IPV4 (0u << 0)
-
-/* Inner L3 type is IPV6. */
-#define MLX5_ETH_WQE_L3_INNER_IPV6 (1u << 0)
-
-/* VLAN insertion flag. */
-#define MLX5_ETH_WQE_VLAN_INSERT (1u << 31)
-
-/* Data inline segment flag. */
-#define MLX5_ETH_WQE_DATA_INLINE (1u << 31)
-
-/* Is flow mark valid. */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff00)
-#else
-#define MLX5_FLOW_MARK_IS_VALID(val) ((val) & 0xffffff)
-#endif
-
-/* INVALID is used by packets matching no flow rules. */
-#define MLX5_FLOW_MARK_INVALID 0
-
-/* Maximum allowed value to mark a packet. */
-#define MLX5_FLOW_MARK_MAX 0xfffff0
-
-/* Default mark value used when none is provided. */
-#define MLX5_FLOW_MARK_DEFAULT 0xffffff
-
-/* Default mark mask for metadata legacy mode. */
-#define MLX5_FLOW_MARK_MASK 0xffffff
-
-/* Maximum number of DS in WQE. Limited by 6-bit field. */
-#define MLX5_DSEG_MAX 63
-
-/* The completion mode offset in the WQE control segment line 2. */
-#define MLX5_COMP_MODE_OFFSET 2
-
-/* Amount of data bytes in minimal inline data segment. */
-#define MLX5_DSEG_MIN_INLINE_SIZE 12u
-
-/* Amount of data bytes in minimal inline eth segment. */
-#define MLX5_ESEG_MIN_INLINE_SIZE 18u
-
-/* Amount of data bytes after eth data segment. */
-#define MLX5_ESEG_EXTRA_DATA_SIZE 32u
-
-/* The maximum log value of segments per RQ WQE. */
-#define MLX5_MAX_LOG_RQ_SEGS 5u
-
-/* The alignment needed for WQ buffer. */
-#define MLX5_WQE_BUF_ALIGNMENT 512
-
-/* Completion mode. */
-enum mlx5_completion_mode {
-	MLX5_COMP_ONLY_ERR = 0x0,
-	MLX5_COMP_ONLY_FIRST_ERR = 0x1,
-	MLX5_COMP_ALWAYS = 0x2,
-	MLX5_COMP_CQE_AND_EQE = 0x3,
-};
-
-/* MPW mode. */
-enum mlx5_mpw_mode {
-	MLX5_MPW_DISABLED,
-	MLX5_MPW,
-	MLX5_MPW_ENHANCED, /* Enhanced Multi-Packet Send WQE, a.k.a MPWv2. */
-};
-
-/* WQE Control segment. */
-struct mlx5_wqe_cseg {
-	uint32_t opcode;
-	uint32_t sq_ds;
-	uint32_t flags;
-	uint32_t misc;
-} __rte_packed __rte_aligned(MLX5_WSEG_SIZE);
-
-/* Header of data segment. Minimal size Data Segment */
-struct mlx5_wqe_dseg {
-	uint32_t bcount;
-	union {
-		uint8_t inline_data[MLX5_DSEG_MIN_INLINE_SIZE];
-		struct {
-			uint32_t lkey;
-			uint64_t pbuf;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* Subset of struct WQE Ethernet Segment. */
-struct mlx5_wqe_eseg {
-	union {
-		struct {
-			uint32_t swp_offs;
-			uint8_t	cs_flags;
-			uint8_t	swp_flags;
-			uint16_t mss;
-			uint32_t metadata;
-			uint16_t inline_hdr_sz;
-			union {
-				uint16_t inline_data;
-				uint16_t vlan_tag;
-			};
-		} __rte_packed;
-		struct {
-			uint32_t offsets;
-			uint32_t flags;
-			uint32_t flow_metadata;
-			uint32_t inline_hdr;
-		} __rte_packed;
-	};
-} __rte_packed;
-
-/* The title WQEBB, header of WQE. */
-struct mlx5_wqe {
-	union {
-		struct mlx5_wqe_cseg cseg;
-		uint32_t ctrl[4];
-	};
-	struct mlx5_wqe_eseg eseg;
-	union {
-		struct mlx5_wqe_dseg dseg[2];
-		uint8_t data[MLX5_ESEG_EXTRA_DATA_SIZE];
-	};
-} __rte_packed;
-
-/* WQE for Multi-Packet RQ. */
-struct mlx5_wqe_mprq {
-	struct mlx5_wqe_srq_next_seg next_seg;
-	struct mlx5_wqe_data_seg dseg;
-};
-
-#define MLX5_MPRQ_LEN_MASK 0x000ffff
-#define MLX5_MPRQ_LEN_SHIFT 0
-#define MLX5_MPRQ_STRIDE_NUM_MASK 0x3fff0000
-#define MLX5_MPRQ_STRIDE_NUM_SHIFT 16
-#define MLX5_MPRQ_FILLER_MASK 0x80000000
-#define MLX5_MPRQ_FILLER_SHIFT 31
-
-#define MLX5_MPRQ_STRIDE_SHIFT_BYTE 2
-
-/* CQ element structure - should be equal to the cache line size */
-struct mlx5_cqe {
-#if (RTE_CACHE_LINE_SIZE == 128)
-	uint8_t padding[64];
-#endif
-	uint8_t pkt_info;
-	uint8_t rsvd0;
-	uint16_t wqe_id;
-	uint8_t lro_tcppsh_abort_dupack;
-	uint8_t lro_min_ttl;
-	uint16_t lro_tcp_win;
-	uint32_t lro_ack_seq_num;
-	uint32_t rx_hash_res;
-	uint8_t rx_hash_type;
-	uint8_t rsvd1[3];
-	uint16_t csum;
-	uint8_t rsvd2[6];
-	uint16_t hdr_type_etc;
-	uint16_t vlan_info;
-	uint8_t lro_num_seg;
-	uint8_t rsvd3[3];
-	uint32_t flow_table_metadata;
-	uint8_t rsvd4[4];
-	uint32_t byte_cnt;
-	uint64_t timestamp;
-	uint32_t sop_drop_qpn;
-	uint16_t wqe_counter;
-	uint8_t rsvd5;
-	uint8_t op_own;
-};
-
-/* Adding direct verbs to data-path. */
-
-/* CQ sequence number mask. */
-#define MLX5_CQ_SQN_MASK 0x3
-
-/* CQ sequence number index. */
-#define MLX5_CQ_SQN_OFFSET 28
-
-/* CQ doorbell index mask. */
-#define MLX5_CI_MASK 0xffffff
-
-/* CQ doorbell offset. */
-#define MLX5_CQ_ARM_DB 1
-
-/* CQ doorbell offset*/
-#define MLX5_CQ_DOORBELL 0x20
-
-/* CQE format value. */
-#define MLX5_COMPRESSED 0x3
-
-/* Action type of header modification. */
-enum {
-	MLX5_MODIFICATION_TYPE_SET = 0x1,
-	MLX5_MODIFICATION_TYPE_ADD = 0x2,
-	MLX5_MODIFICATION_TYPE_COPY = 0x3,
-};
-
-/* The field of packet to be modified. */
-enum mlx5_modification_field {
-	MLX5_MODI_OUT_NONE = -1,
-	MLX5_MODI_OUT_SMAC_47_16 = 1,
-	MLX5_MODI_OUT_SMAC_15_0,
-	MLX5_MODI_OUT_ETHERTYPE,
-	MLX5_MODI_OUT_DMAC_47_16,
-	MLX5_MODI_OUT_DMAC_15_0,
-	MLX5_MODI_OUT_IP_DSCP,
-	MLX5_MODI_OUT_TCP_FLAGS,
-	MLX5_MODI_OUT_TCP_SPORT,
-	MLX5_MODI_OUT_TCP_DPORT,
-	MLX5_MODI_OUT_IPV4_TTL,
-	MLX5_MODI_OUT_UDP_SPORT,
-	MLX5_MODI_OUT_UDP_DPORT,
-	MLX5_MODI_OUT_SIPV6_127_96,
-	MLX5_MODI_OUT_SIPV6_95_64,
-	MLX5_MODI_OUT_SIPV6_63_32,
-	MLX5_MODI_OUT_SIPV6_31_0,
-	MLX5_MODI_OUT_DIPV6_127_96,
-	MLX5_MODI_OUT_DIPV6_95_64,
-	MLX5_MODI_OUT_DIPV6_63_32,
-	MLX5_MODI_OUT_DIPV6_31_0,
-	MLX5_MODI_OUT_SIPV4,
-	MLX5_MODI_OUT_DIPV4,
-	MLX5_MODI_OUT_FIRST_VID,
-	MLX5_MODI_IN_SMAC_47_16 = 0x31,
-	MLX5_MODI_IN_SMAC_15_0,
-	MLX5_MODI_IN_ETHERTYPE,
-	MLX5_MODI_IN_DMAC_47_16,
-	MLX5_MODI_IN_DMAC_15_0,
-	MLX5_MODI_IN_IP_DSCP,
-	MLX5_MODI_IN_TCP_FLAGS,
-	MLX5_MODI_IN_TCP_SPORT,
-	MLX5_MODI_IN_TCP_DPORT,
-	MLX5_MODI_IN_IPV4_TTL,
-	MLX5_MODI_IN_UDP_SPORT,
-	MLX5_MODI_IN_UDP_DPORT,
-	MLX5_MODI_IN_SIPV6_127_96,
-	MLX5_MODI_IN_SIPV6_95_64,
-	MLX5_MODI_IN_SIPV6_63_32,
-	MLX5_MODI_IN_SIPV6_31_0,
-	MLX5_MODI_IN_DIPV6_127_96,
-	MLX5_MODI_IN_DIPV6_95_64,
-	MLX5_MODI_IN_DIPV6_63_32,
-	MLX5_MODI_IN_DIPV6_31_0,
-	MLX5_MODI_IN_SIPV4,
-	MLX5_MODI_IN_DIPV4,
-	MLX5_MODI_OUT_IPV6_HOPLIMIT,
-	MLX5_MODI_IN_IPV6_HOPLIMIT,
-	MLX5_MODI_META_DATA_REG_A,
-	MLX5_MODI_META_DATA_REG_B = 0x50,
-	MLX5_MODI_META_REG_C_0,
-	MLX5_MODI_META_REG_C_1,
-	MLX5_MODI_META_REG_C_2,
-	MLX5_MODI_META_REG_C_3,
-	MLX5_MODI_META_REG_C_4,
-	MLX5_MODI_META_REG_C_5,
-	MLX5_MODI_META_REG_C_6,
-	MLX5_MODI_META_REG_C_7,
-	MLX5_MODI_OUT_TCP_SEQ_NUM,
-	MLX5_MODI_IN_TCP_SEQ_NUM,
-	MLX5_MODI_OUT_TCP_ACK_NUM,
-	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
-};
-
-/* Total number of metadata reg_c's. */
-#define MLX5_MREG_C_NUM (MLX5_MODI_META_REG_C_7 - MLX5_MODI_META_REG_C_0 + 1)
-
-enum modify_reg {
-	REG_NONE = 0,
-	REG_A,
-	REG_B,
-	REG_C_0,
-	REG_C_1,
-	REG_C_2,
-	REG_C_3,
-	REG_C_4,
-	REG_C_5,
-	REG_C_6,
-	REG_C_7,
-};
-
-/* Modification sub command. */
-struct mlx5_modification_cmd {
-	union {
-		uint32_t data0;
-		struct {
-			unsigned int length:5;
-			unsigned int rsvd0:3;
-			unsigned int offset:5;
-			unsigned int rsvd1:3;
-			unsigned int field:12;
-			unsigned int action_type:4;
-		};
-	};
-	union {
-		uint32_t data1;
-		uint8_t data[4];
-		struct {
-			unsigned int rsvd2:8;
-			unsigned int dst_offset:5;
-			unsigned int rsvd3:3;
-			unsigned int dst_field:12;
-			unsigned int rsvd4:4;
-		};
-	};
-};
-
-typedef uint32_t u32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-
-#define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
-#define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
-#define __mlx5_bit_off(typ, fld) ((unsigned int)(unsigned long) \
-				  (&(__mlx5_nullp(typ)->fld)))
-#define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0x1f))
-#define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
-#define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
-#define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << \
-				  __mlx5_dw_bit_off(typ, fld))
-#define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
-#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
-				    (__mlx5_bit_off(typ, fld) & 0xf))
-#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
-#define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
-#define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
-#define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
-#define MLX5_ADDR_OF(typ, p, fld) ((char *)(p) + MLX5_BYTE_OFF(typ, fld))
-
-/* insert a value to a struct */
-#define MLX5_SET(typ, p, fld, v) \
-	do { \
-		u32 _v = v; \
-		*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
-		rte_cpu_to_be_32((rte_be_to_cpu_32(*((u32 *)(p) + \
-				  __mlx5_dw_off(typ, fld))) & \
-				  (~__mlx5_dw_mask(typ, fld))) | \
-				 (((_v) & __mlx5_mask(typ, fld)) << \
-				   __mlx5_dw_bit_off(typ, fld))); \
-	} while (0)
-
-#define MLX5_SET64(typ, p, fld, v) \
-	do { \
-		assert(__mlx5_bit_sz(typ, fld) == 64); \
-		*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = \
-			rte_cpu_to_be_64(v); \
-	} while (0)
-
-#define MLX5_GET(typ, p, fld) \
-	((rte_be_to_cpu_32(*((__be32 *)(p) +\
-	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
-	__mlx5_mask(typ, fld))
-#define MLX5_GET16(typ, p, fld) \
-	((rte_be_to_cpu_16(*((__be16 *)(p) + \
-	  __mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
-	 __mlx5_mask16(typ, fld))
-#define MLX5_GET64(typ, p, fld) rte_be_to_cpu_64(*((__be64 *)(p) + \
-						   __mlx5_64_off(typ, fld)))
-#define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
-
-struct mlx5_ifc_fte_match_set_misc_bits {
-	u8 gre_c_present[0x1];
-	u8 reserved_at_1[0x1];
-	u8 gre_k_present[0x1];
-	u8 gre_s_present[0x1];
-	u8 source_vhci_port[0x4];
-	u8 source_sqn[0x18];
-	u8 reserved_at_20[0x10];
-	u8 source_port[0x10];
-	u8 outer_second_prio[0x3];
-	u8 outer_second_cfi[0x1];
-	u8 outer_second_vid[0xc];
-	u8 inner_second_prio[0x3];
-	u8 inner_second_cfi[0x1];
-	u8 inner_second_vid[0xc];
-	u8 outer_second_cvlan_tag[0x1];
-	u8 inner_second_cvlan_tag[0x1];
-	u8 outer_second_svlan_tag[0x1];
-	u8 inner_second_svlan_tag[0x1];
-	u8 reserved_at_64[0xc];
-	u8 gre_protocol[0x10];
-	u8 gre_key_h[0x18];
-	u8 gre_key_l[0x8];
-	u8 vxlan_vni[0x18];
-	u8 reserved_at_b8[0x8];
-	u8 geneve_vni[0x18];
-	u8 reserved_at_e4[0x7];
-	u8 geneve_oam[0x1];
-	u8 reserved_at_e0[0xc];
-	u8 outer_ipv6_flow_label[0x14];
-	u8 reserved_at_100[0xc];
-	u8 inner_ipv6_flow_label[0x14];
-	u8 reserved_at_120[0xa];
-	u8 geneve_opt_len[0x6];
-	u8 geneve_protocol_type[0x10];
-	u8 reserved_at_140[0xc0];
-};
-
-struct mlx5_ifc_ipv4_layout_bits {
-	u8 reserved_at_0[0x60];
-	u8 ipv4[0x20];
-};
-
-struct mlx5_ifc_ipv6_layout_bits {
-	u8 ipv6[16][0x8];
-};
-
-union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
-	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
-	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
-	u8 reserved_at_0[0x80];
-};
-
-struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
-	u8 smac_47_16[0x20];
-	u8 smac_15_0[0x10];
-	u8 ethertype[0x10];
-	u8 dmac_47_16[0x20];
-	u8 dmac_15_0[0x10];
-	u8 first_prio[0x3];
-	u8 first_cfi[0x1];
-	u8 first_vid[0xc];
-	u8 ip_protocol[0x8];
-	u8 ip_dscp[0x6];
-	u8 ip_ecn[0x2];
-	u8 cvlan_tag[0x1];
-	u8 svlan_tag[0x1];
-	u8 frag[0x1];
-	u8 ip_version[0x4];
-	u8 tcp_flags[0x9];
-	u8 tcp_sport[0x10];
-	u8 tcp_dport[0x10];
-	u8 reserved_at_c0[0x20];
-	u8 udp_sport[0x10];
-	u8 udp_dport[0x10];
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
-	union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
-};
-
-struct mlx5_ifc_fte_match_mpls_bits {
-	u8 mpls_label[0x14];
-	u8 mpls_exp[0x3];
-	u8 mpls_s_bos[0x1];
-	u8 mpls_ttl[0x8];
-};
-
-struct mlx5_ifc_fte_match_set_misc2_bits {
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits inner_first_mpls;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_gre;
-	struct mlx5_ifc_fte_match_mpls_bits outer_first_mpls_over_udp;
-	u8 metadata_reg_c_7[0x20];
-	u8 metadata_reg_c_6[0x20];
-	u8 metadata_reg_c_5[0x20];
-	u8 metadata_reg_c_4[0x20];
-	u8 metadata_reg_c_3[0x20];
-	u8 metadata_reg_c_2[0x20];
-	u8 metadata_reg_c_1[0x20];
-	u8 metadata_reg_c_0[0x20];
-	u8 metadata_reg_a[0x20];
-	u8 metadata_reg_b[0x20];
-	u8 reserved_at_1c0[0x40];
-};
-
-struct mlx5_ifc_fte_match_set_misc3_bits {
-	u8 inner_tcp_seq_num[0x20];
-	u8 outer_tcp_seq_num[0x20];
-	u8 inner_tcp_ack_num[0x20];
-	u8 outer_tcp_ack_num[0x20];
-	u8 reserved_at_auto1[0x8];
-	u8 outer_vxlan_gpe_vni[0x18];
-	u8 outer_vxlan_gpe_next_protocol[0x8];
-	u8 outer_vxlan_gpe_flags[0x8];
-	u8 reserved_at_a8[0x10];
-	u8 icmp_header_data[0x20];
-	u8 icmpv6_header_data[0x20];
-	u8 icmp_type[0x8];
-	u8 icmp_code[0x8];
-	u8 icmpv6_type[0x8];
-	u8 icmpv6_code[0x8];
-	u8 reserved_at_120[0x20];
-	u8 gtpu_teid[0x20];
-	u8 gtpu_msg_type[0x08];
-	u8 gtpu_msg_flags[0x08];
-	u8 reserved_at_170[0x90];
-};
-
-/* Flow matcher. */
-struct mlx5_ifc_fte_match_param_bits {
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits outer_headers;
-	struct mlx5_ifc_fte_match_set_misc_bits misc_parameters;
-	struct mlx5_ifc_fte_match_set_lyr_2_4_bits inner_headers;
-	struct mlx5_ifc_fte_match_set_misc2_bits misc_parameters_2;
-	struct mlx5_ifc_fte_match_set_misc3_bits misc_parameters_3;
-};
-
-enum {
-	MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_INNER_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC2_BIT,
-	MLX5_MATCH_CRITERIA_ENABLE_MISC3_BIT
-};
-
-enum {
-	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
-	MLX5_CMD_OP_CREATE_MKEY = 0x200,
-	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
-	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
-	MLX5_CMD_OP_CREATE_TIR = 0x900,
-	MLX5_CMD_OP_CREATE_SQ = 0X904,
-	MLX5_CMD_OP_MODIFY_SQ = 0X905,
-	MLX5_CMD_OP_CREATE_RQ = 0x908,
-	MLX5_CMD_OP_MODIFY_RQ = 0x909,
-	MLX5_CMD_OP_CREATE_TIS = 0x912,
-	MLX5_CMD_OP_QUERY_TIS = 0x915,
-	MLX5_CMD_OP_CREATE_RQT = 0x916,
-	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
-	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
-};
-
-enum {
-	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
-};
-
-/* Flow counters. */
-struct mlx5_ifc_alloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_40[0x18];
-	u8         flow_counter_bulk[0x8];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_dealloc_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         flow_counter_id[0x20];
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_traffic_counter_bits {
-	u8         packets[0x40];
-	u8         octets[0x40];
-};
-
-struct mlx5_ifc_query_flow_counter_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-	u8         syndrome[0x20];
-	u8         reserved_at_40[0x40];
-	struct mlx5_ifc_traffic_counter_bits flow_statistics[];
-};
-
-struct mlx5_ifc_query_flow_counter_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-	u8         reserved_at_40[0x20];
-	u8         mkey[0x20];
-	u8         address[0x40];
-	u8         clear[0x1];
-	u8         dump_to_memory[0x1];
-	u8         num_of_counters[0x1e];
-	u8         flow_counter_id[0x20];
-};
-
-struct mlx5_ifc_mkc_bits {
-	u8         reserved_at_0[0x1];
-	u8         free[0x1];
-	u8         reserved_at_2[0x1];
-	u8         access_mode_4_2[0x3];
-	u8         reserved_at_6[0x7];
-	u8         relaxed_ordering_write[0x1];
-	u8         reserved_at_e[0x1];
-	u8         small_fence_on_rdma_read_response[0x1];
-	u8         umr_en[0x1];
-	u8         a[0x1];
-	u8         rw[0x1];
-	u8         rr[0x1];
-	u8         lw[0x1];
-	u8         lr[0x1];
-	u8         access_mode_1_0[0x2];
-	u8         reserved_at_18[0x8];
-
-	u8         qpn[0x18];
-	u8         mkey_7_0[0x8];
-
-	u8         reserved_at_40[0x20];
-
-	u8         length64[0x1];
-	u8         bsf_en[0x1];
-	u8         sync_umr[0x1];
-	u8         reserved_at_63[0x2];
-	u8         expected_sigerr_count[0x1];
-	u8         reserved_at_66[0x1];
-	u8         en_rinval[0x1];
-	u8         pd[0x18];
-
-	u8         start_addr[0x40];
-
-	u8         len[0x40];
-
-	u8         bsf_octword_size[0x20];
-
-	u8         reserved_at_120[0x80];
-
-	u8         translations_octword_size[0x20];
-
-	u8         reserved_at_1c0[0x1b];
-	u8         log_page_size[0x5];
-
-	u8         reserved_at_1e0[0x20];
-};
-
-struct mlx5_ifc_create_mkey_out_bits {
-	u8         status[0x8];
-	u8         reserved_at_8[0x18];
-
-	u8         syndrome[0x20];
-
-	u8         reserved_at_40[0x8];
-	u8         mkey_index[0x18];
-
-	u8         reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_mkey_in_bits {
-	u8         opcode[0x10];
-	u8         reserved_at_10[0x10];
-
-	u8         reserved_at_20[0x10];
-	u8         op_mod[0x10];
-
-	u8         reserved_at_40[0x20];
-
-	u8         pg_access[0x1];
-	u8         reserved_at_61[0x1f];
-
-	struct mlx5_ifc_mkc_bits memory_key_mkey_entry;
-
-	u8         reserved_at_280[0x80];
-
-	u8         translations_octword_actual_size[0x20];
-
-	u8         mkey_umem_id[0x20];
-
-	u8         mkey_umem_offset[0x40];
-
-	u8         reserved_at_380[0x500];
-
-	u8         klm_pas_mtt[][0x20];
-};
-
-enum {
-	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
-	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
-};
-
-enum {
-	MLX5_HCA_CAP_OPMOD_GET_MAX   = 0,
-	MLX5_HCA_CAP_OPMOD_GET_CUR   = 1,
-};
-
-enum {
-	MLX5_CAP_INLINE_MODE_L2,
-	MLX5_CAP_INLINE_MODE_VPORT_CONTEXT,
-	MLX5_CAP_INLINE_MODE_NOT_REQUIRED,
-};
-
-enum {
-	MLX5_INLINE_MODE_NONE,
-	MLX5_INLINE_MODE_L2,
-	MLX5_INLINE_MODE_IP,
-	MLX5_INLINE_MODE_TCP_UDP,
-	MLX5_INLINE_MODE_RESERVED4,
-	MLX5_INLINE_MODE_INNER_L2,
-	MLX5_INLINE_MODE_INNER_IP,
-	MLX5_INLINE_MODE_INNER_TCP_UDP,
-};
-
-/* HCA bit masks indicating which Flex parser protocols are already enabled. */
-#define MLX5_HCA_FLEX_IPV4_OVER_VXLAN_ENABLED (1UL << 0)
-#define MLX5_HCA_FLEX_IPV6_OVER_VXLAN_ENABLED (1UL << 1)
-#define MLX5_HCA_FLEX_IPV6_OVER_IP_ENABLED (1UL << 2)
-#define MLX5_HCA_FLEX_GENEVE_ENABLED (1UL << 3)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_GRE_ENABLED (1UL << 4)
-#define MLX5_HCA_FLEX_CW_MPLS_OVER_UDP_ENABLED (1UL << 5)
-#define MLX5_HCA_FLEX_P_BIT_VXLAN_GPE_ENABLED (1UL << 6)
-#define MLX5_HCA_FLEX_VXLAN_GPE_ENABLED (1UL << 7)
-#define MLX5_HCA_FLEX_ICMP_ENABLED (1UL << 8)
-#define MLX5_HCA_FLEX_ICMPV6_ENABLED (1UL << 9)
-
-struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x30];
-	u8 vhca_id[0x10];
-	u8 reserved_at_40[0x40];
-	u8 log_max_srq_sz[0x8];
-	u8 log_max_qp_sz[0x8];
-	u8 reserved_at_90[0xb];
-	u8 log_max_qp[0x5];
-	u8 reserved_at_a0[0xb];
-	u8 log_max_srq[0x5];
-	u8 reserved_at_b0[0x10];
-	u8 reserved_at_c0[0x8];
-	u8 log_max_cq_sz[0x8];
-	u8 reserved_at_d0[0xb];
-	u8 log_max_cq[0x5];
-	u8 log_max_eq_sz[0x8];
-	u8 reserved_at_e8[0x2];
-	u8 log_max_mkey[0x6];
-	u8 reserved_at_f0[0x8];
-	u8 dump_fill_mkey[0x1];
-	u8 reserved_at_f9[0x3];
-	u8 log_max_eq[0x4];
-	u8 max_indirection[0x8];
-	u8 fixed_buffer_size[0x1];
-	u8 log_max_mrw_sz[0x7];
-	u8 force_teardown[0x1];
-	u8 reserved_at_111[0x1];
-	u8 log_max_bsf_list_size[0x6];
-	u8 umr_extended_translation_offset[0x1];
-	u8 null_mkey[0x1];
-	u8 log_max_klm_list_size[0x6];
-	u8 reserved_at_120[0xa];
-	u8 log_max_ra_req_dc[0x6];
-	u8 reserved_at_130[0xa];
-	u8 log_max_ra_res_dc[0x6];
-	u8 reserved_at_140[0xa];
-	u8 log_max_ra_req_qp[0x6];
-	u8 reserved_at_150[0xa];
-	u8 log_max_ra_res_qp[0x6];
-	u8 end_pad[0x1];
-	u8 cc_query_allowed[0x1];
-	u8 cc_modify_allowed[0x1];
-	u8 start_pad[0x1];
-	u8 cache_line_128byte[0x1];
-	u8 reserved_at_165[0xa];
-	u8 qcam_reg[0x1];
-	u8 gid_table_size[0x10];
-	u8 out_of_seq_cnt[0x1];
-	u8 vport_counters[0x1];
-	u8 retransmission_q_counters[0x1];
-	u8 debug[0x1];
-	u8 modify_rq_counter_set_id[0x1];
-	u8 rq_delay_drop[0x1];
-	u8 max_qp_cnt[0xa];
-	u8 pkey_table_size[0x10];
-	u8 vport_group_manager[0x1];
-	u8 vhca_group_manager[0x1];
-	u8 ib_virt[0x1];
-	u8 eth_virt[0x1];
-	u8 vnic_env_queue_counters[0x1];
-	u8 ets[0x1];
-	u8 nic_flow_table[0x1];
-	u8 eswitch_manager[0x1];
-	u8 device_memory[0x1];
-	u8 mcam_reg[0x1];
-	u8 pcam_reg[0x1];
-	u8 local_ca_ack_delay[0x5];
-	u8 port_module_event[0x1];
-	u8 enhanced_error_q_counters[0x1];
-	u8 ports_check[0x1];
-	u8 reserved_at_1b3[0x1];
-	u8 disable_link_up[0x1];
-	u8 beacon_led[0x1];
-	u8 port_type[0x2];
-	u8 num_ports[0x8];
-	u8 reserved_at_1c0[0x1];
-	u8 pps[0x1];
-	u8 pps_modify[0x1];
-	u8 log_max_msg[0x5];
-	u8 reserved_at_1c8[0x4];
-	u8 max_tc[0x4];
-	u8 temp_warn_event[0x1];
-	u8 dcbx[0x1];
-	u8 general_notification_event[0x1];
-	u8 reserved_at_1d3[0x2];
-	u8 fpga[0x1];
-	u8 rol_s[0x1];
-	u8 rol_g[0x1];
-	u8 reserved_at_1d8[0x1];
-	u8 wol_s[0x1];
-	u8 wol_g[0x1];
-	u8 wol_a[0x1];
-	u8 wol_b[0x1];
-	u8 wol_m[0x1];
-	u8 wol_u[0x1];
-	u8 wol_p[0x1];
-	u8 stat_rate_support[0x10];
-	u8 reserved_at_1f0[0xc];
-	u8 cqe_version[0x4];
-	u8 compact_address_vector[0x1];
-	u8 striding_rq[0x1];
-	u8 reserved_at_202[0x1];
-	u8 ipoib_enhanced_offloads[0x1];
-	u8 ipoib_basic_offloads[0x1];
-	u8 reserved_at_205[0x1];
-	u8 repeated_block_disabled[0x1];
-	u8 umr_modify_entity_size_disabled[0x1];
-	u8 umr_modify_atomic_disabled[0x1];
-	u8 umr_indirect_mkey_disabled[0x1];
-	u8 umr_fence[0x2];
-	u8 reserved_at_20c[0x3];
-	u8 drain_sigerr[0x1];
-	u8 cmdif_checksum[0x2];
-	u8 sigerr_cqe[0x1];
-	u8 reserved_at_213[0x1];
-	u8 wq_signature[0x1];
-	u8 sctr_data_cqe[0x1];
-	u8 reserved_at_216[0x1];
-	u8 sho[0x1];
-	u8 tph[0x1];
-	u8 rf[0x1];
-	u8 dct[0x1];
-	u8 qos[0x1];
-	u8 eth_net_offloads[0x1];
-	u8 roce[0x1];
-	u8 atomic[0x1];
-	u8 reserved_at_21f[0x1];
-	u8 cq_oi[0x1];
-	u8 cq_resize[0x1];
-	u8 cq_moderation[0x1];
-	u8 reserved_at_223[0x3];
-	u8 cq_eq_remap[0x1];
-	u8 pg[0x1];
-	u8 block_lb_mc[0x1];
-	u8 reserved_at_229[0x1];
-	u8 scqe_break_moderation[0x1];
-	u8 cq_period_start_from_cqe[0x1];
-	u8 cd[0x1];
-	u8 reserved_at_22d[0x1];
-	u8 apm[0x1];
-	u8 vector_calc[0x1];
-	u8 umr_ptr_rlky[0x1];
-	u8 imaicl[0x1];
-	u8 reserved_at_232[0x4];
-	u8 qkv[0x1];
-	u8 pkv[0x1];
-	u8 set_deth_sqpn[0x1];
-	u8 reserved_at_239[0x3];
-	u8 xrc[0x1];
-	u8 ud[0x1];
-	u8 uc[0x1];
-	u8 rc[0x1];
-	u8 uar_4k[0x1];
-	u8 reserved_at_241[0x9];
-	u8 uar_sz[0x6];
-	u8 reserved_at_250[0x8];
-	u8 log_pg_sz[0x8];
-	u8 bf[0x1];
-	u8 driver_version[0x1];
-	u8 pad_tx_eth_packet[0x1];
-	u8 reserved_at_263[0x8];
-	u8 log_bf_reg_size[0x5];
-	u8 reserved_at_270[0xb];
-	u8 lag_master[0x1];
-	u8 num_lag_ports[0x4];
-	u8 reserved_at_280[0x10];
-	u8 max_wqe_sz_sq[0x10];
-	u8 reserved_at_2a0[0x10];
-	u8 max_wqe_sz_rq[0x10];
-	u8 max_flow_counter_31_16[0x10];
-	u8 max_wqe_sz_sq_dc[0x10];
-	u8 reserved_at_2e0[0x7];
-	u8 max_qp_mcg[0x19];
-	u8 reserved_at_300[0x10];
-	u8 flow_counter_bulk_alloc[0x08];
-	u8 log_max_mcg[0x8];
-	u8 reserved_at_320[0x3];
-	u8 log_max_transport_domain[0x5];
-	u8 reserved_at_328[0x3];
-	u8 log_max_pd[0x5];
-	u8 reserved_at_330[0xb];
-	u8 log_max_xrcd[0x5];
-	u8 nic_receive_steering_discard[0x1];
-	u8 receive_discard_vport_down[0x1];
-	u8 transmit_discard_vport_down[0x1];
-	u8 reserved_at_343[0x5];
-	u8 log_max_flow_counter_bulk[0x8];
-	u8 max_flow_counter_15_0[0x10];
-	u8 modify_tis[0x1];
-	u8 flow_counters_dump[0x1];
-	u8 reserved_at_360[0x1];
-	u8 log_max_rq[0x5];
-	u8 reserved_at_368[0x3];
-	u8 log_max_sq[0x5];
-	u8 reserved_at_370[0x3];
-	u8 log_max_tir[0x5];
-	u8 reserved_at_378[0x3];
-	u8 log_max_tis[0x5];
-	u8 basic_cyclic_rcv_wqe[0x1];
-	u8 reserved_at_381[0x2];
-	u8 log_max_rmp[0x5];
-	u8 reserved_at_388[0x3];
-	u8 log_max_rqt[0x5];
-	u8 reserved_at_390[0x3];
-	u8 log_max_rqt_size[0x5];
-	u8 reserved_at_398[0x3];
-	u8 log_max_tis_per_sq[0x5];
-	u8 ext_stride_num_range[0x1];
-	u8 reserved_at_3a1[0x2];
-	u8 log_max_stride_sz_rq[0x5];
-	u8 reserved_at_3a8[0x3];
-	u8 log_min_stride_sz_rq[0x5];
-	u8 reserved_at_3b0[0x3];
-	u8 log_max_stride_sz_sq[0x5];
-	u8 reserved_at_3b8[0x3];
-	u8 log_min_stride_sz_sq[0x5];
-	u8 hairpin[0x1];
-	u8 reserved_at_3c1[0x2];
-	u8 log_max_hairpin_queues[0x5];
-	u8 reserved_at_3c8[0x3];
-	u8 log_max_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3d0[0x3];
-	u8 log_max_hairpin_num_packets[0x5];
-	u8 reserved_at_3d8[0x3];
-	u8 log_max_wq_sz[0x5];
-	u8 nic_vport_change_event[0x1];
-	u8 disable_local_lb_uc[0x1];
-	u8 disable_local_lb_mc[0x1];
-	u8 log_min_hairpin_wq_data_sz[0x5];
-	u8 reserved_at_3e8[0x3];
-	u8 log_max_vlan_list[0x5];
-	u8 reserved_at_3f0[0x3];
-	u8 log_max_current_mc_list[0x5];
-	u8 reserved_at_3f8[0x3];
-	u8 log_max_current_uc_list[0x5];
-	u8 general_obj_types[0x40];
-	u8 reserved_at_440[0x20];
-	u8 reserved_at_460[0x10];
-	u8 max_num_eqs[0x10];
-	u8 reserved_at_480[0x3];
-	u8 log_max_l2_table[0x5];
-	u8 reserved_at_488[0x8];
-	u8 log_uar_page_sz[0x10];
-	u8 reserved_at_4a0[0x20];
-	u8 device_frequency_mhz[0x20];
-	u8 device_frequency_khz[0x20];
-	u8 reserved_at_500[0x20];
-	u8 num_of_uars_per_page[0x20];
-	u8 flex_parser_protocols[0x20];
-	u8 reserved_at_560[0x20];
-	u8 reserved_at_580[0x3c];
-	u8 mini_cqe_resp_stride_index[0x1];
-	u8 cqe_128_always[0x1];
-	u8 cqe_compression_128[0x1];
-	u8 cqe_compression[0x1];
-	u8 cqe_compression_timeout[0x10];
-	u8 cqe_compression_max_num[0x10];
-	u8 reserved_at_5e0[0x10];
-	u8 tag_matching[0x1];
-	u8 rndv_offload_rc[0x1];
-	u8 rndv_offload_dc[0x1];
-	u8 log_tag_matching_list_sz[0x5];
-	u8 reserved_at_5f8[0x3];
-	u8 log_max_xrq[0x5];
-	u8 affiliate_nic_vport_criteria[0x8];
-	u8 native_port_num[0x8];
-	u8 num_vhca_ports[0x8];
-	u8 reserved_at_618[0x6];
-	u8 sw_owner_id[0x1];
-	u8 reserved_at_61f[0x1e1];
-};
-
-struct mlx5_ifc_qos_cap_bits {
-	u8 packet_pacing[0x1];
-	u8 esw_scheduling[0x1];
-	u8 esw_bw_share[0x1];
-	u8 esw_rate_limit[0x1];
-	u8 reserved_at_4[0x1];
-	u8 packet_pacing_burst_bound[0x1];
-	u8 packet_pacing_typical_size[0x1];
-	u8 flow_meter_srtcm[0x1];
-	u8 reserved_at_8[0x8];
-	u8 log_max_flow_meter[0x8];
-	u8 flow_meter_reg_id[0x8];
-	u8 reserved_at_25[0x8];
-	u8 flow_meter_reg_share[0x1];
-	u8 reserved_at_2e[0x17];
-	u8 packet_pacing_max_rate[0x20];
-	u8 packet_pacing_min_rate[0x20];
-	u8 reserved_at_80[0x10];
-	u8 packet_pacing_rate_table_size[0x10];
-	u8 esw_element_type[0x10];
-	u8 esw_tsar_type[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 max_qos_para_vport[0x10];
-	u8 max_tsar_bw_share[0x20];
-	u8 reserved_at_100[0x6e8];
-};
-
-struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
-	u8 csum_cap[0x1];
-	u8 vlan_cap[0x1];
-	u8 lro_cap[0x1];
-	u8 lro_psh_flag[0x1];
-	u8 lro_time_stamp[0x1];
-	u8 lro_max_msg_sz_mode[0x2];
-	u8 wqe_vlan_insert[0x1];
-	u8 self_lb_en_modifiable[0x1];
-	u8 self_lb_mc[0x1];
-	u8 self_lb_uc[0x1];
-	u8 max_lso_cap[0x5];
-	u8 multi_pkt_send_wqe[0x2];
-	u8 wqe_inline_mode[0x2];
-	u8 rss_ind_tbl_cap[0x4];
-	u8 reg_umr_sq[0x1];
-	u8 scatter_fcs[0x1];
-	u8 enhanced_multi_pkt_send_wqe[0x1];
-	u8 tunnel_lso_const_out_ip_id[0x1];
-	u8 tunnel_lro_gre[0x1];
-	u8 tunnel_lro_vxlan[0x1];
-	u8 tunnel_stateless_gre[0x1];
-	u8 tunnel_stateless_vxlan[0x1];
-	u8 swp[0x1];
-	u8 swp_csum[0x1];
-	u8 swp_lso[0x1];
-	u8 reserved_at_23[0x8];
-	u8 tunnel_stateless_gtp[0x1];
-	u8 reserved_at_25[0x4];
-	u8 max_vxlan_udp_ports[0x8];
-	u8 reserved_at_38[0x6];
-	u8 max_geneve_opt_len[0x1];
-	u8 tunnel_stateless_geneve_rx[0x1];
-	u8 reserved_at_40[0x10];
-	u8 lro_min_mss_size[0x10];
-	u8 reserved_at_60[0x120];
-	u8 lro_timer_supported_periods[4][0x20];
-	u8 reserved_at_200[0x600];
-};
-
-union mlx5_ifc_hca_cap_union_bits {
-	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
-	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
-	       per_protocol_networking_offload_caps;
-	struct mlx5_ifc_qos_cap_bits qos_cap;
-	u8 reserved_at_0[0x8000];
-};
-
-struct mlx5_ifc_query_hca_cap_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	union mlx5_ifc_hca_cap_union_bits capability;
-};
-
-struct mlx5_ifc_query_hca_cap_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_mac_address_layout_bits {
-	u8 reserved_at_0[0x10];
-	u8 mac_addr_47_32[0x10];
-	u8 mac_addr_31_0[0x20];
-};
-
-struct mlx5_ifc_nic_vport_context_bits {
-	u8 reserved_at_0[0x5];
-	u8 min_wqe_inline_mode[0x3];
-	u8 reserved_at_8[0x15];
-	u8 disable_mc_local_lb[0x1];
-	u8 disable_uc_local_lb[0x1];
-	u8 roce_en[0x1];
-	u8 arm_change_event[0x1];
-	u8 reserved_at_21[0x1a];
-	u8 event_on_mtu[0x1];
-	u8 event_on_promisc_change[0x1];
-	u8 event_on_vlan_change[0x1];
-	u8 event_on_mc_address_change[0x1];
-	u8 event_on_uc_address_change[0x1];
-	u8 reserved_at_40[0xc];
-	u8 affiliation_criteria[0x4];
-	u8 affiliated_vhca_id[0x10];
-	u8 reserved_at_60[0xd0];
-	u8 mtu[0x10];
-	u8 system_image_guid[0x40];
-	u8 port_guid[0x40];
-	u8 node_guid[0x40];
-	u8 reserved_at_200[0x140];
-	u8 qkey_violation_counter[0x10];
-	u8 reserved_at_350[0x430];
-	u8 promisc_uc[0x1];
-	u8 promisc_mc[0x1];
-	u8 promisc_all[0x1];
-	u8 reserved_at_783[0x2];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_788[0xc];
-	u8 allowed_list_size[0xc];
-	struct mlx5_ifc_mac_address_layout_bits permanent_address;
-	u8 reserved_at_7e0[0x20];
-};
-
-struct mlx5_ifc_query_nic_vport_context_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_nic_vport_context_bits nic_vport_context;
-};
-
-struct mlx5_ifc_query_nic_vport_context_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 other_vport[0x1];
-	u8 reserved_at_41[0xf];
-	u8 vport_number[0x10];
-	u8 reserved_at_60[0x5];
-	u8 allowed_list_type[0x3];
-	u8 reserved_at_68[0x18];
-};
-
-struct mlx5_ifc_tisc_bits {
-	u8 strict_lag_tx_port_affinity[0x1];
-	u8 reserved_at_1[0x3];
-	u8 lag_tx_port_affinity[0x04];
-	u8 reserved_at_8[0x4];
-	u8 prio[0x4];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x100];
-	u8 reserved_at_120[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_140[0x8];
-	u8 underlay_qpn[0x18];
-	u8 reserved_at_160[0x3a0];
-};
-
-struct mlx5_ifc_query_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-	struct mlx5_ifc_tisc_bits tis_context;
-};
-
-struct mlx5_ifc_query_tis_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 transport_domain[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_alloc_transport_domain_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x40];
-};
-
-enum {
-	MLX5_WQ_TYPE_LINKED_LIST                = 0x0,
-	MLX5_WQ_TYPE_CYCLIC                     = 0x1,
-	MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ    = 0x2,
-	MLX5_WQ_TYPE_CYCLIC_STRIDING_RQ         = 0x3,
-};
-
-enum {
-	MLX5_WQ_END_PAD_MODE_NONE  = 0x0,
-	MLX5_WQ_END_PAD_MODE_ALIGN = 0x1,
-};
-
-struct mlx5_ifc_wq_bits {
-	u8 wq_type[0x4];
-	u8 wq_signature[0x1];
-	u8 end_padding_mode[0x2];
-	u8 cd_slave[0x1];
-	u8 reserved_at_8[0x18];
-	u8 hds_skip_first_sge[0x1];
-	u8 log2_hds_buf_size[0x3];
-	u8 reserved_at_24[0x7];
-	u8 page_offset[0x5];
-	u8 lwm[0x10];
-	u8 reserved_at_40[0x8];
-	u8 pd[0x18];
-	u8 reserved_at_60[0x8];
-	u8 uar_page[0x18];
-	u8 dbr_addr[0x40];
-	u8 hw_counter[0x20];
-	u8 sw_counter[0x20];
-	u8 reserved_at_100[0xc];
-	u8 log_wq_stride[0x4];
-	u8 reserved_at_110[0x3];
-	u8 log_wq_pg_sz[0x5];
-	u8 reserved_at_118[0x3];
-	u8 log_wq_sz[0x5];
-	u8 dbr_umem_valid[0x1];
-	u8 wq_umem_valid[0x1];
-	u8 reserved_at_122[0x1];
-	u8 log_hairpin_num_packets[0x5];
-	u8 reserved_at_128[0x3];
-	u8 log_hairpin_data_sz[0x5];
-	u8 reserved_at_130[0x4];
-	u8 single_wqe_log_num_of_strides[0x4];
-	u8 two_byte_shift_en[0x1];
-	u8 reserved_at_139[0x4];
-	u8 single_stride_log_num_of_bytes[0x3];
-	u8 dbr_umem_id[0x20];
-	u8 wq_umem_id[0x20];
-	u8 wq_umem_offset[0x40];
-	u8 reserved_at_1c0[0x440];
-};
-
-enum {
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_INLINE  = 0x0,
-	MLX5_RQC_MEM_RQ_TYPE_MEMORY_RQ_RMP     = 0x1,
-};
-
-enum {
-	MLX5_RQC_STATE_RST  = 0x0,
-	MLX5_RQC_STATE_RDY  = 0x1,
-	MLX5_RQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_rqc_bits {
-	u8 rlky[0x1];
-	u8 delay_drop_en[0x1];
-	u8 scatter_fcs[0x1];
-	u8 vsd[0x1];
-	u8 mem_rq_type[0x4];
-	u8 state[0x4];
-	u8 reserved_at_c[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 counter_set_id[0x8];
-	u8 reserved_at_68[0x18];
-	u8 reserved_at_80[0x8];
-	u8 rmpn[0x18];
-	u8 reserved_at_a0[0x8];
-	u8 hairpin_peer_sq[0x18];
-	u8 reserved_at_c0[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_e0[0xa0];
-	struct mlx5_ifc_wq_bits wq; /* Not used in LRO RQ. */
-};
-
-struct mlx5_ifc_create_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-struct mlx5_ifc_modify_rq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_create_tis_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tisn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tis_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tisc_bits ctx;
-};
-
-enum {
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_SCATTER_FCS = 1ULL << 2,
-	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
-};
-
-struct mlx5_ifc_modify_rq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 rq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 rqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_rqc_bits ctx;
-};
-
-enum {
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT   = 0x3,
-	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_IPSEC_SPI  = 0x4,
-};
-
-struct mlx5_ifc_rx_hash_field_select_bits {
-	u8 l3_prot_type[0x1];
-	u8 l4_prot_type[0x1];
-	u8 selected_fields[0x1e];
-};
-
-enum {
-	MLX5_TIRC_DISP_TYPE_DIRECT    = 0x0,
-	MLX5_TIRC_DISP_TYPE_INDIRECT  = 0x1,
-};
-
-enum {
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO  = 0x1,
-	MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO  = 0x2,
-};
-
-enum {
-	MLX5_RX_HASH_FN_NONE           = 0x0,
-	MLX5_RX_HASH_FN_INVERTED_XOR8  = 0x1,
-	MLX5_RX_HASH_FN_TOEPLITZ       = 0x2,
-};
-
-enum {
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST    = 0x1,
-	MLX5_TIRC_SELF_LB_BLOCK_BLOCK_MULTICAST  = 0x2,
-};
-
-enum {
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L4    = 0x0,
-	MLX5_LRO_MAX_MSG_SIZE_START_FROM_L2  = 0x1,
-};
-
-struct mlx5_ifc_tirc_bits {
-	u8 reserved_at_0[0x20];
-	u8 disp_type[0x4];
-	u8 reserved_at_24[0x1c];
-	u8 reserved_at_40[0x40];
-	u8 reserved_at_80[0x4];
-	u8 lro_timeout_period_usecs[0x10];
-	u8 lro_enable_mask[0x4];
-	u8 lro_max_msg_sz[0x8];
-	u8 reserved_at_a0[0x40];
-	u8 reserved_at_e0[0x8];
-	u8 inline_rqn[0x18];
-	u8 rx_hash_symmetric[0x1];
-	u8 reserved_at_101[0x1];
-	u8 tunneled_offload_en[0x1];
-	u8 reserved_at_103[0x5];
-	u8 indirect_table[0x18];
-	u8 rx_hash_fn[0x4];
-	u8 reserved_at_124[0x2];
-	u8 self_lb_block[0x2];
-	u8 transport_domain[0x18];
-	u8 rx_hash_toeplitz_key[10][0x20];
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_outer;
-	struct mlx5_ifc_rx_hash_field_select_bits rx_hash_field_selector_inner;
-	u8 reserved_at_2c0[0x4c0];
-};
-
-struct mlx5_ifc_create_tir_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 tirn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_tir_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_tirc_bits ctx;
-};
-
-struct mlx5_ifc_rq_num_bits {
-	u8 reserved_at_0[0x8];
-	u8 rq_num[0x18];
-};
-
-struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
-	u8 rqt_max_size[0x10];
-	u8 reserved_at_c0[0x10];
-	u8 rqt_actual_size[0x10];
-	u8 reserved_at_e0[0x6a0];
-	struct mlx5_ifc_rq_num_bits rq_num[];
-};
-
-struct mlx5_ifc_create_rqt_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 rqtn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-struct mlx5_ifc_create_rqt_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_rqtc_bits rqt_context;
-};
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
-
-enum {
-	MLX5_SQC_STATE_RST  = 0x0,
-	MLX5_SQC_STATE_RDY  = 0x1,
-	MLX5_SQC_STATE_ERR  = 0x3,
-};
-
-struct mlx5_ifc_sqc_bits {
-	u8 rlky[0x1];
-	u8 cd_master[0x1];
-	u8 fre[0x1];
-	u8 flush_in_error_en[0x1];
-	u8 allow_multi_pkt_send_wqe[0x1];
-	u8 min_wqe_inline_mode[0x3];
-	u8 state[0x4];
-	u8 reg_umr[0x1];
-	u8 allow_swp[0x1];
-	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
-	u8 reserved_at_20[0x8];
-	u8 user_index[0x18];
-	u8 reserved_at_40[0x8];
-	u8 cqn[0x18];
-	u8 reserved_at_60[0x8];
-	u8 hairpin_peer_rq[0x18];
-	u8 reserved_at_80[0x10];
-	u8 hairpin_peer_vhca[0x10];
-	u8 reserved_at_a0[0x50];
-	u8 packet_pacing_rate_limit_index[0x10];
-	u8 tis_lst_sz[0x10];
-	u8 reserved_at_110[0x10];
-	u8 reserved_at_120[0x40];
-	u8 reserved_at_160[0x8];
-	u8 tis_num_0[0x18];
-	struct mlx5_ifc_wq_bits wq;
-};
-
-struct mlx5_ifc_query_sq_in_bits {
-	u8 opcode[0x10];
-	u8 reserved_at_10[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_modify_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x40];
-};
-
-struct mlx5_ifc_modify_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 sq_state[0x4];
-	u8 reserved_at_44[0x4];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-	u8 modify_bitmask[0x40];
-	u8 reserved_at_c0[0x40];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-struct mlx5_ifc_create_sq_out_bits {
-	u8 status[0x8];
-	u8 reserved_at_8[0x18];
-	u8 syndrome[0x20];
-	u8 reserved_at_40[0x8];
-	u8 sqn[0x18];
-	u8 reserved_at_60[0x20];
-};
-
-struct mlx5_ifc_create_sq_in_bits {
-	u8 opcode[0x10];
-	u8 uid[0x10];
-	u8 reserved_at_20[0x10];
-	u8 op_mod[0x10];
-	u8 reserved_at_40[0xc0];
-	struct mlx5_ifc_sqc_bits ctx;
-};
-
-enum {
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_ACTIVE = (1ULL << 0),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CBS = (1ULL << 1),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_CIR = (1ULL << 2),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EBS = (1ULL << 3),
-	MLX5_FLOW_METER_OBJ_MODIFY_FIELD_EIR = (1ULL << 4),
-};
-
-struct mlx5_ifc_flow_meter_parameters_bits {
-	u8         valid[0x1];			// 00h
-	u8         bucket_overflow[0x1];
-	u8         start_color[0x2];
-	u8         both_buckets_on_green[0x1];
-	u8         meter_mode[0x2];
-	u8         reserved_at_1[0x19];
-	u8         reserved_at_2[0x20]; //04h
-	u8         reserved_at_3[0x3];
-	u8         cbs_exponent[0x5];		// 08h
-	u8         cbs_mantissa[0x8];
-	u8         reserved_at_4[0x3];
-	u8         cir_exponent[0x5];
-	u8         cir_mantissa[0x8];
-	u8         reserved_at_5[0x20];		// 0Ch
-	u8         reserved_at_6[0x3];
-	u8         ebs_exponent[0x5];		// 10h
-	u8         ebs_mantissa[0x8];
-	u8         reserved_at_7[0x3];
-	u8         eir_exponent[0x5];
-	u8         eir_mantissa[0x8];
-	u8         reserved_at_8[0x60];		// 14h-1Ch
-};
-
-/* CQE format mask. */
-#define MLX5E_CQE_FORMAT_MASK 0xc
-
-/* MPW opcode. */
-#define MLX5_OPC_MOD_MPW 0x01
-
-/* Compressed Rx CQE structure. */
-struct mlx5_mini_cqe8 {
-	union {
-		uint32_t rx_hash_result;
-		struct {
-			uint16_t checksum;
-			uint16_t stride_idx;
-		};
-		struct {
-			uint16_t wqe_counter;
-			uint8_t  s_wqe_opcode;
-			uint8_t  reserved;
-		} s_wqe_info;
-	};
-	uint32_t byte_cnt;
-};
-
-/* srTCM PRM flow meter parameters. */
-enum {
-	MLX5_FLOW_COLOR_RED = 0,
-	MLX5_FLOW_COLOR_YELLOW,
-	MLX5_FLOW_COLOR_GREEN,
-	MLX5_FLOW_COLOR_UNDEFINED,
-};
-
-/* Maximum value of srTCM metering parameters. */
-#define MLX5_SRTCM_CBS_MAX (0xFF * (1ULL << 0x1F))
-#define MLX5_SRTCM_CIR_MAX (8 * (1ULL << 30) * 0xFF)
-#define MLX5_SRTCM_EBS_MAX 0
-
-/* The bits meter color use. */
-#define MLX5_MTR_COLOR_BITS 8
-
-/**
- * Convert a user mark to flow mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_set(uint32_t val)
-{
-	uint32_t ret;
-
-	/*
-	 * Add one to the user value to differentiate un-marked flows from
-	 * marked flows, if the ID is equal to MLX5_FLOW_MARK_DEFAULT it
-	 * remains untouched.
-	 */
-	if (val != MLX5_FLOW_MARK_DEFAULT)
-		++val;
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	/*
-	 * Mark is 24 bits (minus reserved values) but is stored on a 32 bit
-	 * word, byte-swapped by the kernel on little-endian systems. In this
-	 * case, left-shifting the resulting big-endian value ensures the
-	 * least significant 24 bits are retained when converting it back.
-	 */
-	ret = rte_cpu_to_be_32(val) >> 8;
-#else
-	ret = val;
-#endif
-	return ret;
-}
-
-/**
- * Convert a mark to user mark.
- *
- * @param val
- *   Mark value to convert.
- *
- * @return
- *   Converted mark value.
- */
-static inline uint32_t
-mlx5_flow_mark_get(uint32_t val)
-{
-	/*
-	 * Subtract one from the retrieved value. It was added by
-	 * mlx5_flow_mark_set() to distinguish unmarked flows.
-	 */
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	return (val >> 8) - 1;
-#else
-	return val - 1;
-#endif
-}
-
-#endif /* RTE_PMD_MLX5_PRM_H_ */
 --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 1028264..345ce3a 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -22,8 +22,8 @@
 #include <rte_malloc.h>
 #include <rte_ethdev_driver.h>
 
-#include "mlx5.h"
 #include "mlx5_defs.h"
+#include "mlx5.h"
 #include "mlx5_rxtx.h"
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 371b996..e01cbfd 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -30,14 +30,16 @@
 #include <rte_debug.h>
 #include <rte_io.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_glue.h"
 #include "mlx5_flow.h"
-#include "mlx5_devx_cmds.h"
+
 
 /* Default RSS hash key also used for ConnectX-3. */
 uint8_t rss_hash_default_key[] = {
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5a03556..d8f6671 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -28,13 +28,14 @@
 #include <rte_cycles.h>
 #include <rte_flow.h>
 
+#include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 /* TX burst subroutines return codes. */
 enum mlx5_txcmp_code {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 3f659d2..fb13919 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -31,13 +31,14 @@
 #include <rte_bus_pci.h>
 #include <rte_malloc.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
-#include "mlx5_glue.h"
 
 /* Support tunnel matching. */
 #define MLX5_FLOW_TUNNEL 10
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d85f908..5505762 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -23,13 +23,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #if defined RTE_ARCH_X86_64
 #include "mlx5_rxtx_vec_sse.h"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h
index d8c07f2..82f77e5 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
@@ -9,8 +9,9 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 
+#include <mlx5_prm.h>
+
 #include "mlx5_autoconf.h"
-#include "mlx5_prm.h"
 
 /* HW checksum offload capabilities of vectorized Tx. */
 #define MLX5_VEC_TX_CKSUM_OFFLOAD_CAP \
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index 9e5c6ee..1467a42 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -17,13 +17,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 332e9ac..5b846c1 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #pragma GCC diagnostic ignored "-Wcast-qual"
 
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 07d40d5..6e1b967 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -16,13 +16,14 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 
+#include <mlx5_prm.h>
+
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_rxtx_vec.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_defs.h"
-#include "mlx5_prm.h"
 
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 205e4fe..0ed7170 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,9 +13,9 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_defs.h"
 
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5adb4dc..1d2ba8a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -28,13 +28,14 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
-#include "mlx5_utils.h"
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5_defs.h"
+#include "mlx5_utils.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 
 /**
  * Allocate TX queue elements.
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index ebf79b8..c868aee 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -13,8 +13,11 @@
 #include <assert.h>
 #include <errno.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 
+
 /*
  * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
  * Otherwise there would be a type conflict between stdbool and altivec.
@@ -50,81 +53,14 @@
 /* Save and restore errno around argument evaluation. */
 #define ERRNO_SAFE(x) ((errno = (int []){ errno, ((x), 0) }[0]))
 
-/*
- * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
- * manner.
- */
-#define PMD_DRV_LOG_STRIP(a, b) a
-#define PMD_DRV_LOG_OPAREN (
-#define PMD_DRV_LOG_CPAREN )
-#define PMD_DRV_LOG_COMMA ,
-
-/* Return the file name part of a path. */
-static inline const char *
-pmd_drv_log_basename(const char *s)
-{
-	const char *n = s;
-
-	while (*n)
-		if (*(n++) == '/')
-			s = n;
-	return s;
-}
-
 extern int mlx5_logtype;
 
-#define PMD_DRV_LOG___(level, ...) \
-	rte_log(RTE_LOG_ ## level, \
-		mlx5_logtype, \
-		RTE_FMT(MLX5_DRIVER_NAME ": " \
-			RTE_FMT_HEAD(__VA_ARGS__,), \
-		RTE_FMT_TAIL(__VA_ARGS__,)))
-
-/*
- * When debugging is enabled (NDEBUG not defined), file, line and function
- * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
- */
-#ifndef NDEBUG
-
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, "%s:%u: %s(): " __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, \
-		s "\n" PMD_DRV_LOG_COMMA \
-		pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
-		__LINE__ PMD_DRV_LOG_COMMA \
-		__func__, \
-		__VA_ARGS__)
-
-#else /* NDEBUG */
-#define PMD_DRV_LOG__(level, ...) \
-	PMD_DRV_LOG___(level, __VA_ARGS__)
-#define PMD_DRV_LOG_(level, s, ...) \
-	PMD_DRV_LOG__(level, s "\n", __VA_ARGS__)
-
-#endif /* NDEBUG */
-
 /* Generic printf()-like logging macro with automatic line feed. */
 #define DRV_LOG(level, ...) \
-	PMD_DRV_LOG_(level, \
+	PMD_DRV_LOG_(level, mlx5_logtype, MLX5_DRIVER_NAME, \
 		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
 		PMD_DRV_LOG_CPAREN)
 
-/* claim_zero() does not perform any check when debugging is disabled. */
-#ifndef NDEBUG
-
-#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
-#define claim_zero(...) assert((__VA_ARGS__) == 0)
-#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
-
-#else /* NDEBUG */
-
-#define DEBUG(...) (void)0
-#define claim_zero(...) (__VA_ARGS__)
-#define claim_nonzero(...) (__VA_ARGS__)
-
-#endif /* NDEBUG */
-
 #define INFO(...) DRV_LOG(INFO, __VA_ARGS__)
 #define WARN(...) DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) DRV_LOG(ERR, __VA_ARGS__)
@@ -144,13 +80,6 @@
 	 (((val) & (from)) / ((from) / (to))) : \
 	 (((val) & (from)) * ((to) / (from))))
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
-#define MKSTR(name, ...) \
-	int mkstr_size_##name = snprintf(NULL, 0, "" __VA_ARGS__); \
-	char name[mkstr_size_##name + 1]; \
-	\
-	snprintf(name, sizeof(name), "" __VA_ARGS__)
-
 /**
  * Return logarithm of the nearest power of two above input value.
  *
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index feac0f1..b0fa31a 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -27,10 +27,11 @@
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
 
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
-#include "mlx5_glue.h"
-#include "mlx5_devx_cmds.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 15acf95..45f4cad 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,6 +196,7 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 01/25] net/mlx5: separate DevX commands interface Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 02/25] drivers: introduce mlx5 common library Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-30  8:10         ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
                         ` (22 subsequent siblings)
  25 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.
Both, the mlx5 PMD and the vdpa mlx5 PMD should initialize the glue.
The glue initialization should be only one per process, so all the mlx5
PMDs using the glue should share the same glue object.
Move the glue initialization to be in common/mlx5 library to be
initialized by its constructor only once.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.c | 173 +++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/Makefile         |   9 --
 drivers/net/mlx5/meson.build      |   4 -
 drivers/net/mlx5/mlx5.c           | 172 +------------------------------------
 4 files changed, 173 insertions(+), 185 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 14ebd30..9c88a63 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -2,16 +2,185 @@
  * Copyright 2019 Mellanox Technologies, Ltd
  */
 
+#include <dlfcn.h>
+#include <unistd.h>
+#include <string.h>
+
+#include <rte_errno.h>
+
 #include "mlx5_common.h"
+#include "mlx5_common_utils.h"
+#include "mlx5_glue.h"
 
 
 int mlx5_common_logtype;
 
 
-RTE_INIT(rte_mlx5_common_pmd_init)
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+
+/**
+ * Suffix RTE_EAL_PMD_PATH with "-glue".
+ *
+ * This function performs a sanity check on RTE_EAL_PMD_PATH before
+ * suffixing its last component.
+ *
+ * @param buf[out]
+ *   Output buffer, should be large enough otherwise NULL is returned.
+ * @param size
+ *   Size of @p out.
+ *
+ * @return
+ *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
+ */
+static char *
+mlx5_glue_path(char *buf, size_t size)
+{
+	static const char *const bad[] = { "/", ".", "..", NULL };
+	const char *path = RTE_EAL_PMD_PATH;
+	size_t len = strlen(path);
+	size_t off;
+	int i;
+
+	while (len && path[len - 1] == '/')
+		--len;
+	for (off = len; off && path[off - 1] != '/'; --off)
+		;
+	for (i = 0; bad[i]; ++i)
+		if (!strncmp(path + off, bad[i], (int)(len - off)))
+			goto error;
+	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
+	if (i == -1 || (size_t)i >= size)
+		goto error;
+	return buf;
+error:
+	RTE_LOG(ERR, PMD, "unable to append \"-glue\" to last component of"
+		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"), please"
+		" re-configure DPDK");
+	return NULL;
+}
+#endif
+
+/**
+ * Initialization routine for run-time dependency on rdma-core.
+ */
+RTE_INIT_PRIO(mlx5_glue_init, CLASS)
 {
-	/* Initialize driver log type. */
+	void *handle = NULL;
+
+	/* Initialize common log type. */
 	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
 	if (mlx5_common_logtype >= 0)
 		rte_log_set_level(mlx5_common_logtype, RTE_LOG_NOTICE);
+	/*
+	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+	 * huge pages. Calling ibv_fork_init() during init allows
+	 * applications to use fork() safely for purposes other than
+	 * using this PMD, which is not supported in forked processes.
+	 */
+	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+	/* Match the size of Rx completion entry to the size of a cacheline. */
+	if (RTE_CACHE_LINE_SIZE == 128)
+		setenv("MLX5_CQE_SIZE", "128", 0);
+	/*
+	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
+	 * cleanup all the Verbs resources even when the device was removed.
+	 */
+	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
+	/* The glue initialization was done earlier by mlx5 common library. */
+#ifdef RTE_IBVERBS_LINK_DLOPEN
+	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
+	const char *path[] = {
+		/*
+		 * A basic security check is necessary before trusting
+		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
+		 */
+		(geteuid() == getuid() && getegid() == getgid() ?
+		 getenv("MLX5_GLUE_PATH") : NULL),
+		/*
+		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
+		 * variant, otherwise let dlopen() look up libraries on its
+		 * own.
+		 */
+		(*RTE_EAL_PMD_PATH ?
+		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
+	};
+	unsigned int i = 0;
+	void **sym;
+	const char *dlmsg;
+
+	while (!handle && i != RTE_DIM(path)) {
+		const char *end;
+		size_t len;
+		int ret;
+
+		if (!path[i]) {
+			++i;
+			continue;
+		}
+		end = strpbrk(path[i], ":;");
+		if (!end)
+			end = path[i] + strlen(path[i]);
+		len = end - path[i];
+		ret = 0;
+		do {
+			char name[ret + 1];
+
+			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
+				       (int)len, path[i],
+				       (!len || *(end - 1) == '/') ? "" : "/");
+			if (ret == -1)
+				break;
+			if (sizeof(name) != (size_t)ret + 1)
+				continue;
+			DRV_LOG(DEBUG, "Looking for rdma-core glue as "
+				"\"%s\"", name);
+			handle = dlopen(name, RTLD_LAZY);
+			break;
+		} while (1);
+		path[i] = end + 1;
+		if (!*end)
+			++i;
+	}
+	if (!handle) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(WARNING, "Cannot load glue library: %s", dlmsg);
+		goto glue_error;
+	}
+	sym = dlsym(handle, "mlx5_glue");
+	if (!sym || !*sym) {
+		rte_errno = EINVAL;
+		dlmsg = dlerror();
+		if (dlmsg)
+			DRV_LOG(ERR, "Cannot resolve glue symbol: %s", dlmsg);
+		goto glue_error;
+	}
+	mlx5_glue = *sym;
+#endif /* RTE_IBVERBS_LINK_DLOPEN */
+#ifndef NDEBUG
+	/* Glue structure must not contain any NULL pointers. */
+	{
+		unsigned int i;
+
+		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
+			assert(((const void *const *)mlx5_glue)[i]);
+	}
+#endif
+	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
+		rte_errno = EINVAL;
+		DRV_LOG(ERR, "rdma-core glue \"%s\" mismatch: \"%s\" is "
+			"required", mlx5_glue->version, MLX5_GLUE_VERSION);
+		goto glue_error;
+	}
+	mlx5_glue->fork_init();
+	return;
+glue_error:
+	if (handle)
+		dlclose(handle);
+	DRV_LOG(WARNING, "Cannot initialize MLX5 common due to missing"
+		" run-time dependency on rdma-core libraries (libibverbs,"
+		" libmlx5)");
+	mlx5_glue = NULL;
+	return;
 }
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a9558ca..dc6b3c8 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -6,15 +6,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx5.a
-LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
-LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 20.02.0
-
-ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
-CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
-LDLIBS += -ldl
-endif
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index f6d0db9..e10ef3a 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -8,10 +8,6 @@ if not is_linux
 	subdir_done()
 endif
 
-LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
-LIB_GLUE_VERSION = '20.02.0'
-LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
-
 allow_experimental_apis = true
 deps += ['hash', 'common_mlx5']
 sources = files(
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7cf357d..8fbe826 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -7,7 +7,6 @@
 #include <unistd.h>
 #include <string.h>
 #include <assert.h>
-#include <dlfcn.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <errno.h>
@@ -3505,138 +3504,6 @@ struct mlx5_flow_id_pool *
 		     RTE_PCI_DRV_PROBE_AGAIN,
 };
 
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-
-/**
- * Suffix RTE_EAL_PMD_PATH with "-glue".
- *
- * This function performs a sanity check on RTE_EAL_PMD_PATH before
- * suffixing its last component.
- *
- * @param buf[out]
- *   Output buffer, should be large enough otherwise NULL is returned.
- * @param size
- *   Size of @p out.
- *
- * @return
- *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
- */
-static char *
-mlx5_glue_path(char *buf, size_t size)
-{
-	static const char *const bad[] = { "/", ".", "..", NULL };
-	const char *path = RTE_EAL_PMD_PATH;
-	size_t len = strlen(path);
-	size_t off;
-	int i;
-
-	while (len && path[len - 1] == '/')
-		--len;
-	for (off = len; off && path[off - 1] != '/'; --off)
-		;
-	for (i = 0; bad[i]; ++i)
-		if (!strncmp(path + off, bad[i], (int)(len - off)))
-			goto error;
-	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
-	if (i == -1 || (size_t)i >= size)
-		goto error;
-	return buf;
-error:
-	DRV_LOG(ERR,
-		"unable to append \"-glue\" to last component of"
-		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),"
-		" please re-configure DPDK");
-	return NULL;
-}
-
-/**
- * Initialization routine for run-time dependency on rdma-core.
- */
-static int
-mlx5_glue_init(void)
-{
-	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
-	const char *path[] = {
-		/*
-		 * A basic security check is necessary before trusting
-		 * MLX5_GLUE_PATH, which may override RTE_EAL_PMD_PATH.
-		 */
-		(geteuid() == getuid() && getegid() == getgid() ?
-		 getenv("MLX5_GLUE_PATH") : NULL),
-		/*
-		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
-		 * variant, otherwise let dlopen() look up libraries on its
-		 * own.
-		 */
-		(*RTE_EAL_PMD_PATH ?
-		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
-	};
-	unsigned int i = 0;
-	void *handle = NULL;
-	void **sym;
-	const char *dlmsg;
-
-	while (!handle && i != RTE_DIM(path)) {
-		const char *end;
-		size_t len;
-		int ret;
-
-		if (!path[i]) {
-			++i;
-			continue;
-		}
-		end = strpbrk(path[i], ":;");
-		if (!end)
-			end = path[i] + strlen(path[i]);
-		len = end - path[i];
-		ret = 0;
-		do {
-			char name[ret + 1];
-
-			ret = snprintf(name, sizeof(name), "%.*s%s" MLX5_GLUE,
-				       (int)len, path[i],
-				       (!len || *(end - 1) == '/') ? "" : "/");
-			if (ret == -1)
-				break;
-			if (sizeof(name) != (size_t)ret + 1)
-				continue;
-			DRV_LOG(DEBUG, "looking for rdma-core glue as \"%s\"",
-				name);
-			handle = dlopen(name, RTLD_LAZY);
-			break;
-		} while (1);
-		path[i] = end + 1;
-		if (!*end)
-			++i;
-	}
-	if (!handle) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(WARNING, "cannot load glue library: %s", dlmsg);
-		goto glue_error;
-	}
-	sym = dlsym(handle, "mlx5_glue");
-	if (!sym || !*sym) {
-		rte_errno = EINVAL;
-		dlmsg = dlerror();
-		if (dlmsg)
-			DRV_LOG(ERR, "cannot resolve glue symbol: %s", dlmsg);
-		goto glue_error;
-	}
-	mlx5_glue = *sym;
-	return 0;
-glue_error:
-	if (handle)
-		dlclose(handle);
-	DRV_LOG(WARNING,
-		"cannot initialize PMD due to missing run-time dependency on"
-		" rdma-core libraries (libibverbs, libmlx5)");
-	return -rte_errno;
-}
-
-#endif
-
 /**
  * Driver initialization routine.
  */
@@ -3651,43 +3518,8 @@ struct mlx5_flow_id_pool *
 	mlx5_set_ptype_table();
 	mlx5_set_cksum_table();
 	mlx5_set_swp_types_table();
-	/*
-	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
-	 * huge pages. Calling ibv_fork_init() during init allows
-	 * applications to use fork() safely for purposes other than
-	 * using this PMD, which is not supported in forked processes.
-	 */
-	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
-	/* Match the size of Rx completion entry to the size of a cacheline. */
-	if (RTE_CACHE_LINE_SIZE == 128)
-		setenv("MLX5_CQE_SIZE", "128", 0);
-	/*
-	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
-	 * cleanup all the Verbs resources even when the device was removed.
-	 */
-	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
-#ifdef RTE_IBVERBS_LINK_DLOPEN
-	if (mlx5_glue_init())
-		return;
-	assert(mlx5_glue);
-#endif
-#ifndef NDEBUG
-	/* Glue structure must not contain any NULL pointers. */
-	{
-		unsigned int i;
-
-		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
-			assert(((const void *const *)mlx5_glue)[i]);
-	}
-#endif
-	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
-		DRV_LOG(ERR,
-			"rdma-core glue \"%s\" mismatch: \"%s\" is required",
-			mlx5_glue->version, MLX5_GLUE_VERSION);
-		return;
-	}
-	mlx5_glue->fork_init();
-	rte_pci_register(&mlx5_driver);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_driver);
 }
 
 RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 04/25] common/mlx5: share mlx5 PCI device detection
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (2 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 05/25] common/mlx5: share mlx5 devices information Matan Azrad
                         ` (21 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move PCI detection by IB device from mlx5 PMD to the common code.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 55 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               |  4 ++
 drivers/common/mlx5/rte_common_mlx5_version.map |  2 +
 drivers/net/mlx5/mlx5.c                         |  1 +
 drivers/net/mlx5/mlx5.h                         |  2 -
 drivers/net/mlx5/mlx5_ethdev.c                  | 53 +-----------------------
 drivers/net/mlx5/mlx5_rxtx.c                    |  1 +
 drivers/net/mlx5/mlx5_stats.c                   |  3 ++
 9 files changed, 68 insertions(+), 55 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b94d3c0..66585b2 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_pci
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 9c88a63..2381208 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -5,6 +5,9 @@
 #include <dlfcn.h>
 #include <unistd.h>
 #include <string.h>
+#include <stdio.h>
+
+#include <rte_errno.h>
 
 #include <rte_errno.h>
 
@@ -16,6 +19,58 @@
 int mlx5_common_logtype;
 
 
+/**
+ * Get PCI information by sysfs device path.
+ *
+ * @param dev_path
+ *   Pointer to device sysfs folder name.
+ * @param[out] pci_addr
+ *   PCI bus address output buffer.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_dev_to_pci_addr(const char *dev_path,
+		     struct rte_pci_addr *pci_addr)
+{
+	FILE *file;
+	char line[32];
+	MKSTR(path, "%s/device/uevent", dev_path);
+
+	file = fopen(path, "rb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	while (fgets(line, sizeof(line), file) == line) {
+		size_t len = strlen(line);
+		int ret;
+
+		/* Truncate long lines. */
+		if (len == (sizeof(line) - 1))
+			while (line[(len - 1)] != '\n') {
+				ret = fgetc(file);
+				if (ret == EOF)
+					break;
+				line[(len - 1)] = ret;
+			}
+		/* Extract information. */
+		if (sscanf(line,
+			   "PCI_SLOT_NAME="
+			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
+			   &pci_addr->domain,
+			   &pci_addr->bus,
+			   &pci_addr->devid,
+			   &pci_addr->function) == 4) {
+			ret = 0;
+			break;
+		}
+	}
+	fclose(file);
+	return 0;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9f10def..107ab8d 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -6,7 +6,9 @@
 #define RTE_PMD_MLX5_COMMON_H_
 
 #include <assert.h>
+#include <stdio.h>
 
+#include <rte_pci.h>
 #include <rte_log.h>
 
 
@@ -84,4 +86,6 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index e4f85e2..0c01172 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,4 +17,6 @@ DPDK_20.02 {
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
 	mlx5_devx_get_out_command_status;
+
+	mlx5_dev_to_pci_addr;
 };
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8fbe826..d0fa2da 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -39,6 +39,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 872fccb..261a8fc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -655,8 +655,6 @@ int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
 int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev,
 			   struct rte_eth_fc_conf *fc_conf);
-int mlx5_dev_to_pci_addr(const char *dev_path,
-			 struct rte_pci_addr *pci_addr);
 void mlx5_dev_link_status_handler(void *arg);
 void mlx5_dev_interrupt_handler(void *arg);
 void mlx5_dev_interrupt_handler_devx(void *arg);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index eddf888..2628e64 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -38,6 +38,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
@@ -1212,58 +1213,6 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Get PCI information by sysfs device path.
- *
- * @param dev_path
- *   Pointer to device sysfs folder name.
- * @param[out] pci_addr
- *   PCI bus address output buffer.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_dev_to_pci_addr(const char *dev_path,
-		     struct rte_pci_addr *pci_addr)
-{
-	FILE *file;
-	char line[32];
-	MKSTR(path, "%s/device/uevent", dev_path);
-
-	file = fopen(path, "rb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	while (fgets(line, sizeof(line), file) == line) {
-		size_t len = strlen(line);
-		int ret;
-
-		/* Truncate long lines. */
-		if (len == (sizeof(line) - 1))
-			while (line[(len - 1)] != '\n') {
-				ret = fgetc(file);
-				if (ret == EOF)
-					break;
-				line[(len - 1)] = ret;
-			}
-		/* Extract information. */
-		if (sscanf(line,
-			   "PCI_SLOT_NAME="
-			   "%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
-			   &pci_addr->domain,
-			   &pci_addr->bus,
-			   &pci_addr->devid,
-			   &pci_addr->function) == 4) {
-			ret = 0;
-			break;
-		}
-	}
-	fclose(file);
-	return 0;
-}
-
-/**
  * Handle asynchronous removal event for entire multiport device.
  *
  * @param sh
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index d8f6671..b14ae31 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5.h"
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 0ed7170..4c69e77 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -13,10 +13,13 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 
+#include <mlx5_common.h>
+
 #include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
 
+
 static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
 	{
 		.dpdk_name = "rx_port_unicast_bytes",
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 05/25] common/mlx5: share mlx5 devices information
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (3 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 06/25] common/mlx5: share CQ entry check Matan Azrad
                         ` (20 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move the vendor information, vendor ID and device IDs from net/mlx5 PMD
to the common mlx5 file.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 21 +++++++++++++++++++++
 drivers/net/mlx5/mlx5.h           | 21 ---------------------
 drivers/net/mlx5/mlx5_txq.c       |  1 +
 3 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 107ab8d..0f57a27 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -86,6 +86,27 @@
 	\
 	snprintf(name, sizeof(name), "" __VA_ARGS__)
 
+enum {
+	PCI_VENDOR_ID_MELLANOX = 0x15b3,
+};
+
+enum {
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
+	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
+};
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 261a8fc..3daf0db 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -41,27 +41,6 @@
 #include "mlx5_mr.h"
 #include "mlx5_autoconf.h"
 
-enum {
-	PCI_VENDOR_ID_MELLANOX = 0x15b3,
-};
-
-enum {
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5 = 0x1017,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5VF = 0x1018,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EX = 0x1019,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF = 0x101a,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BF = 0xa2d2,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF = 0xa2d3,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6 = 0x101b,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6VF = 0x101c,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DX = 0x101d,
-	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
-};
-
 /* Request types for IPC. */
 enum mlx5_mp_req_type {
 	MLX5_MP_REQ_VERBS_CMD_FD = 1,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1d2ba8a..7bff769 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -30,6 +30,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 06/25] common/mlx5: share CQ entry check
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (4 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 05/25] common/mlx5: share mlx5 devices information Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
                         ` (19 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The CQE has owner bit to indicate if it is in SW control or HW.
Share a CQE check for all the mlx5 drivers.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_common.h | 41 +++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h      | 39 +------------------------------------
 2 files changed, 42 insertions(+), 38 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 0f57a27..9d464d4 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -9,8 +9,11 @@
 #include <stdio.h>
 
 #include <rte_pci.h>
+#include <rte_atomic.h>
 #include <rte_log.h>
 
+#include "mlx5_prm.h"
+
 
 /*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
@@ -107,6 +110,44 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* CQE status. */
+enum mlx5_cqe_status {
+	MLX5_CQE_STATUS_SW_OWN = -1,
+	MLX5_CQE_STATUS_HW_OWN = -2,
+	MLX5_CQE_STATUS_ERR = -3,
+};
+
+/**
+ * Check whether CQE is valid.
+ *
+ * @param cqe
+ *   Pointer to CQE.
+ * @param cqes_n
+ *   Size of completion queue.
+ * @param ci
+ *   Consumer index.
+ *
+ * @return
+ *   The CQE status.
+ */
+static __rte_always_inline enum mlx5_cqe_status
+check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
+	  const uint16_t ci)
+{
+	const uint16_t idx = ci & cqes_n;
+	const uint8_t op_own = cqe->op_own;
+	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
+	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
+
+	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
+		return MLX5_CQE_STATUS_HW_OWN;
+	rte_cio_rmb();
+	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
+		     op_code == MLX5_CQE_REQ_ERR))
+		return MLX5_CQE_STATUS_ERR;
+	return MLX5_CQE_STATUS_SW_OWN;
+}
+
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index fb13919..c2cd23b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -33,6 +33,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_prm.h>
+#include <mlx5_common.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
@@ -549,44 +550,6 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 #define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
 #endif
 
-/* CQE status. */
-enum mlx5_cqe_status {
-	MLX5_CQE_STATUS_SW_OWN = -1,
-	MLX5_CQE_STATUS_HW_OWN = -2,
-	MLX5_CQE_STATUS_ERR = -3,
-};
-
-/**
- * Check whether CQE is valid.
- *
- * @param cqe
- *   Pointer to CQE.
- * @param cqes_n
- *   Size of completion queue.
- * @param ci
- *   Consumer index.
- *
- * @return
- *   The CQE status.
- */
-static __rte_always_inline enum mlx5_cqe_status
-check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
-	  const uint16_t ci)
-{
-	const uint16_t idx = ci & cqes_n;
-	const uint8_t op_own = cqe->op_own;
-	const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
-	const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
-
-	if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
-		return MLX5_CQE_STATUS_HW_OWN;
-	rte_cio_rmb();
-	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
-		     op_code == MLX5_CQE_REQ_ERR))
-		return MLX5_CQE_STATUS_ERR;
-	return MLX5_CQE_STATUS_SW_OWN;
-}
-
 /**
  * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
  * cloned mbuf is allocated is returned instead.
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 07/25] common/mlx5: add query vDPA DevX capabilities
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (5 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 06/25] common/mlx5: share CQ entry check Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 08/25] common/mlx5: glue null memory region allocation Matan Azrad
                         ` (18 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add the DevX capabilities for vDPA configuration and information of
Mellanox devices.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 90 ++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 24 ++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 45 ++++++++++++++++++
 3 files changed, 159 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 4d94f92..3a10ff0 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -285,6 +285,91 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Query NIC vDPA attributes.
+ *
+ * @param[in] ctx
+ *   ibv contexts returned from mlx5dv_open_device.
+ * @param[out] vdpa_attr
+ *   vDPA Attributes structure to fill.
+ */
+static void
+mlx5_devx_cmd_query_hca_vdpa_attr(struct ibv_context *ctx,
+				  struct mlx5_hca_vdpa_attr *vdpa_attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_hca_cap_out)] = {0};
+	void *hcattr = MLX5_ADDR_OF(query_hca_cap_out, out, capability);
+	int status, syndrome, rc;
+
+	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+	MLX5_SET(query_hca_cap_in, in, op_mod,
+		 MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION |
+		 MLX5_HCA_CAP_OPMOD_GET_CUR);
+	rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out, sizeof(out));
+	status = MLX5_GET(query_hca_cap_out, out, status);
+	syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
+	if (rc || status) {
+		RTE_LOG(DEBUG, PMD, "Failed to query devx VDPA capabilities,"
+			" status %x, syndrome = %x", status, syndrome);
+		vdpa_attr->valid = 0;
+	} else {
+		vdpa_attr->valid = 1;
+		vdpa_attr->desc_tunnel_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 desc_tunnel_offload_type);
+		vdpa_attr->eth_frame_offload_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 eth_frame_offload_type);
+		vdpa_attr->virtio_version_1_0 =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_version_1_0);
+		vdpa_attr->tso_ipv4 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv4);
+		vdpa_attr->tso_ipv6 = MLX5_GET(virtio_emulation_cap, hcattr,
+					       tso_ipv6);
+		vdpa_attr->tx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      tx_csum);
+		vdpa_attr->rx_csum = MLX5_GET(virtio_emulation_cap, hcattr,
+					      rx_csum);
+		vdpa_attr->event_mode = MLX5_GET(virtio_emulation_cap, hcattr,
+						 event_mode);
+		vdpa_attr->virtio_queue_type =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_queue_type);
+		vdpa_attr->log_doorbell_stride =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_stride);
+		vdpa_attr->log_doorbell_bar_size =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 log_doorbell_bar_size);
+		vdpa_attr->doorbell_bar_offset =
+			MLX5_GET64(virtio_emulation_cap, hcattr,
+				   doorbell_bar_offset);
+		vdpa_attr->max_num_virtio_queues =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 max_num_virtio_queues);
+		vdpa_attr->umem_1_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_a);
+		vdpa_attr->umem_1_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_1_buffer_param_b);
+		vdpa_attr->umem_2_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_2_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_2_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_a =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_a);
+		vdpa_attr->umem_3_buffer_param_b =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 umem_3_buffer_param_b);
+	}
+}
+
+/**
  * Query HCA attributes.
  * Using those attributes we can check on run time if the device
  * is having the required capabilities.
@@ -343,6 +428,9 @@ struct mlx5_devx_obj *
 	attr->flex_parser_protocols = MLX5_GET(cmd_hca_cap, hcattr,
 					       flex_parser_protocols);
 	attr->qos.sup = MLX5_GET(cmd_hca_cap, hcattr, qos);
+	attr->vdpa.valid = !!(MLX5_GET64(cmd_hca_cap, hcattr,
+					 general_obj_types) &
+			      MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q);
 	if (attr->qos.sup) {
 		MLX5_SET(query_hca_cap_in, in, op_mod,
 			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
@@ -367,6 +455,8 @@ struct mlx5_devx_obj *
 		attr->qos.flow_meter_reg_share =
 			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
 	}
+	if (attr->vdpa.valid)
+		mlx5_devx_cmd_query_hca_vdpa_attr(ctx, &attr->vdpa);
 	if (!attr->eth_net_offloads)
 		return 0;
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 2d58d96..c1c9e99 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -34,6 +34,29 @@ struct mlx5_hca_qos_attr {
 
 };
 
+struct mlx5_hca_vdpa_attr {
+	uint8_t virtio_queue_type;
+	uint32_t valid:1;
+	uint32_t desc_tunnel_offload_type:1;
+	uint32_t eth_frame_offload_type:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t log_doorbell_stride:5;
+	uint32_t log_doorbell_bar_size:5;
+	uint32_t max_num_virtio_queues;
+	uint32_t umem_1_buffer_param_a;
+	uint32_t umem_1_buffer_param_b;
+	uint32_t umem_2_buffer_param_a;
+	uint32_t umem_2_buffer_param_b;
+	uint32_t umem_3_buffer_param_a;
+	uint32_t umem_3_buffer_param_b;
+	uint64_t doorbell_bar_offset;
+};
+
 /* HCA supports this number of time periods for LRO. */
 #define MLX5_LRO_NUM_SUPP_PERIODS 4
 
@@ -62,6 +85,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_num_packets:5;
 	uint32_t vhca_id:16;
 	struct mlx5_hca_qos_attr qos;
+	struct mlx5_hca_vdpa_attr vdpa;
 };
 
 struct mlx5_devx_wq_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 5730ad1..efd6ad4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -881,6 +881,11 @@ enum {
 	MLX5_GET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_ETHERNET_OFFLOAD_CAPS = 0x1 << 1,
 	MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP = 0xc << 1,
+	MLX5_GET_HCA_CAP_OP_MOD_VDPA_EMULATION = 0x13 << 1,
+};
+
+enum {
+	MLX5_GENERAL_OBJ_TYPES_CAP_VIRTQ_NET_Q = (1ULL << 0xd),
 };
 
 enum {
@@ -1256,11 +1261,51 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8 reserved_at_200[0x600];
 };
 
+enum {
+	MLX5_VIRTQ_TYPE_SPLIT = 0,
+	MLX5_VIRTQ_TYPE_PACKED = 1,
+};
+
+enum {
+	MLX5_VIRTQ_EVENT_MODE_NO_MSIX = 0,
+	MLX5_VIRTQ_EVENT_MODE_QP = 1,
+	MLX5_VIRTQ_EVENT_MODE_MSIX = 2,
+};
+
+struct mlx5_ifc_virtio_emulation_cap_bits {
+	u8 desc_tunnel_offload_type[0x1];
+	u8 eth_frame_offload_type[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_7[0x1][0x9];
+	u8 event_mode[0x8];
+	u8 virtio_queue_type[0x8];
+	u8 reserved_at_20[0x13];
+	u8 log_doorbell_stride[0x5];
+	u8 reserved_at_3b[0x3];
+	u8 log_doorbell_bar_size[0x5];
+	u8 doorbell_bar_offset[0x40];
+	u8 reserved_at_80[0x8];
+	u8 max_num_virtio_queues[0x18];
+	u8 reserved_at_a0[0x60];
+	u8 umem_1_buffer_param_a[0x20];
+	u8 umem_1_buffer_param_b[0x20];
+	u8 umem_2_buffer_param_a[0x20];
+	u8 umem_2_buffer_param_b[0x20];
+	u8 umem_3_buffer_param_a[0x20];
+	u8 umem_3_buffer_param_b[0x20];
+	u8 reserved_at_1c0[0x620];
+};
+
 union mlx5_ifc_hca_cap_union_bits {
 	struct mlx5_ifc_cmd_hca_cap_bits cmd_hca_cap;
 	struct mlx5_ifc_per_protocol_networking_offload_caps_bits
 	       per_protocol_networking_offload_caps;
 	struct mlx5_ifc_qos_cap_bits qos_cap;
+	struct mlx5_ifc_virtio_emulation_cap_bits vdpa_caps;
 	u8 reserved_at_0[0x8000];
 };
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 08/25] common/mlx5: glue null memory region allocation
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (6 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
                         ` (17 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add support for rdma-core API to allocate NULL MR.
When the device HW get a NULL MR address, it will do nothing with the
address, no read and no write.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 13 +++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  1 +
 2 files changed, 14 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index d5bc84e..e75e6bc 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -226,6 +226,18 @@
 	return ibv_reg_mr(pd, addr, length, access);
 }
 
+static struct ibv_mr *
+mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return ibv_alloc_null_mr(pd);
+#else
+	(void)pd;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static int
 mlx5_glue_dereg_mr(struct ibv_mr *mr)
 {
@@ -1070,6 +1082,7 @@
 	.destroy_qp = mlx5_glue_destroy_qp,
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
+	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
 	.destroy_counter_set = mlx5_glue_destroy_counter_set,
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index f4c3180..33afaf4 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -138,6 +138,7 @@ struct mlx5_glue {
 			 int attr_mask);
 	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr,
 				 size_t length, int access);
+	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
 		(struct ibv_context *context,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 09/25] common/mlx5: support DevX indirect mkey creation
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (7 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 08/25] common/mlx5: glue null memory region allocation Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 10/25] common/mlx5: glue event queue query Matan Azrad
                         ` (16 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add option to create an indirect mkey by the current
mlx5_devx_cmd_mkey_create command.
Indirect mkey points to set of direct mkeys.
By this way, the HW\SW can reference fragmented memory by one object.
Align the net/mlx5 driver usage in the above command.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 43 ++++++++++++++++++++++++++++++------
 drivers/common/mlx5/mlx5_devx_cmds.h | 17 ++++++++++++++
 drivers/common/mlx5/mlx5_prm.h       | 12 ++++++++++
 drivers/net/mlx5/mlx5_flow_dv.c      |  4 ++++
 4 files changed, 69 insertions(+), 7 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 3a10ff0..2197705 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -142,7 +142,11 @@ struct mlx5_devx_obj *
 mlx5_devx_cmd_mkey_create(struct ibv_context *ctx,
 			  struct mlx5_devx_mkey_attr *attr)
 {
-	uint32_t in[MLX5_ST_SZ_DW(create_mkey_in)] = {0};
+	struct mlx5_klm *klm_array = attr->klm_array;
+	int klm_num = attr->klm_num;
+	int in_size_dw = MLX5_ST_SZ_DW(create_mkey_in) +
+		     (klm_num ? RTE_ALIGN(klm_num, 4) : 0) * MLX5_ST_SZ_DW(klm);
+	uint32_t in[in_size_dw];
 	uint32_t out[MLX5_ST_SZ_DW(create_mkey_out)] = {0};
 	void *mkc;
 	struct mlx5_devx_obj *mkey = rte_zmalloc("mkey", sizeof(*mkey), 0);
@@ -153,27 +157,52 @@ struct mlx5_devx_obj *
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+	memset(in, 0, in_size_dw * 4);
 	pgsize = sysconf(_SC_PAGESIZE);
-	translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
 	MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	if (klm_num > 0) {
+		int i;
+		uint8_t *klm = (uint8_t *)MLX5_ADDR_OF(create_mkey_in, in,
+						       klm_pas_mtt);
+		translation_size = RTE_ALIGN(klm_num, 4);
+		for (i = 0; i < klm_num; i++) {
+			MLX5_SET(klm, klm, byte_count, klm_array[i].byte_count);
+			MLX5_SET(klm, klm, mkey, klm_array[i].mkey);
+			MLX5_SET64(klm, klm, address, klm_array[i].address);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		for (; i < (int)translation_size; i++) {
+			MLX5_SET(klm, klm, mkey, 0x0);
+			MLX5_SET64(klm, klm, address, 0x0);
+			klm += MLX5_ST_SZ_BYTES(klm);
+		}
+		MLX5_SET(mkc, mkc, access_mode_1_0, attr->log_entity_size ?
+			 MLX5_MKC_ACCESS_MODE_KLM_FBS :
+			 MLX5_MKC_ACCESS_MODE_KLM);
+		MLX5_SET(mkc, mkc, log_page_size, attr->log_entity_size);
+	} else {
+		translation_size = (RTE_ALIGN(attr->size, pgsize) * 8) / 16;
+		MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+		MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
+	}
 	MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
 		 translation_size);
 	MLX5_SET(create_mkey_in, in, mkey_umem_id, attr->umem_id);
-	mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+	MLX5_SET(create_mkey_in, in, pg_access, attr->pg_access);
 	MLX5_SET(mkc, mkc, lw, 0x1);
 	MLX5_SET(mkc, mkc, lr, 0x1);
-	MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
 	MLX5_SET(mkc, mkc, qpn, 0xffffff);
 	MLX5_SET(mkc, mkc, pd, attr->pd);
 	MLX5_SET(mkc, mkc, mkey_7_0, attr->umem_id & 0xFF);
 	MLX5_SET(mkc, mkc, translations_octword_size, translation_size);
 	MLX5_SET64(mkc, mkc, start_addr, attr->addr);
 	MLX5_SET64(mkc, mkc, len, attr->size);
-	MLX5_SET(mkc, mkc, log_page_size, rte_log2_u32(pgsize));
-	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+	mkey->obj = mlx5_glue->devx_obj_create(ctx, in, in_size_dw * 4, out,
 					       sizeof(out));
 	if (!mkey->obj) {
-		DRV_LOG(ERR, "Can't create mkey - error %d", errno);
+		DRV_LOG(ERR, "Can't create %sdirect mkey - error %d\n",
+			klm_num ? "an in" : "a ", errno);
 		rte_errno = errno;
 		rte_free(mkey);
 		return NULL;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c1c9e99..c76c172 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -6,6 +6,7 @@
 #define RTE_PMD_MLX5_DEVX_CMDS_H_
 
 #include "mlx5_glue.h"
+#include "mlx5_prm.h"
 
 
 /* devX creation object */
@@ -14,11 +15,26 @@ struct mlx5_devx_obj {
 	int id; /* The object ID. */
 };
 
+/* UMR memory buffer used to define 1 entry in indirect mkey. */
+struct mlx5_klm {
+	uint32_t byte_count;
+	uint32_t mkey;
+	uint64_t address;
+};
+
+/* This is limitation of libibverbs: in length variable type is u16. */
+#define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
+		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
 	uint32_t umem_id;
 	uint32_t pd;
+	uint32_t log_entity_size;
+	uint32_t pg_access:1;
+	struct mlx5_klm *klm_array;
+	int klm_num;
 };
 
 /* HCA qos attributes. */
@@ -216,6 +232,7 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t hairpin_peer_vhca:16;
 };
 
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index efd6ad4..db15bb6 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -726,6 +726,8 @@ enum {
 
 enum {
 	MLX5_MKC_ACCESS_MODE_MTT   = 0x1,
+	MLX5_MKC_ACCESS_MODE_KLM   = 0x2,
+	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
 /* Flow counters. */
@@ -790,6 +792,16 @@ struct mlx5_ifc_query_flow_counter_in_bits {
 	u8         flow_counter_id[0x20];
 };
 
+#define MLX5_MAX_KLM_BYTE_COUNT 0x80000000u
+#define MLX5_MIN_KLM_FIXED_BUFFER_SIZE 0x1000u
+
+
+struct mlx5_ifc_klm_bits {
+	u8         byte_count[0x20];
+	u8         mkey[0x20];
+	u8         address[0x40];
+};
+
 struct mlx5_ifc_mkc_bits {
 	u8         reserved_at_0[0x1];
 	u8         free[0x1];
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1b31602..5610d94 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3885,6 +3885,10 @@ struct field_modify_info modify_tcp[] = {
 	mkey_attr.size = size;
 	mkey_attr.umem_id = mem_mng->umem->umem_id;
 	mkey_attr.pd = sh->pdn;
+	mkey_attr.log_entity_size = 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = NULL;
+	mkey_attr.klm_num = 0;
 	mem_mng->dm = mlx5_devx_cmd_mkey_create(sh->ctx, &mkey_attr);
 	if (!mem_mng->dm) {
 		mlx5_glue->devx_umem_dereg(mem_mng->umem);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 10/25] common/mlx5: glue event queue query
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (8 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 11/25] common/mlx5: glue event interrupt commands Matan Azrad
                         ` (15 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The event queue is managed only by the kernel.
Add the rdma-core command in glue to query the kernel event queue
details.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  2 ++
 2 files changed, 17 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e75e6bc..fedce77 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1049,6 +1049,20 @@
 #endif
 }
 
+static int
+mlx5_glue_devx_query_eqn(struct ibv_context *ctx, uint32_t cpus,
+			 uint32_t *eqn)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_query_eqn(ctx, cpus, eqn);
+#else
+	(void)ctx;
+	(void)cpus;
+	(void)eqn;
+	return -ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1148,4 +1162,5 @@
 	.devx_qp_query = mlx5_glue_devx_qp_query,
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
+	.devx_query_eqn = mlx5_glue_devx_query_eqn,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 33afaf4..fe51f97 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -259,6 +259,8 @@ struct mlx5_glue {
 			       uint32_t port_num,
 			       struct mlx5dv_devx_port *mlx5_devx_port);
 	int (*dr_dump_domain)(FILE *file, void *domain);
+	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
+			      uint32_t *eqn);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 11/25] common/mlx5: glue event interrupt commands
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (9 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 10/25] common/mlx5: glue event queue query Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 12/25] common/mlx5: glue UAR allocation Matan Azrad
                         ` (14 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add the next commands to glue in order to support interrupt event
channel operations associated to events in the EQ:
	devx_create_event_channel,
	devx_destroy_event_channel,
	devx_subscribe_devx_event,
	devx_subscribe_devx_event_fd,
	devx_get_event.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++
 drivers/common/mlx5/meson.build |  2 ++
 drivers/common/mlx5/mlx5_glue.c | 79 +++++++++++++++++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h | 25 +++++++++++++
 4 files changed, 111 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 66585b2..7110231 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -154,6 +154,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_action_create_dest_devx_tir \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVX_EVENT \
+		infiniband/mlx5dv.h \
+		func mlx5dv_devx_get_event \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER \
 		infiniband/mlx5dv.h \
 		func mlx5dv_dr_action_create_flow_meter \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 718cef2..76ca7d7 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -108,6 +108,8 @@ if build
 		'mlx5dv_devx_obj_query_async' ],
 		[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_dest_devx_tir' ],
+		[ 'HAVE_IBV_DEVX_EVENT', 'infiniband/mlx5dv.h',
+		'mlx5dv_devx_get_event' ],
 		[ 'HAVE_MLX5_DR_CREATE_ACTION_FLOW_METER', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_flow_meter' ],
 		[ 'HAVE_MLX5DV_MMAP_GET_NC_PAGES_CMD', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index fedce77..e4eabdb 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1063,6 +1063,80 @@
 #endif
 }
 
+static struct mlx5dv_devx_event_channel *
+mlx5_glue_devx_create_event_channel(struct ibv_context *ctx, int flags)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_create_event_channel(ctx, flags);
+#else
+	(void)ctx;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_destroy_event_channel(struct mlx5dv_devx_event_channel *eventc)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	mlx5dv_devx_destroy_event_channel(eventc);
+#else
+	(void)eventc;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event(struct mlx5dv_devx_event_channel *eventc,
+				    struct mlx5dv_devx_obj *obj,
+				    uint16_t events_sz, uint16_t events_num[],
+				    uint64_t cookie)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event(eventc, obj, events_sz,
+						events_num, cookie);
+#else
+	(void)eventc;
+	(void)obj;
+	(void)events_sz;
+	(void)events_num;
+	(void)cookie;
+	return -ENOTSUP;
+#endif
+}
+
+static int
+mlx5_glue_devx_subscribe_devx_event_fd(struct mlx5dv_devx_event_channel *eventc,
+				       int fd, struct mlx5dv_devx_obj *obj,
+				       uint16_t event_num)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_subscribe_devx_event_fd(eventc, fd, obj, event_num);
+#else
+	(void)eventc;
+	(void)fd;
+	(void)obj;
+	(void)event_num;
+	return -ENOTSUP;
+#endif
+}
+
+static ssize_t
+mlx5_glue_devx_get_event(struct mlx5dv_devx_event_channel *eventc,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len)
+{
+#ifdef HAVE_IBV_DEVX_EVENT
+	return mlx5dv_devx_get_event(eventc, event_data, event_resp_len);
+#else
+	(void)eventc;
+	(void)event_data;
+	(void)event_resp_len;
+	errno = ENOTSUP;
+	return -1;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1163,4 +1237,9 @@
 	.devx_port_query = mlx5_glue_devx_port_query,
 	.dr_dump_domain = mlx5_glue_dr_dump_domain,
 	.devx_query_eqn = mlx5_glue_devx_query_eqn,
+	.devx_create_event_channel = mlx5_glue_devx_create_event_channel,
+	.devx_destroy_event_channel = mlx5_glue_devx_destroy_event_channel,
+	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
+	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
+	.devx_get_event = mlx5_glue_devx_get_event,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index fe51f97..6fc00dd 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -86,6 +86,12 @@
 struct mlx5dv_dr_flow_meter_attr;
 #endif
 
+#ifndef HAVE_IBV_DEVX_EVENT
+struct mlx5dv_devx_event_channel { int fd; };
+struct mlx5dv_devx_async_event_hdr;
+#define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -261,6 +267,25 @@ struct mlx5_glue {
 	int (*dr_dump_domain)(FILE *file, void *domain);
 	int (*devx_query_eqn)(struct ibv_context *context, uint32_t cpus,
 			      uint32_t *eqn);
+	struct mlx5dv_devx_event_channel *(*devx_create_event_channel)
+				(struct ibv_context *context, int flags);
+	void (*devx_destroy_event_channel)
+			(struct mlx5dv_devx_event_channel *event_channel);
+	int (*devx_subscribe_devx_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t events_sz,
+			 uint16_t events_num[],
+			 uint64_t cookie);
+	int (*devx_subscribe_devx_event_fd)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 int fd,
+			 struct mlx5dv_devx_obj *obj,
+			 uint16_t event_num);
+	ssize_t (*devx_get_event)
+			(struct mlx5dv_devx_event_channel *event_channel,
+			 struct mlx5dv_devx_async_event_hdr *event_data,
+			 size_t event_resp_len);
 };
 
 const struct mlx5_glue *mlx5_glue;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 12/25] common/mlx5: glue UAR allocation
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (10 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 11/25] common/mlx5: glue event interrupt commands Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
                         ` (13 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The isolated, protected and independent direct access to the HW by
multiple processes is implemented via User Access Region (UAR)
mechanism.
The UAR is part of PCI address space that is mapped for direct access to
the HW from the CPU.
UAR is comprised of multiple pages, each page containing registers that
control the HW operation.
UAR mechanism is used to post execution or control requests to the HW.
It is used by the HW to enforce protection and isolation between
different processes.
Add a glue command to allocate and free an UAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_glue.c | 25 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  4 ++++
 2 files changed, 29 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index e4eabdb..5691636 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1137,6 +1137,29 @@
 #endif
 }
 
+static struct mlx5dv_devx_uar *
+mlx5_glue_devx_alloc_uar(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	return mlx5dv_devx_alloc_uar(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_devx_free_uar(struct mlx5dv_devx_uar *devx_uar)
+{
+#ifdef HAVE_IBV_DEVX_OBJ
+	mlx5dv_devx_free_uar(devx_uar);
+#else
+	(void)devx_uar;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1242,4 +1265,6 @@
 	.devx_subscribe_devx_event = mlx5_glue_devx_subscribe_devx_event,
 	.devx_subscribe_devx_event_fd = mlx5_glue_devx_subscribe_devx_event_fd,
 	.devx_get_event = mlx5_glue_devx_get_event,
+	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
+	.devx_free_uar = mlx5_glue_devx_free_uar,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 6fc00dd..7d9256e 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -66,6 +66,7 @@
 #ifndef HAVE_IBV_DEVX_OBJ
 struct mlx5dv_devx_obj;
 struct mlx5dv_devx_umem { uint32_t umem_id; };
+struct mlx5dv_devx_uar { void *reg_addr; void *base_addr; uint32_t page_id; };
 #endif
 
 #ifndef HAVE_IBV_DEVX_ASYNC
@@ -230,6 +231,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
+						  uint32_t flags);
+	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
 	struct mlx5dv_devx_obj *(*devx_obj_create)
 					(struct ibv_context *ctx,
 					 const void *in, size_t inlen,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 13/25] common/mlx5: add DevX command to create CQ
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (11 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 12/25] common/mlx5: glue UAR allocation Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 14/25] common/mlx5: glue VAR allocation Matan Azrad
                         ` (12 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
HW implements completion queues(CQ) used to post completion reports upon
completion of work request.
Used for Rx and Tx datapath.
Add DevX command to create a CQ.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 57 ++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            | 19 +++++++
 drivers/common/mlx5/mlx5_prm.h                  | 71 +++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 148 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2197705..cdc041b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1093,3 +1093,60 @@ struct mlx5_devx_obj *
 #endif
 	return -ret;
 }
+
+/*
+ * Create CQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to CQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_cq(struct ibv_context *ctx, struct mlx5_devx_cq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_cq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_cq_out)] = {0};
+	struct mlx5_devx_obj *cq_obj = rte_zmalloc(__func__, sizeof(*cq_obj),
+						   0);
+	void *cqctx = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+
+	if (!cq_obj) {
+		DRV_LOG(ERR, "Failed to allocate CQ object memory.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_cq_in, in, opcode, MLX5_CMD_OP_CREATE_CQ);
+	if (attr->db_umem_valid) {
+		MLX5_SET(cqc, cqctx, dbr_umem_valid, attr->db_umem_valid);
+		MLX5_SET(cqc, cqctx, dbr_umem_id, attr->db_umem_id);
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_umem_offset);
+	} else {
+		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_addr);
+	}
+	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
+	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
+	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
+	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
+	if (attr->q_umem_valid) {
+		MLX5_SET(create_cq_in, in, cq_umem_valid, attr->q_umem_valid);
+		MLX5_SET(create_cq_in, in, cq_umem_id, attr->q_umem_id);
+		MLX5_SET64(create_cq_in, in, cq_umem_offset,
+			   attr->q_umem_offset);
+	}
+	cq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!cq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ using DevX errno=%d.", errno);
+		rte_free(cq_obj);
+		return NULL;
+	}
+	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
+	return cq_obj;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c76c172..581658b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -233,6 +233,23 @@ struct mlx5_devx_modify_sq_attr {
 };
 
 
+/* CQ attributes structure, used by CQ operations. */
+struct mlx5_devx_cq_attr {
+	uint32_t q_umem_valid:1;
+	uint32_t db_umem_valid:1;
+	uint32_t use_first_only:1;
+	uint32_t overrun_ignore:1;
+	uint32_t log_cq_size:5;
+	uint32_t log_page_size:5;
+	uint32_t uar_page_id;
+	uint32_t q_umem_id;
+	uint64_t q_umem_offset;
+	uint32_t db_umem_id;
+	uint64_t db_umem_offset;
+	uint32_t eqn;
+	uint64_t db_addr;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -269,4 +286,6 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_tis(struct ibv_context *ctx,
 struct mlx5_devx_obj *mlx5_devx_cmd_create_td(struct ibv_context *ctx);
 int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
+					      struct mlx5_devx_cq_attr *attr);
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index db15bb6..a4082b9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -710,6 +710,7 @@ enum {
 enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
+	MLX5_CMD_OP_CREATE_CQ = 0x400,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -1846,6 +1847,76 @@ struct mlx5_ifc_flow_meter_parameters_bits {
 	u8         reserved_at_8[0x60];		// 14h-1Ch
 };
 
+struct mlx5_ifc_cqc_bits {
+	u8 status[0x4];
+	u8 as_notify[0x1];
+	u8 initiator_src_dct[0x1];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_7[0x1];
+	u8 cqe_sz[0x3];
+	u8 cc[0x1];
+	u8 reserved_at_c[0x1];
+	u8 scqe_break_moderation_en[0x1];
+	u8 oi[0x1];
+	u8 cq_period_mode[0x2];
+	u8 cqe_comp_en[0x1];
+	u8 mini_cqe_res_format[0x2];
+	u8 st[0x4];
+	u8 reserved_at_18[0x8];
+	u8 dbr_umem_id[0x20];
+	u8 reserved_at_40[0x14];
+	u8 page_offset[0x6];
+	u8 reserved_at_5a[0x6];
+	u8 reserved_at_60[0x3];
+	u8 log_cq_size[0x5];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x4];
+	u8 cq_period[0xc];
+	u8 cq_max_count[0x10];
+	u8 reserved_at_a0[0x18];
+	u8 c_eqn[0x8];
+	u8 reserved_at_c0[0x3];
+	u8 log_page_size[0x5];
+	u8 reserved_at_c8[0x18];
+	u8 reserved_at_e0[0x20];
+	u8 reserved_at_100[0x8];
+	u8 last_notified_index[0x18];
+	u8 reserved_at_120[0x8];
+	u8 last_solicit_index[0x18];
+	u8 reserved_at_140[0x8];
+	u8 consumer_counter[0x18];
+	u8 reserved_at_160[0x8];
+	u8 producer_counter[0x18];
+	u8 local_partition_id[0xc];
+	u8 process_id[0x14];
+	u8 reserved_at_1A0[0x20];
+	u8 dbr_addr[0x40];
+};
+
+struct mlx5_ifc_create_cq_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 cqn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_create_cq_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	struct mlx5_ifc_cqc_bits cq_context;
+	u8 cq_umem_offset[0x40];
+	u8 cq_umem_id[0x20];
+	u8 cq_umem_valid[0x1];
+	u8 reserved_at_2e1[0x1f];
+	u8 reserved_at_300[0x580];
+	u8 pas[];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 0c01172..c6a203d 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -1,6 +1,7 @@
 DPDK_20.02 {
 	global:
 
+	mlx5_devx_cmd_create_cq;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 14/25] common/mlx5: glue VAR allocation
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (12 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 15/25] common/mlx5: add DevX virtq commands Matan Azrad
                         ` (11 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Virtio access region(VAR) is the UAR that allocated for virtio emulation
access.
Add rdma-core operations to allocate and free VAR.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile    |  5 +++++
 drivers/common/mlx5/meson.build |  1 +
 drivers/common/mlx5/mlx5_glue.c | 26 ++++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_glue.h |  8 ++++++++
 4 files changed, 40 insertions(+)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 7110231..d1de3ec 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -174,6 +174,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5_MMAP_GET_NC_PAGES_CMD \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_VAR \
+		infiniband/mlx5dv.h \
+		func mlx5dv_alloc_var \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 76ca7d7..3e130cb 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -120,6 +120,7 @@ if build
 		'MLX5DV_DR_DOMAIN_TYPE_FDB' ],
 		[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
 		'mlx5dv_dr_action_create_push_vlan' ],
+		[ 'HAVE_IBV_VAR', 'infiniband/mlx5dv.h', 'mlx5dv_alloc_var' ],
 		[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
 		'SUPPORTED_40000baseKR4_Full' ],
 		[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
diff --git a/drivers/common/mlx5/mlx5_glue.c b/drivers/common/mlx5/mlx5_glue.c
index 5691636..27cf33c 100644
--- a/drivers/common/mlx5/mlx5_glue.c
+++ b/drivers/common/mlx5/mlx5_glue.c
@@ -1160,6 +1160,30 @@
 #endif
 }
 
+static struct mlx5dv_var *
+mlx5_glue_dv_alloc_var(struct ibv_context *context, uint32_t flags)
+{
+#ifdef HAVE_IBV_VAR
+	return mlx5dv_alloc_var(context, flags);
+#else
+	(void)context;
+	(void)flags;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_dv_free_var(struct mlx5dv_var *var)
+{
+#ifdef HAVE_IBV_VAR
+	mlx5dv_free_var(var);
+#else
+	(void)var;
+	errno = ENOTSUP;
+#endif
+}
+
 alignas(RTE_CACHE_LINE_SIZE)
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
@@ -1267,4 +1291,6 @@
 	.devx_get_event = mlx5_glue_devx_get_event,
 	.devx_alloc_uar = mlx5_glue_devx_alloc_uar,
 	.devx_free_uar = mlx5_glue_devx_free_uar,
+	.dv_alloc_var = mlx5_glue_dv_alloc_var,
+	.dv_free_var = mlx5_glue_dv_free_var,
 };
diff --git a/drivers/common/mlx5/mlx5_glue.h b/drivers/common/mlx5/mlx5_glue.h
index 7d9256e..6238b43 100644
--- a/drivers/common/mlx5/mlx5_glue.h
+++ b/drivers/common/mlx5/mlx5_glue.h
@@ -93,6 +93,11 @@
 #define MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA 1
 #endif
 
+#ifndef HAVE_IBV_VAR
+struct mlx5dv_var { uint32_t page_id; uint32_t length; off_t mmap_off;
+			uint64_t comp_mask; };
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -231,6 +236,9 @@ struct mlx5_glue {
 	int (*dv_destroy_flow)(void *flow);
 	int (*dv_destroy_flow_matcher)(void *matcher);
 	struct ibv_context *(*dv_open_device)(struct ibv_device *device);
+	struct mlx5dv_var *(*dv_alloc_var)(struct ibv_context *context,
+					   uint32_t flags);
+	void (*dv_free_var)(struct mlx5dv_var *var);
 	struct mlx5dv_devx_uar *(*devx_alloc_uar)(struct ibv_context *context,
 						  uint32_t flags);
 	void (*devx_free_uar)(struct mlx5dv_devx_uar *devx_uar);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 15/25] common/mlx5: add DevX virtq commands
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (13 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 14/25] common/mlx5: glue VAR allocation Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
                         ` (10 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Virtio emulation offload allows SW to offload the I/O operations of a
virtio virtqueue, using the device, allowing an improved performance
for its users.
While supplying all the relevant Virtqueue information (type, size,
memory location, doorbell information, etc.). The device can then
offload the I/O operation of this queue, according to its device type
characteristics.
Some of the virtio features can be supported according to the device
capability, for example, TSO and checksum.
Add virtio queue create, modify and query DevX commands.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 199 +++++++++++++++++++++---
 drivers/common/mlx5/mlx5_devx_cmds.h            |  48 +++++-
 drivers/common/mlx5/mlx5_prm.h                  | 117 ++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   3 +
 4 files changed, 343 insertions(+), 24 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index cdc041b..2425513 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -377,24 +377,18 @@ struct mlx5_devx_obj *
 		vdpa_attr->max_num_virtio_queues =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 max_num_virtio_queues);
-		vdpa_attr->umem_1_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_a);
-		vdpa_attr->umem_1_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_1_buffer_param_b);
-		vdpa_attr->umem_2_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_2_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_2_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_a =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_a);
-		vdpa_attr->umem_3_buffer_param_b =
-			MLX5_GET(virtio_emulation_cap, hcattr,
-				 umem_3_buffer_param_b);
+		vdpa_attr->umems[0].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_a);
+		vdpa_attr->umems[0].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_1_buffer_param_b);
+		vdpa_attr->umems[1].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_a);
+		vdpa_attr->umems[1].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_2_buffer_param_b);
+		vdpa_attr->umems[2].a = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_a);
+		vdpa_attr->umems[2].b = MLX5_GET(virtio_emulation_cap, hcattr,
+						 umem_3_buffer_param_b);
 	}
 }
 
@@ -1150,3 +1144,172 @@ struct mlx5_devx_obj *
 	cq_obj->id = MLX5_GET(create_cq_out, out, cqn);
 	return cq_obj;
 }
+
+/**
+ * Create VIRTQ using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to VIRTQ attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	struct mlx5_devx_obj *virtq_obj = rte_zmalloc(__func__,
+						     sizeof(*virtq_obj), 0);
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+
+	if (!virtq_obj) {
+		DRV_LOG(ERR, "Failed to allocate virtq data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	MLX5_SET16(virtio_net_q, virtq, hw_used_index, attr->hw_used_index);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+	MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+	MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+	MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+	MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
+	MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+	MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+	MLX5_SET64(virtio_q, virtctx, available_addr, attr->available_addr);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	MLX5_SET16(virtio_q, virtctx, queue_size, attr->q_size);
+	MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	MLX5_SET(virtio_q, virtctx, umem_1_id, attr->umems[0].id);
+	MLX5_SET(virtio_q, virtctx, umem_1_size, attr->umems[0].size);
+	MLX5_SET64(virtio_q, virtctx, umem_1_offset, attr->umems[0].offset);
+	MLX5_SET(virtio_q, virtctx, umem_2_id, attr->umems[1].id);
+	MLX5_SET(virtio_q, virtctx, umem_2_size, attr->umems[1].size);
+	MLX5_SET64(virtio_q, virtctx, umem_2_offset, attr->umems[1].offset);
+	MLX5_SET(virtio_q, virtctx, umem_3_id, attr->umems[2].id);
+	MLX5_SET(virtio_q, virtctx, umem_3_size, attr->umems[2].size);
+	MLX5_SET64(virtio_q, virtctx, umem_3_offset, attr->umems[2].offset);
+	MLX5_SET(virtio_net_q, virtq, tisn_or_qpn, attr->tis_id);
+	virtq_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						    sizeof(out));
+	if (!virtq_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create VIRTQ Obj using DevX.");
+		rte_free(virtq_obj);
+		return NULL;
+	}
+	virtq_obj->id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+	return virtq_obj;
+}
+
+/**
+ * Modify VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in] attr
+ *   Pointer to modify virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_virtq_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
+	void *virtq = MLX5_ADDR_OF(create_virtq_in, in, virtq);
+	void *hdr = MLX5_ADDR_OF(create_virtq_in, in, hdr);
+	void *virtctx = MLX5_ADDR_OF(virtio_net_q, virtq, virtio_q_context);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
+	switch (attr->type) {
+	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
+			 attr->dirty_bitmap_mkey);
+		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
+			 attr->dirty_bitmap_addr);
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
+			 attr->dirty_bitmap_size);
+		break;
+	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
+			 attr->dirty_bitmap_dump_enable);
+		break;
+	default:
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
+
+/**
+ * Query VIRTQ using DevX API.
+ *
+ * @param[in] virtq_obj
+ *   Pointer to virtq object structure.
+ * @param [in/out] attr
+ *   Pointer to virtq attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			   struct mlx5_devx_virtq_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(general_obj_in_cmd_hdr)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(query_virtq_out)] = {0};
+	void *hdr = MLX5_ADDR_OF(query_virtq_out, in, hdr);
+	void *virtq = MLX5_ADDR_OF(query_virtq_out, out, virtq);
+	int ret;
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
+	ret = mlx5_glue->devx_obj_query(virtq_obj->obj, in, sizeof(in),
+					 out, sizeof(out));
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	attr->hw_available_index = MLX5_GET16(virtio_net_q, virtq,
+					      hw_available_index);
+	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 581658b..1631c08 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -64,12 +64,10 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t max_num_virtio_queues;
-	uint32_t umem_1_buffer_param_a;
-	uint32_t umem_1_buffer_param_b;
-	uint32_t umem_2_buffer_param_a;
-	uint32_t umem_2_buffer_param_b;
-	uint32_t umem_3_buffer_param_a;
-	uint32_t umem_3_buffer_param_b;
+	struct {
+		uint32_t a;
+		uint32_t b;
+	} umems[3];
 	uint64_t doorbell_bar_offset;
 };
 
@@ -250,6 +248,37 @@ struct mlx5_devx_cq_attr {
 	uint64_t db_addr;
 };
 
+/* Virtq attributes structure, used by VIRTQ operations. */
+struct mlx5_devx_virtq_attr {
+	uint16_t hw_available_index;
+	uint16_t hw_used_index;
+	uint16_t q_size;
+	uint32_t virtio_version_1_0:1;
+	uint32_t tso_ipv4:1;
+	uint32_t tso_ipv6:1;
+	uint32_t tx_csum:1;
+	uint32_t rx_csum:1;
+	uint32_t event_mode:3;
+	uint32_t state:4;
+	uint32_t dirty_bitmap_dump_enable:1;
+	uint32_t dirty_bitmap_mkey;
+	uint32_t dirty_bitmap_size;
+	uint32_t mkey;
+	uint32_t qp_id;
+	uint32_t queue_index;
+	uint32_t tis_id;
+	uint64_t dirty_bitmap_addr;
+	uint64_t type;
+	uint64_t desc_addr;
+	uint64_t used_addr;
+	uint64_t available_addr;
+	struct {
+		uint32_t id;
+		uint32_t size;
+		uint64_t offset;
+	} umems[3];
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -288,4 +317,11 @@ int mlx5_devx_cmd_flow_dump(void *fdb_domain, void *rx_domain, void *tx_domain,
 			    FILE *file);
 struct mlx5_devx_obj *mlx5_devx_cmd_create_cq(struct ibv_context *ctx,
 					      struct mlx5_devx_cq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_virtq(struct ibv_context *ctx,
+					     struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
+			       struct mlx5_devx_virtq_attr *attr);
+int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
+			      struct mlx5_devx_virtq_attr *attr);
+
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index a4082b9..4b8a34c 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -527,6 +527,8 @@ struct mlx5_modification_cmd {
 #define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - \
 				    (__mlx5_bit_off(typ, fld) & 0xf))
 #define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << \
+				  __mlx5_16_bit_off(typ, fld))
 #define MLX5_ST_SZ_BYTES(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 8)
 #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
 #define MLX5_BYTE_OFF(typ, fld) (__mlx5_bit_off(typ, fld) / 8)
@@ -551,6 +553,17 @@ struct mlx5_modification_cmd {
 			rte_cpu_to_be_64(v); \
 	} while (0)
 
+#define MLX5_SET16(typ, p, fld, v) \
+	do { \
+		u16 _v = v; \
+		*((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \
+		rte_cpu_to_be_16((rte_be_to_cpu_16(*((__be16 *)(p) + \
+				  __mlx5_16_off(typ, fld))) & \
+				  (~__mlx5_16_mask(typ, fld))) | \
+				 (((_v) & __mlx5_mask16(typ, fld)) << \
+				  __mlx5_16_bit_off(typ, fld))); \
+	} while (0)
+
 #define MLX5_GET(typ, p, fld) \
 	((rte_be_to_cpu_32(*((__be32 *)(p) +\
 	__mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
@@ -723,6 +736,9 @@ enum {
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
+	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
+	MLX5_CMD_OP_MODIFY_GENERAL_OBJECT = 0xa01,
+	MLX5_CMD_OP_QUERY_GENERAL_OBJECT = 0xa02,
 };
 
 enum {
@@ -1691,6 +1707,11 @@ struct mlx5_ifc_create_tir_in_bits {
 	struct mlx5_ifc_tirc_bits ctx;
 };
 
+enum {
+	MLX5_INLINE_Q_TYPE_RQ = 0x0,
+	MLX5_INLINE_Q_TYPE_VIRTQ = 0x1,
+};
+
 struct mlx5_ifc_rq_num_bits {
 	u8 reserved_at_0[0x8];
 	u8 rq_num[0x18];
@@ -1917,6 +1938,102 @@ struct mlx5_ifc_create_cq_in_bits {
 	u8 pas[];
 };
 
+enum {
+	MLX5_GENERAL_OBJ_TYPE_VIRTQ = 0x000d,
+};
+
+struct mlx5_ifc_general_obj_in_cmd_hdr_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x20];
+	u8 obj_type[0x10];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_general_obj_out_cmd_hdr_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 obj_id[0x20];
+	u8 reserved_at_60[0x20];
+};
+
+enum {
+	MLX5_VIRTQ_STATE_INIT = 0,
+	MLX5_VIRTQ_STATE_RDY = 1,
+	MLX5_VIRTQ_STATE_SUSPEND = 2,
+	MLX5_VIRTQ_STATE_ERROR = 3,
+};
+
+enum {
+	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
+	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+};
+
+struct mlx5_ifc_virtio_q_bits {
+	u8 virtio_q_type[0x8];
+	u8 reserved_at_8[0x5];
+	u8 event_mode[0x3];
+	u8 queue_index[0x10];
+	u8 full_emulation[0x1];
+	u8 virtio_version_1_0[0x1];
+	u8 reserved_at_22[0x2];
+	u8 offload_type[0x4];
+	u8 event_qpn_or_msix[0x18];
+	u8 doorbell_stride_idx[0x10];
+	u8 queue_size[0x10];
+	u8 device_emulation_id[0x20];
+	u8 desc_addr[0x40];
+	u8 used_addr[0x40];
+	u8 available_addr[0x40];
+	u8 virtio_q_mkey[0x20];
+	u8 reserved_at_160[0x20];
+	u8 umem_1_id[0x20];
+	u8 umem_1_size[0x20];
+	u8 umem_1_offset[0x40];
+	u8 umem_2_id[0x20];
+	u8 umem_2_size[0x20];
+	u8 umem_2_offset[0x40];
+	u8 umem_3_id[0x20];
+	u8 umem_3_size[0x20];
+	u8 umem_3_offset[0x40];
+	u8 reserved_at_300[0x100];
+};
+
+struct mlx5_ifc_virtio_net_q_bits {
+	u8 modify_field_select[0x40];
+	u8 reserved_at_40[0x40];
+	u8 tso_ipv4[0x1];
+	u8 tso_ipv6[0x1];
+	u8 tx_csum[0x1];
+	u8 rx_csum[0x1];
+	u8 reserved_at_84[0x6];
+	u8 dirty_bitmap_dump_enable[0x1];
+	u8 vhost_log_page[0x5];
+	u8 reserved_at_90[0xc];
+	u8 state[0x4];
+	u8 error_type[0x8];
+	u8 tisn_or_qpn[0x18];
+	u8 dirty_bitmap_mkey[0x20];
+	u8 dirty_bitmap_size[0x20];
+	u8 dirty_bitmap_addr[0x40];
+	u8 hw_available_index[0x10];
+	u8 hw_used_index[0x10];
+	u8 reserved_at_160[0xa0];
+	struct mlx5_ifc_virtio_q_bits virtio_q_context;
+};
+
+struct mlx5_ifc_create_virtq_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
+struct mlx5_ifc_query_virtq_out_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_virtio_net_q_bits virtq;
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index c6a203d..f3082ce 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -8,6 +8,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_create_tir;
 	mlx5_devx_cmd_create_td;
 	mlx5_devx_cmd_create_tis;
+	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
 	mlx5_devx_cmd_flow_counter_query;
@@ -15,8 +16,10 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
+	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
 	mlx5_devx_cmd_query_hca_attr;
+	mlx5_devx_cmd_query_virtq;
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 16/25] common/mlx5: add support for DevX QP operations
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (14 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 15/25] common/mlx5: add DevX virtq commands Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
                         ` (9 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
QP creation is needed for vDPA virtq support.
Add 2 DevX commands to create QP and to modify QP state.
The support is for RC QP only in force loopback address mode.
By this way, the packets can be sent to other inernal destinations in
the nic. For example: other QPs or virtqs.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 167 ++++++++++-
 drivers/common/mlx5/mlx5_devx_cmds.h            |  20 ++
 drivers/common/mlx5/mlx5_prm.h                  | 376 ++++++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |   2 +
 4 files changed, 564 insertions(+), 1 deletion(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2425513..e7288c8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1124,7 +1124,8 @@ struct mlx5_devx_obj *
 	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
 	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
 	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
-	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size);
+	MLX5_SET(cqc, cqctx, log_page_size, attr->log_page_size -
+		 MLX5_ADAPTER_PAGE_SHIFT);
 	MLX5_SET(cqc, cqctx, c_eqn, attr->eqn);
 	MLX5_SET(cqc, cqctx, uar_page, attr->uar_page_id);
 	if (attr->q_umem_valid) {
@@ -1313,3 +1314,167 @@ struct mlx5_devx_obj *
 	attr->hw_used_index = MLX5_GET16(virtio_net_q, virtq, hw_used_index);
 	return ret;
 }
+
+/**
+ * Create QP using DevX API.
+ *
+ * @param[in] ctx
+ *   ibv_context returned from mlx5dv_open_device.
+ * @param [in] attr
+ *   Pointer to QP attributes structure.
+ *
+ * @return
+ *   The DevX object created, NULL otherwise and rte_errno is set.
+ */
+struct mlx5_devx_obj *
+mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
+			struct mlx5_devx_qp_attr *attr)
+{
+	uint32_t in[MLX5_ST_SZ_DW(create_qp_in)] = {0};
+	uint32_t out[MLX5_ST_SZ_DW(create_qp_out)] = {0};
+	struct mlx5_devx_obj *qp_obj = rte_zmalloc(__func__, sizeof(*qp_obj),
+						   0);
+	void *qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+
+	if (!qp_obj) {
+		DRV_LOG(ERR, "Failed to allocate QP data.");
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+	MLX5_SET(qpc, qpc, pd, attr->pd);
+	if (attr->uar_index) {
+		MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+		MLX5_SET(qpc, qpc, uar_page, attr->uar_index);
+		MLX5_SET(qpc, qpc, log_page_size, attr->log_page_size -
+			 MLX5_ADAPTER_PAGE_SHIFT);
+		if (attr->sq_size) {
+			RTE_ASSERT(RTE_IS_POWER_OF_2(attr->sq_size));
+			MLX5_SET(qpc, qpc, cqn_snd, attr->cqn);
+			MLX5_SET(qpc, qpc, log_sq_size,
+				 rte_log2_u32(attr->sq_size));
+		} else {
+			MLX5_SET(qpc, qpc, no_sq, 1);
+		}
+		if (attr->rq_size) {
+			RTE_ASSERT(RTE_IS_POWER_OF_2(attr->rq_size));
+			MLX5_SET(qpc, qpc, cqn_rcv, attr->cqn);
+			MLX5_SET(qpc, qpc, log_rq_stride, attr->log_rq_stride -
+				 MLX5_LOG_RQ_STRIDE_SHIFT);
+			MLX5_SET(qpc, qpc, log_rq_size,
+				 rte_log2_u32(attr->rq_size));
+			MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
+		} else {
+			MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		}
+		if (attr->dbr_umem_valid) {
+			MLX5_SET(qpc, qpc, dbr_umem_valid,
+				 attr->dbr_umem_valid);
+			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
+		}
+		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
+		MLX5_SET64(create_qp_in, in, wq_umem_offset,
+			   attr->wq_umem_offset);
+		MLX5_SET(create_qp_in, in, wq_umem_id, attr->wq_umem_id);
+		MLX5_SET(create_qp_in, in, wq_umem_valid, 1);
+	} else {
+		/* Special QP to be managed by FW - no SQ\RQ\CQ\UAR\DB rec. */
+		MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+		MLX5_SET(qpc, qpc, no_sq, 1);
+	}
+	qp_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in), out,
+						 sizeof(out));
+	if (!qp_obj->obj) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create QP Obj using DevX.");
+		rte_free(qp_obj);
+		return NULL;
+	}
+	qp_obj->id = MLX5_GET(create_qp_out, out, qpn);
+	return qp_obj;
+}
+
+/**
+ * Modify QP using DevX API.
+ * Currently supports only force loop-back QP.
+ *
+ * @param[in] qp
+ *   Pointer to QP object structure.
+ * @param [in] qp_st_mod_op
+ *   The QP state modification operation.
+ * @param [in] remote_qp_id
+ *   The remote QP ID for MLX5_CMD_OP_INIT2RTR_QP operation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
+			      uint32_t remote_qp_id)
+{
+	union {
+		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
+		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
+		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+	} in;
+	union {
+		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
+		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
+		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+	} out;
+	void *qpc;
+	int ret;
+	unsigned int inlen;
+	unsigned int outlen;
+
+	memset(&in, 0, sizeof(in));
+	memset(&out, 0, sizeof(out));
+	MLX5_SET(rst2init_qp_in, &in, opcode, qp_st_mod_op);
+	switch (qp_st_mod_op) {
+	case MLX5_CMD_OP_RST2INIT_QP:
+		MLX5_SET(rst2init_qp_in, &in, qpn, qp->id);
+		qpc = MLX5_ADDR_OF(rst2init_qp_in, &in, qpc);
+		MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, 1);
+		MLX5_SET(qpc, qpc, rre, 1);
+		MLX5_SET(qpc, qpc, rwe, 1);
+		MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+		inlen = sizeof(in.rst2init);
+		outlen = sizeof(out.rst2init);
+		break;
+	case MLX5_CMD_OP_INIT2RTR_QP:
+		MLX5_SET(init2rtr_qp_in, &in, qpn, qp->id);
+		qpc = MLX5_ADDR_OF(init2rtr_qp_in, &in, qpc);
+		MLX5_SET(qpc, qpc, primary_address_path.fl, 1);
+		MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, 1);
+		MLX5_SET(qpc, qpc, mtu, 1);
+		MLX5_SET(qpc, qpc, log_msg_max, 30);
+		MLX5_SET(qpc, qpc, remote_qpn, remote_qp_id);
+		MLX5_SET(qpc, qpc, min_rnr_nak, 0);
+		inlen = sizeof(in.init2rtr);
+		outlen = sizeof(out.init2rtr);
+		break;
+	case MLX5_CMD_OP_RTR2RTS_QP:
+		qpc = MLX5_ADDR_OF(rtr2rts_qp_in, &in, qpc);
+		MLX5_SET(rtr2rts_qp_in, &in, qpn, qp->id);
+		MLX5_SET(qpc, qpc, primary_address_path.ack_timeout, 14);
+		MLX5_SET(qpc, qpc, log_ack_req_freq, 0);
+		MLX5_SET(qpc, qpc, retry_count, 7);
+		MLX5_SET(qpc, qpc, rnr_retry, 7);
+		inlen = sizeof(in.rtr2rts);
+		outlen = sizeof(out.rtr2rts);
+		break;
+	default:
+		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
+			qp_st_mod_op);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	ret = mlx5_glue->devx_obj_modify(qp->obj, &in, inlen, &out, outlen);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify QP using DevX.");
+		rte_errno = errno;
+		return -errno;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 1631c08..d1a21b8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -279,6 +279,22 @@ struct mlx5_devx_virtq_attr {
 	} umems[3];
 };
 
+
+struct mlx5_devx_qp_attr {
+	uint32_t pd:24;
+	uint32_t uar_index:24;
+	uint32_t cqn:24;
+	uint32_t log_page_size:5;
+	uint32_t rq_size:17; /* Must be power of 2. */
+	uint32_t log_rq_stride:3;
+	uint32_t sq_size:17; /* Must be power of 2. */
+	uint32_t dbr_umem_valid:1;
+	uint32_t dbr_umem_id;
+	uint64_t dbr_address;
+	uint32_t wq_umem_id;
+	uint64_t wq_umem_offset;
+};
+
 /* mlx5_devx_cmds.c */
 
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(struct ibv_context *ctx,
@@ -323,5 +339,9 @@ int mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 			       struct mlx5_devx_virtq_attr *attr);
 int mlx5_devx_cmd_query_virtq(struct mlx5_devx_obj *virtq_obj,
 			      struct mlx5_devx_virtq_attr *attr);
+struct mlx5_devx_obj *mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
+					      struct mlx5_devx_qp_attr *attr);
+int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
+				  uint32_t qp_st_mod_op, uint32_t remote_qp_id);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4b8a34c..e53dd61 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -724,6 +724,19 @@ enum {
 	MLX5_CMD_OP_QUERY_HCA_CAP = 0x100,
 	MLX5_CMD_OP_CREATE_MKEY = 0x200,
 	MLX5_CMD_OP_CREATE_CQ = 0x400,
+	MLX5_CMD_OP_CREATE_QP = 0x500,
+	MLX5_CMD_OP_RST2INIT_QP = 0x502,
+	MLX5_CMD_OP_INIT2RTR_QP = 0x503,
+	MLX5_CMD_OP_RTR2RTS_QP = 0x504,
+	MLX5_CMD_OP_RTS2RTS_QP = 0x505,
+	MLX5_CMD_OP_SQERR2RTS_QP = 0x506,
+	MLX5_CMD_OP_QP_2ERR = 0x507,
+	MLX5_CMD_OP_QP_2RST = 0x50A,
+	MLX5_CMD_OP_QUERY_QP = 0x50B,
+	MLX5_CMD_OP_SQD2RTS_QP = 0x50C,
+	MLX5_CMD_OP_INIT2INIT_QP = 0x50E,
+	MLX5_CMD_OP_SUSPEND_QP = 0x50F,
+	MLX5_CMD_OP_RESUME_QP = 0x510,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
@@ -747,6 +760,9 @@ enum {
 	MLX5_MKC_ACCESS_MODE_KLM_FBS = 0x3,
 };
 
+#define MLX5_ADAPTER_PAGE_SHIFT 12
+#define MLX5_LOG_RQ_STRIDE_SHIFT 4
+
 /* Flow counters. */
 struct mlx5_ifc_alloc_flow_counter_out_bits {
 	u8         status[0x8];
@@ -2034,6 +2050,366 @@ struct mlx5_ifc_query_virtq_out_bits {
 	struct mlx5_ifc_virtio_net_q_bits virtq;
 };
 
+enum {
+	MLX5_QP_ST_RC = 0x0,
+};
+
+enum {
+	MLX5_QP_PM_MIGRATED = 0x3,
+};
+
+enum {
+	MLX5_NON_ZERO_RQ = 0x0,
+	MLX5_SRQ_RQ = 0x1,
+	MLX5_CRQ_RQ = 0x2,
+	MLX5_ZERO_LEN_RQ = 0x3,
+};
+
+struct mlx5_ifc_ads_bits {
+	u8 fl[0x1];
+	u8 free_ar[0x1];
+	u8 reserved_at_2[0xe];
+	u8 pkey_index[0x10];
+	u8 reserved_at_20[0x8];
+	u8 grh[0x1];
+	u8 mlid[0x7];
+	u8 rlid[0x10];
+	u8 ack_timeout[0x5];
+	u8 reserved_at_45[0x3];
+	u8 src_addr_index[0x8];
+	u8 reserved_at_50[0x4];
+	u8 stat_rate[0x4];
+	u8 hop_limit[0x8];
+	u8 reserved_at_60[0x4];
+	u8 tclass[0x8];
+	u8 flow_label[0x14];
+	u8 rgid_rip[16][0x8];
+	u8 reserved_at_100[0x4];
+	u8 f_dscp[0x1];
+	u8 f_ecn[0x1];
+	u8 reserved_at_106[0x1];
+	u8 f_eth_prio[0x1];
+	u8 ecn[0x2];
+	u8 dscp[0x6];
+	u8 udp_sport[0x10];
+	u8 dei_cfi[0x1];
+	u8 eth_prio[0x3];
+	u8 sl[0x4];
+	u8 vhca_port_num[0x8];
+	u8 rmac_47_32[0x10];
+	u8 rmac_31_0[0x20];
+};
+
+struct mlx5_ifc_qpc_bits {
+	u8 state[0x4];
+	u8 lag_tx_port_affinity[0x4];
+	u8 st[0x8];
+	u8 reserved_at_10[0x3];
+	u8 pm_state[0x2];
+	u8 reserved_at_15[0x1];
+	u8 req_e2e_credit_mode[0x2];
+	u8 offload_type[0x4];
+	u8 end_padding_mode[0x2];
+	u8 reserved_at_1e[0x2];
+	u8 wq_signature[0x1];
+	u8 block_lb_mc[0x1];
+	u8 atomic_like_write_en[0x1];
+	u8 latency_sensitive[0x1];
+	u8 reserved_at_24[0x1];
+	u8 drain_sigerr[0x1];
+	u8 reserved_at_26[0x2];
+	u8 pd[0x18];
+	u8 mtu[0x3];
+	u8 log_msg_max[0x5];
+	u8 reserved_at_48[0x1];
+	u8 log_rq_size[0x4];
+	u8 log_rq_stride[0x3];
+	u8 no_sq[0x1];
+	u8 log_sq_size[0x4];
+	u8 reserved_at_55[0x6];
+	u8 rlky[0x1];
+	u8 ulp_stateless_offload_mode[0x4];
+	u8 counter_set_id[0x8];
+	u8 uar_page[0x18];
+	u8 reserved_at_80[0x8];
+	u8 user_index[0x18];
+	u8 reserved_at_a0[0x3];
+	u8 log_page_size[0x5];
+	u8 remote_qpn[0x18];
+	struct mlx5_ifc_ads_bits primary_address_path;
+	struct mlx5_ifc_ads_bits secondary_address_path;
+	u8 log_ack_req_freq[0x4];
+	u8 reserved_at_384[0x4];
+	u8 log_sra_max[0x3];
+	u8 reserved_at_38b[0x2];
+	u8 retry_count[0x3];
+	u8 rnr_retry[0x3];
+	u8 reserved_at_393[0x1];
+	u8 fre[0x1];
+	u8 cur_rnr_retry[0x3];
+	u8 cur_retry_count[0x3];
+	u8 reserved_at_39b[0x5];
+	u8 reserved_at_3a0[0x20];
+	u8 reserved_at_3c0[0x8];
+	u8 next_send_psn[0x18];
+	u8 reserved_at_3e0[0x8];
+	u8 cqn_snd[0x18];
+	u8 reserved_at_400[0x8];
+	u8 deth_sqpn[0x18];
+	u8 reserved_at_420[0x20];
+	u8 reserved_at_440[0x8];
+	u8 last_acked_psn[0x18];
+	u8 reserved_at_460[0x8];
+	u8 ssn[0x18];
+	u8 reserved_at_480[0x8];
+	u8 log_rra_max[0x3];
+	u8 reserved_at_48b[0x1];
+	u8 atomic_mode[0x4];
+	u8 rre[0x1];
+	u8 rwe[0x1];
+	u8 rae[0x1];
+	u8 reserved_at_493[0x1];
+	u8 page_offset[0x6];
+	u8 reserved_at_49a[0x3];
+	u8 cd_slave_receive[0x1];
+	u8 cd_slave_send[0x1];
+	u8 cd_master[0x1];
+	u8 reserved_at_4a0[0x3];
+	u8 min_rnr_nak[0x5];
+	u8 next_rcv_psn[0x18];
+	u8 reserved_at_4c0[0x8];
+	u8 xrcd[0x18];
+	u8 reserved_at_4e0[0x8];
+	u8 cqn_rcv[0x18];
+	u8 dbr_addr[0x40];
+	u8 q_key[0x20];
+	u8 reserved_at_560[0x5];
+	u8 rq_type[0x3];
+	u8 srqn_rmpn_xrqn[0x18];
+	u8 reserved_at_580[0x8];
+	u8 rmsn[0x18];
+	u8 hw_sq_wqebb_counter[0x10];
+	u8 sw_sq_wqebb_counter[0x10];
+	u8 hw_rq_counter[0x20];
+	u8 sw_rq_counter[0x20];
+	u8 reserved_at_600[0x20];
+	u8 reserved_at_620[0xf];
+	u8 cgs[0x1];
+	u8 cs_req[0x8];
+	u8 cs_res[0x8];
+	u8 dc_access_key[0x40];
+	u8 reserved_at_680[0x3];
+	u8 dbr_umem_valid[0x1];
+	u8 reserved_at_684[0x9c];
+	u8 dbr_umem_id[0x20];
+};
+
+struct mlx5_ifc_create_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_create_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x40];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 wq_umem_offset[0x40];
+	u8 wq_umem_id[0x20];
+	u8 wq_umem_valid[0x1];
+	u8 reserved_at_861[0x1f];
+	u8 pas[0][0x40];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+struct mlx5_ifc_sqerr2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_sqerr2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_sqd2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_sqd2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rts2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rts2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rtr2rts_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rtr2rts_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_rst2init_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_rst2init_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_init2rtr_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_init2rtr_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+struct mlx5_ifc_init2init_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_init2init_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+};
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_query_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	u8 opt_param_mask[0x20];
+	u8 reserved_at_a0[0x20];
+	struct mlx5_ifc_qpc_bits qpc;
+	u8 reserved_at_800[0x80];
+	u8 pas[0][0x40];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+struct mlx5_ifc_query_qp_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_60[0x20];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index f3082ce..df8e064 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -2,6 +2,7 @@ DPDK_20.02 {
 	global:
 
 	mlx5_devx_cmd_create_cq;
+	mlx5_devx_cmd_create_qp;
 	mlx5_devx_cmd_create_rq;
 	mlx5_devx_cmd_create_rqt;
 	mlx5_devx_cmd_create_sq;
@@ -14,6 +15,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_mkey_create;
+	mlx5_devx_cmd_modify_qp_state;
 	mlx5_devx_cmd_modify_rq;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 17/25] common/mlx5: allow type configuration for DevX RQT
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (15 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 18/25] common/mlx5: add TIR field constants Matan Azrad
                         ` (8 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Allow virtio queue type configuration in the RQ table.
The needed fields and configuration was added.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 1 +
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 drivers/common/mlx5/mlx5_prm.h       | 5 +++--
 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index e7288c8..e372df6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -846,6 +846,7 @@ struct mlx5_devx_obj *
 	}
 	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
 	rqt_ctx = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
 	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
 	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
 	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d1a21b8..9ef3ce2 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -188,6 +188,7 @@ struct mlx5_devx_tir_attr {
 
 /* RQT attributes structure, used by RQT operations. */
 struct mlx5_devx_rqt_attr {
+	uint8_t rq_type;
 	uint32_t rqt_max_size:16;
 	uint32_t rqt_actual_size:16;
 	uint32_t rq_list[];
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e53dd61..000ba1f 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1734,8 +1734,9 @@ struct mlx5_ifc_rq_num_bits {
 };
 
 struct mlx5_ifc_rqtc_bits {
-	u8 reserved_at_0[0xa0];
-	u8 reserved_at_a0[0x10];
+	u8 reserved_at_0[0xa5];
+	u8 list_q_type[0x3];
+	u8 reserved_at_a8[0x8];
 	u8 rqt_max_size[0x10];
 	u8 reserved_at_c0[0x10];
 	u8 rqt_actual_size[0x10];
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 18/25] common/mlx5: add TIR field constants
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (16 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
                         ` (7 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The DevX TIR object configuration should get L3 and L4 protocols
expected to be forwarded by the TIR.
Add the PRM constant values needed to configure the L3 and L4 protocols.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 000ba1f..e326868 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1639,6 +1639,16 @@ struct mlx5_ifc_modify_rq_in_bits {
 };
 
 enum {
+	MLX5_L3_PROT_TYPE_IPV4 = 0,
+	MLX5_L3_PROT_TYPE_IPV6 = 1,
+};
+
+enum {
+	MLX5_L4_PROT_TYPE_TCP = 0,
+	MLX5_L4_PROT_TYPE_UDP = 1,
+};
+
+enum {
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP     = 0x0,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP     = 0x1,
 	MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT   = 0x2,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 19/25] common/mlx5: add DevX command to modify RQT
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (17 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 18/25] common/mlx5: add TIR field constants Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
                         ` (6 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
RQ table can be changed to support different list of queues.
Add DevX command to modify DevX RQT object to point on new RQ list.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 47 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  2 ++
 drivers/common/mlx5/mlx5_prm.h                  | 21 +++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 71 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index e372df6..1d3a729 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -864,6 +864,53 @@ struct mlx5_devx_obj *
 }
 
 /**
+ * Modify RQT using DevX API.
+ *
+ * @param[in] rqt
+ *   Pointer to RQT DevX object structure.
+ * @param [in] rqt_attr
+ *   Pointer to RQT attributes structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			 struct mlx5_devx_rqt_attr *rqt_attr)
+{
+	uint32_t inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) +
+			 rqt_attr->rqt_actual_size * sizeof(uint32_t);
+	uint32_t out[MLX5_ST_SZ_DW(modify_rqt_out)] = {0};
+	uint32_t *in = rte_calloc(__func__, 1, inlen, 0);
+	void *rqt_ctx;
+	int i;
+	int ret;
+
+	if (!in) {
+		DRV_LOG(ERR, "Failed to allocate RQT modify IN data.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	MLX5_SET(modify_rqt_in, in, opcode, MLX5_CMD_OP_MODIFY_RQT);
+	MLX5_SET(modify_rqt_in, in, rqtn, rqt->id);
+	MLX5_SET64(modify_rqt_in, in, modify_bitmask, 0x1);
+	rqt_ctx = MLX5_ADDR_OF(modify_rqt_in, in, rqt_context);
+	MLX5_SET(rqtc, rqt_ctx, list_q_type, rqt_attr->rq_type);
+	MLX5_SET(rqtc, rqt_ctx, rqt_max_size, rqt_attr->rqt_max_size);
+	MLX5_SET(rqtc, rqt_ctx, rqt_actual_size, rqt_attr->rqt_actual_size);
+	for (i = 0; i < rqt_attr->rqt_actual_size; i++)
+		MLX5_SET(rqtc, rqt_ctx, rq_num[i], rqt_attr->rq_list[i]);
+	ret = mlx5_glue->devx_obj_modify(rqt->obj, in, inlen, out, sizeof(out));
+	rte_free(in);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify RQT using DevX.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return ret;
+}
+
+/**
  * Create SQ using DevX API.
  *
  * @param[in] ctx
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 9ef3ce2..b99c54b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -344,5 +344,7 @@ struct mlx5_devx_obj *mlx5_devx_cmd_create_qp(struct ibv_context *ctx,
 					      struct mlx5_devx_qp_attr *attr);
 int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
 				  uint32_t qp_st_mod_op, uint32_t remote_qp_id);
+int mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
+			     struct mlx5_devx_rqt_attr *rqt_attr);
 
 #endif /* RTE_PMD_MLX5_DEVX_CMDS_H_ */
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e326868..b48cd0a 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -747,6 +747,7 @@ enum {
 	MLX5_CMD_OP_CREATE_TIS = 0x912,
 	MLX5_CMD_OP_QUERY_TIS = 0x915,
 	MLX5_CMD_OP_CREATE_RQT = 0x916,
+	MLX5_CMD_OP_MODIFY_RQT = 0x917,
 	MLX5_CMD_OP_ALLOC_FLOW_COUNTER = 0x939,
 	MLX5_CMD_OP_QUERY_FLOW_COUNTER = 0x93b,
 	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
@@ -1774,10 +1775,30 @@ struct mlx5_ifc_create_rqt_in_bits {
 	u8 reserved_at_40[0xc0];
 	struct mlx5_ifc_rqtc_bits rqt_context;
 };
+
+struct mlx5_ifc_modify_rqt_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x8];
+	u8 rqtn[0x18];
+	u8 reserved_at_60[0x20];
+	u8 modify_bitmask[0x40];
+	u8 reserved_at_c0[0x40];
+	struct mlx5_ifc_rqtc_bits rqt_context;
+};
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+struct mlx5_ifc_modify_rqt_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
 enum {
 	MLX5_SQC_STATE_RST  = 0x0,
 	MLX5_SQC_STATE_RDY  = 0x1,
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index df8e064..95ca54a 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -17,6 +17,7 @@ DPDK_20.02 {
 	mlx5_devx_cmd_mkey_create;
 	mlx5_devx_cmd_modify_qp_state;
 	mlx5_devx_cmd_modify_rq;
+	mlx5_devx_cmd_modify_rqt;
 	mlx5_devx_cmd_modify_sq;
 	mlx5_devx_cmd_modify_virtq;
 	mlx5_devx_cmd_qp_query_tis_td;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 20/25] common/mlx5: get DevX capability for max RQT size
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (18 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 21/25] net/mlx5: select driver by class device argument Matan Azrad
                         ` (5 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
In order to allow RQT size configuration which is limited to the
correct maximum value, add log_max_rqt_size for DevX capability
structure.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 2 ++
 drivers/common/mlx5/mlx5_devx_cmds.h | 1 +
 2 files changed, 3 insertions(+)
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d3a729..b0803ac 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -436,6 +436,8 @@ struct mlx5_devx_obj *
 			MLX5_GET(cmd_hca_cap, hcattr, flow_counter_bulk_alloc);
 	attr->flow_counters_dump = MLX5_GET(cmd_hca_cap, hcattr,
 					    flow_counters_dump);
+	attr->log_max_rqt_size = MLX5_GET(cmd_hca_cap, hcattr,
+					  log_max_rqt_size);
 	attr->eswitch_manager = MLX5_GET(cmd_hca_cap, hcattr, eswitch_manager);
 	attr->hairpin = MLX5_GET(cmd_hca_cap, hcattr, hairpin);
 	attr->log_max_hairpin_queues = MLX5_GET(cmd_hca_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index b99c54b..6912dc6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -78,6 +78,7 @@ struct mlx5_hca_vdpa_attr {
 struct mlx5_hca_attr {
 	uint32_t eswitch_manager:1;
 	uint32_t flow_counters_dump:1;
+	uint32_t log_max_rqt_size:5;
 	uint8_t flow_counter_bulk_alloc_bitmap;
 	uint32_t eth_net_offloads:1;
 	uint32_t eth_virt:1;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 21/25] net/mlx5: select driver by class device argument
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (19 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 22/25] net/mlx5: separate Netlink command interface Matan Azrad
                         ` (4 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
There might be a case that one Mellanox device can be probed by
multiple mlx5 drivers.
One case is that any mlx5 vDPA device can be probed by bothe net/mlx5
and vdpa/mlx5.
Add a new mlx5 common API to get the requested driver by devargs:
class=[net/vdpa].
Skip net/mlx5 PMD probing while the device is selected to be probed by
the vDPA driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |  2 +-
 drivers/common/mlx5/meson.build                 |  2 +-
 drivers/common/mlx5/mlx5_common.c               | 36 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_common.h               | 11 ++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  2 ++
 drivers/net/mlx5/mlx5.c                         |  8 ++++++
 6 files changed, 59 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index d1de3ec..b9e9803 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -41,7 +41,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 3e130cb..b88822e 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -37,7 +37,7 @@ endforeach
 
 if build
 	allow_experimental_apis = true
-	deps += ['hash', 'pci', 'net', 'eal']
+	deps += ['hash', 'pci', 'net', 'eal', 'kvargs']
 	ext_deps += libs
 	sources = files(
 		'mlx5_devx_cmds.c',
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 2381208..52f191a 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -71,6 +71,42 @@
 	return 0;
 }
 
+static int
+mlx5_class_check_handler(__rte_unused const char *key, const char *value,
+			 void *opaque)
+{
+	enum mlx5_class *ret = opaque;
+
+	if (strcmp(value, "vdpa") == 0) {
+		*ret = MLX5_CLASS_VDPA;
+	} else if (strcmp(value, "net") == 0) {
+		*ret = MLX5_CLASS_NET;
+	} else {
+		DRV_LOG(ERR, "Invalid mlx5 class %s. Maybe typo in device"
+			" class argument setting?", value);
+		*ret = MLX5_CLASS_INVALID;
+	}
+	return 0;
+}
+
+enum mlx5_class
+mlx5_class_get(struct rte_devargs *devargs)
+{
+	struct rte_kvargs *kvlist;
+	const char *key = MLX5_CLASS_ARG_NAME;
+	enum mlx5_class ret = MLX5_CLASS_NET;
+
+	if (devargs == NULL)
+		return ret;
+	kvlist = rte_kvargs_parse(devargs->args, NULL);
+	if (kvlist == NULL)
+		return ret;
+	if (rte_kvargs_count(kvlist, key))
+		rte_kvargs_process(kvlist, key, mlx5_class_check_handler, &ret);
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 9d464d4..2988f4b 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -11,6 +11,8 @@
 #include <rte_pci.h>
 #include <rte_atomic.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "mlx5_prm.h"
 
@@ -150,4 +152,13 @@ enum mlx5_cqe_status {
 
 int mlx5_dev_to_pci_addr(const char *dev_path, struct rte_pci_addr *pci_addr);
 
+#define MLX5_CLASS_ARG_NAME "class"
+
+enum mlx5_class {
+	MLX5_CLASS_NET,
+	MLX5_CLASS_VDPA,
+	MLX5_CLASS_INVALID,
+};
+enum mlx5_class mlx5_class_get(struct rte_devargs *devargs);
+
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 95ca54a..3e7038b 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -1,6 +1,8 @@
 DPDK_20.02 {
 	global:
 
+	mlx5_class_get;
+
 	mlx5_devx_cmd_create_cq;
 	mlx5_devx_cmd_create_qp;
 	mlx5_devx_cmd_create_rq;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d0fa2da..ca5bc14 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1565,6 +1565,8 @@ struct mlx5_flow_id_pool *
 		config->max_dump_files_num = tmp;
 	} else if (strcmp(MLX5_LRO_TIMEOUT_USEC, key) == 0) {
 		config->lro.timeout = tmp;
+	} else if (strcmp(MLX5_CLASS_ARG_NAME, key) == 0) {
+		DRV_LOG(DEBUG, "class argument is %s.", val);
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -1616,6 +1618,7 @@ struct mlx5_flow_id_pool *
 		MLX5_REPRESENTOR,
 		MLX5_MAX_DUMP_FILES_NUM,
 		MLX5_LRO_TIMEOUT_USEC,
+		MLX5_CLASS_ARG_NAME,
 		NULL,
 	};
 	struct rte_kvargs *kvlist;
@@ -2967,6 +2970,11 @@ struct mlx5_flow_id_pool *
 	struct mlx5_dev_config dev_config;
 	int ret;
 
+	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_NET) {
+		DRV_LOG(DEBUG, "Skip probing - should be probed by other mlx5"
+			" driver.");
+		return 1;
+	}
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
 		mlx5_pmd_socket_init();
 	ret = mlx5_init_once();
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 22/25] net/mlx5: separate Netlink command interface
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (20 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 21/25] net/mlx5: select driver by class device argument Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
                         ` (3 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
The Netlink commands interfaces is included in the mlx5.h file with a
lot of other PMD interfaces.
As an arrangement to make the Netlink commands shared with different
PMDs, this patch moves the Netlink interface to a new file called
mlx5_nl.h.
Move non Netlink pure vlan commands from mlx5_nl.c to the
mlx5_vlan.c.
Rename Netlink commands and structures to use prefix mlx5_nl.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  72 +++------------------
 drivers/net/mlx5/mlx5_nl.c   | 149 +++----------------------------------------
 drivers/net/mlx5/mlx5_nl.h   |  69 ++++++++++++++++++++
 drivers/net/mlx5/mlx5_vlan.c | 134 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 220 insertions(+), 204 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3daf0db..01d0051 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -39,6 +39,7 @@
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
+#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
@@ -75,24 +76,6 @@ struct mlx5_mp_param {
 /** Key string for IPC. */
 #define MLX5_MP_NAME "net_mlx5_mp"
 
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
 
 LIST_HEAD(mlx5_dev_list, mlx5_ibv_shared);
 
@@ -226,30 +209,12 @@ enum mlx5_verbs_alloc_type {
 	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
 };
 
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
 	uint32_t created:1;
 };
 
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t nl_sn;
-	uint32_t vf_ifindex;
-	struct rte_eth_dev *dev;
-	struct mlx5_vlan_dev vlan_dev[4096];
-};
-
 /**
  * Verbs allocator needs a context to know in the callback which kind of
  * resources it is allocating.
@@ -576,7 +541,7 @@ struct mlx5_priv {
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
 	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
-	struct mlx5_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
+	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
@@ -672,6 +637,8 @@ int mlx5_hairpin_cap_get(struct rte_eth_dev *dev,
 void mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx5_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
 		      uint32_t index, uint32_t vmdq);
+struct mlx5_nl_vlan_vmwa_context *mlx5_vlan_vmwa_init
+				    (struct rte_eth_dev *dev, uint32_t ifindex);
 int mlx5_mac_addr_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr);
 int mlx5_set_mc_addr_list(struct rte_eth_dev *dev,
 			struct rte_ether_addr *mc_addr_set,
@@ -715,6 +682,11 @@ int mlx5_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
 int mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 void mlx5_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t queue, int on);
 int mlx5_vlan_offload_set(struct rte_eth_dev *dev, int mask);
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *ctx);
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vf_vlan);
 
 /* mlx5_trigger.c */
 
@@ -796,32 +768,6 @@ int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
 int mlx5_pmd_socket_init(void);
 void mlx5_pmd_socket_uninit(void);
 
-/* mlx5_nl.c */
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-struct mlx5_vlan_vmwa_context *mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-						   uint32_t ifindex);
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *ctx);
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vf_vlan);
-
 /* mlx5_flow_meter.c */
 
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index e7ba034..3fe4b6f 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -5,7 +5,6 @@
 
 #include <errno.h>
 #include <linux/if_link.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
@@ -18,8 +17,6 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
-#include <rte_malloc.h>
-#include <rte_hypervisor.h>
 
 #include "mlx5.h"
 #include "mlx5_utils.h"
@@ -1072,7 +1069,8 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_switch_info(int nl, unsigned int ifindex, struct mlx5_switch_info *info)
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
 {
 	uint32_t seq = random();
 	struct {
@@ -1116,12 +1114,12 @@ struct mlx5_nl_ifindex_data {
  * Delete VLAN network device by ifindex.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Interface index of network device to delete.
  */
-static void
-mlx5_vlan_vmwa_delete(struct mlx5_vlan_vmwa_context *vmwa,
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
 	int ret;
@@ -1196,14 +1194,14 @@ struct mlx5_nl_ifindex_data {
  * Create network VLAN device with specified VLAN tag.
  *
  * @param[in] tcf
- *   Context object initialized by mlx5_vlan_vmwa_init().
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
  * @param[in] ifindex
  *   Base network interface index.
  * @param[in] tag
  *   VLAN tag for VLAN network device to create.
  */
-static uint32_t
-mlx5_vlan_vmwa_create(struct mlx5_vlan_vmwa_context *vmwa,
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex,
 		      uint16_t tag)
 {
@@ -1269,134 +1267,3 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
-
-/*
- * Release VLAN network device, created for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to release.
- */
-void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(vlan->created);
-	assert(priv->vmwa_context);
-	if (!vlan->created || !vmwa)
-		return;
-	vlan->created = 0;
-	assert(vlan_dev[vlan->tag].refcnt);
-	if (--vlan_dev[vlan->tag].refcnt == 0 &&
-	    vlan_dev[vlan->tag].ifindex) {
-		mlx5_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex = 0;
-	}
-}
-
-/**
- * Acquire VLAN interface with specified tag for VM workaround.
- *
- * @param[in] dev
- *   Ethernet device object, Netlink context provider.
- * @param[in] vlan
- *   Object representing the network device to acquire.
- */
-void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
-			    struct mlx5_vf_vlan *vlan)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_vlan_vmwa_context *vmwa = priv->vmwa_context;
-	struct mlx5_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
-
-	assert(!vlan->created);
-	assert(priv->vmwa_context);
-	if (vlan->created || !vmwa)
-		return;
-	if (vlan_dev[vlan->tag].refcnt == 0) {
-		assert(!vlan_dev[vlan->tag].ifindex);
-		vlan_dev[vlan->tag].ifindex =
-			mlx5_vlan_vmwa_create(vmwa,
-					      vmwa->vf_ifindex,
-					      vlan->tag);
-	}
-	if (vlan_dev[vlan->tag].ifindex) {
-		vlan_dev[vlan->tag].refcnt++;
-		vlan->created = 1;
-	}
-}
-
-/*
- * Create per ethernet device VLAN VM workaround context
- */
-struct mlx5_vlan_vmwa_context *
-mlx5_vlan_vmwa_init(struct rte_eth_dev *dev,
-		    uint32_t ifindex)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_dev_config *config = &priv->config;
-	struct mlx5_vlan_vmwa_context *vmwa;
-	enum rte_hypervisor hv_type;
-
-	/* Do not engage workaround over PF. */
-	if (!config->vf)
-		return NULL;
-	/* Check whether there is desired virtual environment */
-	hv_type = rte_hypervisor_get();
-	switch (hv_type) {
-	case RTE_HYPERVISOR_UNKNOWN:
-	case RTE_HYPERVISOR_VMWARE:
-		/*
-		 * The "white list" of configurations
-		 * to engage the workaround.
-		 */
-		break;
-	default:
-		/*
-		 * The configuration is not found in the "white list".
-		 * We should not engage the VLAN workaround.
-		 */
-		return NULL;
-	}
-	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
-	if (!vmwa) {
-		DRV_LOG(WARNING,
-			"Can not allocate memory"
-			" for VLAN workaround context");
-		return NULL;
-	}
-	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
-	if (vmwa->nl_socket < 0) {
-		DRV_LOG(WARNING,
-			"Can not create Netlink socket"
-			" for VLAN workaround context");
-		rte_free(vmwa);
-		return NULL;
-	}
-	vmwa->nl_sn = random();
-	vmwa->vf_ifindex = ifindex;
-	vmwa->dev = dev;
-	/* Cleanup for existing VLAN devices. */
-	return vmwa;
-}
-
-/*
- * Destroy per ethernet device VLAN VM workaround context
- */
-void mlx5_vlan_vmwa_exit(struct mlx5_vlan_vmwa_context *vmwa)
-{
-	unsigned int i;
-
-	/* Delete all remaining VLAN devices. */
-	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
-		if (vmwa->vlan_dev[i].ifindex)
-			mlx5_vlan_vmwa_delete(vmwa, vmwa->vlan_dev[i].ifindex);
-	}
-	if (vmwa->nl_socket >= 0)
-		close(vmwa->nl_socket);
-	rte_free(vmwa);
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..7903673
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t nl_sn;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			 uint32_t index);
+int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
+void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
+int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
+int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			   uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index b0fa31a..fb52d8f 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -7,6 +7,8 @@
 #include <errno.h>
 #include <assert.h>
 #include <stdint.h>
+#include <unistd.h>
+
 
 /*
  * Not needed by this file; included to work around the lack of off_t
@@ -26,6 +28,8 @@
 
 #include <rte_ethdev_driver.h>
 #include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_hypervisor.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -33,6 +37,7 @@
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
@@ -193,3 +198,132 @@
 	}
 	return 0;
 }
+
+/*
+ * Release VLAN network device, created for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to release.
+ */
+void mlx5_vlan_vmwa_release(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(vlan->created);
+	assert(priv->vmwa_context);
+	if (!vlan->created || !vmwa)
+		return;
+	vlan->created = 0;
+	assert(vlan_dev[vlan->tag].refcnt);
+	if (--vlan_dev[vlan->tag].refcnt == 0 &&
+	    vlan_dev[vlan->tag].ifindex) {
+		mlx5_nl_vlan_vmwa_delete(vmwa, vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex = 0;
+	}
+}
+
+/**
+ * Acquire VLAN interface with specified tag for VM workaround.
+ *
+ * @param[in] dev
+ *   Ethernet device object, Netlink context provider.
+ * @param[in] vlan
+ *   Object representing the network device to acquire.
+ */
+void mlx5_vlan_vmwa_acquire(struct rte_eth_dev *dev,
+			    struct mlx5_vf_vlan *vlan)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_nl_vlan_vmwa_context *vmwa = priv->vmwa_context;
+	struct mlx5_nl_vlan_dev *vlan_dev = &vmwa->vlan_dev[0];
+
+	assert(!vlan->created);
+	assert(priv->vmwa_context);
+	if (vlan->created || !vmwa)
+		return;
+	if (vlan_dev[vlan->tag].refcnt == 0) {
+		assert(!vlan_dev[vlan->tag].ifindex);
+		vlan_dev[vlan->tag].ifindex =
+			mlx5_nl_vlan_vmwa_create(vmwa, vmwa->vf_ifindex,
+						 vlan->tag);
+	}
+	if (vlan_dev[vlan->tag].ifindex) {
+		vlan_dev[vlan->tag].refcnt++;
+		vlan->created = 1;
+	}
+}
+
+/*
+ * Create per ethernet device VLAN VM workaround context
+ */
+struct mlx5_nl_vlan_vmwa_context *
+mlx5_vlan_vmwa_init(struct rte_eth_dev *dev, uint32_t ifindex)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_config *config = &priv->config;
+	struct mlx5_nl_vlan_vmwa_context *vmwa;
+	enum rte_hypervisor hv_type;
+
+	/* Do not engage workaround over PF. */
+	if (!config->vf)
+		return NULL;
+	/* Check whether there is desired virtual environment */
+	hv_type = rte_hypervisor_get();
+	switch (hv_type) {
+	case RTE_HYPERVISOR_UNKNOWN:
+	case RTE_HYPERVISOR_VMWARE:
+		/*
+		 * The "white list" of configurations
+		 * to engage the workaround.
+		 */
+		break;
+	default:
+		/*
+		 * The configuration is not found in the "white list".
+		 * We should not engage the VLAN workaround.
+		 */
+		return NULL;
+	}
+	vmwa = rte_zmalloc(__func__, sizeof(*vmwa), sizeof(uint32_t));
+	if (!vmwa) {
+		DRV_LOG(WARNING,
+			"Can not allocate memory"
+			" for VLAN workaround context");
+		return NULL;
+	}
+	vmwa->nl_socket = mlx5_nl_init(NETLINK_ROUTE);
+	if (vmwa->nl_socket < 0) {
+		DRV_LOG(WARNING,
+			"Can not create Netlink socket"
+			" for VLAN workaround context");
+		rte_free(vmwa);
+		return NULL;
+	}
+	vmwa->nl_sn = random();
+	vmwa->vf_ifindex = ifindex;
+	/* Cleanup for existing VLAN devices. */
+	return vmwa;
+}
+
+/*
+ * Destroy per ethernet device VLAN VM workaround context
+ */
+void mlx5_vlan_vmwa_exit(struct mlx5_nl_vlan_vmwa_context *vmwa)
+{
+	unsigned int i;
+
+	/* Delete all remaining VLAN devices. */
+	for (i = 0; i < RTE_DIM(vmwa->vlan_dev); i++) {
+		if (vmwa->vlan_dev[i].ifindex)
+			mlx5_nl_vlan_vmwa_delete(vmwa,
+						 vmwa->vlan_dev[i].ifindex);
+	}
+	if (vmwa->nl_socket >= 0)
+		close(vmwa->nl_socket);
+	rte_free(vmwa);
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 23/25] net/mlx5: reduce Netlink commands dependencies
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (21 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 22/25] net/mlx5: separate Netlink command interface Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 24/25] common/mlx5: share Netlink commands Matan Azrad
                         ` (2 subsequent siblings)
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
As an arrangment for Netlink command moving to the common library,
reduce the net/mlx5 dependencies.
Replace ethdev class command parameters.
Improve Netlink sequence number mechanism to be controlled by the
mlx5 Netlink mechanism.
Move mlx5_nl_check_switch_info to mlx5_nl.c since it is the only one
which uses it.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  10 +-
 drivers/net/mlx5/mlx5.h        |   3 -
 drivers/net/mlx5/mlx5_ethdev.c |  49 ------
 drivers/net/mlx5/mlx5_mac.c    |  14 +-
 drivers/net/mlx5/mlx5_nl.c     | 329 +++++++++++++++++++++++++----------------
 drivers/net/mlx5/mlx5_nl.h     |  23 +--
 drivers/net/mlx5/mlx5_rxmode.c |  12 +-
 drivers/net/mlx5/mlx5_vlan.c   |   1 -
 8 files changed, 236 insertions(+), 205 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ca5bc14..3b8de8f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1270,7 +1270,9 @@ struct mlx5_flow_id_pool *
 	if (priv->reta_idx != NULL)
 		rte_free(priv->reta_idx);
 	if (priv->config.vf)
-		mlx5_nl_mac_addr_flush(dev);
+		mlx5_nl_mac_addr_flush(priv->nl_socket_route, mlx5_ifindex(dev),
+				       dev->data->mac_addrs,
+				       MLX5_MAX_MAC_ADDRESSES, priv->mac_own);
 	if (priv->nl_socket_route >= 0)
 		close(priv->nl_socket_route);
 	if (priv->nl_socket_rdma >= 0)
@@ -2330,7 +2332,6 @@ struct mlx5_flow_id_pool *
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
-	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
 	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
@@ -2650,7 +2651,10 @@ struct mlx5_flow_id_pool *
 	/* Register MAC address. */
 	claim_zero(mlx5_mac_addr_add(eth_dev, &mac, 0, 0));
 	if (config.vf && config.vf_nl_en)
-		mlx5_nl_mac_addr_sync(eth_dev);
+		mlx5_nl_mac_addr_sync(priv->nl_socket_route,
+				      mlx5_ifindex(eth_dev),
+				      eth_dev->data->mac_addrs,
+				      MLX5_MAX_MAC_ADDRESSES);
 	TAILQ_INIT(&priv->flows);
 	TAILQ_INIT(&priv->ctrl_flows);
 	TAILQ_INIT(&priv->flow_meters);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 01d0051..9864aa7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -539,7 +539,6 @@ struct mlx5_priv {
 	/* Context for Verbs allocator. */
 	int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */
 	int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */
-	uint32_t nl_sn; /* Netlink message sequence number. */
 	LIST_HEAD(dbrpage, mlx5_devx_dbr_page) dbrpgs; /* Door-bell pages. */
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_flow_id_pool *qrss_id_pool;
@@ -617,8 +616,6 @@ int mlx5_sysfs_switch_info(unsigned int ifindex,
 			   struct mlx5_switch_info *info);
 void mlx5_sysfs_check_switch_info(bool device_dir,
 				  struct mlx5_switch_info *switch_info);
-void mlx5_nl_check_switch_info(bool nun_vf_set,
-			       struct mlx5_switch_info *switch_info);
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
 void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2628e64..5484104 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1891,55 +1891,6 @@ struct mlx5_priv *
 }
 
 /**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
  * Analyze gathered port parameters via sysfs to recognize master
  * and representor devices for E-Switch configuration.
  *
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index a646b90..0ab2a0e 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -74,8 +74,9 @@
 	if (rte_is_zero_ether_addr(&dev->data->mac_addrs[index]))
 		return;
 	if (vf)
-		mlx5_nl_mac_addr_remove(dev, &dev->data->mac_addrs[index],
-					index);
+		mlx5_nl_mac_addr_remove(priv->nl_socket_route,
+					mlx5_ifindex(dev), priv->mac_own,
+					&dev->data->mac_addrs[index], index);
 	memset(&dev->data->mac_addrs[index], 0, sizeof(struct rte_ether_addr));
 }
 
@@ -117,7 +118,9 @@
 		return -rte_errno;
 	}
 	if (vf) {
-		int ret = mlx5_nl_mac_addr_add(dev, mac, index);
+		int ret = mlx5_nl_mac_addr_add(priv->nl_socket_route,
+					       mlx5_ifindex(dev), priv->mac_own,
+					       mac, index);
 
 		if (ret)
 			return ret;
@@ -209,8 +212,9 @@
 			if (priv->master == 1) {
 				priv = dev->data->dev_private;
 				return mlx5_nl_vf_mac_addr_modify
-					(&rte_eth_devices[port_id],
-					 mac_addr, priv->representor_id);
+				       (priv->nl_socket_route,
+					mlx5_ifindex(&rte_eth_devices[port_id]),
+					mac_addr, priv->representor_id);
 			}
 		}
 		rte_errno = -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 3fe4b6f..6b8ca00 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -17,8 +17,11 @@
 #include <unistd.h>
 
 #include <rte_errno.h>
+#include <rte_atomic.h>
+#include <rte_ether.h>
 
 #include "mlx5.h"
+#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /* Size of the buffer to receive kernel messages */
@@ -109,6 +112,11 @@ struct mlx5_nl_ifindex_data {
 	uint32_t portnum; /**< IB device max port number (out). */
 };
 
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
 /**
  * Opens a Netlink socket.
  *
@@ -369,8 +377,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Get bridge MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac[out]
  *   Pointer to the array table of MAC addresses to fill.
  *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
@@ -381,11 +391,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_list(struct rte_eth_dev *dev, struct rte_ether_addr (*mac)[],
-		      int *mac_n)
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr	hdr;
 		struct ifinfomsg ifm;
@@ -404,33 +412,33 @@ struct mlx5_nl_ifindex_data {
 		.mac = mac,
 		.mac_n = 0,
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_request(fd, &req.hdr, sn, &req.ifm,
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
 			      sizeof(struct ifinfomsg));
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, mlx5_nl_mac_addr_cb, &data);
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
 	if (ret < 0)
 		goto error;
 	*mac_n = data.mac_n;
 	return 0;
 error:
-	DRV_LOG(DEBUG, "port %u cannot retrieve MAC address list %s",
-		dev->data->port_id, strerror(rte_errno));
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
 	return -rte_errno;
 }
 
 /**
  * Modify the MAC address neighbour table with Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *   MAC address to consider.
  * @param add
@@ -440,11 +448,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_mac_addr_modify(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			int add)
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ndmsg ndm;
@@ -468,28 +474,26 @@ struct mlx5_nl_ifindex_data {
 			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
-	uint32_t sn = priv->nl_sn++;
 
-	if (priv->nl_socket_route == -1)
+	if (nlsk_fd == -1)
 		return 0;
-	fd = priv->nl_socket_route;
 	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
 	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
 		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
 error:
 	DRV_LOG(DEBUG,
-		"port %u cannot %s MAC address %02X:%02X:%02X:%02X:%02X:%02X"
-		" %s",
-		dev->data->port_id,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
 		add ? "add" : "remove",
 		mac->addr_bytes[0], mac->addr_bytes[1],
 		mac->addr_bytes[2], mac->addr_bytes[3],
@@ -501,8 +505,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Modify the VF MAC address neighbour table with Netlink.
  *
- * @param dev
- *    Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param mac
  *    MAC address to consider.
  * @param vf_index
@@ -512,12 +518,10 @@ struct mlx5_nl_ifindex_data {
  *    0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			   struct rte_ether_addr *mac, int vf_index)
 {
-	int fd, ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
+	int ret;
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifm;
@@ -546,10 +550,10 @@ struct mlx5_nl_ifindex_data {
 			.rta_type = IFLA_VF_MAC,
 		},
 	};
-	uint32_t sn = priv->nl_sn++;
 	struct ifla_vf_mac ivm = {
 		.vf = vf_index,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 
 	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
 	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
@@ -564,13 +568,12 @@ struct mlx5_nl_ifindex_data {
 	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
 					       &req.vf_info_rta);
 
-	fd = priv->nl_socket_route;
-	if (fd < 0)
+	if (nlsk_fd < 0)
 		return -1;
-	ret = mlx5_nl_send(fd, &req.hdr, sn);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		goto error;
-	ret = mlx5_nl_recv(fd, sn, NULL, NULL);
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
 	if (ret < 0)
 		goto error;
 	return 0;
@@ -589,8 +592,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Add a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to register.
  * @param index
@@ -600,15 +607,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
 		     uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_modify(dev, mac, 1);
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
 	if (!ret)
-		BITFIELD_SET(priv->mac_own, index);
+		BITFIELD_SET(mac_own, index);
 	if (ret == -EEXIST)
 		return 0;
 	return ret;
@@ -617,8 +624,12 @@ struct mlx5_nl_ifindex_data {
 /**
  * Remove a MAC address.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  * @param mac
  *   MAC address to remove.
  * @param index
@@ -628,46 +639,50 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			uint32_t index)
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	BITFIELD_RESET(priv->mac_own, index);
-	return mlx5_nl_mac_addr_modify(dev, mac, 0);
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
 }
 
 /**
  * Synchronize Netlink bridge table to the internal table.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
  */
 void
-mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
 {
-	struct rte_ether_addr macs[MLX5_MAX_MAC_ADDRESSES];
+	struct rte_ether_addr macs[n];
 	int macs_n = 0;
 	int i;
 	int ret;
 
-	ret = mlx5_nl_mac_addr_list(dev, &macs, &macs_n);
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
 	if (ret)
 		return;
 	for (i = 0; i != macs_n; ++i) {
 		int j;
 
 		/* Verify the address is not in the array yet. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j)
-			if (rte_is_same_ether_addr(&macs[i],
-					       &dev->data->mac_addrs[j]))
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
 				break;
-		if (j != MLX5_MAX_MAC_ADDRESSES)
+		if (j != n)
 			continue;
 		/* Find the first entry available. */
-		for (j = 0; j != MLX5_MAX_MAC_ADDRESSES; ++j) {
-			if (rte_is_zero_ether_addr(&dev->data->mac_addrs[j])) {
-				dev->data->mac_addrs[j] = macs[i];
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
 				break;
 			}
 		}
@@ -677,28 +692,40 @@ struct mlx5_nl_ifindex_data {
 /**
  * Flush all added MAC addresses.
  *
- * @param dev
- *   Pointer to Ethernet device.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
  */
 void
-mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev)
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 
-	for (i = MLX5_MAX_MAC_ADDRESSES - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &dev->data->mac_addrs[i];
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
 
-		if (BITFIELD_ISSET(priv->mac_own, i))
-			mlx5_nl_mac_addr_remove(dev, m, i);
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
 	}
 }
 
 /**
  * Enable promiscuous / all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param flags
  *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
  * @param enable
@@ -708,10 +735,9 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_device_flags(struct rte_eth_dev *dev, uint32_t flags, int enable)
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	unsigned int iface_idx = mlx5_ifindex(dev);
 	struct {
 		struct nlmsghdr hdr;
 		struct ifinfomsg ifi;
@@ -727,14 +753,13 @@ struct mlx5_nl_ifindex_data {
 			.ifi_index = iface_idx,
 		},
 	};
-	int fd;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (priv->nl_socket_route < 0)
+	if (nlsk_fd < 0)
 		return 0;
-	fd = priv->nl_socket_route;
-	ret = mlx5_nl_send(fd, &req.hdr, priv->nl_sn++);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
 	if (ret < 0)
 		return ret;
 	return 0;
@@ -743,8 +768,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable promiscuous mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -752,14 +779,14 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_promisc(struct rte_eth_dev *dev, int enable)
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_PROMISC, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s promisc mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -767,8 +794,10 @@ struct mlx5_nl_ifindex_data {
 /**
  * Enable all multicast mode through Netlink.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
  * @param enable
  *   Nonzero to enable, disable otherwise.
  *
@@ -776,14 +805,15 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable)
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
 {
-	int ret = mlx5_nl_device_flags(dev, IFF_ALLMULTI, enable);
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
 
 	if (ret)
 		DRV_LOG(DEBUG,
-			"port %u cannot %s allmulti mode: Netlink error %s",
-			dev->data->port_id, enable ? "enable" : "disable",
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
 			strerror(rte_errno));
 	return ret;
 }
@@ -879,7 +909,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
 		.flags = 0,
@@ -900,19 +929,20 @@ struct mlx5_nl_ifindex_data {
 		},
 	};
 	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
 	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
 		goto error;
 	data.flags = 0;
-	++seq;
+	sn = MLX5_NL_SN_GENERATE;
 	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
 					     RDMA_NLDEV_CMD_PORT_GET);
 	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
@@ -927,10 +957,10 @@ struct mlx5_nl_ifindex_data {
 	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
 	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
 	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -959,7 +989,6 @@ struct mlx5_nl_ifindex_data {
 unsigned int
 mlx5_nl_portnum(int nl, const char *name)
 {
-	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.flags = 0,
 		.name = name,
@@ -972,12 +1001,13 @@ struct mlx5_nl_ifindex_data {
 					       RDMA_NLDEV_CMD_GET),
 		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req, seq);
+	ret = mlx5_nl_send(nl, &req, sn);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
@@ -992,6 +1022,55 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
@@ -1072,7 +1151,6 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_switch_info(int nl, unsigned int ifindex,
 		    struct mlx5_switch_info *info)
 {
-	uint32_t seq = random();
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
@@ -1096,11 +1174,12 @@ struct mlx5_nl_ifindex_data {
 		},
 		.extmask = RTE_LE32(1),
 	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
-	ret = mlx5_nl_send(nl, &req.nh, seq);
+	ret = mlx5_nl_send(nl, &req.nh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, seq, mlx5_nl_switch_info_cb, info);
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
 	if (info->master && info->representor) {
 		DRV_LOG(ERR, "ifindex %u device is recognized as master"
 			     " and as representor", ifindex);
@@ -1122,6 +1201,7 @@ struct mlx5_nl_ifindex_data {
 mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 		      uint32_t ifindex)
 {
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 	struct {
 		struct nlmsghdr nh;
@@ -1139,18 +1219,12 @@ struct mlx5_nl_ifindex_data {
 	};
 
 	if (ifindex) {
-		++vmwa->nl_sn;
-		if (!vmwa->nl_sn)
-			++vmwa->nl_sn;
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, vmwa->nl_sn);
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
 		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket,
-					   vmwa->nl_sn,
-					   NULL, NULL);
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting"
-					 " VLAN WA ifindex %u, %d",
-					 ifindex, ret);
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
 	}
 }
 
@@ -1202,8 +1276,7 @@ struct mlx5_nl_ifindex_data {
  */
 uint32_t
 mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex,
-		      uint16_t tag)
+			 uint32_t ifindex, uint16_t tag)
 {
 	struct nlmsghdr *nlh;
 	struct ifinfomsg *ifm;
@@ -1220,12 +1293,10 @@ struct mlx5_nl_ifindex_data {
 		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
 	struct nlattr *na_info;
 	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
 	int ret;
 
 	memset(buf, 0, sizeof(buf));
-	++vmwa->nl_sn;
-	if (!vmwa->nl_sn)
-		++vmwa->nl_sn;
 	nlh = (struct nlmsghdr *)buf;
 	nlh->nlmsg_len = sizeof(struct nlmsghdr);
 	nlh->nlmsg_type = RTM_NEWLINK;
@@ -1249,20 +1320,18 @@ struct mlx5_nl_ifindex_data {
 	nl_attr_nest_end(nlh, na_vlan);
 	nl_attr_nest_end(nlh, na_info);
 	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, vmwa->nl_sn);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
 	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, vmwa->nl_sn, NULL, NULL);
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
 	if (ret < 0) {
-		DRV_LOG(WARNING,
-			"netlink: VLAN %s create failure (%d)",
-			name, ret);
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
 	}
 	// Try to get ifindex of created or pre-existing device.
 	ret = if_nametoindex(name);
 	if (!ret) {
-		DRV_LOG(WARNING,
-			"VLAN %s failed to get index (%d)",
-			name, errno);
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
 		return 0;
 	}
 	return ret;
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
index 7903673..9be87c0 100644
--- a/drivers/net/mlx5/mlx5_nl.h
+++ b/drivers/net/mlx5/mlx5_nl.h
@@ -39,30 +39,33 @@ struct mlx5_nl_vlan_dev {
  */
 struct mlx5_nl_vlan_vmwa_context {
 	int nl_socket;
-	uint32_t nl_sn;
 	uint32_t vf_ifindex;
 	struct mlx5_nl_vlan_dev vlan_dev[4096];
 };
 
 
 int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
-			 uint32_t index);
-int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct rte_ether_addr *mac,
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
 			    uint32_t index);
-void mlx5_nl_mac_addr_sync(struct rte_eth_dev *dev);
-void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
-int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
-int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
 unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(struct rte_eth_dev *dev,
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
 			       struct rte_ether_addr *mac, int vf_index);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
 void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			   uint32_t ifindex);
+			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
 
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 760cc2f..84c8b05 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -47,7 +47,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 1);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      1);
 		if (ret)
 			return ret;
 	}
@@ -80,7 +81,8 @@
 
 	dev->data->promiscuous = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_promisc(dev, 0);
+		ret = mlx5_nl_promisc(priv->nl_socket_route, mlx5_ifindex(dev),
+				      0);
 		if (ret)
 			return ret;
 	}
@@ -120,7 +122,8 @@
 		return 0;
 	}
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 1);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       1);
 		if (ret)
 			goto error;
 	}
@@ -153,7 +156,8 @@
 
 	dev->data->all_multicast = 0;
 	if (priv->config.vf) {
-		ret = mlx5_nl_allmulti(dev, 0);
+		ret = mlx5_nl_allmulti(priv->nl_socket_route, mlx5_ifindex(dev),
+				       0);
 		if (ret)
 			goto error;
 	}
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fb52d8f..fc1a91c 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -304,7 +304,6 @@ struct mlx5_nl_vlan_vmwa_context *
 		rte_free(vmwa);
 		return NULL;
 	}
-	vmwa->nl_sn = random();
 	vmwa->vf_ifindex = ifindex;
 	/* Cleanup for existing VLAN devices. */
 	return vmwa;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 24/25] common/mlx5: share Netlink commands
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (22 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
  2020-01-30 12:26       ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Raslan Darawsheh
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Move Netlink mechanism and its dependencies from net/mlx5 to
common/mlx5 in order to be ready to use by other mlx5 drivers.
The dependencies are BITFIELD defines, the ppc64 compilation workaround
for bool type and the function mlx5_translate_port_name.
Update build mechanism accordingly.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |    3 +-
 drivers/common/mlx5/meson.build                 |    1 +
 drivers/common/mlx5/mlx5_common.c               |   55 +
 drivers/common/mlx5/mlx5_common.h               |   59 +
 drivers/common/mlx5/mlx5_nl.c                   | 1337 ++++++++++++++++++++++
 drivers/common/mlx5/mlx5_nl.h                   |   57 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   16 +
 drivers/net/mlx5/Makefile                       |    1 -
 drivers/net/mlx5/meson.build                    |    1 -
 drivers/net/mlx5/mlx5.h                         |    2 +-
 drivers/net/mlx5/mlx5_defs.h                    |    8 -
 drivers/net/mlx5/mlx5_ethdev.c                  |   55 -
 drivers/net/mlx5/mlx5_nl.c                      | 1338 -----------------------
 drivers/net/mlx5/mlx5_nl.h                      |   72 --
 drivers/net/mlx5/mlx5_vlan.c                    |    2 +-
 15 files changed, 1529 insertions(+), 1478 deletions(-)
 create mode 100644 drivers/common/mlx5/mlx5_nl.c
 create mode 100644 drivers/common/mlx5/mlx5_nl.h
 delete mode 100644 drivers/net/mlx5/mlx5_nl.c
 delete mode 100644 drivers/net/mlx5/mlx5_nl.h
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index b9e9803..6a14b7d 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -15,6 +15,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
 INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
@@ -41,7 +42,7 @@ else
 LDLIBS += -libverbs -lmlx5
 endif
 
-LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs
+LDLIBS += -lrte_eal -lrte_pci -lrte_kvargs -lrte_net
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual -DNDEBUG -UPEDANTIC
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index b88822e..34cb7b9 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -42,6 +42,7 @@ if build
 	sources = files(
 		'mlx5_devx_cmds.c',
 		'mlx5_common.c',
+		'mlx5_nl.c',
 	)
 	if not pmd_dlopen
 		sources += files('mlx5_glue.c')
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 52f191a..922794e 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -107,6 +107,61 @@ enum mlx5_class
 	return ret;
 }
 
+/**
+ * Extract port name, as a number, from sysfs or netlink information.
+ *
+ * @param[in] port_name_in
+ *   String representing the port name.
+ * @param[out] port_info_out
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   port_name field set according to recognized name format.
+ */
+void
+mlx5_translate_port_name(const char *port_name_in,
+			 struct mlx5_switch_info *port_info_out)
+{
+	char pf_c1, pf_c2, vf_c1, vf_c2;
+	char *end;
+	int sc_items;
+
+	/*
+	 * Check for port-name as a string of the form pf0vf0
+	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
+			  &pf_c1, &pf_c2, &port_info_out->pf_num,
+			  &vf_c1, &vf_c2, &port_info_out->port_name);
+	if (sc_items == 6 &&
+	    pf_c1 == 'p' && pf_c2 == 'f' &&
+	    vf_c1 == 'v' && vf_c2 == 'f') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
+		return;
+	}
+	/*
+	 * Check for port-name as a string of the form p0
+	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
+	 */
+	sc_items = sscanf(port_name_in, "%c%d",
+			  &pf_c1, &port_info_out->port_name);
+	if (sc_items == 2 && pf_c1 == 'p') {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
+		return;
+	}
+	/* Check for port-name as a number (support kernel ver < 5.0 */
+	errno = 0;
+	port_info_out->port_name = strtol(port_name_in, &end, 0);
+	if (!errno &&
+	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
+		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
+		return;
+	}
+	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
+	return;
+}
+
 #ifdef RTE_IBVERBS_LINK_DLOPEN
 
 /**
diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h
index 2988f4b..d9c2d26 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -18,6 +18,35 @@
 
 
 /*
+ * Compilation workaround for PPC64 when AltiVec is fully enabled, e.g. std=c11.
+ * Otherwise there would be a type conflict between stdbool and altivec.
+ */
+#if defined(__PPC64__) && !defined(__APPLE_ALTIVEC__)
+#undef bool
+/* redefine as in stdbool.h */
+#define bool _Bool
+#endif
+
+/* Bit-field manipulation. */
+#define BITFIELD_DECLARE(bf, type, size) \
+	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \
+		 !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
+#define BITFIELD_DEFINE(bf, type, size) \
+	BITFIELD_DECLARE((bf), type, (size)) = { 0 }
+#define BITFIELD_SET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |= \
+		((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_RESET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &= \
+		~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_ISSET(bf, b) \
+	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+	 !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] & \
+	     ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
+
+/*
  * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
  * manner.
  */
@@ -112,6 +141,33 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF = 0x101e,
 };
 
+/* Maximum number of simultaneous unicast MAC addresses. */
+#define MLX5_MAX_UC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous Multicast MAC addresses. */
+#define MLX5_MAX_MC_MAC_ADDRESSES 128
+/* Maximum number of simultaneous MAC addresses. */
+#define MLX5_MAX_MAC_ADDRESSES \
+	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
+
+/* Recognized Infiniband device physical port name types. */
+enum mlx5_nl_phys_port_name_type {
+	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
+	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
+	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
+};
+
+/** Switch information returned by mlx5_nl_switch_info(). */
+struct mlx5_switch_info {
+	uint32_t master:1; /**< Master device. */
+	uint32_t representor:1; /**< Representor device. */
+	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
+	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
+	int32_t port_name; /**< Representor port name. */
+	uint64_t switch_id; /**< Switch identifier. */
+};
+
 /* CQE status. */
 enum mlx5_cqe_status {
 	MLX5_CQE_STATUS_SW_OWN = -1,
@@ -159,6 +215,9 @@ enum mlx5_class {
 	MLX5_CLASS_VDPA,
 	MLX5_CLASS_INVALID,
 };
+
 enum mlx5_class mlx5_class_get(struct rte_devargs *devargs);
+void mlx5_translate_port_name(const char *port_name_in,
+			      struct mlx5_switch_info *port_info_out);
 
 #endif /* RTE_PMD_MLX5_COMMON_H_ */
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
new file mode 100644
index 0000000..b4fc053
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -0,0 +1,1337 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018 6WIND S.A.
+ * Copyright 2018 Mellanox Technologies, Ltd
+ */
+
+#include <errno.h>
+#include <linux/if_link.h>
+#include <linux/rtnetlink.h>
+#include <net/if.h>
+#include <rdma/rdma_netlink.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdalign.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <stdbool.h>
+
+#include <rte_errno.h>
+#include <rte_atomic.h>
+
+#include "mlx5_nl.h"
+#include "mlx5_common_utils.h"
+
+/* Size of the buffer to receive kernel messages */
+#define MLX5_NL_BUF_SIZE (32 * 1024)
+/* Send buffer size for the Netlink socket */
+#define MLX5_SEND_BUF_SIZE 32768
+/* Receive buffer size for the Netlink socket */
+#define MLX5_RECV_BUF_SIZE 32768
+
+/** Parameters of VLAN devices created by driver. */
+#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
+/*
+ * Define NDA_RTA as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef MLX5_NDA_RTA
+#define MLX5_NDA_RTA(r) \
+	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
+#endif
+/*
+ * Define NLMSG_TAIL as defined in iproute2 sources.
+ *
+ * see in iproute2 sources file include/libnetlink.h
+ */
+#ifndef NLMSG_TAIL
+#define NLMSG_TAIL(nmsg) \
+	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
+#endif
+/*
+ * The following definitions are normally found in rdma/rdma_netlink.h,
+ * however they are so recent that most systems do not expose them yet.
+ */
+#ifndef HAVE_RDMA_NL_NLDEV
+#define RDMA_NL_NLDEV 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_GET
+#define RDMA_NLDEV_CMD_GET 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
+#define RDMA_NLDEV_CMD_PORT_GET 5
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
+#define RDMA_NLDEV_ATTR_DEV_INDEX 1
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
+#define RDMA_NLDEV_ATTR_DEV_NAME 2
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
+#define RDMA_NLDEV_ATTR_PORT_INDEX 3
+#endif
+#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
+#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
+#endif
+
+/* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
+#ifndef HAVE_IFLA_PHYS_SWITCH_ID
+#define IFLA_PHYS_SWITCH_ID 36
+#endif
+#ifndef HAVE_IFLA_PHYS_PORT_NAME
+#define IFLA_PHYS_PORT_NAME 38
+#endif
+
+/* Add/remove MAC address through Netlink */
+struct mlx5_nl_mac_addr {
+	struct rte_ether_addr (*mac)[];
+	/**< MAC address handled by the device. */
+	int mac_n; /**< Number of addresses in the array. */
+};
+
+#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
+#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
+#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
+#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
+
+/** Data structure used by mlx5_nl_cmdget_cb(). */
+struct mlx5_nl_ifindex_data {
+	const char *name; /**< IB device name (in). */
+	uint32_t flags; /**< found attribute flags (out). */
+	uint32_t ibindex; /**< IB device index (out). */
+	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number (out). */
+};
+
+rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
+
+/* Generate Netlink sequence number. */
+#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
+
+/**
+ * Opens a Netlink socket.
+ *
+ * @param protocol
+ *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
+ *
+ * @return
+ *   A file descriptor on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+int
+mlx5_nl_init(int protocol)
+{
+	int fd;
+	int sndbuf_size = MLX5_SEND_BUF_SIZE;
+	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
+	struct sockaddr_nl local = {
+		.nl_family = AF_NETLINK,
+	};
+	int ret;
+
+	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
+	if (fd == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
+	if (ret == -1) {
+		rte_errno = errno;
+		goto error;
+	}
+	return fd;
+error:
+	close(fd);
+	return -rte_errno;
+}
+
+/**
+ * Send a request message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] ssn
+ *   Sequence number.
+ * @param[in] req
+ *   Pointer to the request structure.
+ * @param[in] len
+ *   Length of the request in bytes.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
+		int len)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov[2] = {
+		{ .iov_base = nh, .iov_len = sizeof(*nh), },
+		{ .iov_base = req, .iov_len = len, },
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = iov,
+		.msg_iovlen = 2,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Send a message to the kernel on the Netlink socket.
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] nh
+ *   The Netlink message send to the kernel.
+ * @param[in] sn
+ *   Sequence number.
+ *
+ * @return
+ *   The number of sent bytes on success, a negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
+{
+	struct sockaddr_nl sa = {
+		.nl_family = AF_NETLINK,
+	};
+	struct iovec iov = {
+		.iov_base = nh,
+		.iov_len = nh->nlmsg_len,
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+	int send_bytes;
+
+	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
+	nh->nlmsg_seq = sn;
+	send_bytes = sendmsg(nlsk_fd, &msg, 0);
+	if (send_bytes < 0) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	return send_bytes;
+}
+
+/**
+ * Receive a message from the kernel on the Netlink socket, following
+ * mlx5_nl_send().
+ *
+ * @param[in] nlsk_fd
+ *   The Netlink socket file descriptor used for communication.
+ * @param[in] sn
+ *   Sequence number.
+ * @param[in] cb
+ *   The callback function to call for each Netlink message received.
+ * @param[in, out] arg
+ *   Custom arguments for the callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
+	     void *arg)
+{
+	struct sockaddr_nl sa;
+	char buf[MLX5_RECV_BUF_SIZE];
+	struct iovec iov = {
+		.iov_base = buf,
+		.iov_len = sizeof(buf),
+	};
+	struct msghdr msg = {
+		.msg_name = &sa,
+		.msg_namelen = sizeof(sa),
+		.msg_iov = &iov,
+		/* One message at a time */
+		.msg_iovlen = 1,
+	};
+	int multipart = 0;
+	int ret = 0;
+
+	do {
+		struct nlmsghdr *nh;
+		int recv_bytes = 0;
+
+		do {
+			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
+			if (recv_bytes == -1) {
+				rte_errno = errno;
+				return -rte_errno;
+			}
+			nh = (struct nlmsghdr *)buf;
+		} while (nh->nlmsg_seq != sn);
+		for (;
+		     NLMSG_OK(nh, (unsigned int)recv_bytes);
+		     nh = NLMSG_NEXT(nh, recv_bytes)) {
+			if (nh->nlmsg_type == NLMSG_ERROR) {
+				struct nlmsgerr *err_data = NLMSG_DATA(nh);
+
+				if (err_data->error < 0) {
+					rte_errno = -err_data->error;
+					return -rte_errno;
+				}
+				/* Ack message. */
+				return 0;
+			}
+			/* Multi-part msgs and their trailing DONE message. */
+			if (nh->nlmsg_flags & NLM_F_MULTI) {
+				if (nh->nlmsg_type == NLMSG_DONE)
+					return 0;
+				multipart = 1;
+			}
+			if (cb) {
+				ret = cb(nh, arg);
+				if (ret < 0)
+					return ret;
+			}
+		}
+	} while (multipart);
+	return ret;
+}
+
+/**
+ * Parse Netlink message to retrieve the bridge MAC address.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_mac_addr *data = arg;
+	struct ndmsg *r = NLMSG_DATA(nh);
+	struct rtattr *attribute;
+	int len;
+
+	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
+	for (attribute = MLX5_NDA_RTA(r);
+	     RTA_OK(attribute, len);
+	     attribute = RTA_NEXT(attribute, len)) {
+		if (attribute->rta_type == NDA_LLADDR) {
+			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
+				DRV_LOG(WARNING,
+					"not enough room to finalize the"
+					" request");
+				rte_errno = ENOMEM;
+				return -rte_errno;
+			}
+#ifndef NDEBUG
+			char m[18];
+
+			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
+			DRV_LOG(DEBUG, "bridge MAC address %s", m);
+#endif
+			memcpy(&(*data->mac)[data->mac_n++],
+			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
+		}
+	}
+	return 0;
+}
+
+/**
+ * Get bridge MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac[out]
+ *   Pointer to the array table of MAC addresses to fill.
+ *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
+ * @param mac_n[out]
+ *   Number of entries filled in MAC array.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr (*mac)[], int *mac_n)
+{
+	struct {
+		struct nlmsghdr	hdr;
+		struct ifinfomsg ifm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_GETNEIGH,
+			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		},
+		.ifm = {
+			.ifi_family = PF_BRIDGE,
+			.ifi_index = iface_idx,
+		},
+	};
+	struct mlx5_nl_mac_addr data = {
+		.mac = mac,
+		.mac_n = 0,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
+			      sizeof(struct ifinfomsg));
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
+	if (ret < 0)
+		goto error;
+	*mac_n = data.mac_n;
+	return 0;
+error:
+	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
+		iface_idx, strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *   MAC address to consider.
+ * @param add
+ *   1 to add the MAC address, 0 to remove the MAC address.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			struct rte_ether_addr *mac, int add)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ndmsg ndm;
+		struct rtattr rta;
+		uint8_t buffer[RTE_ETHER_ADDR_LEN];
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+				NLM_F_EXCL | NLM_F_ACK,
+			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
+		},
+		.ndm = {
+			.ndm_family = PF_BRIDGE,
+			.ndm_state = NUD_NOARP | NUD_PERMANENT,
+			.ndm_ifindex = iface_idx,
+			.ndm_flags = NTF_SELF,
+		},
+		.rta = {
+			.rta_type = NDA_LLADDR,
+			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	if (nlsk_fd == -1)
+		return 0;
+	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.rta.rta_len);
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Interface %u cannot %s MAC address"
+		" %02X:%02X:%02X:%02X:%02X:%02X %s",
+		iface_idx,
+		add ? "add" : "remove",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Modify the VF MAC address neighbour table with Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac
+ *    MAC address to consider.
+ * @param vf_index
+ *    VF index.
+ *
+ * @return
+ *    0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac, int vf_index)
+{
+	int ret;
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifm;
+		struct rtattr vf_list_rta;
+		struct rtattr vf_info_rta;
+		struct rtattr vf_mac_rta;
+		struct ifla_vf_mac ivm;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+			.nlmsg_type = RTM_BASE,
+		},
+		.ifm = {
+			.ifi_index = iface_idx,
+		},
+		.vf_list_rta = {
+			.rta_type = IFLA_VFINFO_LIST,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_info_rta = {
+			.rta_type = IFLA_VF_INFO,
+			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
+		},
+		.vf_mac_rta = {
+			.rta_type = IFLA_VF_MAC,
+		},
+	};
+	struct ifla_vf_mac ivm = {
+		.vf = vf_index,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+
+	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
+	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
+
+	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
+	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
+		RTA_ALIGN(req.vf_list_rta.rta_len) +
+		RTA_ALIGN(req.vf_info_rta.rta_len) +
+		RTA_ALIGN(req.vf_mac_rta.rta_len);
+	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_list_rta);
+	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
+					       &req.vf_info_rta);
+
+	if (nlsk_fd < 0)
+		return -1;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		goto error;
+	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0)
+		goto error;
+	return 0;
+error:
+	DRV_LOG(ERR,
+		"representor %u cannot set VF MAC address "
+		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
+		vf_index,
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5],
+		strerror(rte_errno));
+	return -rte_errno;
+}
+
+/**
+ * Add a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
+		     uint64_t *mac_own, struct rte_ether_addr *mac,
+		     uint32_t index)
+{
+	int ret;
+
+	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
+	if (!ret)
+		BITFIELD_SET(mac_own, index);
+	if (ret == -EEXIST)
+		return 0;
+	return ret;
+}
+
+/**
+ * Remove a MAC address.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ * @param mac
+ *   MAC address to remove.
+ * @param index
+ *   MAC address index.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			struct rte_ether_addr *mac, uint32_t index)
+{
+	BITFIELD_RESET(mac_own, index);
+	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
+}
+
+/**
+ * Synchronize Netlink bridge table to the internal table.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param mac_addrs
+ *   Mac addresses array to sync.
+ * @param n
+ *   @p mac_addrs array size.
+ */
+void
+mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+		      struct rte_ether_addr *mac_addrs, int n)
+{
+	struct rte_ether_addr macs[n];
+	int macs_n = 0;
+	int i;
+	int ret;
+
+	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
+	if (ret)
+		return;
+	for (i = 0; i != macs_n; ++i) {
+		int j;
+
+		/* Verify the address is not in the array yet. */
+		for (j = 0; j != n; ++j)
+			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
+				break;
+		if (j != n)
+			continue;
+		/* Find the first entry available. */
+		for (j = 0; j != n; ++j) {
+			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
+				mac_addrs[j] = macs[i];
+				break;
+			}
+		}
+	}
+}
+
+/**
+ * Flush all added MAC addresses.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param[in] mac_addrs
+ *   Mac addresses array to flush.
+ * @param n
+ *   @p mac_addrs array size.
+ * @param mac_own
+ *   BITFIELD_DECLARE array to store the mac.
+ */
+void
+mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+		       struct rte_ether_addr *mac_addrs, int n,
+		       uint64_t *mac_own)
+{
+	int i;
+
+	for (i = n - 1; i >= 0; --i) {
+		struct rte_ether_addr *m = &mac_addrs[i];
+
+		if (BITFIELD_ISSET(mac_own, i))
+			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
+						i);
+	}
+}
+
+/**
+ * Enable promiscuous / all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param flags
+ *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
+		     int enable)
+{
+	struct {
+		struct nlmsghdr hdr;
+		struct ifinfomsg ifi;
+	} req = {
+		.hdr = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_NEWLINK,
+			.nlmsg_flags = NLM_F_REQUEST,
+		},
+		.ifi = {
+			.ifi_flags = enable ? flags : 0,
+			.ifi_change = flags,
+			.ifi_index = iface_idx,
+		},
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
+	if (nlsk_fd < 0)
+		return 0;
+	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+/**
+ * Enable promiscuous mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s promisc mode: Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Enable all multicast mode through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] iface_idx
+ *   Net device interface index.
+ * @param enable
+ *   Nonzero to enable, disable otherwise.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
+{
+	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
+				       enable);
+
+	if (ret)
+		DRV_LOG(DEBUG,
+			"Interface %u cannot %s allmulti : Netlink error %s",
+			iface_idx, enable ? "enable" : "disable",
+			strerror(rte_errno));
+	return ret;
+}
+
+/**
+ * Process network interface information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_nl_ifindex_data *data = arg;
+	struct mlx5_nl_ifindex_data local = {
+		.flags = 0,
+	};
+	size_t off = NLMSG_HDRLEN;
+
+	if (nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
+	    nh->nlmsg_type !=
+	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct nlattr *na = (void *)((uintptr_t)nh + off);
+		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
+
+		if (na->nla_len > nh->nlmsg_len - off)
+			goto error;
+		switch (na->nla_type) {
+		case RDMA_NLDEV_ATTR_DEV_INDEX:
+			local.ibindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_DEV_NAME:
+			if (!strcmp(payload, data->name))
+				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
+			break;
+		case RDMA_NLDEV_ATTR_NDEV_INDEX:
+			local.ifindex = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
+			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			local.portnum = *(uint32_t *)payload;
+			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
+			break;
+		default:
+			break;
+		}
+		off += NLA_ALIGN(na->nla_len);
+	}
+	/*
+	 * It is possible to have multiple messages for all
+	 * Infiniband devices in the system with appropriate name.
+	 * So we should gather parameters locally and copy to
+	 * query context only in case of coinciding device name.
+	 */
+	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
+		data->flags = local.flags;
+		data->ibindex = local.ibindex;
+		data->ifindex = local.ifindex;
+		data->portnum = local.portnum;
+	}
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get index of network interface associated with some IB device.
+ *
+ * This is the only somewhat safe method to avoid resorting to heuristics
+ * when faced with port representors. Unfortunately it requires at least
+ * Linux 4.17.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ * @param[in] pindex
+ *   IB device port index, starting from 1
+ * @return
+ *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
+ *   is set.
+ */
+unsigned int
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.flags = 0,
+		.ibindex = 0, /* Determined during first pass. */
+		.ifindex = 0, /* Determined during second pass. */
+	};
+	union {
+		struct nlmsghdr nh;
+		uint8_t buf[NLMSG_HDRLEN +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
+			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(0),
+			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+						       RDMA_NLDEV_CMD_GET),
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+		},
+	};
+	struct nlattr *na;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
+		goto error;
+	data.flags = 0;
+	sn = MLX5_NL_SN_GENERATE;
+	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					     RDMA_NLDEV_CMD_PORT_GET);
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
+	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
+	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
+	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &data.ibindex, sizeof(data.ibindex));
+	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
+	na->nla_len = NLA_HDRLEN + sizeof(pindex);
+	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
+	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
+	       &pindex, sizeof(pindex));
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
+	    !data.ifindex)
+		goto error;
+	return data.ifindex;
+error:
+	rte_errno = ENODEV;
+	return 0;
+}
+
+/**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	struct mlx5_nl_ifindex_data data = {
+		.flags = 0,
+		.name = name,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, sn);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
+	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
+	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
+ * Analyze gathered port parameters via Netlink to recognize master
+ * and representor devices for E-Switch configuration.
+ *
+ * @param[in] num_vf_set
+ *   flag of presence of number of VFs port attribute.
+ * @param[inout] switch_info
+ *   Port information, including port name as a number and port name
+ *   type if recognized
+ *
+ * @return
+ *   master and representor flags are set in switch_info according to
+ *   recognized parameters (if any).
+ */
+static void
+mlx5_nl_check_switch_info(bool num_vf_set,
+			  struct mlx5_switch_info *switch_info)
+{
+	switch (switch_info->name_type) {
+	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
+		/*
+		 * Name is not recognized, assume the master,
+		 * check the number of VFs key presence.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
+		/*
+		 * Name is not set, this assumes the legacy naming
+		 * schema for master, just check if there is a
+		 * number of VFs key.
+		 */
+		switch_info->master = num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
+		/* New uplink naming schema recognized. */
+		switch_info->master = 1;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
+		/* Legacy representors naming schema. */
+		switch_info->representor = !num_vf_set;
+		break;
+	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
+		/* New representors naming schema. */
+		switch_info->representor = 1;
+		break;
+	}
+}
+
+/**
+ * Process switch information from Netlink message.
+ *
+ * @param nh
+ *   Pointer to Netlink message header.
+ * @param arg
+ *   Opaque data pointer for this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
+{
+	struct mlx5_switch_info info = {
+		.master = 0,
+		.representor = 0,
+		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
+		.port_name = 0,
+		.switch_id = 0,
+	};
+	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+	bool switch_id_set = false;
+	bool num_vf_set = false;
+
+	if (nh->nlmsg_type != RTM_NEWLINK)
+		goto error;
+	while (off < nh->nlmsg_len) {
+		struct rtattr *ra = (void *)((uintptr_t)nh + off);
+		void *payload = RTA_DATA(ra);
+		unsigned int i;
+
+		if (ra->rta_len > nh->nlmsg_len - off)
+			goto error;
+		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
+		case IFLA_PHYS_PORT_NAME:
+			mlx5_translate_port_name((char *)payload, &info);
+			break;
+		case IFLA_PHYS_SWITCH_ID:
+			info.switch_id = 0;
+			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
+				info.switch_id <<= 8;
+				info.switch_id |= ((uint8_t *)payload)[i];
+			}
+			switch_id_set = true;
+			break;
+		}
+		off += RTA_ALIGN(ra->rta_len);
+	}
+	if (switch_id_set) {
+		/* We have some E-Switch configuration. */
+		mlx5_nl_check_switch_info(num_vf_set, &info);
+	}
+	assert(!(info.master && info.representor));
+	memcpy(arg, &info, sizeof(info));
+	return 0;
+error:
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Get switch information associated with network interface.
+ *
+ * @param nl
+ *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
+ * @param ifindex
+ *   Network interface index.
+ * @param[out] info
+ *   Switch information object, populated in case of success.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_switch_info(int nl, unsigned int ifindex,
+		    struct mlx5_switch_info *info)
+{
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
+			.nlmsg_type = RTM_GETLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
+	};
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req.nh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
+	if (info->master && info->representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		ret = -rte_errno;
+	}
+	return ret;
+}
+
+/*
+ * Delete VLAN network device by ifindex.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Interface index of network device to delete.
+ */
+void
+mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+		      uint32_t ifindex)
+{
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct ifinfomsg info;
+	} req = {
+		.nh = {
+			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+			.nlmsg_type = RTM_DELLINK,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+		},
+		.info = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_index = ifindex,
+		},
+	};
+
+	if (ifindex) {
+		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
+		if (ret >= 0)
+			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+		if (ret < 0)
+			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
+				" ifindex %u, %d", ifindex, ret);
+	}
+}
+
+/* Set of subroutines to build Netlink message. */
+static struct nlattr *
+nl_msg_tail(struct nlmsghdr *nlh)
+{
+	return (struct nlattr *)
+		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
+}
+
+static void
+nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
+{
+	struct nlattr *nla = nl_msg_tail(nlh);
+
+	nla->nla_type = type;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
+	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+
+	if (alen)
+		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
+}
+
+static struct nlattr *
+nl_attr_nest_start(struct nlmsghdr *nlh, int type)
+{
+	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
+
+	nl_attr_put(nlh, type, NULL, 0);
+	return nest;
+}
+
+static void
+nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
+{
+	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
+}
+
+/*
+ * Create network VLAN device with specified VLAN tag.
+ *
+ * @param[in] tcf
+ *   Context object initialized by mlx5_nl_vlan_vmwa_init().
+ * @param[in] ifindex
+ *   Base network interface index.
+ * @param[in] tag
+ *   VLAN tag for VLAN network device to create.
+ */
+uint32_t
+mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			 uint32_t ifindex, uint16_t tag)
+{
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
+
+	alignas(RTE_CACHE_LINE_SIZE)
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(name)) +
+		    NLMSG_ALIGN(sizeof("vlan")) +
+		    NLMSG_ALIGN(sizeof(uint32_t)) +
+		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
+	struct nlattr *na_info;
+	struct nlattr *na_vlan;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = RTM_NEWLINK;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
+			   NLM_F_EXCL | NLM_F_ACK;
+	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct ifinfomsg);
+	ifm->ifi_family = AF_UNSPEC;
+	ifm->ifi_type = 0;
+	ifm->ifi_index = 0;
+	ifm->ifi_flags = IFF_UP;
+	ifm->ifi_change = 0xffffffff;
+	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
+	ret = snprintf(name, sizeof(name), "%s.%u.%u",
+		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
+	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
+	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
+	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
+	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
+	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
+	nl_attr_nest_end(nlh, na_vlan);
+	nl_attr_nest_end(nlh, na_info);
+	assert(sizeof(buf) >= nlh->nlmsg_len);
+	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
+			ret);
+	}
+	// Try to get ifindex of created or pre-existing device.
+	ret = if_nametoindex(name);
+	if (!ret) {
+		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
+			errno);
+		return 0;
+	}
+	return ret;
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
new file mode 100644
index 0000000..8e66a98
--- /dev/null
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_NL_H_
+#define RTE_PMD_MLX5_NL_H_
+
+#include <linux/netlink.h>
+
+#include <rte_ether.h>
+
+#include "mlx5_common.h"
+
+
+/* VLAN netdev for VLAN workaround. */
+struct mlx5_nl_vlan_dev {
+	uint32_t refcnt;
+	uint32_t ifindex; /**< Own interface index. */
+};
+
+/*
+ * Array of VLAN devices created on the base of VF
+ * used for workaround in virtual environments.
+ */
+struct mlx5_nl_vlan_vmwa_context {
+	int nl_socket;
+	uint32_t vf_ifindex;
+	struct mlx5_nl_vlan_dev vlan_dev[4096];
+};
+
+
+int mlx5_nl_init(int protocol);
+int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
+			 struct rte_ether_addr *mac, uint32_t index);
+int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
+			    uint64_t *mac_own, struct rte_ether_addr *mac,
+			    uint32_t index);
+void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
+			   struct rte_ether_addr *mac_addrs, int n);
+void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
+			    struct rte_ether_addr *mac_addrs, int n,
+			    uint64_t *mac_own);
+int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
+int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
+int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
+			       struct rte_ether_addr *mac, int vf_index);
+int mlx5_nl_switch_info(int nl, unsigned int ifindex,
+			struct mlx5_switch_info *info);
+
+void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
+			      uint32_t ifindex);
+uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
+				  uint32_t ifindex, uint16_t tag);
+
+#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index 3e7038b..f93f5cb 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -28,4 +28,20 @@ DPDK_20.02 {
 	mlx5_devx_get_out_command_status;
 
 	mlx5_dev_to_pci_addr;
+
+	mlx5_nl_allmulti;
+	mlx5_nl_ifindex;
+	mlx5_nl_init;
+	mlx5_nl_mac_addr_add;
+	mlx5_nl_mac_addr_flush;
+	mlx5_nl_mac_addr_remove;
+	mlx5_nl_mac_addr_sync;
+	mlx5_nl_portnum;
+	mlx5_nl_promisc;
+	mlx5_nl_switch_info;
+	mlx5_nl_vf_mac_addr_modify;
+	mlx5_nl_vlan_vmwa_create;
+	mlx5_nl_vlan_vmwa_delete;
+
+	mlx5_translate_port_name;
 };
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index dc6b3c8..d26afbb 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -30,7 +30,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_meter.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_dv.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_flow_verbs.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mp.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_utils.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_socket.c
 
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index e10ef3a..d45be00 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -19,7 +19,6 @@ sources = files(
 	'mlx5_flow_verbs.c',
 	'mlx5_mac.c',
 	'mlx5_mr.c',
-	'mlx5_nl.c',
 	'mlx5_rss.c',
 	'mlx5_rxmode.c',
 	'mlx5_rxq.c',
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9864aa7..a7e7089 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -35,11 +35,11 @@
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
 #include "mlx5_mr.h"
-#include "mlx5_nl.h"
 #include "mlx5_autoconf.h"
 
 /* Request types for IPC. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index dc9b965..9b392ed 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -14,14 +14,6 @@
 /* Reported driver name. */
 #define MLX5_DRIVER_NAME "net_mlx5"
 
-/* Maximum number of simultaneous unicast MAC addresses. */
-#define MLX5_MAX_UC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous Multicast MAC addresses. */
-#define MLX5_MAX_MC_MAC_ADDRESSES 128
-/* Maximum number of simultaneous MAC addresses. */
-#define MLX5_MAX_MAC_ADDRESSES \
-	(MLX5_MAX_UC_MAC_ADDRESSES + MLX5_MAX_MC_MAC_ADDRESSES)
-
 /* Maximum number of simultaneous VLAN filters. */
 #define MLX5_MAX_VLAN_IDS 128
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5484104..b765636 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1940,61 +1940,6 @@ struct mlx5_priv *
 }
 
 /**
- * Extract port name, as a number, from sysfs or netlink information.
- *
- * @param[in] port_name_in
- *   String representing the port name.
- * @param[out] port_info_out
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   port_name field set according to recognized name format.
- */
-void
-mlx5_translate_port_name(const char *port_name_in,
-			 struct mlx5_switch_info *port_info_out)
-{
-	char pf_c1, pf_c2, vf_c1, vf_c2;
-	char *end;
-	int sc_items;
-
-	/*
-	 * Check for port-name as a string of the form pf0vf0
-	 * (support kernel ver >= 5.0 or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%c%d%c%c%d",
-			  &pf_c1, &pf_c2, &port_info_out->pf_num,
-			  &vf_c1, &vf_c2, &port_info_out->port_name);
-	if (sc_items == 6 &&
-	    pf_c1 == 'p' && pf_c2 == 'f' &&
-	    vf_c1 == 'v' && vf_c2 == 'f') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_PFVF;
-		return;
-	}
-	/*
-	 * Check for port-name as a string of the form p0
-	 * (support kernel ver >= 5.0, or OFED ver >= 4.6).
-	 */
-	sc_items = sscanf(port_name_in, "%c%d",
-			  &pf_c1, &port_info_out->port_name);
-	if (sc_items == 2 && pf_c1 == 'p') {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UPLINK;
-		return;
-	}
-	/* Check for port-name as a number (support kernel ver < 5.0 */
-	errno = 0;
-	port_info_out->port_name = strtol(port_name_in, &end, 0);
-	if (!errno &&
-	    (size_t)(end - port_name_in) == strlen(port_name_in)) {
-		port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_LEGACY;
-		return;
-	}
-	port_info_out->name_type = MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN;
-	return;
-}
-
-/**
  * DPDK callback to retrieve plug-in module EEPROM information (type and size).
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
deleted file mode 100644
index 6b8ca00..0000000
--- a/drivers/net/mlx5/mlx5_nl.c
+++ /dev/null
@@ -1,1338 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018 6WIND S.A.
- * Copyright 2018 Mellanox Technologies, Ltd
- */
-
-#include <errno.h>
-#include <linux/if_link.h>
-#include <linux/rtnetlink.h>
-#include <net/if.h>
-#include <rdma/rdma_netlink.h>
-#include <stdbool.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <stdalign.h>
-#include <string.h>
-#include <sys/socket.h>
-#include <unistd.h>
-
-#include <rte_errno.h>
-#include <rte_atomic.h>
-#include <rte_ether.h>
-
-#include "mlx5.h"
-#include "mlx5_nl.h"
-#include "mlx5_utils.h"
-
-/* Size of the buffer to receive kernel messages */
-#define MLX5_NL_BUF_SIZE (32 * 1024)
-/* Send buffer size for the Netlink socket */
-#define MLX5_SEND_BUF_SIZE 32768
-/* Receive buffer size for the Netlink socket */
-#define MLX5_RECV_BUF_SIZE 32768
-
-/** Parameters of VLAN devices created by driver. */
-#define MLX5_VMWA_VLAN_DEVICE_PFX "evmlx"
-/*
- * Define NDA_RTA as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef MLX5_NDA_RTA
-#define MLX5_NDA_RTA(r) \
-	((struct rtattr *)(((char *)(r)) + NLMSG_ALIGN(sizeof(struct ndmsg))))
-#endif
-/*
- * Define NLMSG_TAIL as defined in iproute2 sources.
- *
- * see in iproute2 sources file include/libnetlink.h
- */
-#ifndef NLMSG_TAIL
-#define NLMSG_TAIL(nmsg) \
-	((struct rtattr *)(((char *)(nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len)))
-#endif
-/*
- * The following definitions are normally found in rdma/rdma_netlink.h,
- * however they are so recent that most systems do not expose them yet.
- */
-#ifndef HAVE_RDMA_NL_NLDEV
-#define RDMA_NL_NLDEV 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_GET
-#define RDMA_NLDEV_CMD_GET 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_CMD_PORT_GET
-#define RDMA_NLDEV_CMD_PORT_GET 5
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_INDEX
-#define RDMA_NLDEV_ATTR_DEV_INDEX 1
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_DEV_NAME
-#define RDMA_NLDEV_ATTR_DEV_NAME 2
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_PORT_INDEX
-#define RDMA_NLDEV_ATTR_PORT_INDEX 3
-#endif
-#ifndef HAVE_RDMA_NLDEV_ATTR_NDEV_INDEX
-#define RDMA_NLDEV_ATTR_NDEV_INDEX 50
-#endif
-
-/* These are normally found in linux/if_link.h. */
-#ifndef HAVE_IFLA_NUM_VF
-#define IFLA_NUM_VF 21
-#endif
-#ifndef HAVE_IFLA_EXT_MASK
-#define IFLA_EXT_MASK 29
-#endif
-#ifndef HAVE_IFLA_PHYS_SWITCH_ID
-#define IFLA_PHYS_SWITCH_ID 36
-#endif
-#ifndef HAVE_IFLA_PHYS_PORT_NAME
-#define IFLA_PHYS_PORT_NAME 38
-#endif
-
-/* Add/remove MAC address through Netlink */
-struct mlx5_nl_mac_addr {
-	struct rte_ether_addr (*mac)[];
-	/**< MAC address handled by the device. */
-	int mac_n; /**< Number of addresses in the array. */
-};
-
-#define MLX5_NL_CMD_GET_IB_NAME (1 << 0)
-#define MLX5_NL_CMD_GET_IB_INDEX (1 << 1)
-#define MLX5_NL_CMD_GET_NET_INDEX (1 << 2)
-#define MLX5_NL_CMD_GET_PORT_INDEX (1 << 3)
-
-/** Data structure used by mlx5_nl_cmdget_cb(). */
-struct mlx5_nl_ifindex_data {
-	const char *name; /**< IB device name (in). */
-	uint32_t flags; /**< found attribute flags (out). */
-	uint32_t ibindex; /**< IB device index (out). */
-	uint32_t ifindex; /**< Network interface index (out). */
-	uint32_t portnum; /**< IB device max port number (out). */
-};
-
-rte_atomic32_t atomic_sn = RTE_ATOMIC32_INIT(0);
-
-/* Generate Netlink sequence number. */
-#define MLX5_NL_SN_GENERATE ((uint32_t)rte_atomic32_add_return(&atomic_sn, 1))
-
-/**
- * Opens a Netlink socket.
- *
- * @param protocol
- *   Netlink protocol (e.g. NETLINK_ROUTE, NETLINK_RDMA).
- *
- * @return
- *   A file descriptor on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-int
-mlx5_nl_init(int protocol)
-{
-	int fd;
-	int sndbuf_size = MLX5_SEND_BUF_SIZE;
-	int rcvbuf_size = MLX5_RECV_BUF_SIZE;
-	struct sockaddr_nl local = {
-		.nl_family = AF_NETLINK,
-	};
-	int ret;
-
-	fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol);
-	if (fd == -1) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf_size, sizeof(int));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	ret = bind(fd, (struct sockaddr *)&local, sizeof(local));
-	if (ret == -1) {
-		rte_errno = errno;
-		goto error;
-	}
-	return fd;
-error:
-	close(fd);
-	return -rte_errno;
-}
-
-/**
- * Send a request message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] ssn
- *   Sequence number.
- * @param[in] req
- *   Pointer to the request structure.
- * @param[in] len
- *   Length of the request in bytes.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_request(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn, void *req,
-		int len)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov[2] = {
-		{ .iov_base = nh, .iov_len = sizeof(*nh), },
-		{ .iov_base = req, .iov_len = len, },
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = iov,
-		.msg_iovlen = 2,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Send a message to the kernel on the Netlink socket.
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] nh
- *   The Netlink message send to the kernel.
- * @param[in] sn
- *   Sequence number.
- *
- * @return
- *   The number of sent bytes on success, a negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-mlx5_nl_send(int nlsk_fd, struct nlmsghdr *nh, uint32_t sn)
-{
-	struct sockaddr_nl sa = {
-		.nl_family = AF_NETLINK,
-	};
-	struct iovec iov = {
-		.iov_base = nh,
-		.iov_len = nh->nlmsg_len,
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		.msg_iovlen = 1,
-	};
-	int send_bytes;
-
-	nh->nlmsg_pid = 0; /* communication with the kernel uses pid 0 */
-	nh->nlmsg_seq = sn;
-	send_bytes = sendmsg(nlsk_fd, &msg, 0);
-	if (send_bytes < 0) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	return send_bytes;
-}
-
-/**
- * Receive a message from the kernel on the Netlink socket, following
- * mlx5_nl_send().
- *
- * @param[in] nlsk_fd
- *   The Netlink socket file descriptor used for communication.
- * @param[in] sn
- *   Sequence number.
- * @param[in] cb
- *   The callback function to call for each Netlink message received.
- * @param[in, out] arg
- *   Custom arguments for the callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_recv(int nlsk_fd, uint32_t sn, int (*cb)(struct nlmsghdr *, void *arg),
-	     void *arg)
-{
-	struct sockaddr_nl sa;
-	char buf[MLX5_RECV_BUF_SIZE];
-	struct iovec iov = {
-		.iov_base = buf,
-		.iov_len = sizeof(buf),
-	};
-	struct msghdr msg = {
-		.msg_name = &sa,
-		.msg_namelen = sizeof(sa),
-		.msg_iov = &iov,
-		/* One message at a time */
-		.msg_iovlen = 1,
-	};
-	int multipart = 0;
-	int ret = 0;
-
-	do {
-		struct nlmsghdr *nh;
-		int recv_bytes = 0;
-
-		do {
-			recv_bytes = recvmsg(nlsk_fd, &msg, 0);
-			if (recv_bytes == -1) {
-				rte_errno = errno;
-				return -rte_errno;
-			}
-			nh = (struct nlmsghdr *)buf;
-		} while (nh->nlmsg_seq != sn);
-		for (;
-		     NLMSG_OK(nh, (unsigned int)recv_bytes);
-		     nh = NLMSG_NEXT(nh, recv_bytes)) {
-			if (nh->nlmsg_type == NLMSG_ERROR) {
-				struct nlmsgerr *err_data = NLMSG_DATA(nh);
-
-				if (err_data->error < 0) {
-					rte_errno = -err_data->error;
-					return -rte_errno;
-				}
-				/* Ack message. */
-				return 0;
-			}
-			/* Multi-part msgs and their trailing DONE message. */
-			if (nh->nlmsg_flags & NLM_F_MULTI) {
-				if (nh->nlmsg_type == NLMSG_DONE)
-					return 0;
-				multipart = 1;
-			}
-			if (cb) {
-				ret = cb(nh, arg);
-				if (ret < 0)
-					return ret;
-			}
-		}
-	} while (multipart);
-	return ret;
-}
-
-/**
- * Parse Netlink message to retrieve the bridge MAC address.
- *
- * @param nh
- *   Pointer to Netlink Message Header.
- * @param arg
- *   PMD data register with this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_mac_addr *data = arg;
-	struct ndmsg *r = NLMSG_DATA(nh);
-	struct rtattr *attribute;
-	int len;
-
-	len = nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
-	for (attribute = MLX5_NDA_RTA(r);
-	     RTA_OK(attribute, len);
-	     attribute = RTA_NEXT(attribute, len)) {
-		if (attribute->rta_type == NDA_LLADDR) {
-			if (data->mac_n == MLX5_MAX_MAC_ADDRESSES) {
-				DRV_LOG(WARNING,
-					"not enough room to finalize the"
-					" request");
-				rte_errno = ENOMEM;
-				return -rte_errno;
-			}
-#ifndef NDEBUG
-			char m[18];
-
-			rte_ether_format_addr(m, 18, RTA_DATA(attribute));
-			DRV_LOG(DEBUG, "bridge MAC address %s", m);
-#endif
-			memcpy(&(*data->mac)[data->mac_n++],
-			       RTA_DATA(attribute), RTE_ETHER_ADDR_LEN);
-		}
-	}
-	return 0;
-}
-
-/**
- * Get bridge MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac[out]
- *   Pointer to the array table of MAC addresses to fill.
- *   Its size should be of MLX5_MAX_MAC_ADDRESSES.
- * @param mac_n[out]
- *   Number of entries filled in MAC array.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_list(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr (*mac)[], int *mac_n)
-{
-	struct {
-		struct nlmsghdr	hdr;
-		struct ifinfomsg ifm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_GETNEIGH,
-			.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
-		},
-		.ifm = {
-			.ifi_family = PF_BRIDGE,
-			.ifi_index = iface_idx,
-		},
-	};
-	struct mlx5_nl_mac_addr data = {
-		.mac = mac,
-		.mac_n = 0,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	ret = mlx5_nl_request(nlsk_fd, &req.hdr, sn, &req.ifm,
-			      sizeof(struct ifinfomsg));
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_mac_addr_cb, &data);
-	if (ret < 0)
-		goto error;
-	*mac_n = data.mac_n;
-	return 0;
-error:
-	DRV_LOG(DEBUG, "Interface %u cannot retrieve MAC address list %s",
-		iface_idx, strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *   MAC address to consider.
- * @param add
- *   1 to add the MAC address, 0 to remove the MAC address.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			struct rte_ether_addr *mac, int add)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ndmsg ndm;
-		struct rtattr rta;
-		uint8_t buffer[RTE_ETHER_ADDR_LEN];
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-				NLM_F_EXCL | NLM_F_ACK,
-			.nlmsg_type = add ? RTM_NEWNEIGH : RTM_DELNEIGH,
-		},
-		.ndm = {
-			.ndm_family = PF_BRIDGE,
-			.ndm_state = NUD_NOARP | NUD_PERMANENT,
-			.ndm_ifindex = iface_idx,
-			.ndm_flags = NTF_SELF,
-		},
-		.rta = {
-			.rta_type = NDA_LLADDR,
-			.rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN),
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	if (nlsk_fd == -1)
-		return 0;
-	memcpy(RTA_DATA(&req.rta), mac, RTE_ETHER_ADDR_LEN);
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.rta.rta_len);
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(DEBUG,
-		"Interface %u cannot %s MAC address"
-		" %02X:%02X:%02X:%02X:%02X:%02X %s",
-		iface_idx,
-		add ? "add" : "remove",
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Modify the VF MAC address neighbour table with Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac
- *    MAC address to consider.
- * @param vf_index
- *    VF index.
- *
- * @return
- *    0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac, int vf_index)
-{
-	int ret;
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifm;
-		struct rtattr vf_list_rta;
-		struct rtattr vf_info_rta;
-		struct rtattr vf_mac_rta;
-		struct ifla_vf_mac ivm;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-			.nlmsg_type = RTM_BASE,
-		},
-		.ifm = {
-			.ifi_index = iface_idx,
-		},
-		.vf_list_rta = {
-			.rta_type = IFLA_VFINFO_LIST,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_info_rta = {
-			.rta_type = IFLA_VF_INFO,
-			.rta_len = RTA_ALIGN(RTA_LENGTH(0)),
-		},
-		.vf_mac_rta = {
-			.rta_type = IFLA_VF_MAC,
-		},
-	};
-	struct ifla_vf_mac ivm = {
-		.vf = vf_index,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-
-	memcpy(&ivm.mac, mac, RTE_ETHER_ADDR_LEN);
-	memcpy(RTA_DATA(&req.vf_mac_rta), &ivm, sizeof(ivm));
-
-	req.vf_mac_rta.rta_len = RTA_LENGTH(sizeof(ivm));
-	req.hdr.nlmsg_len = NLMSG_ALIGN(req.hdr.nlmsg_len) +
-		RTA_ALIGN(req.vf_list_rta.rta_len) +
-		RTA_ALIGN(req.vf_info_rta.rta_len) +
-		RTA_ALIGN(req.vf_mac_rta.rta_len);
-	req.vf_list_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_list_rta);
-	req.vf_info_rta.rta_len = RTE_PTR_DIFF(NLMSG_TAIL(&req.hdr),
-					       &req.vf_info_rta);
-
-	if (nlsk_fd < 0)
-		return -1;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		goto error;
-	ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
-	if (ret < 0)
-		goto error;
-	return 0;
-error:
-	DRV_LOG(ERR,
-		"representor %u cannot set VF MAC address "
-		"%02X:%02X:%02X:%02X:%02X:%02X : %s",
-		vf_index,
-		mac->addr_bytes[0], mac->addr_bytes[1],
-		mac->addr_bytes[2], mac->addr_bytes[3],
-		mac->addr_bytes[4], mac->addr_bytes[5],
-		strerror(rte_errno));
-	return -rte_errno;
-}
-
-/**
- * Add a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to register.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx,
-		     uint64_t *mac_own, struct rte_ether_addr *mac,
-		     uint32_t index)
-{
-	int ret;
-
-	ret = mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 1);
-	if (!ret)
-		BITFIELD_SET(mac_own, index);
-	if (ret == -EEXIST)
-		return 0;
-	return ret;
-}
-
-/**
- * Remove a MAC address.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- * @param mac
- *   MAC address to remove.
- * @param index
- *   MAC address index.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			struct rte_ether_addr *mac, uint32_t index)
-{
-	BITFIELD_RESET(mac_own, index);
-	return mlx5_nl_mac_addr_modify(nlsk_fd, iface_idx, mac, 0);
-}
-
-/**
- * Synchronize Netlink bridge table to the internal table.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param mac_addrs
- *   Mac addresses array to sync.
- * @param n
- *   @p mac_addrs array size.
- */
-void
-mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-		      struct rte_ether_addr *mac_addrs, int n)
-{
-	struct rte_ether_addr macs[n];
-	int macs_n = 0;
-	int i;
-	int ret;
-
-	ret = mlx5_nl_mac_addr_list(nlsk_fd, iface_idx, &macs, &macs_n);
-	if (ret)
-		return;
-	for (i = 0; i != macs_n; ++i) {
-		int j;
-
-		/* Verify the address is not in the array yet. */
-		for (j = 0; j != n; ++j)
-			if (rte_is_same_ether_addr(&macs[i], &mac_addrs[j]))
-				break;
-		if (j != n)
-			continue;
-		/* Find the first entry available. */
-		for (j = 0; j != n; ++j) {
-			if (rte_is_zero_ether_addr(&mac_addrs[j])) {
-				mac_addrs[j] = macs[i];
-				break;
-			}
-		}
-	}
-}
-
-/**
- * Flush all added MAC addresses.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param[in] mac_addrs
- *   Mac addresses array to flush.
- * @param n
- *   @p mac_addrs array size.
- * @param mac_own
- *   BITFIELD_DECLARE array to store the mac.
- */
-void
-mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-		       struct rte_ether_addr *mac_addrs, int n,
-		       uint64_t *mac_own)
-{
-	int i;
-
-	for (i = n - 1; i >= 0; --i) {
-		struct rte_ether_addr *m = &mac_addrs[i];
-
-		if (BITFIELD_ISSET(mac_own, i))
-			mlx5_nl_mac_addr_remove(nlsk_fd, iface_idx, mac_own, m,
-						i);
-	}
-}
-
-/**
- * Enable promiscuous / all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param flags
- *   IFF_PROMISC for promiscuous, IFF_ALLMULTI for allmulti.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_device_flags(int nlsk_fd, unsigned int iface_idx, uint32_t flags,
-		     int enable)
-{
-	struct {
-		struct nlmsghdr hdr;
-		struct ifinfomsg ifi;
-	} req = {
-		.hdr = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_NEWLINK,
-			.nlmsg_flags = NLM_F_REQUEST,
-		},
-		.ifi = {
-			.ifi_flags = enable ? flags : 0,
-			.ifi_change = flags,
-			.ifi_index = iface_idx,
-		},
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	assert(!(flags & ~(IFF_PROMISC | IFF_ALLMULTI)));
-	if (nlsk_fd < 0)
-		return 0;
-	ret = mlx5_nl_send(nlsk_fd, &req.hdr, sn);
-	if (ret < 0)
-		return ret;
-	return 0;
-}
-
-/**
- * Enable promiscuous mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_PROMISC, enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s promisc mode: Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Enable all multicast mode through Netlink.
- *
- * @param[in] nlsk_fd
- *   Netlink socket file descriptor.
- * @param[in] iface_idx
- *   Net device interface index.
- * @param enable
- *   Nonzero to enable, disable otherwise.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable)
-{
-	int ret = mlx5_nl_device_flags(nlsk_fd, iface_idx, IFF_ALLMULTI,
-				       enable);
-
-	if (ret)
-		DRV_LOG(DEBUG,
-			"Interface %u cannot %s allmulti : Netlink error %s",
-			iface_idx, enable ? "enable" : "disable",
-			strerror(rte_errno));
-	return ret;
-}
-
-/**
- * Process network interface information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_nl_ifindex_data *data = arg;
-	struct mlx5_nl_ifindex_data local = {
-		.flags = 0,
-	};
-	size_t off = NLMSG_HDRLEN;
-
-	if (nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET) &&
-	    nh->nlmsg_type !=
-	    RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET))
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct nlattr *na = (void *)((uintptr_t)nh + off);
-		void *payload = (void *)((uintptr_t)na + NLA_HDRLEN);
-
-		if (na->nla_len > nh->nlmsg_len - off)
-			goto error;
-		switch (na->nla_type) {
-		case RDMA_NLDEV_ATTR_DEV_INDEX:
-			local.ibindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_IB_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_DEV_NAME:
-			if (!strcmp(payload, data->name))
-				local.flags |= MLX5_NL_CMD_GET_IB_NAME;
-			break;
-		case RDMA_NLDEV_ATTR_NDEV_INDEX:
-			local.ifindex = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_NET_INDEX;
-			break;
-		case RDMA_NLDEV_ATTR_PORT_INDEX:
-			local.portnum = *(uint32_t *)payload;
-			local.flags |= MLX5_NL_CMD_GET_PORT_INDEX;
-			break;
-		default:
-			break;
-		}
-		off += NLA_ALIGN(na->nla_len);
-	}
-	/*
-	 * It is possible to have multiple messages for all
-	 * Infiniband devices in the system with appropriate name.
-	 * So we should gather parameters locally and copy to
-	 * query context only in case of coinciding device name.
-	 */
-	if (local.flags & MLX5_NL_CMD_GET_IB_NAME) {
-		data->flags = local.flags;
-		data->ibindex = local.ibindex;
-		data->ifindex = local.ifindex;
-		data->portnum = local.portnum;
-	}
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get index of network interface associated with some IB device.
- *
- * This is the only somewhat safe method to avoid resorting to heuristics
- * when faced with port representors. Unfortunately it requires at least
- * Linux 4.17.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- * @param[in] pindex
- *   IB device port index, starting from 1
- * @return
- *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
- *   is set.
- */
-unsigned int
-mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.name = name,
-		.flags = 0,
-		.ibindex = 0, /* Determined during first pass. */
-		.ifindex = 0, /* Determined during second pass. */
-	};
-	union {
-		struct nlmsghdr nh;
-		uint8_t buf[NLMSG_HDRLEN +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(data.ibindex)) +
-			    NLA_HDRLEN + NLA_ALIGN(sizeof(pindex))];
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(0),
-			.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-						       RDMA_NLDEV_CMD_GET),
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-		},
-	};
-	struct nlattr *na;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX))
-		goto error;
-	data.flags = 0;
-	sn = MLX5_NL_SN_GENERATE;
-	req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					     RDMA_NLDEV_CMD_PORT_GET);
-	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.buf) - NLMSG_HDRLEN);
-	na = (void *)((uintptr_t)req.buf + NLMSG_HDRLEN);
-	na->nla_len = NLA_HDRLEN + sizeof(data.ibindex);
-	na->nla_type = RDMA_NLDEV_ATTR_DEV_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &data.ibindex, sizeof(data.ibindex));
-	na = (void *)((uintptr_t)na + NLA_ALIGN(na->nla_len));
-	na->nla_len = NLA_HDRLEN + sizeof(pindex);
-	na->nla_type = RDMA_NLDEV_ATTR_PORT_INDEX;
-	memcpy((void *)((uintptr_t)na + NLA_HDRLEN),
-	       &pindex, sizeof(pindex));
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_NET_INDEX) ||
-	    !data.ifindex)
-		goto error;
-	return data.ifindex;
-error:
-	rte_errno = ENODEV;
-	return 0;
-}
-
-/**
- * Get the number of physical ports of given IB device.
- *
- * @param nl
- *   Netlink socket of the RDMA kind (NETLINK_RDMA).
- * @param[in] name
- *   IB device name.
- *
- * @return
- *   A valid (nonzero) number of ports on success, 0 otherwise
- *   and rte_errno is set.
- */
-unsigned int
-mlx5_nl_portnum(int nl, const char *name)
-{
-	struct mlx5_nl_ifindex_data data = {
-		.flags = 0,
-		.name = name,
-		.ifindex = 0,
-		.portnum = 0,
-	};
-	struct nlmsghdr req = {
-		.nlmsg_len = NLMSG_LENGTH(0),
-		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
-					       RDMA_NLDEV_CMD_GET),
-		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req, sn);
-	if (ret < 0)
-		return 0;
-	ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, &data);
-	if (ret < 0)
-		return 0;
-	if (!(data.flags & MLX5_NL_CMD_GET_IB_NAME) ||
-	    !(data.flags & MLX5_NL_CMD_GET_IB_INDEX) ||
-	    !(data.flags & MLX5_NL_CMD_GET_PORT_INDEX)) {
-		rte_errno = ENODEV;
-		return 0;
-	}
-	if (!data.portnum)
-		rte_errno = EINVAL;
-	return data.portnum;
-}
-
-/**
- * Analyze gathered port parameters via Netlink to recognize master
- * and representor devices for E-Switch configuration.
- *
- * @param[in] num_vf_set
- *   flag of presence of number of VFs port attribute.
- * @param[inout] switch_info
- *   Port information, including port name as a number and port name
- *   type if recognized
- *
- * @return
- *   master and representor flags are set in switch_info according to
- *   recognized parameters (if any).
- */
-static void
-mlx5_nl_check_switch_info(bool num_vf_set,
-			  struct mlx5_switch_info *switch_info)
-{
-	switch (switch_info->name_type) {
-	case MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN:
-		/*
-		 * Name is not recognized, assume the master,
-		 * check the number of VFs key presence.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_NOTSET:
-		/*
-		 * Name is not set, this assumes the legacy naming
-		 * schema for master, just check if there is a
-		 * number of VFs key.
-		 */
-		switch_info->master = num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_UPLINK:
-		/* New uplink naming schema recognized. */
-		switch_info->master = 1;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_LEGACY:
-		/* Legacy representors naming schema. */
-		switch_info->representor = !num_vf_set;
-		break;
-	case MLX5_PHYS_PORT_NAME_TYPE_PFVF:
-		/* New representors naming schema. */
-		switch_info->representor = 1;
-		break;
-	}
-}
-
-/**
- * Process switch information from Netlink message.
- *
- * @param nh
- *   Pointer to Netlink message header.
- * @param arg
- *   Opaque data pointer for this callback.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx5_nl_switch_info_cb(struct nlmsghdr *nh, void *arg)
-{
-	struct mlx5_switch_info info = {
-		.master = 0,
-		.representor = 0,
-		.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET,
-		.port_name = 0,
-		.switch_id = 0,
-	};
-	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-	bool switch_id_set = false;
-	bool num_vf_set = false;
-
-	if (nh->nlmsg_type != RTM_NEWLINK)
-		goto error;
-	while (off < nh->nlmsg_len) {
-		struct rtattr *ra = (void *)((uintptr_t)nh + off);
-		void *payload = RTA_DATA(ra);
-		unsigned int i;
-
-		if (ra->rta_len > nh->nlmsg_len - off)
-			goto error;
-		switch (ra->rta_type) {
-		case IFLA_NUM_VF:
-			num_vf_set = true;
-			break;
-		case IFLA_PHYS_PORT_NAME:
-			mlx5_translate_port_name((char *)payload, &info);
-			break;
-		case IFLA_PHYS_SWITCH_ID:
-			info.switch_id = 0;
-			for (i = 0; i < RTA_PAYLOAD(ra); ++i) {
-				info.switch_id <<= 8;
-				info.switch_id |= ((uint8_t *)payload)[i];
-			}
-			switch_id_set = true;
-			break;
-		}
-		off += RTA_ALIGN(ra->rta_len);
-	}
-	if (switch_id_set) {
-		/* We have some E-Switch configuration. */
-		mlx5_nl_check_switch_info(num_vf_set, &info);
-	}
-	assert(!(info.master && info.representor));
-	memcpy(arg, &info, sizeof(info));
-	return 0;
-error:
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Get switch information associated with network interface.
- *
- * @param nl
- *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
- * @param ifindex
- *   Network interface index.
- * @param[out] info
- *   Switch information object, populated in case of success.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx5_nl_switch_info(int nl, unsigned int ifindex,
-		    struct mlx5_switch_info *info)
-{
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-		struct rtattr rta;
-		uint32_t extmask;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH
-					(sizeof(req.info) +
-					 RTA_LENGTH(sizeof(uint32_t))),
-			.nlmsg_type = RTM_GETLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-		.rta = {
-			.rta_type = IFLA_EXT_MASK,
-			.rta_len = RTA_LENGTH(sizeof(int32_t)),
-		},
-		.extmask = RTE_LE32(1),
-	};
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	ret = mlx5_nl_send(nl, &req.nh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(nl, sn, mlx5_nl_switch_info_cb, info);
-	if (info->master && info->representor) {
-		DRV_LOG(ERR, "ifindex %u device is recognized as master"
-			     " and as representor", ifindex);
-		rte_errno = ENODEV;
-		ret = -rte_errno;
-	}
-	return ret;
-}
-
-/*
- * Delete VLAN network device by ifindex.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Interface index of network device to delete.
- */
-void
-mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-		      uint32_t ifindex)
-{
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-	struct {
-		struct nlmsghdr nh;
-		struct ifinfomsg info;
-	} req = {
-		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-			.nlmsg_type = RTM_DELLINK,
-			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
-		},
-		.info = {
-			.ifi_family = AF_UNSPEC,
-			.ifi_index = ifindex,
-		},
-	};
-
-	if (ifindex) {
-		ret = mlx5_nl_send(vmwa->nl_socket, &req.nh, sn);
-		if (ret >= 0)
-			ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-		if (ret < 0)
-			DRV_LOG(WARNING, "netlink: error deleting VLAN WA"
-				" ifindex %u, %d", ifindex, ret);
-	}
-}
-
-/* Set of subroutines to build Netlink message. */
-static struct nlattr *
-nl_msg_tail(struct nlmsghdr *nlh)
-{
-	return (struct nlattr *)
-		(((uint8_t *)nlh) + NLMSG_ALIGN(nlh->nlmsg_len));
-}
-
-static void
-nl_attr_put(struct nlmsghdr *nlh, int type, const void *data, int alen)
-{
-	struct nlattr *nla = nl_msg_tail(nlh);
-
-	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
-
-	if (alen)
-		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
-}
-
-static struct nlattr *
-nl_attr_nest_start(struct nlmsghdr *nlh, int type)
-{
-	struct nlattr *nest = (struct nlattr *)nl_msg_tail(nlh);
-
-	nl_attr_put(nlh, type, NULL, 0);
-	return nest;
-}
-
-static void
-nl_attr_nest_end(struct nlmsghdr *nlh, struct nlattr *nest)
-{
-	nest->nla_len = (uint8_t *)nl_msg_tail(nlh) - (uint8_t *)nest;
-}
-
-/*
- * Create network VLAN device with specified VLAN tag.
- *
- * @param[in] tcf
- *   Context object initialized by mlx5_nl_vlan_vmwa_init().
- * @param[in] ifindex
- *   Base network interface index.
- * @param[in] tag
- *   VLAN tag for VLAN network device to create.
- */
-uint32_t
-mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			 uint32_t ifindex, uint16_t tag)
-{
-	struct nlmsghdr *nlh;
-	struct ifinfomsg *ifm;
-	char name[sizeof(MLX5_VMWA_VLAN_DEVICE_PFX) + 32];
-
-	alignas(RTE_CACHE_LINE_SIZE)
-	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
-		    NLMSG_ALIGN(sizeof(struct ifinfomsg)) +
-		    NLMSG_ALIGN(sizeof(struct nlattr)) * 8 +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(name)) +
-		    NLMSG_ALIGN(sizeof("vlan")) +
-		    NLMSG_ALIGN(sizeof(uint32_t)) +
-		    NLMSG_ALIGN(sizeof(uint16_t)) + 16];
-	struct nlattr *na_info;
-	struct nlattr *na_vlan;
-	uint32_t sn = MLX5_NL_SN_GENERATE;
-	int ret;
-
-	memset(buf, 0, sizeof(buf));
-	nlh = (struct nlmsghdr *)buf;
-	nlh->nlmsg_len = sizeof(struct nlmsghdr);
-	nlh->nlmsg_type = RTM_NEWLINK;
-	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
-			   NLM_F_EXCL | NLM_F_ACK;
-	ifm = (struct ifinfomsg *)nl_msg_tail(nlh);
-	nlh->nlmsg_len += sizeof(struct ifinfomsg);
-	ifm->ifi_family = AF_UNSPEC;
-	ifm->ifi_type = 0;
-	ifm->ifi_index = 0;
-	ifm->ifi_flags = IFF_UP;
-	ifm->ifi_change = 0xffffffff;
-	nl_attr_put(nlh, IFLA_LINK, &ifindex, sizeof(ifindex));
-	ret = snprintf(name, sizeof(name), "%s.%u.%u",
-		       MLX5_VMWA_VLAN_DEVICE_PFX, ifindex, tag);
-	nl_attr_put(nlh, IFLA_IFNAME, name, ret + 1);
-	na_info = nl_attr_nest_start(nlh, IFLA_LINKINFO);
-	nl_attr_put(nlh, IFLA_INFO_KIND, "vlan", sizeof("vlan"));
-	na_vlan = nl_attr_nest_start(nlh, IFLA_INFO_DATA);
-	nl_attr_put(nlh, IFLA_VLAN_ID, &tag, sizeof(tag));
-	nl_attr_nest_end(nlh, na_vlan);
-	nl_attr_nest_end(nlh, na_info);
-	assert(sizeof(buf) >= nlh->nlmsg_len);
-	ret = mlx5_nl_send(vmwa->nl_socket, nlh, sn);
-	if (ret >= 0)
-		ret = mlx5_nl_recv(vmwa->nl_socket, sn, NULL, NULL);
-	if (ret < 0) {
-		DRV_LOG(WARNING, "netlink: VLAN %s create failure (%d)", name,
-			ret);
-	}
-	// Try to get ifindex of created or pre-existing device.
-	ret = if_nametoindex(name);
-	if (!ret) {
-		DRV_LOG(WARNING, "VLAN %s failed to get index (%d)", name,
-			errno);
-		return 0;
-	}
-	return ret;
-}
diff --git a/drivers/net/mlx5/mlx5_nl.h b/drivers/net/mlx5/mlx5_nl.h
deleted file mode 100644
index 9be87c0..0000000
--- a/drivers/net/mlx5/mlx5_nl.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2019 Mellanox Technologies, Ltd
- */
-
-#ifndef RTE_PMD_MLX5_NL_H_
-#define RTE_PMD_MLX5_NL_H_
-
-#include <linux/netlink.h>
-
-
-/* Recognized Infiniband device physical port name types. */
-enum mlx5_nl_phys_port_name_type {
-	MLX5_PHYS_PORT_NAME_TYPE_NOTSET = 0, /* Not set. */
-	MLX5_PHYS_PORT_NAME_TYPE_LEGACY, /* before kernel ver < 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UPLINK, /* p0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_PFVF, /* pf0vf0, kernel ver >= 5.0 */
-	MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */
-};
-
-/** Switch information returned by mlx5_nl_switch_info(). */
-struct mlx5_switch_info {
-	uint32_t master:1; /**< Master device. */
-	uint32_t representor:1; /**< Representor device. */
-	enum mlx5_nl_phys_port_name_type name_type; /** < Port name type. */
-	int32_t pf_num; /**< PF number (valid for pfxvfx format only). */
-	int32_t port_name; /**< Representor port name. */
-	uint64_t switch_id; /**< Switch identifier. */
-};
-
-/* VLAN netdev for VLAN workaround. */
-struct mlx5_nl_vlan_dev {
-	uint32_t refcnt;
-	uint32_t ifindex; /**< Own interface index. */
-};
-
-/*
- * Array of VLAN devices created on the base of VF
- * used for workaround in virtual environments.
- */
-struct mlx5_nl_vlan_vmwa_context {
-	int nl_socket;
-	uint32_t vf_ifindex;
-	struct mlx5_nl_vlan_dev vlan_dev[4096];
-};
-
-
-int mlx5_nl_init(int protocol);
-int mlx5_nl_mac_addr_add(int nlsk_fd, unsigned int iface_idx, uint64_t *mac_own,
-			 struct rte_ether_addr *mac, uint32_t index);
-int mlx5_nl_mac_addr_remove(int nlsk_fd, unsigned int iface_idx,
-			    uint64_t *mac_own, struct rte_ether_addr *mac,
-			    uint32_t index);
-void mlx5_nl_mac_addr_sync(int nlsk_fd, unsigned int iface_idx,
-			   struct rte_ether_addr *mac_addrs, int n);
-void mlx5_nl_mac_addr_flush(int nlsk_fd, unsigned int iface_idx,
-			    struct rte_ether_addr *mac_addrs, int n,
-			    uint64_t *mac_own);
-int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable);
-int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable);
-unsigned int mlx5_nl_portnum(int nl, const char *name);
-unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
-int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx,
-			       struct rte_ether_addr *mac, int vf_index);
-int mlx5_nl_switch_info(int nl, unsigned int ifindex,
-			struct mlx5_switch_info *info);
-
-void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
-			      uint32_t ifindex);
-uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
-				  uint32_t ifindex, uint16_t tag);
-
-#endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index fc1a91c..8e63b67 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -33,11 +33,11 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_nl.h>
 
 #include "mlx5.h"
 #include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
-#include "mlx5_nl.h"
 #include "mlx5_utils.h"
 
 /**
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v4 25/25] common/mlx5: support ROCE disable through Netlink
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (23 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 24/25] common/mlx5: share Netlink commands Matan Azrad
@ 2020-01-29 12:38       ` Matan Azrad
  2020-01-30 12:26       ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Raslan Darawsheh
  25 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-01-29 12:38 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Raslan Darawsheh
Add new 4 Netlink commands to support enable/disable ROCE:
        1. mlx5_nl_devlink_family_id_get to get the Devlink family ID of
           Netlink general command.
        2. mlx5_nl_enable_roce_get to get the ROCE current status.
        3. mlx5_nl_driver_reload - to reload the device kernel driver.
        4. mlx5_nl_enable_roce_set - to set the ROCE status.
When the user changes the ROCE status, the IB device may disappear and
appear again, so DPDK driver should wait for it and to restart itself.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/Makefile                    |   5 +
 drivers/common/mlx5/meson.build                 |   1 +
 drivers/common/mlx5/mlx5_nl.c                   | 366 +++++++++++++++++++++++-
 drivers/common/mlx5/mlx5_nl.h                   |   6 +
 drivers/common/mlx5/rte_common_mlx5_version.map |   4 +
 5 files changed, 380 insertions(+), 2 deletions(-)
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 6a14b7d..9d4d81f 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -260,6 +260,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum IFLA_PHYS_PORT_NAME \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_DEVLINK \
+		linux/devlink.h \
+		define DEVLINK_GENL_NAME \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_SUPPORTED_40000baseKR4_Full \
 		/usr/include/linux/ethtool.h \
 		define SUPPORTED_40000baseKR4_Full \
diff --git a/drivers/common/mlx5/meson.build b/drivers/common/mlx5/meson.build
index 34cb7b9..fdd1e85 100644
--- a/drivers/common/mlx5/meson.build
+++ b/drivers/common/mlx5/meson.build
@@ -168,6 +168,7 @@ if build
 		'RDMA_NLDEV_ATTR_NDEV_INDEX' ],
 		[ 'HAVE_MLX5_DR_FLOW_DUMP', 'infiniband/mlx5dv.h',
 		'mlx5dv_dump_dr_domain'],
+		[ 'HAVE_DEVLINK', 'linux/devlink.h', 'DEVLINK_GENL_NAME' ],
 	]
 	config = configuration_data()
 	foreach arg:has_sym_args
diff --git a/drivers/common/mlx5/mlx5_nl.c b/drivers/common/mlx5/mlx5_nl.c
index b4fc053..0d1efd2 100644
--- a/drivers/common/mlx5/mlx5_nl.c
+++ b/drivers/common/mlx5/mlx5_nl.c
@@ -6,6 +6,7 @@
 #include <errno.h>
 #include <linux/if_link.h>
 #include <linux/rtnetlink.h>
+#include <linux/genetlink.h>
 #include <net/if.h>
 #include <rdma/rdma_netlink.h>
 #include <stdbool.h>
@@ -22,6 +23,10 @@
 
 #include "mlx5_nl.h"
 #include "mlx5_common_utils.h"
+#ifdef HAVE_DEVLINK
+#include <linux/devlink.h>
+#endif
+
 
 /* Size of the buffer to receive kernel messages */
 #define MLX5_NL_BUF_SIZE (32 * 1024)
@@ -90,6 +95,59 @@
 #define IFLA_PHYS_PORT_NAME 38
 #endif
 
+/*
+ * Some Devlink defines may be missed in old kernel versions,
+ * adjust used defines.
+ */
+#ifndef DEVLINK_GENL_NAME
+#define DEVLINK_GENL_NAME "devlink"
+#endif
+#ifndef DEVLINK_GENL_VERSION
+#define DEVLINK_GENL_VERSION 1
+#endif
+#ifndef DEVLINK_ATTR_BUS_NAME
+#define DEVLINK_ATTR_BUS_NAME 1
+#endif
+#ifndef DEVLINK_ATTR_DEV_NAME
+#define DEVLINK_ATTR_DEV_NAME 2
+#endif
+#ifndef DEVLINK_ATTR_PARAM
+#define DEVLINK_ATTR_PARAM 80
+#endif
+#ifndef DEVLINK_ATTR_PARAM_NAME
+#define DEVLINK_ATTR_PARAM_NAME 81
+#endif
+#ifndef DEVLINK_ATTR_PARAM_TYPE
+#define DEVLINK_ATTR_PARAM_TYPE 83
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUES_LIST
+#define DEVLINK_ATTR_PARAM_VALUES_LIST 84
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE
+#define DEVLINK_ATTR_PARAM_VALUE 85
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_DATA
+#define DEVLINK_ATTR_PARAM_VALUE_DATA 86
+#endif
+#ifndef DEVLINK_ATTR_PARAM_VALUE_CMODE
+#define DEVLINK_ATTR_PARAM_VALUE_CMODE 87
+#endif
+#ifndef DEVLINK_PARAM_CMODE_DRIVERINIT
+#define DEVLINK_PARAM_CMODE_DRIVERINIT 1
+#endif
+#ifndef DEVLINK_CMD_RELOAD
+#define DEVLINK_CMD_RELOAD 37
+#endif
+#ifndef DEVLINK_CMD_PARAM_GET
+#define DEVLINK_CMD_PARAM_GET 38
+#endif
+#ifndef DEVLINK_CMD_PARAM_SET
+#define DEVLINK_CMD_PARAM_SET 39
+#endif
+#ifndef NLA_FLAG
+#define NLA_FLAG 6
+#endif
+
 /* Add/remove MAC address through Netlink */
 struct mlx5_nl_mac_addr {
 	struct rte_ether_addr (*mac)[];
@@ -1241,8 +1299,8 @@ struct mlx5_nl_ifindex_data {
 	struct nlattr *nla = nl_msg_tail(nlh);
 
 	nla->nla_type = type;
-	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr) + alen);
-	nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len) + nla->nla_len;
+	nla->nla_len = NLMSG_ALIGN(sizeof(struct nlattr)) + alen;
+	nlh->nlmsg_len += NLMSG_ALIGN(nla->nla_len);
 
 	if (alen)
 		memcpy((uint8_t *)nla + sizeof(struct nlattr), data, alen);
@@ -1335,3 +1393,307 @@ struct mlx5_nl_ifindex_data {
 	}
 	return ret;
 }
+
+/**
+ * Parse Netlink message to retrieve the general family ID.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_family_id_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	for (; nla->nla_len && nla < tail;
+	     nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len))) {
+		if (nla->nla_type == CTRL_ATTR_FAMILY_ID) {
+			*(uint16_t *)arg = *(uint16_t *)(nla + 1);
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
+#define MLX5_NL_MAX_ATTR_SIZE 100
+/**
+ * Get generic netlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] name
+ *   The family name.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_generic_family_id_get(int nlsk_fd, const char *name)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int name_size = strlen(name) + 1;
+	int ret;
+	uint16_t id = -1;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE)];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = GENL_ID_CTRL;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = CTRL_CMD_GETFAMILY;
+	genl->version = 1;
+	nl_attr_put(nlh, CTRL_ATTR_FAMILY_NAME, name, name_size);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_family_id_cb, &id);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get Netlink %s family ID: %d.", name,
+			ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Netlink \"%s\" family ID is %u.", name, id);
+	return (int)id;
+}
+
+/**
+ * Get Devlink family ID.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ *
+ * @return
+ *   ID >= 0 on success and @p enable is updated, a negative errno value
+ *   otherwise and rte_errno is set.
+ */
+
+int
+mlx5_nl_devlink_family_id_get(int nlsk_fd)
+{
+	return mlx5_nl_generic_family_id_get(nlsk_fd, DEVLINK_GENL_NAME);
+}
+
+/**
+ * Parse Netlink message to retrieve the ROCE enable status.
+ *
+ * @param nh
+ *   Pointer to Netlink Message Header.
+ * @param arg
+ *   PMD data register with this callback.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_nl_roce_cb(struct nlmsghdr *nh, void *arg)
+{
+
+	int ret = -EINVAL;
+	int *enable = arg;
+	struct nlattr *tail = RTE_PTR_ADD(nh, nh->nlmsg_len);
+	struct nlattr *nla = RTE_PTR_ADD(nh, NLMSG_ALIGN(sizeof(*nh)) +
+					NLMSG_ALIGN(sizeof(struct genlmsghdr)));
+
+	while (nla->nla_len && nla < tail) {
+		switch (nla->nla_type) {
+		/* Expected nested attributes case. */
+		case DEVLINK_ATTR_PARAM:
+		case DEVLINK_ATTR_PARAM_VALUES_LIST:
+		case DEVLINK_ATTR_PARAM_VALUE:
+			ret = 0;
+			nla += 1;
+			break;
+		case DEVLINK_ATTR_PARAM_VALUE_DATA:
+			*enable = 1;
+			return 0;
+		default:
+			nla = RTE_PTR_ADD(nla, NLMSG_ALIGN(nla->nla_len));
+		}
+	}
+	*enable = 0;
+	return ret;
+}
+
+/**
+ * Get ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   Where to store the enable status.
+ *
+ * @return
+ *   0 on success and @p enable is updated, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			int *enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	int cur_en;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 4 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 4];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_GET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, mlx5_nl_roce_cb, &cur_en);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable on device %s: %d.",
+			pci_addr, ret);
+		return ret;
+	}
+	*enable = cur_en;
+	DRV_LOG(DEBUG, "ROCE is %sabled for device \"%s\".",
+		cur_en ? "en" : "dis", pci_addr);
+	return ret;
+}
+
+/**
+ * Reload mlx5 device kernel driver through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 2 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 2];
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_RELOAD;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to reload %s device by Netlink - %d",
+			pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device \"%s\" was reloaded by Netlink successfully.",
+		pci_addr);
+	return 0;
+}
+
+/**
+ * Set ROCE enable status through Netlink.
+ *
+ * @param[in] nlsk_fd
+ *   Netlink socket file descriptor.
+ * @param[in] family_id
+ *   the Devlink family ID.
+ * @param pci_addr
+ *   The device PCI address.
+ * @param[out] enable
+ *   The enable status to set.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			int enable)
+{
+	struct nlmsghdr *nlh;
+	struct genlmsghdr *genl;
+	uint32_t sn = MLX5_NL_SN_GENERATE;
+	int ret;
+	uint8_t buf[NLMSG_ALIGN(sizeof(struct nlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct genlmsghdr)) +
+		    NLMSG_ALIGN(sizeof(struct nlattr)) * 6 +
+		    NLMSG_ALIGN(MLX5_NL_MAX_ATTR_SIZE) * 6];
+	uint8_t cmode = DEVLINK_PARAM_CMODE_DRIVERINIT;
+	uint8_t ptype = NLA_FLAG;
+;
+
+	memset(buf, 0, sizeof(buf));
+	nlh = (struct nlmsghdr *)buf;
+	nlh->nlmsg_len = sizeof(struct nlmsghdr);
+	nlh->nlmsg_type = family_id;
+	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	genl = (struct genlmsghdr *)nl_msg_tail(nlh);
+	nlh->nlmsg_len += sizeof(struct genlmsghdr);
+	genl->cmd = DEVLINK_CMD_PARAM_SET;
+	genl->version = DEVLINK_GENL_VERSION;
+	nl_attr_put(nlh, DEVLINK_ATTR_BUS_NAME, "pci", 4);
+	nl_attr_put(nlh, DEVLINK_ATTR_DEV_NAME, pci_addr, strlen(pci_addr) + 1);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_NAME, "enable_roce", 12);
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_CMODE, &cmode, sizeof(cmode));
+	nl_attr_put(nlh, DEVLINK_ATTR_PARAM_TYPE, &ptype, sizeof(ptype));
+	if (enable)
+		nl_attr_put(nlh, DEVLINK_ATTR_PARAM_VALUE_DATA, NULL, 0);
+	ret = mlx5_nl_send(nlsk_fd, nlh, sn);
+	if (ret >= 0)
+		ret = mlx5_nl_recv(nlsk_fd, sn, NULL, NULL);
+	if (ret < 0) {
+		DRV_LOG(DEBUG, "Failed to %sable ROCE for device %s by Netlink:"
+			" %d.", enable ? "en" : "dis", pci_addr, ret);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Device %s ROCE was %sabled by Netlink successfully.",
+		pci_addr, enable ? "en" : "dis");
+	/* Now, need to reload the driver. */
+	return mlx5_nl_driver_reload(nlsk_fd, family_id, pci_addr);
+}
diff --git a/drivers/common/mlx5/mlx5_nl.h b/drivers/common/mlx5/mlx5_nl.h
index 8e66a98..2c3f837 100644
--- a/drivers/common/mlx5/mlx5_nl.h
+++ b/drivers/common/mlx5/mlx5_nl.h
@@ -53,5 +53,11 @@ void mlx5_nl_vlan_vmwa_delete(struct mlx5_nl_vlan_vmwa_context *vmwa,
 			      uint32_t ifindex);
 uint32_t mlx5_nl_vlan_vmwa_create(struct mlx5_nl_vlan_vmwa_context *vmwa,
 				  uint32_t ifindex, uint16_t tag);
+int mlx5_nl_devlink_family_id_get(int nlsk_fd);
+int mlx5_nl_enable_roce_get(int nlsk_fd, int family_id, const char *pci_addr,
+			    int *enable);
+int mlx5_nl_driver_reload(int nlsk_fd, int family_id, const char *pci_addr);
+int mlx5_nl_enable_roce_set(int nlsk_fd, int family_id, const char *pci_addr,
+			    int enable);
 
 #endif /* RTE_PMD_MLX5_NL_H_ */
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index f93f5cb..b1a2014 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -30,6 +30,10 @@ DPDK_20.02 {
 	mlx5_dev_to_pci_addr;
 
 	mlx5_nl_allmulti;
+	mlx5_nl_devlink_family_id_get;
+	mlx5_nl_driver_reload;
+	mlx5_nl_enable_roce_get;
+	mlx5_nl_enable_roce_set;
 	mlx5_nl_ifindex;
 	mlx5_nl_init;
 	mlx5_nl_mac_addr_add;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
@ 2020-01-30  8:10         ` Matan Azrad
  2020-01-30  8:38           ` Raslan Darawsheh
  0 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-30  8:10 UTC (permalink / raw)
  To: Matan Azrad, dev, Slava Ovsiienko; +Cc: Raslan Darawsheh
Self-suggestion.
It makes sense to squash this patch to the previous patch since the glue started to move in the previous patch.
Raslan, can we do it in integration or we need to send all the series again for it? 
From: Matan Azrad
> A new Mellanox vdpa PMD will be added to support vdpa operations by
> Mellanox adapters.
> 
> Both, the mlx5 PMD and the vdpa mlx5 PMD should initialize the glue.
> 
> The glue initialization should be only one per process, so all the mlx5 PMDs
> using the glue should share the same glue object.
> 
> Move the glue initialization to be in common/mlx5 library to be initialized by
> its constructor only once.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/common/mlx5/mlx5_common.c | 173
> +++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/Makefile         |   9 --
>  drivers/net/mlx5/meson.build      |   4 -
>  drivers/net/mlx5/mlx5.c           | 172 +------------------------------------
>  4 files changed, 173 insertions(+), 185 deletions(-)
> 
> diff --git a/drivers/common/mlx5/mlx5_common.c
> b/drivers/common/mlx5/mlx5_common.c
> index 14ebd30..9c88a63 100644
> --- a/drivers/common/mlx5/mlx5_common.c
> +++ b/drivers/common/mlx5/mlx5_common.c
> @@ -2,16 +2,185 @@
>   * Copyright 2019 Mellanox Technologies, Ltd
>   */
> 
> +#include <dlfcn.h>
> +#include <unistd.h>
> +#include <string.h>
> +
> +#include <rte_errno.h>
> +
>  #include "mlx5_common.h"
> +#include "mlx5_common_utils.h"
> +#include "mlx5_glue.h"
> 
> 
>  int mlx5_common_logtype;
> 
> 
> -RTE_INIT(rte_mlx5_common_pmd_init)
> +#ifdef RTE_IBVERBS_LINK_DLOPEN
> +
> +/**
> + * Suffix RTE_EAL_PMD_PATH with "-glue".
> + *
> + * This function performs a sanity check on RTE_EAL_PMD_PATH before
> + * suffixing its last component.
> + *
> + * @param buf[out]
> + *   Output buffer, should be large enough otherwise NULL is returned.
> + * @param size
> + *   Size of @p out.
> + *
> + * @return
> + *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
> + */
> +static char *
> +mlx5_glue_path(char *buf, size_t size)
> +{
> +	static const char *const bad[] = { "/", ".", "..", NULL };
> +	const char *path = RTE_EAL_PMD_PATH;
> +	size_t len = strlen(path);
> +	size_t off;
> +	int i;
> +
> +	while (len && path[len - 1] == '/')
> +		--len;
> +	for (off = len; off && path[off - 1] != '/'; --off)
> +		;
> +	for (i = 0; bad[i]; ++i)
> +		if (!strncmp(path + off, bad[i], (int)(len - off)))
> +			goto error;
> +	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
> +	if (i == -1 || (size_t)i >= size)
> +		goto error;
> +	return buf;
> +error:
> +	RTE_LOG(ERR, PMD, "unable to append \"-glue\" to last component
> of"
> +		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),
> please"
> +		" re-configure DPDK");
> +	return NULL;
> +}
> +#endif
> +
> +/**
> + * Initialization routine for run-time dependency on rdma-core.
> + */
> +RTE_INIT_PRIO(mlx5_glue_init, CLASS)
>  {
> -	/* Initialize driver log type. */
> +	void *handle = NULL;
> +
> +	/* Initialize common log type. */
>  	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
>  	if (mlx5_common_logtype >= 0)
>  		rte_log_set_level(mlx5_common_logtype,
> RTE_LOG_NOTICE);
> +	/*
> +	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
> +	 * huge pages. Calling ibv_fork_init() during init allows
> +	 * applications to use fork() safely for purposes other than
> +	 * using this PMD, which is not supported in forked processes.
> +	 */
> +	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
> +	/* Match the size of Rx completion entry to the size of a cacheline. */
> +	if (RTE_CACHE_LINE_SIZE == 128)
> +		setenv("MLX5_CQE_SIZE", "128", 0);
> +	/*
> +	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
> +	 * cleanup all the Verbs resources even when the device was
> removed.
> +	 */
> +	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
> +	/* The glue initialization was done earlier by mlx5 common library. */
> +#ifdef RTE_IBVERBS_LINK_DLOPEN
> +	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
> +	const char *path[] = {
> +		/*
> +		 * A basic security check is necessary before trusting
> +		 * MLX5_GLUE_PATH, which may override
> RTE_EAL_PMD_PATH.
> +		 */
> +		(geteuid() == getuid() && getegid() == getgid() ?
> +		 getenv("MLX5_GLUE_PATH") : NULL),
> +		/*
> +		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
> +		 * variant, otherwise let dlopen() look up libraries on its
> +		 * own.
> +		 */
> +		(*RTE_EAL_PMD_PATH ?
> +		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
> +	};
> +	unsigned int i = 0;
> +	void **sym;
> +	const char *dlmsg;
> +
> +	while (!handle && i != RTE_DIM(path)) {
> +		const char *end;
> +		size_t len;
> +		int ret;
> +
> +		if (!path[i]) {
> +			++i;
> +			continue;
> +		}
> +		end = strpbrk(path[i], ":;");
> +		if (!end)
> +			end = path[i] + strlen(path[i]);
> +		len = end - path[i];
> +		ret = 0;
> +		do {
> +			char name[ret + 1];
> +
> +			ret = snprintf(name, sizeof(name), "%.*s%s"
> MLX5_GLUE,
> +				       (int)len, path[i],
> +				       (!len || *(end - 1) == '/') ? "" : "/");
> +			if (ret == -1)
> +				break;
> +			if (sizeof(name) != (size_t)ret + 1)
> +				continue;
> +			DRV_LOG(DEBUG, "Looking for rdma-core glue as "
> +				"\"%s\"", name);
> +			handle = dlopen(name, RTLD_LAZY);
> +			break;
> +		} while (1);
> +		path[i] = end + 1;
> +		if (!*end)
> +			++i;
> +	}
> +	if (!handle) {
> +		rte_errno = EINVAL;
> +		dlmsg = dlerror();
> +		if (dlmsg)
> +			DRV_LOG(WARNING, "Cannot load glue library: %s",
> dlmsg);
> +		goto glue_error;
> +	}
> +	sym = dlsym(handle, "mlx5_glue");
> +	if (!sym || !*sym) {
> +		rte_errno = EINVAL;
> +		dlmsg = dlerror();
> +		if (dlmsg)
> +			DRV_LOG(ERR, "Cannot resolve glue symbol: %s",
> dlmsg);
> +		goto glue_error;
> +	}
> +	mlx5_glue = *sym;
> +#endif /* RTE_IBVERBS_LINK_DLOPEN */
> +#ifndef NDEBUG
> +	/* Glue structure must not contain any NULL pointers. */
> +	{
> +		unsigned int i;
> +
> +		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
> +			assert(((const void *const *)mlx5_glue)[i]);
> +	}
> +#endif
> +	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
> +		rte_errno = EINVAL;
> +		DRV_LOG(ERR, "rdma-core glue \"%s\" mismatch: \"%s\" is "
> +			"required", mlx5_glue->version,
> MLX5_GLUE_VERSION);
> +		goto glue_error;
> +	}
> +	mlx5_glue->fork_init();
> +	return;
> +glue_error:
> +	if (handle)
> +		dlclose(handle);
> +	DRV_LOG(WARNING, "Cannot initialize MLX5 common due to
> missing"
> +		" run-time dependency on rdma-core libraries (libibverbs,"
> +		" libmlx5)");
> +	mlx5_glue = NULL;
> +	return;
>  }
> diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index
> a9558ca..dc6b3c8 100644
> --- a/drivers/net/mlx5/Makefile
> +++ b/drivers/net/mlx5/Makefile
> @@ -6,15 +6,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
> 
>  # Library name.
>  LIB = librte_pmd_mlx5.a
> -LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> -LIB_GLUE_BASE = librte_pmd_mlx5_glue.so -LIB_GLUE_VERSION = 20.02.0
> -
> -ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> -CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
> -CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
> -LDLIBS += -ldl
> -endif
> 
>  # Sources.
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c diff --git
> a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build index
> f6d0db9..e10ef3a 100644
> --- a/drivers/net/mlx5/meson.build
> +++ b/drivers/net/mlx5/meson.build
> @@ -8,10 +8,6 @@ if not is_linux
>  	subdir_done()
>  endif
> 
> -LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
> -LIB_GLUE_VERSION = '20.02.0'
> -LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
> -
>  allow_experimental_apis = true
>  deps += ['hash', 'common_mlx5']
>  sources = files(
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 7cf357d..8fbe826 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -7,7 +7,6 @@
>  #include <unistd.h>
>  #include <string.h>
>  #include <assert.h>
> -#include <dlfcn.h>
>  #include <stdint.h>
>  #include <stdlib.h>
>  #include <errno.h>
> @@ -3505,138 +3504,6 @@ struct mlx5_flow_id_pool *
>  		     RTE_PCI_DRV_PROBE_AGAIN,
>  };
> 
> -#ifdef RTE_IBVERBS_LINK_DLOPEN
> -
> -/**
> - * Suffix RTE_EAL_PMD_PATH with "-glue".
> - *
> - * This function performs a sanity check on RTE_EAL_PMD_PATH before
> - * suffixing its last component.
> - *
> - * @param buf[out]
> - *   Output buffer, should be large enough otherwise NULL is returned.
> - * @param size
> - *   Size of @p out.
> - *
> - * @return
> - *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
> - */
> -static char *
> -mlx5_glue_path(char *buf, size_t size)
> -{
> -	static const char *const bad[] = { "/", ".", "..", NULL };
> -	const char *path = RTE_EAL_PMD_PATH;
> -	size_t len = strlen(path);
> -	size_t off;
> -	int i;
> -
> -	while (len && path[len - 1] == '/')
> -		--len;
> -	for (off = len; off && path[off - 1] != '/'; --off)
> -		;
> -	for (i = 0; bad[i]; ++i)
> -		if (!strncmp(path + off, bad[i], (int)(len - off)))
> -			goto error;
> -	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
> -	if (i == -1 || (size_t)i >= size)
> -		goto error;
> -	return buf;
> -error:
> -	DRV_LOG(ERR,
> -		"unable to append \"-glue\" to last component of"
> -		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),"
> -		" please re-configure DPDK");
> -	return NULL;
> -}
> -
> -/**
> - * Initialization routine for run-time dependency on rdma-core.
> - */
> -static int
> -mlx5_glue_init(void)
> -{
> -	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
> -	const char *path[] = {
> -		/*
> -		 * A basic security check is necessary before trusting
> -		 * MLX5_GLUE_PATH, which may override
> RTE_EAL_PMD_PATH.
> -		 */
> -		(geteuid() == getuid() && getegid() == getgid() ?
> -		 getenv("MLX5_GLUE_PATH") : NULL),
> -		/*
> -		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
> -		 * variant, otherwise let dlopen() look up libraries on its
> -		 * own.
> -		 */
> -		(*RTE_EAL_PMD_PATH ?
> -		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
> -	};
> -	unsigned int i = 0;
> -	void *handle = NULL;
> -	void **sym;
> -	const char *dlmsg;
> -
> -	while (!handle && i != RTE_DIM(path)) {
> -		const char *end;
> -		size_t len;
> -		int ret;
> -
> -		if (!path[i]) {
> -			++i;
> -			continue;
> -		}
> -		end = strpbrk(path[i], ":;");
> -		if (!end)
> -			end = path[i] + strlen(path[i]);
> -		len = end - path[i];
> -		ret = 0;
> -		do {
> -			char name[ret + 1];
> -
> -			ret = snprintf(name, sizeof(name), "%.*s%s"
> MLX5_GLUE,
> -				       (int)len, path[i],
> -				       (!len || *(end - 1) == '/') ? "" : "/");
> -			if (ret == -1)
> -				break;
> -			if (sizeof(name) != (size_t)ret + 1)
> -				continue;
> -			DRV_LOG(DEBUG, "looking for rdma-core glue as
> \"%s\"",
> -				name);
> -			handle = dlopen(name, RTLD_LAZY);
> -			break;
> -		} while (1);
> -		path[i] = end + 1;
> -		if (!*end)
> -			++i;
> -	}
> -	if (!handle) {
> -		rte_errno = EINVAL;
> -		dlmsg = dlerror();
> -		if (dlmsg)
> -			DRV_LOG(WARNING, "cannot load glue library: %s",
> dlmsg);
> -		goto glue_error;
> -	}
> -	sym = dlsym(handle, "mlx5_glue");
> -	if (!sym || !*sym) {
> -		rte_errno = EINVAL;
> -		dlmsg = dlerror();
> -		if (dlmsg)
> -			DRV_LOG(ERR, "cannot resolve glue symbol: %s",
> dlmsg);
> -		goto glue_error;
> -	}
> -	mlx5_glue = *sym;
> -	return 0;
> -glue_error:
> -	if (handle)
> -		dlclose(handle);
> -	DRV_LOG(WARNING,
> -		"cannot initialize PMD due to missing run-time dependency
> on"
> -		" rdma-core libraries (libibverbs, libmlx5)");
> -	return -rte_errno;
> -}
> -
> -#endif
> -
>  /**
>   * Driver initialization routine.
>   */
> @@ -3651,43 +3518,8 @@ struct mlx5_flow_id_pool *
>  	mlx5_set_ptype_table();
>  	mlx5_set_cksum_table();
>  	mlx5_set_swp_types_table();
> -	/*
> -	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
> -	 * huge pages. Calling ibv_fork_init() during init allows
> -	 * applications to use fork() safely for purposes other than
> -	 * using this PMD, which is not supported in forked processes.
> -	 */
> -	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
> -	/* Match the size of Rx completion entry to the size of a cacheline. */
> -	if (RTE_CACHE_LINE_SIZE == 128)
> -		setenv("MLX5_CQE_SIZE", "128", 0);
> -	/*
> -	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
> -	 * cleanup all the Verbs resources even when the device was
> removed.
> -	 */
> -	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
> -#ifdef RTE_IBVERBS_LINK_DLOPEN
> -	if (mlx5_glue_init())
> -		return;
> -	assert(mlx5_glue);
> -#endif
> -#ifndef NDEBUG
> -	/* Glue structure must not contain any NULL pointers. */
> -	{
> -		unsigned int i;
> -
> -		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
> -			assert(((const void *const *)mlx5_glue)[i]);
> -	}
> -#endif
> -	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
> -		DRV_LOG(ERR,
> -			"rdma-core glue \"%s\" mismatch: \"%s\" is
> required",
> -			mlx5_glue->version, MLX5_GLUE_VERSION);
> -		return;
> -	}
> -	mlx5_glue->fork_init();
> -	rte_pci_register(&mlx5_driver);
> +	if (mlx5_glue)
> +		rte_pci_register(&mlx5_driver);
>  }
> 
>  RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
> --
> 1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference
  2020-01-30  8:10         ` Matan Azrad
@ 2020-01-30  8:38           ` Raslan Darawsheh
  0 siblings, 0 replies; 174+ messages in thread
From: Raslan Darawsheh @ 2020-01-30  8:38 UTC (permalink / raw)
  To: Matan Azrad, dev, Slava Ovsiienko
I agree with you,
I'll work on squashing the two patches together during integration.
Kindest regards,
Raslan Darawsheh
> -----Original Message-----
> From: Matan Azrad <matan@mellanox.com>
> Sent: Thursday, January 30, 2020 10:10 AM
> To: Matan Azrad <matan@mellanox.com>; dev@dpdk.org; Slava Ovsiienko
> <viacheslavo@mellanox.com>
> Cc: Raslan Darawsheh <rasland@mellanox.com>
> Subject: RE: [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue
> reference
> 
> Self-suggestion.
> It makes sense to squash this patch to the previous patch since the glue
> started to move in the previous patch.
> 
> Raslan, can we do it in integration or we need to send all the series again for
> it?
> 
> From: Matan Azrad
> > A new Mellanox vdpa PMD will be added to support vdpa operations by
> > Mellanox adapters.
> >
> > Both, the mlx5 PMD and the vdpa mlx5 PMD should initialize the glue.
> >
> > The glue initialization should be only one per process, so all the
> > mlx5 PMDs using the glue should share the same glue object.
> >
> > Move the glue initialization to be in common/mlx5 library to be
> > initialized by its constructor only once.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/common/mlx5/mlx5_common.c | 173
> > +++++++++++++++++++++++++++++++++++++-
> >  drivers/net/mlx5/Makefile         |   9 --
> >  drivers/net/mlx5/meson.build      |   4 -
> >  drivers/net/mlx5/mlx5.c           | 172 +------------------------------------
> >  4 files changed, 173 insertions(+), 185 deletions(-)
> >
> > diff --git a/drivers/common/mlx5/mlx5_common.c
> > b/drivers/common/mlx5/mlx5_common.c
> > index 14ebd30..9c88a63 100644
> > --- a/drivers/common/mlx5/mlx5_common.c
> > +++ b/drivers/common/mlx5/mlx5_common.c
> > @@ -2,16 +2,185 @@
> >   * Copyright 2019 Mellanox Technologies, Ltd
> >   */
> >
> > +#include <dlfcn.h>
> > +#include <unistd.h>
> > +#include <string.h>
> > +
> > +#include <rte_errno.h>
> > +
> >  #include "mlx5_common.h"
> > +#include "mlx5_common_utils.h"
> > +#include "mlx5_glue.h"
> >
> >
> >  int mlx5_common_logtype;
> >
> >
> > -RTE_INIT(rte_mlx5_common_pmd_init)
> > +#ifdef RTE_IBVERBS_LINK_DLOPEN
> > +
> > +/**
> > + * Suffix RTE_EAL_PMD_PATH with "-glue".
> > + *
> > + * This function performs a sanity check on RTE_EAL_PMD_PATH before
> > + * suffixing its last component.
> > + *
> > + * @param buf[out]
> > + *   Output buffer, should be large enough otherwise NULL is returned.
> > + * @param size
> > + *   Size of @p out.
> > + *
> > + * @return
> > + *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
> > + */
> > +static char *
> > +mlx5_glue_path(char *buf, size_t size) {
> > +	static const char *const bad[] = { "/", ".", "..", NULL };
> > +	const char *path = RTE_EAL_PMD_PATH;
> > +	size_t len = strlen(path);
> > +	size_t off;
> > +	int i;
> > +
> > +	while (len && path[len - 1] == '/')
> > +		--len;
> > +	for (off = len; off && path[off - 1] != '/'; --off)
> > +		;
> > +	for (i = 0; bad[i]; ++i)
> > +		if (!strncmp(path + off, bad[i], (int)(len - off)))
> > +			goto error;
> > +	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
> > +	if (i == -1 || (size_t)i >= size)
> > +		goto error;
> > +	return buf;
> > +error:
> > +	RTE_LOG(ERR, PMD, "unable to append \"-glue\" to last component
> > of"
> > +		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),
> > please"
> > +		" re-configure DPDK");
> > +	return NULL;
> > +}
> > +#endif
> > +
> > +/**
> > + * Initialization routine for run-time dependency on rdma-core.
> > + */
> > +RTE_INIT_PRIO(mlx5_glue_init, CLASS)
> >  {
> > -	/* Initialize driver log type. */
> > +	void *handle = NULL;
> > +
> > +	/* Initialize common log type. */
> >  	mlx5_common_logtype = rte_log_register("pmd.common.mlx5");
> >  	if (mlx5_common_logtype >= 0)
> >  		rte_log_set_level(mlx5_common_logtype,
> > RTE_LOG_NOTICE);
> > +	/*
> > +	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
> > +	 * huge pages. Calling ibv_fork_init() during init allows
> > +	 * applications to use fork() safely for purposes other than
> > +	 * using this PMD, which is not supported in forked processes.
> > +	 */
> > +	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
> > +	/* Match the size of Rx completion entry to the size of a cacheline. */
> > +	if (RTE_CACHE_LINE_SIZE == 128)
> > +		setenv("MLX5_CQE_SIZE", "128", 0);
> > +	/*
> > +	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
> > +	 * cleanup all the Verbs resources even when the device was
> > removed.
> > +	 */
> > +	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
> > +	/* The glue initialization was done earlier by mlx5 common library.
> > +*/ #ifdef RTE_IBVERBS_LINK_DLOPEN
> > +	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
> > +	const char *path[] = {
> > +		/*
> > +		 * A basic security check is necessary before trusting
> > +		 * MLX5_GLUE_PATH, which may override
> > RTE_EAL_PMD_PATH.
> > +		 */
> > +		(geteuid() == getuid() && getegid() == getgid() ?
> > +		 getenv("MLX5_GLUE_PATH") : NULL),
> > +		/*
> > +		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
> > +		 * variant, otherwise let dlopen() look up libraries on its
> > +		 * own.
> > +		 */
> > +		(*RTE_EAL_PMD_PATH ?
> > +		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
> > +	};
> > +	unsigned int i = 0;
> > +	void **sym;
> > +	const char *dlmsg;
> > +
> > +	while (!handle && i != RTE_DIM(path)) {
> > +		const char *end;
> > +		size_t len;
> > +		int ret;
> > +
> > +		if (!path[i]) {
> > +			++i;
> > +			continue;
> > +		}
> > +		end = strpbrk(path[i], ":;");
> > +		if (!end)
> > +			end = path[i] + strlen(path[i]);
> > +		len = end - path[i];
> > +		ret = 0;
> > +		do {
> > +			char name[ret + 1];
> > +
> > +			ret = snprintf(name, sizeof(name), "%.*s%s"
> > MLX5_GLUE,
> > +				       (int)len, path[i],
> > +				       (!len || *(end - 1) == '/') ? "" : "/");
> > +			if (ret == -1)
> > +				break;
> > +			if (sizeof(name) != (size_t)ret + 1)
> > +				continue;
> > +			DRV_LOG(DEBUG, "Looking for rdma-core glue as "
> > +				"\"%s\"", name);
> > +			handle = dlopen(name, RTLD_LAZY);
> > +			break;
> > +		} while (1);
> > +		path[i] = end + 1;
> > +		if (!*end)
> > +			++i;
> > +	}
> > +	if (!handle) {
> > +		rte_errno = EINVAL;
> > +		dlmsg = dlerror();
> > +		if (dlmsg)
> > +			DRV_LOG(WARNING, "Cannot load glue library: %s",
> > dlmsg);
> > +		goto glue_error;
> > +	}
> > +	sym = dlsym(handle, "mlx5_glue");
> > +	if (!sym || !*sym) {
> > +		rte_errno = EINVAL;
> > +		dlmsg = dlerror();
> > +		if (dlmsg)
> > +			DRV_LOG(ERR, "Cannot resolve glue symbol: %s",
> > dlmsg);
> > +		goto glue_error;
> > +	}
> > +	mlx5_glue = *sym;
> > +#endif /* RTE_IBVERBS_LINK_DLOPEN */
> > +#ifndef NDEBUG
> > +	/* Glue structure must not contain any NULL pointers. */
> > +	{
> > +		unsigned int i;
> > +
> > +		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
> > +			assert(((const void *const *)mlx5_glue)[i]);
> > +	}
> > +#endif
> > +	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
> > +		rte_errno = EINVAL;
> > +		DRV_LOG(ERR, "rdma-core glue \"%s\" mismatch: \"%s\" is "
> > +			"required", mlx5_glue->version,
> > MLX5_GLUE_VERSION);
> > +		goto glue_error;
> > +	}
> > +	mlx5_glue->fork_init();
> > +	return;
> > +glue_error:
> > +	if (handle)
> > +		dlclose(handle);
> > +	DRV_LOG(WARNING, "Cannot initialize MLX5 common due to
> > missing"
> > +		" run-time dependency on rdma-core libraries (libibverbs,"
> > +		" libmlx5)");
> > +	mlx5_glue = NULL;
> > +	return;
> >  }
> > diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
> > index
> > a9558ca..dc6b3c8 100644
> > --- a/drivers/net/mlx5/Makefile
> > +++ b/drivers/net/mlx5/Makefile
> > @@ -6,15 +6,6 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >
> >  # Library name.
> >  LIB = librte_pmd_mlx5.a
> > -LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > -LIB_GLUE_BASE = librte_pmd_mlx5_glue.so -LIB_GLUE_VERSION =
> 20.02.0
> > -
> > -ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> > -CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
> > -CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
> > -LDLIBS += -ldl
> > -endif
> >
> >  # Sources.
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c diff --git
> > a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build index
> > f6d0db9..e10ef3a 100644
> > --- a/drivers/net/mlx5/meson.build
> > +++ b/drivers/net/mlx5/meson.build
> > @@ -8,10 +8,6 @@ if not is_linux
> >  	subdir_done()
> >  endif
> >
> > -LIB_GLUE_BASE = 'librte_pmd_mlx5_glue.so'
> > -LIB_GLUE_VERSION = '20.02.0'
> > -LIB_GLUE = LIB_GLUE_BASE + '.' + LIB_GLUE_VERSION
> > -
> >  allow_experimental_apis = true
> >  deps += ['hash', 'common_mlx5']
> >  sources = files(
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > 7cf357d..8fbe826 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -7,7 +7,6 @@
> >  #include <unistd.h>
> >  #include <string.h>
> >  #include <assert.h>
> > -#include <dlfcn.h>
> >  #include <stdint.h>
> >  #include <stdlib.h>
> >  #include <errno.h>
> > @@ -3505,138 +3504,6 @@ struct mlx5_flow_id_pool *
> >  		     RTE_PCI_DRV_PROBE_AGAIN,
> >  };
> >
> > -#ifdef RTE_IBVERBS_LINK_DLOPEN
> > -
> > -/**
> > - * Suffix RTE_EAL_PMD_PATH with "-glue".
> > - *
> > - * This function performs a sanity check on RTE_EAL_PMD_PATH before
> > - * suffixing its last component.
> > - *
> > - * @param buf[out]
> > - *   Output buffer, should be large enough otherwise NULL is returned.
> > - * @param size
> > - *   Size of @p out.
> > - *
> > - * @return
> > - *   Pointer to @p buf or @p NULL in case suffix cannot be appended.
> > - */
> > -static char *
> > -mlx5_glue_path(char *buf, size_t size) -{
> > -	static const char *const bad[] = { "/", ".", "..", NULL };
> > -	const char *path = RTE_EAL_PMD_PATH;
> > -	size_t len = strlen(path);
> > -	size_t off;
> > -	int i;
> > -
> > -	while (len && path[len - 1] == '/')
> > -		--len;
> > -	for (off = len; off && path[off - 1] != '/'; --off)
> > -		;
> > -	for (i = 0; bad[i]; ++i)
> > -		if (!strncmp(path + off, bad[i], (int)(len - off)))
> > -			goto error;
> > -	i = snprintf(buf, size, "%.*s-glue", (int)len, path);
> > -	if (i == -1 || (size_t)i >= size)
> > -		goto error;
> > -	return buf;
> > -error:
> > -	DRV_LOG(ERR,
> > -		"unable to append \"-glue\" to last component of"
> > -		" RTE_EAL_PMD_PATH (\"" RTE_EAL_PMD_PATH "\"),"
> > -		" please re-configure DPDK");
> > -	return NULL;
> > -}
> > -
> > -/**
> > - * Initialization routine for run-time dependency on rdma-core.
> > - */
> > -static int
> > -mlx5_glue_init(void)
> > -{
> > -	char glue_path[sizeof(RTE_EAL_PMD_PATH) - 1 + sizeof("-glue")];
> > -	const char *path[] = {
> > -		/*
> > -		 * A basic security check is necessary before trusting
> > -		 * MLX5_GLUE_PATH, which may override
> > RTE_EAL_PMD_PATH.
> > -		 */
> > -		(geteuid() == getuid() && getegid() == getgid() ?
> > -		 getenv("MLX5_GLUE_PATH") : NULL),
> > -		/*
> > -		 * When RTE_EAL_PMD_PATH is set, use its glue-suffixed
> > -		 * variant, otherwise let dlopen() look up libraries on its
> > -		 * own.
> > -		 */
> > -		(*RTE_EAL_PMD_PATH ?
> > -		 mlx5_glue_path(glue_path, sizeof(glue_path)) : ""),
> > -	};
> > -	unsigned int i = 0;
> > -	void *handle = NULL;
> > -	void **sym;
> > -	const char *dlmsg;
> > -
> > -	while (!handle && i != RTE_DIM(path)) {
> > -		const char *end;
> > -		size_t len;
> > -		int ret;
> > -
> > -		if (!path[i]) {
> > -			++i;
> > -			continue;
> > -		}
> > -		end = strpbrk(path[i], ":;");
> > -		if (!end)
> > -			end = path[i] + strlen(path[i]);
> > -		len = end - path[i];
> > -		ret = 0;
> > -		do {
> > -			char name[ret + 1];
> > -
> > -			ret = snprintf(name, sizeof(name), "%.*s%s"
> > MLX5_GLUE,
> > -				       (int)len, path[i],
> > -				       (!len || *(end - 1) == '/') ? "" : "/");
> > -			if (ret == -1)
> > -				break;
> > -			if (sizeof(name) != (size_t)ret + 1)
> > -				continue;
> > -			DRV_LOG(DEBUG, "looking for rdma-core glue as
> > \"%s\"",
> > -				name);
> > -			handle = dlopen(name, RTLD_LAZY);
> > -			break;
> > -		} while (1);
> > -		path[i] = end + 1;
> > -		if (!*end)
> > -			++i;
> > -	}
> > -	if (!handle) {
> > -		rte_errno = EINVAL;
> > -		dlmsg = dlerror();
> > -		if (dlmsg)
> > -			DRV_LOG(WARNING, "cannot load glue library: %s",
> > dlmsg);
> > -		goto glue_error;
> > -	}
> > -	sym = dlsym(handle, "mlx5_glue");
> > -	if (!sym || !*sym) {
> > -		rte_errno = EINVAL;
> > -		dlmsg = dlerror();
> > -		if (dlmsg)
> > -			DRV_LOG(ERR, "cannot resolve glue symbol: %s",
> > dlmsg);
> > -		goto glue_error;
> > -	}
> > -	mlx5_glue = *sym;
> > -	return 0;
> > -glue_error:
> > -	if (handle)
> > -		dlclose(handle);
> > -	DRV_LOG(WARNING,
> > -		"cannot initialize PMD due to missing run-time dependency
> > on"
> > -		" rdma-core libraries (libibverbs, libmlx5)");
> > -	return -rte_errno;
> > -}
> > -
> > -#endif
> > -
> >  /**
> >   * Driver initialization routine.
> >   */
> > @@ -3651,43 +3518,8 @@ struct mlx5_flow_id_pool *
> >  	mlx5_set_ptype_table();
> >  	mlx5_set_cksum_table();
> >  	mlx5_set_swp_types_table();
> > -	/*
> > -	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
> > -	 * huge pages. Calling ibv_fork_init() during init allows
> > -	 * applications to use fork() safely for purposes other than
> > -	 * using this PMD, which is not supported in forked processes.
> > -	 */
> > -	setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
> > -	/* Match the size of Rx completion entry to the size of a cacheline. */
> > -	if (RTE_CACHE_LINE_SIZE == 128)
> > -		setenv("MLX5_CQE_SIZE", "128", 0);
> > -	/*
> > -	 * MLX5_DEVICE_FATAL_CLEANUP tells ibv_destroy functions to
> > -	 * cleanup all the Verbs resources even when the device was
> > removed.
> > -	 */
> > -	setenv("MLX5_DEVICE_FATAL_CLEANUP", "1", 1);
> > -#ifdef RTE_IBVERBS_LINK_DLOPEN
> > -	if (mlx5_glue_init())
> > -		return;
> > -	assert(mlx5_glue);
> > -#endif
> > -#ifndef NDEBUG
> > -	/* Glue structure must not contain any NULL pointers. */
> > -	{
> > -		unsigned int i;
> > -
> > -		for (i = 0; i != sizeof(*mlx5_glue) / sizeof(void *); ++i)
> > -			assert(((const void *const *)mlx5_glue)[i]);
> > -	}
> > -#endif
> > -	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
> > -		DRV_LOG(ERR,
> > -			"rdma-core glue \"%s\" mismatch: \"%s\" is
> > required",
> > -			mlx5_glue->version, MLX5_GLUE_VERSION);
> > -		return;
> > -	}
> > -	mlx5_glue->fork_init();
> > -	rte_pci_register(&mlx5_driver);
> > +	if (mlx5_glue)
> > +		rte_pci_register(&mlx5_driver);
> >  }
> >
> >  RTE_PMD_EXPORT_NAME(net_mlx5, __COUNTER__);
> > --
> > 1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library
  2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
                         ` (24 preceding siblings ...)
  2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
@ 2020-01-30 12:26       ` Raslan Darawsheh
  25 siblings, 0 replies; 174+ messages in thread
From: Raslan Darawsheh @ 2020-01-30 12:26 UTC (permalink / raw)
  To: Matan Azrad, dev, Slava Ovsiienko
Hi,
> -----Original Message-----
> From: Matan Azrad <matan@mellanox.com>
> Sent: Wednesday, January 29, 2020 2:38 PM
> To: dev@dpdk.org; Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: Raslan Darawsheh <rasland@mellanox.com>
> Subject: [PATCH v4 00/25] Introduce mlx5 common library
> 
> Steps:
> - Prepare net/mlx5 for code sharing.
> - Introduce new common lib for mlx5 devices.
> - Share code from net/mlx5 to common/mlx5.
> 
> v2:
> - Reorder patches for 2 serieses - this is the first one for common directory
> and vDPA preparation,
>   the second will be sent later for vDPA new driver part.
> - Fix spelling and per patch complition issues.
> - moved to use claim_zero instead of pure asserts.
> - improve title names.
> 
> v3:
> rebase.
> 
> v4:
> Change devargs argument to get class name.
> Actually only the last 4 pathes here were changed.
> 
> Matan Azrad (25):
>   net/mlx5: separate DevX commands interface
>   drivers: introduce mlx5 common library
>   common/mlx5: share the mlx5 glue reference
>   common/mlx5: share mlx5 PCI device detection
>   common/mlx5: share mlx5 devices information
>   common/mlx5: share CQ entry check
>   common/mlx5: add query vDPA DevX capabilities
>   common/mlx5: glue null memory region allocation
>   common/mlx5: support DevX indirect mkey creation
>   common/mlx5: glue event queue query
>   common/mlx5: glue event interrupt commands
>   common/mlx5: glue UAR allocation
>   common/mlx5: add DevX command to create CQ
>   common/mlx5: glue VAR allocation
>   common/mlx5: add DevX virtq commands
>   common/mlx5: add support for DevX QP operations
>   common/mlx5: allow type configuration for DevX RQT
>   common/mlx5: add TIR field constants
>   common/mlx5: add DevX command to modify RQT
>   common/mlx5: get DevX capability for max RQT size
>   net/mlx5: select driver by class device argument
>   net/mlx5: separate Netlink command interface
>   net/mlx5: reduce Netlink commands dependencies
>   common/mlx5: share Netlink commands
>   common/mlx5: support ROCE disable through Netlink
> 
>  MAINTAINERS                                     |    1 +
>  drivers/common/Makefile                         |    4 +
>  drivers/common/meson.build                      |    2 +-
>  drivers/common/mlx5/Makefile                    |  347 ++++
>  drivers/common/mlx5/meson.build                 |  210 ++
>  drivers/common/mlx5/mlx5_common.c               |  332 +++
>  drivers/common/mlx5/mlx5_common.h               |  223 ++
>  drivers/common/mlx5/mlx5_common_utils.h         |   20 +
>  drivers/common/mlx5/mlx5_devx_cmds.c            | 1530 ++++++++++++++
>  drivers/common/mlx5/mlx5_devx_cmds.h            |  351 ++++
>  drivers/common/mlx5/mlx5_glue.c                 | 1296 ++++++++++++
>  drivers/common/mlx5/mlx5_glue.h                 |  305 +++
>  drivers/common/mlx5/mlx5_nl.c                   | 1699 +++++++++++++++
>  drivers/common/mlx5/mlx5_nl.h                   |   63 +
>  drivers/common/mlx5/mlx5_prm.h                  | 2542
> +++++++++++++++++++++++
>  drivers/common/mlx5/rte_common_mlx5_version.map |   51 +
>  drivers/net/mlx5/Makefile                       |  307 +--
>  drivers/net/mlx5/meson.build                    |  257 +--
>  drivers/net/mlx5/mlx5.c                         |  197 +-
>  drivers/net/mlx5/mlx5.h                         |  326 +--
>  drivers/net/mlx5/mlx5_defs.h                    |    8 -
>  drivers/net/mlx5/mlx5_devx_cmds.c               |  969 ---------
>  drivers/net/mlx5/mlx5_ethdev.c                  |  161 +-
>  drivers/net/mlx5/mlx5_flow.c                    |   12 +-
>  drivers/net/mlx5/mlx5_flow.h                    |    3 +-
>  drivers/net/mlx5/mlx5_flow_dv.c                 |   12 +-
>  drivers/net/mlx5/mlx5_flow_meter.c              |    2 +
>  drivers/net/mlx5/mlx5_flow_verbs.c              |    7 +-
>  drivers/net/mlx5/mlx5_glue.c                    | 1150 ----------
>  drivers/net/mlx5/mlx5_glue.h                    |  264 ---
>  drivers/net/mlx5/mlx5_mac.c                     |   16 +-
>  drivers/net/mlx5/mlx5_mr.c                      |    3 +-
>  drivers/net/mlx5/mlx5_nl.c                      | 1402 -------------
>  drivers/net/mlx5/mlx5_prm.h                     | 1888 -----------------
>  drivers/net/mlx5/mlx5_rss.c                     |    2 +-
>  drivers/net/mlx5/mlx5_rxmode.c                  |   12 +-
>  drivers/net/mlx5/mlx5_rxq.c                     |    7 +-
>  drivers/net/mlx5/mlx5_rxtx.c                    |    7 +-
>  drivers/net/mlx5/mlx5_rxtx.h                    |   46 +-
>  drivers/net/mlx5/mlx5_rxtx_vec.c                |    5 +-
>  drivers/net/mlx5/mlx5_rxtx_vec.h                |    3 +-
>  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |    5 +-
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |    5 +-
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |    5 +-
>  drivers/net/mlx5/mlx5_stats.c                   |    5 +-
>  drivers/net/mlx5/mlx5_txq.c                     |    7 +-
>  drivers/net/mlx5/mlx5_utils.h                   |   79 +-
>  drivers/net/mlx5/mlx5_vlan.c                    |  137 +-
>  mk/rte.app.mk                                   |    1 +
>  49 files changed, 9286 insertions(+), 7000 deletions(-)  create mode 100644
> drivers/common/mlx5/Makefile  create mode 100644
> drivers/common/mlx5/meson.build  create mode 100644
> drivers/common/mlx5/mlx5_common.c  create mode 100644
> drivers/common/mlx5/mlx5_common.h  create mode 100644
> drivers/common/mlx5/mlx5_common_utils.h
>  create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.c
>  create mode 100644 drivers/common/mlx5/mlx5_devx_cmds.h
>  create mode 100644 drivers/common/mlx5/mlx5_glue.c  create mode
> 100644 drivers/common/mlx5/mlx5_glue.h  create mode 100644
> drivers/common/mlx5/mlx5_nl.c  create mode 100644
> drivers/common/mlx5/mlx5_nl.h  create mode 100644
> drivers/common/mlx5/mlx5_prm.h  create mode 100644
> drivers/common/mlx5/rte_common_mlx5_version.map
>  delete mode 100644 drivers/net/mlx5/mlx5_devx_cmds.c  delete mode
> 100644 drivers/net/mlx5/mlx5_glue.c  delete mode 100644
> drivers/net/mlx5/mlx5_glue.h  delete mode 100644
> drivers/net/mlx5/mlx5_nl.c  delete mode 100644
> drivers/net/mlx5/mlx5_prm.h
> 
> --
> 1.8.3.1
Squashed patched 2 and 3, 
Series applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 01/13] drivers: introduce mlx5 vDPA driver
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 01/13] drivers: introduce " Matan Azrad
@ 2020-01-30 14:38     ` Maxime Coquelin
  2020-02-01 17:53       ` Matan Azrad
  0 siblings, 1 reply; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 14:38 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:08 AM, Matan Azrad wrote:
> Add a new driver to support vDPA operations by Mellanox devices.
> 
> The first Mellanox devices which support vDPA operations are
> ConnectX6DX and Bluefield1 HCA for their PF ports and VF ports.
> 
> This driver is depending on rdma-core like the mlx5 PMD, also it is
> going to use mlx5 DevX to create HW objects directly by the FW.
> Hence, the common/mlx5 library is linked to the mlx5_vdpa driver.
If possible, I would really appreciate to have the information on
the versions required for the above dependencies. Better if it is also
mentionned in the guide.
> This driver will not be compiled by default due to the above
> dependencies.
> 
> Register a new log type for this driver.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  MAINTAINERS                                     |   7 +
>  config/common_base                              |   5 +
>  doc/guides/rel_notes/release_20_02.rst          |   5 +
>  doc/guides/vdpadevs/features/mlx5.ini           |  14 ++
>  doc/guides/vdpadevs/index.rst                   |   1 +
>  doc/guides/vdpadevs/mlx5.rst                    | 111 ++++++++++++
>  drivers/common/Makefile                         |   2 +-
>  drivers/common/mlx5/Makefile                    |  17 +-
>  drivers/meson.build                             |   8 +-
>  drivers/vdpa/Makefile                           |   2 +
>  drivers/vdpa/meson.build                        |   3 +-
>  drivers/vdpa/mlx5/Makefile                      |  36 ++++
>  drivers/vdpa/mlx5/meson.build                   |  29 +++
>  drivers/vdpa/mlx5/mlx5_vdpa.c                   | 227 ++++++++++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +++
>  drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
>  mk/rte.app.mk                                   |  15 +-
>  17 files changed, 488 insertions(+), 17 deletions(-)
>  create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
>  create mode 100644 doc/guides/vdpadevs/mlx5.rst
>  create mode 100644 drivers/vdpa/mlx5/Makefile
>  create mode 100644 drivers/vdpa/mlx5/meson.build
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
>  create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 150d507..f697e9a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1103,6 +1103,13 @@ F: drivers/vdpa/ifc/
>  F: doc/guides/vdpadevs/ifc.rst
>  F: doc/guides/vdpadevs/features/ifcvf.ini
>  
> +Mellanox mlx5 vDPA
> +M: Matan Azrad <matan@mellanox.com>
> +M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> +F: drivers/vdpa/mlx5/
> +F: doc/guides/vdpadevs/mlx5.rst
> +F: doc/guides/vdpadevs/features/mlx5.ini
> +
>  
>  Eventdev Drivers
>  ----------------
> diff --git a/config/common_base b/config/common_base
> index c897dd0..6ea9c63 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -366,6 +366,11 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
>  CONFIG_RTE_LIBRTE_MLX5_PMD=n
>  CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
>  
> +#
> +# Compile vdpa-oriented Mellanox ConnectX-6 & Bluefield (MLX5) PMD
> +#
> +CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=n
> +
>  # Linking method for mlx4/5 dependency on ibverbs and related libraries
>  # Default linking is dynamic by linker.
>  # Other options are: dynamic by dlopen at run-time, or statically embedded.
> diff --git a/doc/guides/rel_notes/release_20_02.rst b/doc/guides/rel_notes/release_20_02.rst
> index 50e2c14..690e7db 100644
> --- a/doc/guides/rel_notes/release_20_02.rst
> +++ b/doc/guides/rel_notes/release_20_02.rst
> @@ -113,6 +113,11 @@ New Features
>    * Added support for RSS using L3/L4 source/destination only.
>    * Added support for matching on GTP tunnel header item.
>  
> +* **Add new vDPA PMD based on Mellanox devices**
> +
> +  Added a new Mellanox vDPA  (``mlx5_vdpa``) PMD.
> +  See the :doc:`../vdpadevs/mlx5` guide for more details on this driver.
> +
>  * **Updated testpmd application.**
>  
>    Added support for ESP and L2TPv3 over IP rte_flow patterns to the testpmd
> diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
> new file mode 100644
> index 0000000..d635bdf
> --- /dev/null
> +++ b/doc/guides/vdpadevs/features/mlx5.ini
> @@ -0,0 +1,14 @@
> +;
> +; Supported features of the 'mlx5' VDPA driver.
> +;
> +; Refer to default.ini for the full list of available driver features.
> +;
> +[Features]
> +Other kdrv           = Y
> +ARMv8                = Y
> +Power8               = Y
> +x86-32               = Y
> +x86-64               = Y
> +Usage doc            = Y
> +Design doc           = Y
> +
> diff --git a/doc/guides/vdpadevs/index.rst b/doc/guides/vdpadevs/index.rst
> index 9657108..1a13efe 100644
> --- a/doc/guides/vdpadevs/index.rst
> +++ b/doc/guides/vdpadevs/index.rst
> @@ -13,3 +13,4 @@ which can be used from an application through vhost API.
>  
>      features_overview
>      ifc
> +    mlx5
> diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
> new file mode 100644
> index 0000000..1861e71
> --- /dev/null
> +++ b/doc/guides/vdpadevs/mlx5.rst
> @@ -0,0 +1,111 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright 2019 Mellanox Technologies, Ltd
> +
> +MLX5 vDPA driver
> +================
> +
> +The MLX5 vDPA (vhost data path acceleration) driver library
> +(**librte_pmd_mlx5_vdpa**) provides support for **Mellanox ConnectX-6**,
> +**Mellanox ConnectX-6DX** and **Mellanox BlueField** families of
> +10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in
> +SR-IOV context.
> +
> +.. note::
> +
> +   Due to external dependencies, this driver is disabled in default
> +   configuration of the "make" build. It can be enabled with
> +   ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=y`` or by using "meson" build system which
> +   will detect dependencies.
> +
> +
> +Design
> +------
> +
> +For security reasons and robustness, this driver only deals with virtual
> +memory addresses. The way resources allocations are handled by the kernel,
> +combined with hardware specifications that allow to handle virtual memory
> +addresses directly, ensure that DPDK applications cannot access random
> +physical memory (or memory that does not belong to the current process).
> +
> +The PMD can use libibverbs and libmlx5 to access the device firmware
> +or directly the hardware components.
> +There are different levels of objects and bypassing abilities
> +to get the best performances:
> +
> +- Verbs is a complete high-level generic API
> +- Direct Verbs is a device-specific API
> +- DevX allows to access firmware objects
> +- Direct Rules manages flow steering at low-level hardware layer
> +
> +Enabling librte_pmd_mlx5_vdpa causes DPDK applications to be linked against
> +libibverbs.
> +
> +A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5
> +driver but not in parallel. Hence, the user should decide the driver by the
> +``class`` parameter in the device argument list.
> +By default, the mlx5 device will be probed by the net/mlx5 driver. 
> +
> +Supported NICs
> +--------------
> +
> +* Mellanox(R) ConnectX(R)-6 200G MCX654106A-HCAT (4x200G)
> +* Mellanox(R) ConnectX(R)-6DX EN 100G MCX623106AN-CDAT (2*100G)
> +* Mellanox(R) ConnectX(R)-6DX EN 200G MCX623105AN-VDAT (1*200G)
> +* Mellanox(R) BlueField SmartNIC 25G MBF1M332A-ASCAT (2*25G)
> +
> +Prerequisites
> +-------------
> +
> +- Mellanox OFED version: **4.7**
> +  see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
> +
> +Compilation options
> +~~~~~~~~~~~~~~~~~~~
> +
> +These options can be modified in the ``.config`` file.
> +
> +- ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` (default **n**)
> +
> +  Toggle compilation of librte_pmd_mlx5 itself.
> +
> +- ``CONFIG_RTE_IBVERBS_LINK_DLOPEN`` (default **n**)
> +
> +  Build PMD with additional code to make it loadable without hard
> +  dependencies on **libibverbs** nor **libmlx5**, which may not be installed
> +  on the target system.
> +
> +  In this mode, their presence is still required for it to run properly,
> +  however their absence won't prevent a DPDK application from starting (with
> +  ``CONFIG_RTE_BUILD_SHARED_LIB`` disabled) and they won't show up as
> +  missing with ``ldd(1)``.
> +
> +  It works by moving these dependencies to a purpose-built rdma-core "glue"
> +  plug-in which must either be installed in a directory whose name is based
> +  on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if set, or in a
> +  standard location for the dynamic linker (e.g. ``/lib``) if left to the
> +  default empty string (``""``).
> +
> +  This option has no performance impact.
> +
> +- ``CONFIG_RTE_IBVERBS_LINK_STATIC`` (default **n**)
> +
> +  Embed static flavor of the dependencies **libibverbs** and **libmlx5**
> +  in the PMD shared library or the executable static binary.
> +
> +.. note::
> +
> +   For BlueField, target should be set to ``arm64-bluefield-linux-gcc``. This
> +   will enable ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` and set
> +   ``RTE_CACHE_LINE_SIZE`` to 64. Default armv8a configuration of make build and
> +   meson build set it to 128 then brings performance degradation.
> +
> +Run-time configuration
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +- **ethtool** operations on related kernel interfaces also affect the PMD.
> +
> +- ``class`` parameter [string]
> +
> +  Select the class of the driver that should probe the device.
> +  `vdpa` for the mlx5 vDPA driver.
> +
> diff --git a/drivers/common/Makefile b/drivers/common/Makefile
> index 4775d4b..96bd7ac 100644
> --- a/drivers/common/Makefile
> +++ b/drivers/common/Makefile
> @@ -35,7 +35,7 @@ ifneq (,$(findstring y,$(IAVF-y)))
>  DIRS-y += iavf
>  endif
>  
> -ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
> +ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
>  DIRS-y += mlx5
>  endif
>  
> diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
> index 9d4d81f..c4b7999 100644
> --- a/drivers/common/mlx5/Makefile
> +++ b/drivers/common/mlx5/Makefile
> @@ -10,15 +10,16 @@ LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
>  LIB_GLUE_VERSION = 20.02.0
>  
>  # Sources.
> +ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
>  ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
> +SRCS-y += mlx5_glue.c
>  endif
> -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
> -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
> -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
> -
> +SRCS-y += mlx5_devx_cmds.c
> +SRCS-y += mlx5_common.c
> +SRCS-y += mlx5_nl.c
>  ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> -INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
> +INSTALL-y-lib += $(LIB_GLUE)
> +endif
>  endif
>  
>  # Basic CFLAGS.
> @@ -317,7 +318,9 @@ mlx5_autoconf.h: mlx5_autoconf.h.new
>  		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
>  		mv '$<' '$@'
>  
> -$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
> +ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
> +$(SRCS-y:.c=.o): mlx5_autoconf.h
> +endif
>  
>  # Generate dependency plug-in for rdma-core when the PMD must not be linked
>  # directly, so that applications do not inherit this dependency.
> diff --git a/drivers/meson.build b/drivers/meson.build
> index 29708cc..bd154fa 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -42,6 +42,7 @@ foreach class:dpdk_driver_classes
>  		build = true # set to false to disable, e.g. missing deps
>  		reason = '<unknown reason>' # set if build == false to explain
>  		name = drv
> +		fmt_name = ''
>  		allow_experimental_apis = false
>  		sources = []
>  		objs = []
> @@ -98,8 +99,11 @@ foreach class:dpdk_driver_classes
>  		else
>  			class_drivers += name
>  
> -			dpdk_conf.set(config_flag_fmt.format(name.to_upper()),1)
> -			lib_name = driver_name_fmt.format(name)
> +			if fmt_name == ''
> +				fmt_name = name
> +			endif
> +			dpdk_conf.set(config_flag_fmt.format(fmt_name.to_upper()),1)
> +			lib_name = driver_name_fmt.format(fmt_name)
>  
>  			if allow_experimental_apis
>  				cflags += '-DALLOW_EXPERIMENTAL_API'
> diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> index b5a7a11..6e88359 100644
> --- a/drivers/vdpa/Makefile
> +++ b/drivers/vdpa/Makefile
> @@ -7,4 +7,6 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc
>  endif
>  
> +DIRS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5
> +
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build
> index 2f047b5..e3ed54a 100644
> --- a/drivers/vdpa/meson.build
> +++ b/drivers/vdpa/meson.build
> @@ -1,7 +1,8 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright 2019 Mellanox Technologies, Ltd
>  
> -drivers = ['ifc']
> +drivers = ['ifc',
> +	   'mlx5',]
>  std_deps = ['bus_pci', 'kvargs']
>  std_deps += ['vhost']
>  config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> new file mode 100644
> index 0000000..c1c8cc0
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/Makefile
> @@ -0,0 +1,36 @@
> +#   SPDX-License-Identifier: BSD-3-Clause
> +#   Copyright 2019 Mellanox Technologies, Ltd
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# Library name.
> +LIB = librte_pmd_mlx5_vdpa.a
> +
> +# Sources.
> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
> +
> +# Basic CFLAGS.
> +CFLAGS += -O3
> +CFLAGS += -std=c11 -Wall -Wextra
> +CFLAGS += -g
> +CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
> +CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
> +CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
> +CFLAGS += -D_BSD_SOURCE
> +CFLAGS += -D_DEFAULT_SOURCE
> +CFLAGS += -D_XOPEN_SOURCE=600
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -Wno-strict-prototypes
> +LDLIBS += -lrte_common_mlx5
> +LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
> +
> +# A few warnings cannot be avoided in external headers.
> +CFLAGS += -Wno-error=cast-qual
> +
> +EXPORT_MAP := rte_pmd_mlx5_vdpa_version.map
> +# memseg walk is not part of stable API
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +
> +CFLAGS += -DNDEBUG -UPEDANTIC
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
> new file mode 100644
> index 0000000..4bca6ea
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/meson.build
> @@ -0,0 +1,29 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2019 Mellanox Technologies, Ltd
> +
> +if not is_linux
> +	build = false
> +	reason = 'only supported on Linux'
> +	subdir_done()
> +endif
> +
> +fmt_name = 'mlx5_vdpa'
> +allow_experimental_apis = true
> +deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
> +sources = files(
> +	'mlx5_vdpa.c',
> +)
> +cflags_options = [
> +	'-std=c11',
> +	'-Wno-strict-prototypes',
> +	'-D_BSD_SOURCE',
> +	'-D_DEFAULT_SOURCE',
> +	'-D_XOPEN_SOURCE=600'
> +]
> +foreach option:cflags_options
> +	if cc.has_argument(option)
> +		cflags += option
> +	endif
> +endforeach
> +
> +cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> new file mode 100644
> index 0000000..6286d7a
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -0,0 +1,227 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2019 Mellanox Technologies, Ltd
> + */
> +#include <rte_malloc.h>
> +#include <rte_log.h>
> +#include <rte_errno.h>
> +#include <rte_bus_pci.h>
> +#include <rte_vdpa.h>
> +
> +#include <mlx5_glue.h>
> +#include <mlx5_common.h>
> +
> +#include "mlx5_vdpa_utils.h"
> +
> +
> +struct mlx5_vdpa_priv {
> +	TAILQ_ENTRY(mlx5_vdpa_priv) next;
> +	int id; /* vDPA device id. */
> +	struct ibv_context *ctx; /* Device context. */
> +	struct rte_vdpa_dev_addr dev_addr;
> +};
> +
> +TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
> +					      TAILQ_HEAD_INITIALIZER(priv_list);
> +static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +int mlx5_vdpa_logtype;
> +
> +static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
> +	.get_queue_num = NULL,
> +	.get_features = NULL,
> +	.get_protocol_features = NULL,
> +	.dev_conf = NULL,
> +	.dev_close = NULL,
> +	.set_vring_state = NULL,
> +	.set_features = NULL,
> +	.migration_done = NULL,
> +	.get_vfio_group_fd = NULL,
> +	.get_vfio_device_fd = NULL,
> +	.get_notify_area = NULL,
> +};
> +
> +/**
> + * DPDK callback to register a PCI device.
> + *
> + * This function spawns vdpa device out of a given PCI device.
> + *
> + * @param[in] pci_drv
> + *   PCI driver structure (mlx5_vpda_driver).
> + * @param[in] pci_dev
> + *   PCI device information.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> +		    struct rte_pci_device *pci_dev __rte_unused)
> +{
> +	struct ibv_device **ibv_list;
> +	struct ibv_device *ibv_match = NULL;
> +	struct mlx5_vdpa_priv *priv = NULL;
> +	struct ibv_context *ctx = NULL;
> +	int ret;
> +
> +	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_VDPA) {
> +		DRV_LOG(DEBUG, "Skip probing - should be probed by other mlx5"
> +			" driver.");
> +		return 1;
Maybe the function doc should be implemented for return 1 case:
* @return
*   0 on success, a negative errno value otherwise and rte_errno is set.
> +	}
> +	errno = 0;
> +	ibv_list = mlx5_glue->get_device_list(&ret);
> +	if (!ibv_list) {
> +		rte_errno = errno;
> +		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
> +		return -ENOSYS;
Shouldn't return -rte_errno to be consistent with the rest of the
function?
For the sake of consistency, you could also goto error instead.
> +	}
> +	while (ret-- > 0) {
> +		struct rte_pci_addr pci_addr;
> +
> +		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
> +		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
> +			continue;
> +		if (pci_dev->addr.domain != pci_addr.domain ||
> +		    pci_dev->addr.bus != pci_addr.bus ||
> +		    pci_dev->addr.devid != pci_addr.devid ||
> +		    pci_dev->addr.function != pci_addr.function)
> +			continue;
> +		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
> +			ibv_list[ret]->name);
> +		ibv_match = ibv_list[ret];
> +		break;
> +	}
> +	mlx5_glue->free_device_list(ibv_list);
> +	if (!ibv_match) {
> +		DRV_LOG(ERR, "No matching IB device for PCI slot "
> +			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
> +			pci_dev->addr.domain, pci_dev->addr.bus,
> +			pci_dev->addr.devid, pci_dev->addr.function);
> +		rte_errno = ENOENT;
> +		return -rte_errno;
> +	}
> +	ctx = mlx5_glue->dv_open_device(ibv_match);
> +	if (!ctx) {
> +		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
> +			ibv_match->name);
> +		rte_errno = ENODEV;
> +		return -rte_errno;
> +	}
> +	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv),
> +			   RTE_CACHE_LINE_SIZE);
> +	if (!priv) {
> +		DRV_LOG(ERR, "Failed to allocate private memory.");
> +		rte_errno = ENOMEM;
> +		goto error;
> +	}
> +	priv->ctx = ctx;
> +	priv->dev_addr.pci_addr = pci_dev->addr;
> +	priv->dev_addr.type = PCI_ADDR;
> +	priv->id = rte_vdpa_register_device(&priv->dev_addr, &mlx5_vdpa_ops);
> +	if (priv->id < 0) {
> +		DRV_LOG(ERR, "Failed to register vDPA device.");
> +		rte_errno = rte_errno ? rte_errno : EINVAL;
> +		goto error;
> +	}
> +	pthread_mutex_lock(&priv_list_lock);
> +	TAILQ_INSERT_TAIL(&priv_list, priv, next);
> +	pthread_mutex_unlock(&priv_list_lock);
> +	return 0;
> +
> +error:
> +	if (priv)
> +		rte_free(priv);
> +	if (ctx)
> +		mlx5_glue->close_device(ctx);
> +	return -rte_errno;
> +}
> +
These are minor comments.
If directly fixed in v3, feel free to add my:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 02/13] vdpa/mlx5: support queues number operation
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 02/13] vdpa/mlx5: support queues number operation Matan Azrad
@ 2020-01-30 14:46     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 14:46 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:08 AM, Matan Azrad wrote:
> Support get_queue_num operation to get the maximum number of queues
> supported by the device.
> 
> This number comes from the DevX capabilities.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 53 insertions(+), 1 deletion(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 03/13] vdpa/mlx5: support features get operations
  2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 03/13] vdpa/mlx5: support features get operations Matan Azrad
@ 2020-01-30 14:50     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 14:50 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:08 AM, Matan Azrad wrote:
> Add support for get_features and get_protocol_features operations.
> 
> Part of the features are reported by the DevX capabilities.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  doc/guides/vdpadevs/features/mlx5.ini |  7 ++++
>  drivers/vdpa/mlx5/mlx5_vdpa.c         | 66 +++++++++++++++++++++++++++++++++--
>  2 files changed, 71 insertions(+), 2 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 04/13] vdpa/mlx5: prepare memory regions
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
@ 2020-01-30 17:39     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 17:39 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> In order to map the guest physical addresses used by the virtio device
> guest side to the host physical addresses used by the HW as the host
> side, memory regions are created.
> 
> By this way, for example, the HW can translate the addresses of the
> packets posted by the guest and to take the packets from the correct
> place.
> 
> The design is to work with single MR which will be configured to the
> virtio queues in the HW, hence a lot of direct MRs are grouped to single
> indirect MR.
> 
> Create functions to prepare and release MRs with all the related
> resources that are required for it.
> 
> Create a new file mlx5_vdpa_mem.c to manage all the MR related code
> in the driver.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/Makefile        |   4 +-
>  drivers/vdpa/mlx5/meson.build     |   3 +-
>  drivers/vdpa/mlx5/mlx5_vdpa.c     |  11 +-
>  drivers/vdpa/mlx5/mlx5_vdpa.h     |  60 +++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 346 ++++++++++++++++++++++++++++++++++++++
>  5 files changed, 413 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
> 
I don't have enough knowledge of Mellanox HW to find any issues in the
mem regions programming, but it looks correct from a C point of view:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
@ 2020-01-30 18:17     ` Maxime Coquelin
  2020-01-31  6:56       ` Matan Azrad
  0 siblings, 1 reply; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 18:17 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> As an arrangement to the vitrio queues creation, a 2 QPs and CQ may be
> created for the virtio queue.
> 
> The design is to trigger an event for the guest and for the vdpa driver
> when a new CQE is posted by the HW after the packet transition.
> 
> This patch add the basic operations to create and destroy the above HW
> objects  and to trigger the CQE events when a new CQE is posted.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/common/mlx5/mlx5_prm.h      |   4 +
>  drivers/vdpa/mlx5/Makefile          |   1 +
>  drivers/vdpa/mlx5/meson.build       |   1 +
>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  89 ++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_event.c | 399 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 494 insertions(+)
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
> 
> diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
> index b48cd0a..b533798 100644
> --- a/drivers/common/mlx5/mlx5_prm.h
> +++ b/drivers/common/mlx5/mlx5_prm.h
> @@ -392,6 +392,10 @@ struct mlx5_cqe {
>  /* CQE format value. */
>  #define MLX5_COMPRESSED 0x3
>  
> +/* CQ doorbell cmd types. */
> +#define MLX5_CQ_DBR_CMD_SOL_ONLY (1 << 24)
> +#define MLX5_CQ_DBR_CMD_ALL (0 << 24)
> +
>  /* Action type of header modification. */
>  enum {
>  	MLX5_MODIFICATION_TYPE_SET = 0x1,
> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> index 5472797..7f13756 100644
> --- a/drivers/vdpa/mlx5/Makefile
> +++ b/drivers/vdpa/mlx5/Makefile
> @@ -9,6 +9,7 @@ LIB = librte_pmd_mlx5_vdpa.a
>  # Sources.
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
>  
>  # Basic CFLAGS.
>  CFLAGS += -O3
> diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
> index 7e5dd95..c609f7c 100644
> --- a/drivers/vdpa/mlx5/meson.build
> +++ b/drivers/vdpa/mlx5/meson.build
> @@ -13,6 +13,7 @@ deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
>  sources = files(
>  	'mlx5_vdpa.c',
>  	'mlx5_vdpa_mem.c',
> +	'mlx5_vdpa_event.c',
>  )
>  cflags_options = [
>  	'-std=c11',
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
> index e27baea..30030b7 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> @@ -9,9 +9,40 @@
>  
>  #include <rte_vdpa.h>
>  #include <rte_vhost.h>
> +#include <rte_spinlock.h>
> +#include <rte_interrupts.h>
>  
>  #include <mlx5_glue.h>
>  #include <mlx5_devx_cmds.h>
> +#include <mlx5_prm.h>
> +
> +
> +#define MLX5_VDPA_INTR_RETRIES 256
> +#define MLX5_VDPA_INTR_RETRIES_USEC 1000
> +
> +struct mlx5_vdpa_cq {
> +	uint16_t log_desc_n;
> +	uint32_t cq_ci:24;
> +	uint32_t arm_sn:2;
> +	rte_spinlock_t sl;
> +	struct mlx5_devx_obj *cq;
> +	struct mlx5dv_devx_umem *umem_obj;
> +	union {
> +		volatile void *umem_buf;
> +		volatile struct mlx5_cqe *cqes;
> +	};
> +	volatile uint32_t *db_rec;
> +	uint64_t errors;
> +};
> +
> +struct mlx5_vdpa_event_qp {
> +	struct mlx5_vdpa_cq cq;
> +	struct mlx5_devx_obj *fw_qp;
> +	struct mlx5_devx_obj *sw_qp;
> +	struct mlx5dv_devx_umem *umem_obj;
> +	void *umem_buf;
> +	volatile uint32_t *db_rec;
> +};
>  
>  struct mlx5_vdpa_query_mr {
>  	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
> @@ -34,6 +65,10 @@ struct mlx5_vdpa_priv {
>  	uint32_t gpa_mkey_index;
>  	struct ibv_mr *null_mr;
>  	struct rte_vhost_memory *vmem;
> +	uint32_t eqn;
> +	struct mlx5dv_devx_event_channel *eventc;
> +	struct mlx5dv_devx_uar *uar;
> +	struct rte_intr_handle intr_handle;
>  	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
>  };
>  
> @@ -57,4 +92,58 @@ struct mlx5_vdpa_priv {
>   */
>  int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
>  
> +
> +/**
> + * Create an event QP and all its related resources.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + * @param[in] desc_n
> + *   Number of descriptors.
> + * @param[in] callfd
> + *   The guest notification file descriptor.
> + * @param[in/out] eqp
> + *   Pointer to the event QP structure.
> + *
> + * @return
> + *   0 on success, -1 otherwise and rte_errno is set.
> + */
> +int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
> +			      int callfd, struct mlx5_vdpa_event_qp *eqp);
> +
> +/**
> + * Destroy an event QP and all its related resources.
> + *
> + * @param[in/out] eqp
> + *   Pointer to the event QP structure.
> + */
> +void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
> +
> +/**
> + * Release all the event global resources.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + */
> +void mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv);
> +
> +/**
> + * Setup CQE event.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv);
> +
> +/**
> + * Unset CQE event .
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + */
> +void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
> +
>  #endif /* RTE_PMD_MLX5_VDPA_H_ */
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
> new file mode 100644
> index 0000000..35518ad
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
> @@ -0,0 +1,399 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2019 Mellanox Technologies, Ltd
> + */
> +#include <unistd.h>
> +#include <stdint.h>
> +#include <fcntl.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_errno.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_common.h>
> +#include <rte_io.h>
> +
> +#include <mlx5_common.h>
> +
> +#include "mlx5_vdpa_utils.h"
> +#include "mlx5_vdpa.h"
> +
> +
> +void
> +mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
> +{
> +	if (priv->uar) {
> +		mlx5_glue->devx_free_uar(priv->uar);
> +		priv->uar = NULL;
> +	}
> +	if (priv->eventc) {
> +		mlx5_glue->devx_destroy_event_channel(priv->eventc);
> +		priv->eventc = NULL;
> +	}
> +	priv->eqn = 0;
> +}
> +
> +/* Prepare all the global resources for all the event objects.*/
> +static int
> +mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
> +{
> +	uint32_t lcore;
> +
> +	if (priv->eventc)
> +		return 0;
> +	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
> +	if (mlx5_glue->devx_query_eqn(priv->ctx, lcore, &priv->eqn)) {
> +		rte_errno = errno;
> +		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
> +		return -1;
> +	}
> +	priv->eventc = mlx5_glue->devx_create_event_channel(priv->ctx,
> +			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
> +	if (!priv->eventc) {
> +		rte_errno = errno;
> +		DRV_LOG(ERR, "Failed to create event channel %d.",
> +			rte_errno);
> +		goto error;
> +	}
> +	priv->uar = mlx5_glue->devx_alloc_uar(priv->ctx, 0);
> +	if (!priv->uar) {
> +		rte_errno = errno;
> +		DRV_LOG(ERR, "Failed to allocate UAR.");
> +		goto error;
> +	}
> +	return 0;
> +error:
> +	mlx5_vdpa_event_qp_global_release(priv);
> +	return -1;
> +}
> +
> +static void
> +mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq)
> +{
> +	if (cq->cq)
> +		claim_zero(mlx5_devx_cmd_destroy(cq->cq));
> +	if (cq->umem_obj)
> +		claim_zero(mlx5_glue->devx_umem_dereg(cq->umem_obj));
> +	if (cq->umem_buf)
> +		rte_free((void *)(uintptr_t)cq->umem_buf);
> +	memset(cq, 0, sizeof(*cq));
> +}
> +
> +static inline void
> +mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
> +{
> +	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
> +	uint32_t arm_sn = cq->arm_sn << MLX5_CQ_SQN_OFFSET;
> +	uint32_t cq_ci = cq->cq_ci & MLX5_CI_MASK & cqe_mask;
> +	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
> +	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq->id;
> +	uint64_t db_be = rte_cpu_to_be_64(doorbell);
> +	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr, MLX5_CQ_DOORBELL);
> +
> +	rte_io_wmb();
> +	cq->db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
> +	rte_wmb();
> +#ifdef RTE_ARCH_64
> +	*(uint64_t *)addr = db_be;
> +#else
> +	*(uint32_t *)addr = db_be;
> +	rte_io_wmb();
> +	*((uint32_t *)addr + 1) = db_be >> 32;
> +#endif
> +	cq->arm_sn++;
> +}
> +
> +static int
> +mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
> +		    int callfd, struct mlx5_vdpa_cq *cq)
> +{
> +	struct mlx5_devx_cq_attr attr;
> +	size_t pgsize = sysconf(_SC_PAGESIZE);
> +	uint32_t umem_size;
> +	int ret;
> +	uint16_t event_nums[1] = {0};
> +
> +	cq->log_desc_n = log_desc_n;
> +	umem_size = sizeof(struct mlx5_cqe) * (1 << log_desc_n) +
> +							sizeof(*cq->db_rec) * 2;
> +	cq->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
> +	if (!cq->umem_buf) {
> +		DRV_LOG(ERR, "Failed to allocate memory for CQ.");
> +		rte_errno = ENOMEM;
> +		return -ENOMEM;
> +	}
> +	cq->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
> +						(void *)(uintptr_t)cq->umem_buf,
> +						umem_size,
> +						IBV_ACCESS_LOCAL_WRITE);
> +	if (!cq->umem_obj) {
> +		DRV_LOG(ERR, "Failed to register umem for CQ.");
> +		goto error;
> +	}
> +	attr.q_umem_valid = 1;
> +	attr.db_umem_valid = 1;
> +	attr.use_first_only = 0;
> +	attr.overrun_ignore = 0;
> +	attr.uar_page_id = priv->uar->page_id;
> +	attr.q_umem_id = cq->umem_obj->umem_id;
> +	attr.q_umem_offset = 0;
> +	attr.db_umem_id = cq->umem_obj->umem_id;
> +	attr.db_umem_offset = sizeof(struct mlx5_cqe) * (1 << log_desc_n);
> +	attr.eqn = priv->eqn;
> +	attr.log_cq_size = log_desc_n;
> +	attr.log_page_size = rte_log2_u32(pgsize);
> +	cq->cq = mlx5_devx_cmd_create_cq(priv->ctx, &attr);
> +	if (!cq->cq)
> +		goto error;
> +	cq->db_rec = RTE_PTR_ADD(cq->umem_buf, (uintptr_t)attr.db_umem_offset);
> +	cq->cq_ci = 0;
> +	rte_spinlock_init(&cq->sl);
> +	/* Subscribe CQ event to the event channel controlled by the driver. */
> +	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc, cq->cq->obj,
> +						   sizeof(event_nums),
> +						   event_nums,
> +						   (uint64_t)(uintptr_t)cq);
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to subscribe CQE event.");
> +		rte_errno = errno;
> +		goto error;
> +	}
> +	/* Subscribe CQ event to the guest FD only if it is not in poll mode. */
> +	if (callfd != -1) {
> +		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv->eventc,
> +							      callfd,
> +							      cq->cq->obj, 0);
> +		if (ret) {
> +			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
> +			rte_errno = errno;
> +			goto error;
> +		}
> +	}
> +	/* First arming. */
> +	mlx5_vdpa_cq_arm(priv, cq);
> +	return 0;
> +error:
> +	mlx5_vdpa_cq_destroy(cq);
> +	return -1;
> +}
> +
> +static inline void __rte_unused
> +mlx5_vdpa_cq_poll(struct mlx5_vdpa_priv *priv __rte_unused,
> +		  struct mlx5_vdpa_cq *cq)
> +{
> +	struct mlx5_vdpa_event_qp *eqp =
> +				container_of(cq, struct mlx5_vdpa_event_qp, cq);
> +	const unsigned int cqe_size = 1 << cq->log_desc_n;
> +	const unsigned int cqe_mask = cqe_size - 1;
> +	int ret;
> +
> +	do {
> +		volatile struct mlx5_cqe *cqe = cq->cqes + (cq->cq_ci &
> +							    cqe_mask);
> +
> +		ret = check_cqe(cqe, cqe_size, cq->cq_ci);
> +		switch (ret) {
> +		case MLX5_CQE_STATUS_ERR:
> +			cq->errors++;
> +			/*fall-through*/
> +		case MLX5_CQE_STATUS_SW_OWN:
> +			cq->cq_ci++;
> +			break;
> +		case MLX5_CQE_STATUS_HW_OWN:
> +		default:
> +			break;
> +		}
> +	} while (ret != MLX5_CQE_STATUS_HW_OWN);
Isn't there a risk of endless loop here?
> +	rte_io_wmb();
> +	/* Ring CQ doorbell record. */
> +	cq->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
> +	rte_io_wmb();
> +	/* Ring SW QP doorbell record. */
> +	eqp->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cqe_size);
> +}
> +
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
@ 2020-01-30 20:00     ` Maxime Coquelin
  2020-01-31  7:34       ` Matan Azrad
  0 siblings, 1 reply; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 20:00 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> The HW virtq object represents an emulated context for a VIRTIO_NET
> virtqueue which was created and managed by a VIRTIO_NET driver as
> defined in VIRTIO Specification.
> 
> Add support to prepare and release all the basic HW resources needed
> the user virtqs emulation according to the rte_vhost configurations.
> 
> This patch prepares the basic configurations needed by DevX commands to
> create a virtq.
> 
> Add new file mlx5_vdpa_virtq.c to manage virtq operations.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/Makefile          |   1 +
>  drivers/vdpa/mlx5/meson.build       |   1 +
>  drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  36 ++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 212 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 251 insertions(+)
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> 
> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> index 7f13756..353e262 100644
> --- a/drivers/vdpa/mlx5/Makefile
> +++ b/drivers/vdpa/mlx5/Makefile
> @@ -10,6 +10,7 @@ LIB = librte_pmd_mlx5_vdpa.a
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
>  
>  # Basic CFLAGS.
>  CFLAGS += -O3
> diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
> index c609f7c..e017f95 100644
> --- a/drivers/vdpa/mlx5/meson.build
> +++ b/drivers/vdpa/mlx5/meson.build
> @@ -14,6 +14,7 @@ sources = files(
>  	'mlx5_vdpa.c',
>  	'mlx5_vdpa_mem.c',
>  	'mlx5_vdpa_event.c',
> +	'mlx5_vdpa_virtq.c',
>  )
>  cflags_options = [
>  	'-std=c11',
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index c67f93d..4d30b35 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -229,6 +229,7 @@
>  		goto error;
>  	}
>  	SLIST_INIT(&priv->mr_list);
> +	SLIST_INIT(&priv->virtq_list);
>  	pthread_mutex_lock(&priv_list_lock);
>  	TAILQ_INSERT_TAIL(&priv_list, priv, next);
>  	pthread_mutex_unlock(&priv_list_lock);
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
> index 30030b7..a7e2185 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> @@ -53,6 +53,19 @@ struct mlx5_vdpa_query_mr {
>  	int is_indirect;
>  };
>  
> +struct mlx5_vdpa_virtq {
> +	SLIST_ENTRY(mlx5_vdpa_virtq) next;
> +	uint16_t index;
> +	uint16_t vq_size;
> +	struct mlx5_devx_obj *virtq;
> +	struct mlx5_vdpa_event_qp eqp;
> +	struct {
> +		struct mlx5dv_devx_umem *obj;
> +		void *buf;
> +		uint32_t size;
> +	} umems[3];
> +};
> +
>  struct mlx5_vdpa_priv {
>  	TAILQ_ENTRY(mlx5_vdpa_priv) next;
>  	int id; /* vDPA device id. */
> @@ -69,6 +82,10 @@ struct mlx5_vdpa_priv {
>  	struct mlx5dv_devx_event_channel *eventc;
>  	struct mlx5dv_devx_uar *uar;
>  	struct rte_intr_handle intr_handle;
> +	struct mlx5_devx_obj *td;
> +	struct mlx5_devx_obj *tis;
> +	uint16_t nr_virtqs;
> +	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
>  	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
>  };
>  
> @@ -146,4 +163,23 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
>   */
>  void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
>  
> +/**
> + * Release a virtq and all its related resources.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + */
> +void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
> +
> +/**
> + * Create all the HW virtqs resources and all their related resources.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
> +
>  #endif /* RTE_PMD_MLX5_VDPA_H_ */
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> new file mode 100644
> index 0000000..781bccf
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> @@ -0,0 +1,212 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2019 Mellanox Technologies, Ltd
> + */
> +#include <string.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_errno.h>
> +
> +#include <mlx5_common.h>
> +
> +#include "mlx5_vdpa_utils.h"
> +#include "mlx5_vdpa.h"
> +
> +
> +static int
> +mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
> +{
> +	int i;
> +
> +	if (virtq->virtq) {
> +		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
> +		virtq->virtq = NULL;
> +	}
> +	for (i = 0; i < 3; ++i) {
> +		if (virtq->umems[i].obj)
> +			claim_zero(mlx5_glue->devx_umem_dereg
> +							 (virtq->umems[i].obj));
> +		if (virtq->umems[i].buf)
> +			rte_free(virtq->umems[i].buf);
> +	}
> +	memset(&virtq->umems, 0, sizeof(virtq->umems));
> +	if (virtq->eqp.fw_qp)
> +		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
> +	return 0;
> +}
> +
> +void
> +mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
> +{
> +	struct mlx5_vdpa_virtq *entry;
> +	struct mlx5_vdpa_virtq *next;
> +
> +	entry = SLIST_FIRST(&priv->virtq_list);
> +	while (entry) {
> +		next = SLIST_NEXT(entry, next);
> +		mlx5_vdpa_virtq_unset(entry);
> +		SLIST_REMOVE(&priv->virtq_list, entry, mlx5_vdpa_virtq, next);
> +		rte_free(entry);
> +		entry = next;
> +	}
> +	SLIST_INIT(&priv->virtq_list);
> +	if (priv->tis) {
> +		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
> +		priv->tis = NULL;
> +	}
> +	if (priv->td) {
> +		claim_zero(mlx5_devx_cmd_destroy(priv->td));
> +		priv->td = NULL;
> +	}
> +}
> +
> +static uint64_t
> +mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
> +{
> +	struct rte_vhost_mem_region *reg;
> +	uint32_t i;
> +	uint64_t gpa = 0;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		reg = &mem->regions[i];
> +		if (hva >= reg->host_user_addr &&
> +		    hva < reg->host_user_addr + reg->size) {
> +			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
> +			break;
> +		}
> +	}
> +	return gpa;
> +}
I think you may need a third parameter for the size to map.
Otherwise, you would be vulnerable to CVE-2018-1059.
> +
> +static int
> +mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv,
> +		      struct mlx5_vdpa_virtq *virtq, int index)
> +{
> +	struct rte_vhost_vring vq;
> +	struct mlx5_devx_virtq_attr attr = {0};
> +	uint64_t gpa;
> +	int ret;
> +	int i;
> +	uint16_t last_avail_idx;
> +	uint16_t last_used_idx;
> +
> +	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
> +	if (ret)
> +		return -1;
> +	virtq->index = index;
> +	virtq->vq_size = vq.size;
> +	/*
> +	 * No need event QPs creation when the guest in poll mode or when the
> +	 * capability allows it.
> +	 */
> +	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
> +					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
> +						      MLX5_VIRTQ_EVENT_MODE_QP :
> +						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
> +	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
> +		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
> +						&virtq->eqp);
> +		if (ret) {
> +			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
> +				index);
> +			return -1;
> +		}
> +		attr.qp_id = virtq->eqp.fw_qp->id;
> +	} else {
> +		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
> +			" need event QPs and event mechanism.", index);
> +	}
> +	/* Setup 3 UMEMs for each virtq. */
> +	for (i = 0; i < 3; ++i) {
> +		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
> +							  priv->caps.umems[i].b;
> +		virtq->umems[i].buf = rte_zmalloc(__func__,
> +						  virtq->umems[i].size, 4096);
> +		if (!virtq->umems[i].buf) {
> +			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
> +				" %u.", i, index);
> +			goto error;
> +		}
> +		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->ctx,
> +							virtq->umems[i].buf,
> +							virtq->umems[i].size,
> +							IBV_ACCESS_LOCAL_WRITE);
> +		if (!virtq->umems[i].obj) {
> +			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
> +				i, index);
> +			goto error;
> +		}
> +		attr.umems[i].id = virtq->umems[i].obj->umem_id;
> +		attr.umems[i].offset = 0;
> +		attr.umems[i].size = virtq->umems[i].size;
> +	}
> +	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
> +	if (!gpa) {
> +		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> +		goto error;
> +	}
> +	attr.desc_addr = gpa;
> +	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
> +	if (!gpa) {
> +		DRV_LOG(ERR, "Fail to get GPA for used ring.");
> +		goto error;
> +	}
> +	attr.used_addr = gpa;
> +	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
> +	if (!gpa) {
> +		DRV_LOG(ERR, "Fail to get GPA for available ring.");
> +		goto error;
> +	}
> +	attr.available_addr = gpa;
> +	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
> +				 &last_used_idx);
> +	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
> +		"virtq %d.", priv->vid, last_avail_idx, last_used_idx, index);
> +	attr.hw_available_index = last_avail_idx;
> +	attr.hw_used_index = last_used_idx;
> +	attr.q_size = vq.size;
> +	attr.mkey = priv->gpa_mkey_index;
> +	attr.tis_id = priv->tis->id;
> +	attr.queue_index = index;
> +	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
> +	if (!virtq->virtq)
> +		goto error;
> +	return 0;
> +error:
> +	mlx5_vdpa_virtq_unset(virtq);
> +	return -1;
> +}
> +
> +int
> +mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
> +{
> +	struct mlx5_devx_tis_attr tis_attr = {0};
> +	struct mlx5_vdpa_virtq *virtq;
> +	uint32_t i;
> +	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
> +
> +	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
> +	if (!priv->td) {
> +		DRV_LOG(ERR, "Failed to create transport domain.");
> +		return -rte_errno;
> +	}
> +	tis_attr.transport_domain = priv->td->id;
> +	priv->tis = mlx5_devx_cmd_create_tis(priv->ctx, &tis_attr);
> +	if (!priv->tis) {
> +		DRV_LOG(ERR, "Failed to create TIS.");
> +		goto error;
> +	}
> +	for (i = 0; i < nr_vring; i++) {
> +		virtq = rte_zmalloc(__func__, sizeof(*virtq), 0);
> +		if (!virtq || mlx5_vdpa_virtq_setup(priv, virtq, i)) {
> +			if (virtq)
> +				rte_free(virtq);
> +			goto error;
> +		}
> +		SLIST_INSERT_HEAD(&priv->virtq_list, virtq, next);
> +	}
> +	priv->nr_virtqs = nr_vring;
> +	return 0;
> +error:
> +	mlx5_vdpa_virtqs_release(priv);
> +	return -1;
> +}
> 
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 07/13] vdpa/mlx5: support stateless offloads
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
@ 2020-01-30 20:08     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-30 20:08 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> Add support for the next features in virtq configuration:
> 	VIRTIO_F_RING_PACKED,
> 	VIRTIO_NET_F_HOST_TSO4,
> 	VIRTIO_NET_F_HOST_TSO6,
> 	VIRTIO_NET_F_CSUM,
> 	VIRTIO_NET_F_GUEST_CSUM,
> 	VIRTIO_F_VERSION_1,
> 
> These features support depends in the DevX capabilities reported by the
> device.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  doc/guides/vdpadevs/features/mlx5.ini |   7 ++-
>  drivers/vdpa/mlx5/mlx5_vdpa.c         |  10 ----
>  drivers/vdpa/mlx5/mlx5_vdpa.h         |  10 ++++
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 108 ++++++++++++++++++++++++++++------
>  4 files changed, 107 insertions(+), 28 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues
  2020-01-30 18:17     ` Maxime Coquelin
@ 2020-01-31  6:56       ` Matan Azrad
  2020-01-31 14:47         ` Maxime Coquelin
  0 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-31  6:56 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Slava Ovsiienko
Hi
From: Maxime Coquelin
> On 1/29/20 11:09 AM, Matan Azrad wrote:
> > As an arrangement to the vitrio queues creation, a 2 QPs and CQ may be
> > created for the virtio queue.
> >
> > The design is to trigger an event for the guest and for the vdpa
> > driver when a new CQE is posted by the HW after the packet transition.
> >
> > This patch add the basic operations to create and destroy the above HW
> > objects  and to trigger the CQE events when a new CQE is posted.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/common/mlx5/mlx5_prm.h      |   4 +
> >  drivers/vdpa/mlx5/Makefile          |   1 +
> >  drivers/vdpa/mlx5/meson.build       |   1 +
> >  drivers/vdpa/mlx5/mlx5_vdpa.h       |  89 ++++++++
> >  drivers/vdpa/mlx5/mlx5_vdpa_event.c | 399
> > ++++++++++++++++++++++++++++++++++++
> >  5 files changed, 494 insertions(+)
> >  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
> >
> > diff --git a/drivers/common/mlx5/mlx5_prm.h
> > b/drivers/common/mlx5/mlx5_prm.h index b48cd0a..b533798 100644
> > --- a/drivers/common/mlx5/mlx5_prm.h
> > +++ b/drivers/common/mlx5/mlx5_prm.h
> > @@ -392,6 +392,10 @@ struct mlx5_cqe {
> >  /* CQE format value. */
> >  #define MLX5_COMPRESSED 0x3
> >
> > +/* CQ doorbell cmd types. */
> > +#define MLX5_CQ_DBR_CMD_SOL_ONLY (1 << 24) #define
> > +MLX5_CQ_DBR_CMD_ALL (0 << 24)
> > +
> >  /* Action type of header modification. */  enum {
> >  	MLX5_MODIFICATION_TYPE_SET = 0x1,
> > diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> > index 5472797..7f13756 100644
> > --- a/drivers/vdpa/mlx5/Makefile
> > +++ b/drivers/vdpa/mlx5/Makefile
> > @@ -9,6 +9,7 @@ LIB = librte_pmd_mlx5_vdpa.a  # Sources.
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
> >
> >  # Basic CFLAGS.
> >  CFLAGS += -O3
> > diff --git a/drivers/vdpa/mlx5/meson.build
> > b/drivers/vdpa/mlx5/meson.build index 7e5dd95..c609f7c 100644
> > --- a/drivers/vdpa/mlx5/meson.build
> > +++ b/drivers/vdpa/mlx5/meson.build
> > @@ -13,6 +13,7 @@ deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci',
> > 'eal', 'sched']  sources = files(
> >  	'mlx5_vdpa.c',
> >  	'mlx5_vdpa_mem.c',
> > +	'mlx5_vdpa_event.c',
> >  )
> >  cflags_options = [
> >  	'-std=c11',
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h
> > b/drivers/vdpa/mlx5/mlx5_vdpa.h index e27baea..30030b7 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> > @@ -9,9 +9,40 @@
> >
> >  #include <rte_vdpa.h>
> >  #include <rte_vhost.h>
> > +#include <rte_spinlock.h>
> > +#include <rte_interrupts.h>
> >
> >  #include <mlx5_glue.h>
> >  #include <mlx5_devx_cmds.h>
> > +#include <mlx5_prm.h>
> > +
> > +
> > +#define MLX5_VDPA_INTR_RETRIES 256
> > +#define MLX5_VDPA_INTR_RETRIES_USEC 1000
> > +
> > +struct mlx5_vdpa_cq {
> > +	uint16_t log_desc_n;
> > +	uint32_t cq_ci:24;
> > +	uint32_t arm_sn:2;
> > +	rte_spinlock_t sl;
> > +	struct mlx5_devx_obj *cq;
> > +	struct mlx5dv_devx_umem *umem_obj;
> > +	union {
> > +		volatile void *umem_buf;
> > +		volatile struct mlx5_cqe *cqes;
> > +	};
> > +	volatile uint32_t *db_rec;
> > +	uint64_t errors;
> > +};
> > +
> > +struct mlx5_vdpa_event_qp {
> > +	struct mlx5_vdpa_cq cq;
> > +	struct mlx5_devx_obj *fw_qp;
> > +	struct mlx5_devx_obj *sw_qp;
> > +	struct mlx5dv_devx_umem *umem_obj;
> > +	void *umem_buf;
> > +	volatile uint32_t *db_rec;
> > +};
> >
> >  struct mlx5_vdpa_query_mr {
> >  	SLIST_ENTRY(mlx5_vdpa_query_mr) next; @@ -34,6 +65,10 @@
> struct
> > mlx5_vdpa_priv {
> >  	uint32_t gpa_mkey_index;
> >  	struct ibv_mr *null_mr;
> >  	struct rte_vhost_memory *vmem;
> > +	uint32_t eqn;
> > +	struct mlx5dv_devx_event_channel *eventc;
> > +	struct mlx5dv_devx_uar *uar;
> > +	struct rte_intr_handle intr_handle;
> >  	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;  };
> >
> > @@ -57,4 +92,58 @@ struct mlx5_vdpa_priv {
> >   */
> >  int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
> >
> > +
> > +/**
> > + * Create an event QP and all its related resources.
> > + *
> > + * @param[in] priv
> > + *   The vdpa driver private structure.
> > + * @param[in] desc_n
> > + *   Number of descriptors.
> > + * @param[in] callfd
> > + *   The guest notification file descriptor.
> > + * @param[in/out] eqp
> > + *   Pointer to the event QP structure.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise and rte_errno is set.
> > + */
> > +int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t
> desc_n,
> > +			      int callfd, struct mlx5_vdpa_event_qp *eqp);
> > +
> > +/**
> > + * Destroy an event QP and all its related resources.
> > + *
> > + * @param[in/out] eqp
> > + *   Pointer to the event QP structure.
> > + */
> > +void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
> > +
> > +/**
> > + * Release all the event global resources.
> > + *
> > + * @param[in] priv
> > + *   The vdpa driver private structure.
> > + */
> > +void mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv);
> > +
> > +/**
> > + * Setup CQE event.
> > + *
> > + * @param[in] priv
> > + *   The vdpa driver private structure.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +int mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv);
> > +
> > +/**
> > + * Unset CQE event .
> > + *
> > + * @param[in] priv
> > + *   The vdpa driver private structure.
> > + */
> > +void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
> > +
> >  #endif /* RTE_PMD_MLX5_VDPA_H_ */
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
> > b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
> > new file mode 100644
> > index 0000000..35518ad
> > --- /dev/null
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
> > @@ -0,0 +1,399 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2019 Mellanox Technologies, Ltd  */ #include <unistd.h>
> > +#include <stdint.h> #include <fcntl.h>
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_errno.h>
> > +#include <rte_lcore.h>
> > +#include <rte_atomic.h>
> > +#include <rte_common.h>
> > +#include <rte_io.h>
> > +
> > +#include <mlx5_common.h>
> > +
> > +#include "mlx5_vdpa_utils.h"
> > +#include "mlx5_vdpa.h"
> > +
> > +
> > +void
> > +mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv) {
> > +	if (priv->uar) {
> > +		mlx5_glue->devx_free_uar(priv->uar);
> > +		priv->uar = NULL;
> > +	}
> > +	if (priv->eventc) {
> > +		mlx5_glue->devx_destroy_event_channel(priv->eventc);
> > +		priv->eventc = NULL;
> > +	}
> > +	priv->eqn = 0;
> > +}
> > +
> > +/* Prepare all the global resources for all the event objects.*/
> > +static int mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv
> > +*priv) {
> > +	uint32_t lcore;
> > +
> > +	if (priv->eventc)
> > +		return 0;
> > +	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
> > +	if (mlx5_glue->devx_query_eqn(priv->ctx, lcore, &priv->eqn)) {
> > +		rte_errno = errno;
> > +		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
> > +		return -1;
> > +	}
> > +	priv->eventc = mlx5_glue->devx_create_event_channel(priv->ctx,
> > +
> MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
> > +	if (!priv->eventc) {
> > +		rte_errno = errno;
> > +		DRV_LOG(ERR, "Failed to create event channel %d.",
> > +			rte_errno);
> > +		goto error;
> > +	}
> > +	priv->uar = mlx5_glue->devx_alloc_uar(priv->ctx, 0);
> > +	if (!priv->uar) {
> > +		rte_errno = errno;
> > +		DRV_LOG(ERR, "Failed to allocate UAR.");
> > +		goto error;
> > +	}
> > +	return 0;
> > +error:
> > +	mlx5_vdpa_event_qp_global_release(priv);
> > +	return -1;
> > +}
> > +
> > +static void
> > +mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq) {
> > +	if (cq->cq)
> > +		claim_zero(mlx5_devx_cmd_destroy(cq->cq));
> > +	if (cq->umem_obj)
> > +		claim_zero(mlx5_glue->devx_umem_dereg(cq-
> >umem_obj));
> > +	if (cq->umem_buf)
> > +		rte_free((void *)(uintptr_t)cq->umem_buf);
> > +	memset(cq, 0, sizeof(*cq));
> > +}
> > +
> > +static inline void
> > +mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq
> > +*cq) {
> > +	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
> > +	uint32_t arm_sn = cq->arm_sn << MLX5_CQ_SQN_OFFSET;
> > +	uint32_t cq_ci = cq->cq_ci & MLX5_CI_MASK & cqe_mask;
> > +	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
> > +	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq->id;
> > +	uint64_t db_be = rte_cpu_to_be_64(doorbell);
> > +	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr,
> > +MLX5_CQ_DOORBELL);
> > +
> > +	rte_io_wmb();
> > +	cq->db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
> > +	rte_wmb();
> > +#ifdef RTE_ARCH_64
> > +	*(uint64_t *)addr = db_be;
> > +#else
> > +	*(uint32_t *)addr = db_be;
> > +	rte_io_wmb();
> > +	*((uint32_t *)addr + 1) = db_be >> 32; #endif
> > +	cq->arm_sn++;
> > +}
> > +
> > +static int
> > +mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
> > +		    int callfd, struct mlx5_vdpa_cq *cq) {
> > +	struct mlx5_devx_cq_attr attr;
> > +	size_t pgsize = sysconf(_SC_PAGESIZE);
> > +	uint32_t umem_size;
> > +	int ret;
> > +	uint16_t event_nums[1] = {0};
> > +
> > +	cq->log_desc_n = log_desc_n;
> > +	umem_size = sizeof(struct mlx5_cqe) * (1 << log_desc_n) +
> > +							sizeof(*cq->db_rec)
> * 2;
> > +	cq->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
> > +	if (!cq->umem_buf) {
> > +		DRV_LOG(ERR, "Failed to allocate memory for CQ.");
> > +		rte_errno = ENOMEM;
> > +		return -ENOMEM;
> > +	}
> > +	cq->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
> > +						(void *)(uintptr_t)cq-
> >umem_buf,
> > +						umem_size,
> > +						IBV_ACCESS_LOCAL_WRITE);
> > +	if (!cq->umem_obj) {
> > +		DRV_LOG(ERR, "Failed to register umem for CQ.");
> > +		goto error;
> > +	}
> > +	attr.q_umem_valid = 1;
> > +	attr.db_umem_valid = 1;
> > +	attr.use_first_only = 0;
> > +	attr.overrun_ignore = 0;
> > +	attr.uar_page_id = priv->uar->page_id;
> > +	attr.q_umem_id = cq->umem_obj->umem_id;
> > +	attr.q_umem_offset = 0;
> > +	attr.db_umem_id = cq->umem_obj->umem_id;
> > +	attr.db_umem_offset = sizeof(struct mlx5_cqe) * (1 << log_desc_n);
> > +	attr.eqn = priv->eqn;
> > +	attr.log_cq_size = log_desc_n;
> > +	attr.log_page_size = rte_log2_u32(pgsize);
> > +	cq->cq = mlx5_devx_cmd_create_cq(priv->ctx, &attr);
> > +	if (!cq->cq)
> > +		goto error;
> > +	cq->db_rec = RTE_PTR_ADD(cq->umem_buf,
> (uintptr_t)attr.db_umem_offset);
> > +	cq->cq_ci = 0;
> > +	rte_spinlock_init(&cq->sl);
> > +	/* Subscribe CQ event to the event channel controlled by the driver.
> */
> > +	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc, cq->cq-
> >obj,
> > +						   sizeof(event_nums),
> > +						   event_nums,
> > +						   (uint64_t)(uintptr_t)cq);
> > +	if (ret) {
> > +		DRV_LOG(ERR, "Failed to subscribe CQE event.");
> > +		rte_errno = errno;
> > +		goto error;
> > +	}
> > +	/* Subscribe CQ event to the guest FD only if it is not in poll mode. */
> > +	if (callfd != -1) {
> > +		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv-
> >eventc,
> > +							      callfd,
> > +							      cq->cq->obj, 0);
> > +		if (ret) {
> > +			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
> > +			rte_errno = errno;
> > +			goto error;
> > +		}
> > +	}
> > +	/* First arming. */
> > +	mlx5_vdpa_cq_arm(priv, cq);
> > +	return 0;
> > +error:
> > +	mlx5_vdpa_cq_destroy(cq);
> > +	return -1;
> > +}
> > +
> > +static inline void __rte_unused
> > +mlx5_vdpa_cq_poll(struct mlx5_vdpa_priv *priv __rte_unused,
> > +		  struct mlx5_vdpa_cq *cq)
> > +{
> > +	struct mlx5_vdpa_event_qp *eqp =
> > +				container_of(cq, struct
> mlx5_vdpa_event_qp, cq);
> > +	const unsigned int cqe_size = 1 << cq->log_desc_n;
> > +	const unsigned int cqe_mask = cqe_size - 1;
> > +	int ret;
> > +
> > +	do {
> > +		volatile struct mlx5_cqe *cqe = cq->cqes + (cq->cq_ci &
> > +							    cqe_mask);
> > +
> > +		ret = check_cqe(cqe, cqe_size, cq->cq_ci);
> > +		switch (ret) {
> > +		case MLX5_CQE_STATUS_ERR:
> > +			cq->errors++;
> > +			/*fall-through*/
> > +		case MLX5_CQE_STATUS_SW_OWN:
> > +			cq->cq_ci++;
> > +			break;
> > +		case MLX5_CQE_STATUS_HW_OWN:
> > +		default:
> > +			break;
> > +		}
> > +	} while (ret != MLX5_CQE_STATUS_HW_OWN);
> 
> Isn't there a risk of endless loop here?
No. maximum iterations number is the CQ size , since HW cannot write more CQEs before the doorbell record is updated.
> 
> > +	rte_io_wmb();
> > +	/* Ring CQ doorbell record. */
> > +	cq->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
> > +	rte_io_wmb();
> > +	/* Ring SW QP doorbell record. */
> > +	eqp->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cqe_size); }
> > +
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues
  2020-01-30 20:00     ` Maxime Coquelin
@ 2020-01-31  7:34       ` Matan Azrad
  2020-01-31 14:46         ` Maxime Coquelin
  0 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-01-31  7:34 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Slava Ovsiienko
From: Maxime Coquelin
> On 1/29/20 11:09 AM, Matan Azrad wrote:
> > The HW virtq object represents an emulated context for a VIRTIO_NET
> > virtqueue which was created and managed by a VIRTIO_NET driver as
> > defined in VIRTIO Specification.
> >
> > Add support to prepare and release all the basic HW resources needed
> > the user virtqs emulation according to the rte_vhost configurations.
> >
> > This patch prepares the basic configurations needed by DevX commands
> > to create a virtq.
> >
> > Add new file mlx5_vdpa_virtq.c to manage virtq operations.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/vdpa/mlx5/Makefile          |   1 +
> >  drivers/vdpa/mlx5/meson.build       |   1 +
> >  drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
> >  drivers/vdpa/mlx5/mlx5_vdpa.h       |  36 ++++++
> >  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 212
> > ++++++++++++++++++++++++++++++++++++
> >  5 files changed, 251 insertions(+)
> >  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> >
> > diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> > index 7f13756..353e262 100644
> > --- a/drivers/vdpa/mlx5/Makefile
> > +++ b/drivers/vdpa/mlx5/Makefile
> > @@ -10,6 +10,7 @@ LIB = librte_pmd_mlx5_vdpa.a
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
> >
> >  # Basic CFLAGS.
> >  CFLAGS += -O3
> > diff --git a/drivers/vdpa/mlx5/meson.build
> > b/drivers/vdpa/mlx5/meson.build index c609f7c..e017f95 100644
> > --- a/drivers/vdpa/mlx5/meson.build
> > +++ b/drivers/vdpa/mlx5/meson.build
> > @@ -14,6 +14,7 @@ sources = files(
> >  	'mlx5_vdpa.c',
> >  	'mlx5_vdpa_mem.c',
> >  	'mlx5_vdpa_event.c',
> > +	'mlx5_vdpa_virtq.c',
> >  )
> >  cflags_options = [
> >  	'-std=c11',
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > b/drivers/vdpa/mlx5/mlx5_vdpa.c index c67f93d..4d30b35 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > @@ -229,6 +229,7 @@
> >  		goto error;
> >  	}
> >  	SLIST_INIT(&priv->mr_list);
> > +	SLIST_INIT(&priv->virtq_list);
> >  	pthread_mutex_lock(&priv_list_lock);
> >  	TAILQ_INSERT_TAIL(&priv_list, priv, next);
> >  	pthread_mutex_unlock(&priv_list_lock);
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h
> > b/drivers/vdpa/mlx5/mlx5_vdpa.h index 30030b7..a7e2185 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> > @@ -53,6 +53,19 @@ struct mlx5_vdpa_query_mr {
> >  	int is_indirect;
> >  };
> >
> > +struct mlx5_vdpa_virtq {
> > +	SLIST_ENTRY(mlx5_vdpa_virtq) next;
> > +	uint16_t index;
> > +	uint16_t vq_size;
> > +	struct mlx5_devx_obj *virtq;
> > +	struct mlx5_vdpa_event_qp eqp;
> > +	struct {
> > +		struct mlx5dv_devx_umem *obj;
> > +		void *buf;
> > +		uint32_t size;
> > +	} umems[3];
> > +};
> > +
> >  struct mlx5_vdpa_priv {
> >  	TAILQ_ENTRY(mlx5_vdpa_priv) next;
> >  	int id; /* vDPA device id. */
> > @@ -69,6 +82,10 @@ struct mlx5_vdpa_priv {
> >  	struct mlx5dv_devx_event_channel *eventc;
> >  	struct mlx5dv_devx_uar *uar;
> >  	struct rte_intr_handle intr_handle;
> > +	struct mlx5_devx_obj *td;
> > +	struct mlx5_devx_obj *tis;
> > +	uint16_t nr_virtqs;
> > +	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
> >  	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;  };
> >
> > @@ -146,4 +163,23 @@ int mlx5_vdpa_event_qp_create(struct
> mlx5_vdpa_priv *priv, uint16_t desc_n,
> >   */
> >  void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
> >
> > +/**
> > + * Release a virtq and all its related resources.
> > + *
> > + * @param[in] priv
> > + *   The vdpa driver private structure.
> > + */
> > +void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
> > +
> > +/**
> > + * Create all the HW virtqs resources and all their related resources.
> > + *
> > + * @param[in] priv
> > + *   The vdpa driver private structure.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
> > +
> >  #endif /* RTE_PMD_MLX5_VDPA_H_ */
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> > b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> > new file mode 100644
> > index 0000000..781bccf
> > --- /dev/null
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> > @@ -0,0 +1,212 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2019 Mellanox Technologies, Ltd  */ #include <string.h>
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_errno.h>
> > +
> > +#include <mlx5_common.h>
> > +
> > +#include "mlx5_vdpa_utils.h"
> > +#include "mlx5_vdpa.h"
> > +
> > +
> > +static int
> > +mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq) {
> > +	int i;
> > +
> > +	if (virtq->virtq) {
> > +		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
> > +		virtq->virtq = NULL;
> > +	}
> > +	for (i = 0; i < 3; ++i) {
> > +		if (virtq->umems[i].obj)
> > +			claim_zero(mlx5_glue->devx_umem_dereg
> > +							 (virtq-
> >umems[i].obj));
> > +		if (virtq->umems[i].buf)
> > +			rte_free(virtq->umems[i].buf);
> > +	}
> > +	memset(&virtq->umems, 0, sizeof(virtq->umems));
> > +	if (virtq->eqp.fw_qp)
> > +		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
> > +	return 0;
> > +}
> > +
> > +void
> > +mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv) {
> > +	struct mlx5_vdpa_virtq *entry;
> > +	struct mlx5_vdpa_virtq *next;
> > +
> > +	entry = SLIST_FIRST(&priv->virtq_list);
> > +	while (entry) {
> > +		next = SLIST_NEXT(entry, next);
> > +		mlx5_vdpa_virtq_unset(entry);
> > +		SLIST_REMOVE(&priv->virtq_list, entry, mlx5_vdpa_virtq,
> next);
> > +		rte_free(entry);
> > +		entry = next;
> > +	}
> > +	SLIST_INIT(&priv->virtq_list);
> > +	if (priv->tis) {
> > +		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
> > +		priv->tis = NULL;
> > +	}
> > +	if (priv->td) {
> > +		claim_zero(mlx5_devx_cmd_destroy(priv->td));
> > +		priv->td = NULL;
> > +	}
> > +}
> > +
> > +static uint64_t
> > +mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva) {
> > +	struct rte_vhost_mem_region *reg;
> > +	uint32_t i;
> > +	uint64_t gpa = 0;
> > +
> > +	for (i = 0; i < mem->nregions; i++) {
> > +		reg = &mem->regions[i];
> > +		if (hva >= reg->host_user_addr &&
> > +		    hva < reg->host_user_addr + reg->size) {
> > +			gpa = hva - reg->host_user_addr + reg-
> >guest_phys_addr;
> > +			break;
> > +		}
> > +	}
> > +	return gpa;
> > +}
> 
> I think you may need a third parameter for the size to map.
> Otherwise, you would be vulnerable to CVE-2018-1059.
Yes, I just read it and understood that the virtio descriptor queues\packets data may be non continues in the guest physical memory and even maybe undefined here in the rte_vhost library, Is it?
Don't you think that the rte_vhost should validate it? at least, that all the queues memory are mapped?
Can you extend more why it may happen? QEMU bug?
In any case,
From Mellanox perspective, at least for the packet data, it is OK since if the guest will try to access physical address which is not mapped the packet will be ignored by the HW.
 
> > +
> > +static int
> > +mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv,
> > +		      struct mlx5_vdpa_virtq *virtq, int index) {
> > +	struct rte_vhost_vring vq;
> > +	struct mlx5_devx_virtq_attr attr = {0};
> > +	uint64_t gpa;
> > +	int ret;
> > +	int i;
> > +	uint16_t last_avail_idx;
> > +	uint16_t last_used_idx;
> > +
> > +	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
> > +	if (ret)
> > +		return -1;
> > +	virtq->index = index;
> > +	virtq->vq_size = vq.size;
> > +	/*
> > +	 * No need event QPs creation when the guest in poll mode or when
> the
> > +	 * capability allows it.
> > +	 */
> > +	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1
> <<
> > +
> MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
> > +
> MLX5_VIRTQ_EVENT_MODE_QP :
> > +
> MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
> > +	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
> > +		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
> > +						&virtq->eqp);
> > +		if (ret) {
> > +			DRV_LOG(ERR, "Failed to create event QPs for virtq
> %d.",
> > +				index);
> > +			return -1;
> > +		}
> > +		attr.qp_id = virtq->eqp.fw_qp->id;
> > +	} else {
> > +		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode,
> no"
> > +			" need event QPs and event mechanism.", index);
> > +	}
> > +	/* Setup 3 UMEMs for each virtq. */
> > +	for (i = 0; i < 3; ++i) {
> > +		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
> > +							  priv-
> >caps.umems[i].b;
> > +		virtq->umems[i].buf = rte_zmalloc(__func__,
> > +						  virtq->umems[i].size, 4096);
> > +		if (!virtq->umems[i].buf) {
> > +			DRV_LOG(ERR, "Cannot allocate umem %d memory
> for virtq"
> > +				" %u.", i, index);
> > +			goto error;
> > +		}
> > +		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv-
> >ctx,
> > +							virtq->umems[i].buf,
> > +							virtq->umems[i].size,
> > +
> 	IBV_ACCESS_LOCAL_WRITE);
> > +		if (!virtq->umems[i].obj) {
> > +			DRV_LOG(ERR, "Failed to register umem %d for virtq
> %u.",
> > +				i, index);
> > +			goto error;
> > +		}
> > +		attr.umems[i].id = virtq->umems[i].obj->umem_id;
> > +		attr.umems[i].offset = 0;
> > +		attr.umems[i].size = virtq->umems[i].size;
> > +	}
> > +	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
> (uint64_t)(uintptr_t)vq.desc);
> > +	if (!gpa) {
> > +		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> > +		goto error;
> > +	}
> > +	attr.desc_addr = gpa;
> > +	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
> (uint64_t)(uintptr_t)vq.used);
> > +	if (!gpa) {
> > +		DRV_LOG(ERR, "Fail to get GPA for used ring.");
> > +		goto error;
> > +	}
> > +	attr.used_addr = gpa;
> > +	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
> (uint64_t)(uintptr_t)vq.avail);
> > +	if (!gpa) {
> > +		DRV_LOG(ERR, "Fail to get GPA for available ring.");
> > +		goto error;
> > +	}
> > +	attr.available_addr = gpa;
> > +	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
> > +				 &last_used_idx);
> > +	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d
> for "
> > +		"virtq %d.", priv->vid, last_avail_idx, last_used_idx, index);
> > +	attr.hw_available_index = last_avail_idx;
> > +	attr.hw_used_index = last_used_idx;
> > +	attr.q_size = vq.size;
> > +	attr.mkey = priv->gpa_mkey_index;
> > +	attr.tis_id = priv->tis->id;
> > +	attr.queue_index = index;
> > +	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
> > +	if (!virtq->virtq)
> > +		goto error;
> > +	return 0;
> > +error:
> > +	mlx5_vdpa_virtq_unset(virtq);
> > +	return -1;
> > +}
> > +
> > +int
> > +mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv) {
> > +	struct mlx5_devx_tis_attr tis_attr = {0};
> > +	struct mlx5_vdpa_virtq *virtq;
> > +	uint32_t i;
> > +	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
> > +
> > +	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
> > +	if (!priv->td) {
> > +		DRV_LOG(ERR, "Failed to create transport domain.");
> > +		return -rte_errno;
> > +	}
> > +	tis_attr.transport_domain = priv->td->id;
> > +	priv->tis = mlx5_devx_cmd_create_tis(priv->ctx, &tis_attr);
> > +	if (!priv->tis) {
> > +		DRV_LOG(ERR, "Failed to create TIS.");
> > +		goto error;
> > +	}
> > +	for (i = 0; i < nr_vring; i++) {
> > +		virtq = rte_zmalloc(__func__, sizeof(*virtq), 0);
> > +		if (!virtq || mlx5_vdpa_virtq_setup(priv, virtq, i)) {
> > +			if (virtq)
> > +				rte_free(virtq);
> > +			goto error;
> > +		}
> > +		SLIST_INSERT_HEAD(&priv->virtq_list, virtq, next);
> > +	}
> > +	priv->nr_virtqs = nr_vring;
> > +	return 0;
> > +error:
> > +	mlx5_vdpa_virtqs_release(priv);
> > +	return -1;
> > +}
> >
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues
  2020-01-31  7:34       ` Matan Azrad
@ 2020-01-31 14:46         ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 14:46 UTC (permalink / raw)
  To: Matan Azrad, dev, Slava Ovsiienko
On 1/31/20 8:34 AM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin
>> On 1/29/20 11:09 AM, Matan Azrad wrote:
>>> The HW virtq object represents an emulated context for a VIRTIO_NET
>>> virtqueue which was created and managed by a VIRTIO_NET driver as
>>> defined in VIRTIO Specification.
>>>
>>> Add support to prepare and release all the basic HW resources needed
>>> the user virtqs emulation according to the rte_vhost configurations.
>>>
>>> This patch prepares the basic configurations needed by DevX commands
>>> to create a virtq.
>>>
>>> Add new file mlx5_vdpa_virtq.c to manage virtq operations.
>>>
>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>> ---
>>>  drivers/vdpa/mlx5/Makefile          |   1 +
>>>  drivers/vdpa/mlx5/meson.build       |   1 +
>>>  drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
>>>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  36 ++++++
>>>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 212
>>> ++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 251 insertions(+)
>>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>>>
>>> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
>>> index 7f13756..353e262 100644
>>> --- a/drivers/vdpa/mlx5/Makefile
>>> +++ b/drivers/vdpa/mlx5/Makefile
>>> @@ -10,6 +10,7 @@ LIB = librte_pmd_mlx5_vdpa.a
>>>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
>>>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
>>>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
>>> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
>>>
>>>  # Basic CFLAGS.
>>>  CFLAGS += -O3
>>> diff --git a/drivers/vdpa/mlx5/meson.build
>>> b/drivers/vdpa/mlx5/meson.build index c609f7c..e017f95 100644
>>> --- a/drivers/vdpa/mlx5/meson.build
>>> +++ b/drivers/vdpa/mlx5/meson.build
>>> @@ -14,6 +14,7 @@ sources = files(
>>>  	'mlx5_vdpa.c',
>>>  	'mlx5_vdpa_mem.c',
>>>  	'mlx5_vdpa_event.c',
>>> +	'mlx5_vdpa_virtq.c',
>>>  )
>>>  cflags_options = [
>>>  	'-std=c11',
>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>> b/drivers/vdpa/mlx5/mlx5_vdpa.c index c67f93d..4d30b35 100644
>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
>>> @@ -229,6 +229,7 @@
>>>  		goto error;
>>>  	}
>>>  	SLIST_INIT(&priv->mr_list);
>>> +	SLIST_INIT(&priv->virtq_list);
>>>  	pthread_mutex_lock(&priv_list_lock);
>>>  	TAILQ_INSERT_TAIL(&priv_list, priv, next);
>>>  	pthread_mutex_unlock(&priv_list_lock);
>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h
>>> b/drivers/vdpa/mlx5/mlx5_vdpa.h index 30030b7..a7e2185 100644
>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
>>> @@ -53,6 +53,19 @@ struct mlx5_vdpa_query_mr {
>>>  	int is_indirect;
>>>  };
>>>
>>> +struct mlx5_vdpa_virtq {
>>> +	SLIST_ENTRY(mlx5_vdpa_virtq) next;
>>> +	uint16_t index;
>>> +	uint16_t vq_size;
>>> +	struct mlx5_devx_obj *virtq;
>>> +	struct mlx5_vdpa_event_qp eqp;
>>> +	struct {
>>> +		struct mlx5dv_devx_umem *obj;
>>> +		void *buf;
>>> +		uint32_t size;
>>> +	} umems[3];
>>> +};
>>> +
>>>  struct mlx5_vdpa_priv {
>>>  	TAILQ_ENTRY(mlx5_vdpa_priv) next;
>>>  	int id; /* vDPA device id. */
>>> @@ -69,6 +82,10 @@ struct mlx5_vdpa_priv {
>>>  	struct mlx5dv_devx_event_channel *eventc;
>>>  	struct mlx5dv_devx_uar *uar;
>>>  	struct rte_intr_handle intr_handle;
>>> +	struct mlx5_devx_obj *td;
>>> +	struct mlx5_devx_obj *tis;
>>> +	uint16_t nr_virtqs;
>>> +	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
>>>  	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;  };
>>>
>>> @@ -146,4 +163,23 @@ int mlx5_vdpa_event_qp_create(struct
>> mlx5_vdpa_priv *priv, uint16_t desc_n,
>>>   */
>>>  void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
>>>
>>> +/**
>>> + * Release a virtq and all its related resources.
>>> + *
>>> + * @param[in] priv
>>> + *   The vdpa driver private structure.
>>> + */
>>> +void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
>>> +
>>> +/**
>>> + * Create all the HW virtqs resources and all their related resources.
>>> + *
>>> + * @param[in] priv
>>> + *   The vdpa driver private structure.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>> + */
>>> +int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
>>> +
>>>  #endif /* RTE_PMD_MLX5_VDPA_H_ */
>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>>> b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>>> new file mode 100644
>>> index 0000000..781bccf
>>> --- /dev/null
>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>>> @@ -0,0 +1,212 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright 2019 Mellanox Technologies, Ltd  */ #include <string.h>
>>> +
>>> +#include <rte_malloc.h>
>>> +#include <rte_errno.h>
>>> +
>>> +#include <mlx5_common.h>
>>> +
>>> +#include "mlx5_vdpa_utils.h"
>>> +#include "mlx5_vdpa.h"
>>> +
>>> +
>>> +static int
>>> +mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq) {
>>> +	int i;
>>> +
>>> +	if (virtq->virtq) {
>>> +		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
>>> +		virtq->virtq = NULL;
>>> +	}
>>> +	for (i = 0; i < 3; ++i) {
>>> +		if (virtq->umems[i].obj)
>>> +			claim_zero(mlx5_glue->devx_umem_dereg
>>> +							 (virtq-
>>> umems[i].obj));
>>> +		if (virtq->umems[i].buf)
>>> +			rte_free(virtq->umems[i].buf);
>>> +	}
>>> +	memset(&virtq->umems, 0, sizeof(virtq->umems));
>>> +	if (virtq->eqp.fw_qp)
>>> +		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
>>> +	return 0;
>>> +}
>>> +
>>> +void
>>> +mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv) {
>>> +	struct mlx5_vdpa_virtq *entry;
>>> +	struct mlx5_vdpa_virtq *next;
>>> +
>>> +	entry = SLIST_FIRST(&priv->virtq_list);
>>> +	while (entry) {
>>> +		next = SLIST_NEXT(entry, next);
>>> +		mlx5_vdpa_virtq_unset(entry);
>>> +		SLIST_REMOVE(&priv->virtq_list, entry, mlx5_vdpa_virtq,
>> next);
>>> +		rte_free(entry);
>>> +		entry = next;
>>> +	}
>>> +	SLIST_INIT(&priv->virtq_list);
>>> +	if (priv->tis) {
>>> +		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
>>> +		priv->tis = NULL;
>>> +	}
>>> +	if (priv->td) {
>>> +		claim_zero(mlx5_devx_cmd_destroy(priv->td));
>>> +		priv->td = NULL;
>>> +	}
>>> +}
>>> +
>>> +static uint64_t
>>> +mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva) {
>>> +	struct rte_vhost_mem_region *reg;
>>> +	uint32_t i;
>>> +	uint64_t gpa = 0;
>>> +
>>> +	for (i = 0; i < mem->nregions; i++) {
>>> +		reg = &mem->regions[i];
>>> +		if (hva >= reg->host_user_addr &&
>>> +		    hva < reg->host_user_addr + reg->size) {
>>> +			gpa = hva - reg->host_user_addr + reg-
>>> guest_phys_addr;
>>> +			break;
>>> +		}
>>> +	}
>>> +	return gpa;
>>> +}
>>
>> I think you may need a third parameter for the size to map.
>> Otherwise, you would be vulnerable to CVE-2018-1059.
> 
> Yes, I just read it and understood that the virtio descriptor queues\packets data may be non continues in the guest physical memory and even maybe undefined here in the rte_vhost library, Is it?
> 
> Don't you think that the rte_vhost should validate it? at least, that all the queues memory are mapped?
I just checked vhost lib again, and you're right, it already does the
check.
Basically, if translate_ring_addresses() fail because the rings aren't
fully mapped, then virtio_is_ready() will return false and so the
vdpa .dev_conf() callback won't be called.
> Can you extend more why it may happen? QEMU bug?
It could happen with a malicious or compromised vhost-user
master, like Qemu or Virtio-user based application.
> In any case,
> From Mellanox perspective, at least for the packet data, it is OK since if the guest will try to access physical address which is not mapped the packet will be ignored by the HW.
Ok!
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues
  2020-01-31  6:56       ` Matan Azrad
@ 2020-01-31 14:47         ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 14:47 UTC (permalink / raw)
  To: Matan Azrad, dev, Slava Ovsiienko
On 1/31/20 7:56 AM, Matan Azrad wrote:
> 
> Hi
> 
> From: Maxime Coquelin
>> On 1/29/20 11:09 AM, Matan Azrad wrote:
>>> As an arrangement to the vitrio queues creation, a 2 QPs and CQ may be
>>> created for the virtio queue.
>>>
>>> The design is to trigger an event for the guest and for the vdpa
>>> driver when a new CQE is posted by the HW after the packet transition.
>>>
>>> This patch add the basic operations to create and destroy the above HW
>>> objects  and to trigger the CQE events when a new CQE is posted.
>>>
>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>> ---
>>>  drivers/common/mlx5/mlx5_prm.h      |   4 +
>>>  drivers/vdpa/mlx5/Makefile          |   1 +
>>>  drivers/vdpa/mlx5/meson.build       |   1 +
>>>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  89 ++++++++
>>>  drivers/vdpa/mlx5/mlx5_vdpa_event.c | 399
>>> ++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 494 insertions(+)
>>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
>>>
>>> diff --git a/drivers/common/mlx5/mlx5_prm.h
>>> b/drivers/common/mlx5/mlx5_prm.h index b48cd0a..b533798 100644
>>> --- a/drivers/common/mlx5/mlx5_prm.h
>>> +++ b/drivers/common/mlx5/mlx5_prm.h
>>> @@ -392,6 +392,10 @@ struct mlx5_cqe {
>>>  /* CQE format value. */
>>>  #define MLX5_COMPRESSED 0x3
>>>
>>> +/* CQ doorbell cmd types. */
>>> +#define MLX5_CQ_DBR_CMD_SOL_ONLY (1 << 24) #define
>>> +MLX5_CQ_DBR_CMD_ALL (0 << 24)
>>> +
>>>  /* Action type of header modification. */  enum {
>>>  	MLX5_MODIFICATION_TYPE_SET = 0x1,
>>> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
>>> index 5472797..7f13756 100644
>>> --- a/drivers/vdpa/mlx5/Makefile
>>> +++ b/drivers/vdpa/mlx5/Makefile
>>> @@ -9,6 +9,7 @@ LIB = librte_pmd_mlx5_vdpa.a  # Sources.
>>>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
>>>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
>>> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
>>>
>>>  # Basic CFLAGS.
>>>  CFLAGS += -O3
>>> diff --git a/drivers/vdpa/mlx5/meson.build
>>> b/drivers/vdpa/mlx5/meson.build index 7e5dd95..c609f7c 100644
>>> --- a/drivers/vdpa/mlx5/meson.build
>>> +++ b/drivers/vdpa/mlx5/meson.build
>>> @@ -13,6 +13,7 @@ deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci',
>>> 'eal', 'sched']  sources = files(
>>>  	'mlx5_vdpa.c',
>>>  	'mlx5_vdpa_mem.c',
>>> +	'mlx5_vdpa_event.c',
>>>  )
>>>  cflags_options = [
>>>  	'-std=c11',
>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h
>>> b/drivers/vdpa/mlx5/mlx5_vdpa.h index e27baea..30030b7 100644
>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
>>> @@ -9,9 +9,40 @@
>>>
>>>  #include <rte_vdpa.h>
>>>  #include <rte_vhost.h>
>>> +#include <rte_spinlock.h>
>>> +#include <rte_interrupts.h>
>>>
>>>  #include <mlx5_glue.h>
>>>  #include <mlx5_devx_cmds.h>
>>> +#include <mlx5_prm.h>
>>> +
>>> +
>>> +#define MLX5_VDPA_INTR_RETRIES 256
>>> +#define MLX5_VDPA_INTR_RETRIES_USEC 1000
>>> +
>>> +struct mlx5_vdpa_cq {
>>> +	uint16_t log_desc_n;
>>> +	uint32_t cq_ci:24;
>>> +	uint32_t arm_sn:2;
>>> +	rte_spinlock_t sl;
>>> +	struct mlx5_devx_obj *cq;
>>> +	struct mlx5dv_devx_umem *umem_obj;
>>> +	union {
>>> +		volatile void *umem_buf;
>>> +		volatile struct mlx5_cqe *cqes;
>>> +	};
>>> +	volatile uint32_t *db_rec;
>>> +	uint64_t errors;
>>> +};
>>> +
>>> +struct mlx5_vdpa_event_qp {
>>> +	struct mlx5_vdpa_cq cq;
>>> +	struct mlx5_devx_obj *fw_qp;
>>> +	struct mlx5_devx_obj *sw_qp;
>>> +	struct mlx5dv_devx_umem *umem_obj;
>>> +	void *umem_buf;
>>> +	volatile uint32_t *db_rec;
>>> +};
>>>
>>>  struct mlx5_vdpa_query_mr {
>>>  	SLIST_ENTRY(mlx5_vdpa_query_mr) next; @@ -34,6 +65,10 @@
>> struct
>>> mlx5_vdpa_priv {
>>>  	uint32_t gpa_mkey_index;
>>>  	struct ibv_mr *null_mr;
>>>  	struct rte_vhost_memory *vmem;
>>> +	uint32_t eqn;
>>> +	struct mlx5dv_devx_event_channel *eventc;
>>> +	struct mlx5dv_devx_uar *uar;
>>> +	struct rte_intr_handle intr_handle;
>>>  	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;  };
>>>
>>> @@ -57,4 +92,58 @@ struct mlx5_vdpa_priv {
>>>   */
>>>  int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
>>>
>>> +
>>> +/**
>>> + * Create an event QP and all its related resources.
>>> + *
>>> + * @param[in] priv
>>> + *   The vdpa driver private structure.
>>> + * @param[in] desc_n
>>> + *   Number of descriptors.
>>> + * @param[in] callfd
>>> + *   The guest notification file descriptor.
>>> + * @param[in/out] eqp
>>> + *   Pointer to the event QP structure.
>>> + *
>>> + * @return
>>> + *   0 on success, -1 otherwise and rte_errno is set.
>>> + */
>>> +int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t
>> desc_n,
>>> +			      int callfd, struct mlx5_vdpa_event_qp *eqp);
>>> +
>>> +/**
>>> + * Destroy an event QP and all its related resources.
>>> + *
>>> + * @param[in/out] eqp
>>> + *   Pointer to the event QP structure.
>>> + */
>>> +void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
>>> +
>>> +/**
>>> + * Release all the event global resources.
>>> + *
>>> + * @param[in] priv
>>> + *   The vdpa driver private structure.
>>> + */
>>> +void mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv);
>>> +
>>> +/**
>>> + * Setup CQE event.
>>> + *
>>> + * @param[in] priv
>>> + *   The vdpa driver private structure.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>> + */
>>> +int mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv);
>>> +
>>> +/**
>>> + * Unset CQE event .
>>> + *
>>> + * @param[in] priv
>>> + *   The vdpa driver private structure.
>>> + */
>>> +void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
>>> +
>>>  #endif /* RTE_PMD_MLX5_VDPA_H_ */
>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
>>> b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
>>> new file mode 100644
>>> index 0000000..35518ad
>>> --- /dev/null
>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
>>> @@ -0,0 +1,399 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright 2019 Mellanox Technologies, Ltd  */ #include <unistd.h>
>>> +#include <stdint.h> #include <fcntl.h>
>>> +
>>> +#include <rte_malloc.h>
>>> +#include <rte_errno.h>
>>> +#include <rte_lcore.h>
>>> +#include <rte_atomic.h>
>>> +#include <rte_common.h>
>>> +#include <rte_io.h>
>>> +
>>> +#include <mlx5_common.h>
>>> +
>>> +#include "mlx5_vdpa_utils.h"
>>> +#include "mlx5_vdpa.h"
>>> +
>>> +
>>> +void
>>> +mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv) {
>>> +	if (priv->uar) {
>>> +		mlx5_glue->devx_free_uar(priv->uar);
>>> +		priv->uar = NULL;
>>> +	}
>>> +	if (priv->eventc) {
>>> +		mlx5_glue->devx_destroy_event_channel(priv->eventc);
>>> +		priv->eventc = NULL;
>>> +	}
>>> +	priv->eqn = 0;
>>> +}
>>> +
>>> +/* Prepare all the global resources for all the event objects.*/
>>> +static int mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv
>>> +*priv) {
>>> +	uint32_t lcore;
>>> +
>>> +	if (priv->eventc)
>>> +		return 0;
>>> +	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
>>> +	if (mlx5_glue->devx_query_eqn(priv->ctx, lcore, &priv->eqn)) {
>>> +		rte_errno = errno;
>>> +		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
>>> +		return -1;
>>> +	}
>>> +	priv->eventc = mlx5_glue->devx_create_event_channel(priv->ctx,
>>> +
>> MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
>>> +	if (!priv->eventc) {
>>> +		rte_errno = errno;
>>> +		DRV_LOG(ERR, "Failed to create event channel %d.",
>>> +			rte_errno);
>>> +		goto error;
>>> +	}
>>> +	priv->uar = mlx5_glue->devx_alloc_uar(priv->ctx, 0);
>>> +	if (!priv->uar) {
>>> +		rte_errno = errno;
>>> +		DRV_LOG(ERR, "Failed to allocate UAR.");
>>> +		goto error;
>>> +	}
>>> +	return 0;
>>> +error:
>>> +	mlx5_vdpa_event_qp_global_release(priv);
>>> +	return -1;
>>> +}
>>> +
>>> +static void
>>> +mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq) {
>>> +	if (cq->cq)
>>> +		claim_zero(mlx5_devx_cmd_destroy(cq->cq));
>>> +	if (cq->umem_obj)
>>> +		claim_zero(mlx5_glue->devx_umem_dereg(cq-
>>> umem_obj));
>>> +	if (cq->umem_buf)
>>> +		rte_free((void *)(uintptr_t)cq->umem_buf);
>>> +	memset(cq, 0, sizeof(*cq));
>>> +}
>>> +
>>> +static inline void
>>> +mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq
>>> +*cq) {
>>> +	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
>>> +	uint32_t arm_sn = cq->arm_sn << MLX5_CQ_SQN_OFFSET;
>>> +	uint32_t cq_ci = cq->cq_ci & MLX5_CI_MASK & cqe_mask;
>>> +	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
>>> +	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq->id;
>>> +	uint64_t db_be = rte_cpu_to_be_64(doorbell);
>>> +	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr,
>>> +MLX5_CQ_DOORBELL);
>>> +
>>> +	rte_io_wmb();
>>> +	cq->db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
>>> +	rte_wmb();
>>> +#ifdef RTE_ARCH_64
>>> +	*(uint64_t *)addr = db_be;
>>> +#else
>>> +	*(uint32_t *)addr = db_be;
>>> +	rte_io_wmb();
>>> +	*((uint32_t *)addr + 1) = db_be >> 32; #endif
>>> +	cq->arm_sn++;
>>> +}
>>> +
>>> +static int
>>> +mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
>>> +		    int callfd, struct mlx5_vdpa_cq *cq) {
>>> +	struct mlx5_devx_cq_attr attr;
>>> +	size_t pgsize = sysconf(_SC_PAGESIZE);
>>> +	uint32_t umem_size;
>>> +	int ret;
>>> +	uint16_t event_nums[1] = {0};
>>> +
>>> +	cq->log_desc_n = log_desc_n;
>>> +	umem_size = sizeof(struct mlx5_cqe) * (1 << log_desc_n) +
>>> +							sizeof(*cq->db_rec)
>> * 2;
>>> +	cq->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
>>> +	if (!cq->umem_buf) {
>>> +		DRV_LOG(ERR, "Failed to allocate memory for CQ.");
>>> +		rte_errno = ENOMEM;
>>> +		return -ENOMEM;
>>> +	}
>>> +	cq->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
>>> +						(void *)(uintptr_t)cq-
>>> umem_buf,
>>> +						umem_size,
>>> +						IBV_ACCESS_LOCAL_WRITE);
>>> +	if (!cq->umem_obj) {
>>> +		DRV_LOG(ERR, "Failed to register umem for CQ.");
>>> +		goto error;
>>> +	}
>>> +	attr.q_umem_valid = 1;
>>> +	attr.db_umem_valid = 1;
>>> +	attr.use_first_only = 0;
>>> +	attr.overrun_ignore = 0;
>>> +	attr.uar_page_id = priv->uar->page_id;
>>> +	attr.q_umem_id = cq->umem_obj->umem_id;
>>> +	attr.q_umem_offset = 0;
>>> +	attr.db_umem_id = cq->umem_obj->umem_id;
>>> +	attr.db_umem_offset = sizeof(struct mlx5_cqe) * (1 << log_desc_n);
>>> +	attr.eqn = priv->eqn;
>>> +	attr.log_cq_size = log_desc_n;
>>> +	attr.log_page_size = rte_log2_u32(pgsize);
>>> +	cq->cq = mlx5_devx_cmd_create_cq(priv->ctx, &attr);
>>> +	if (!cq->cq)
>>> +		goto error;
>>> +	cq->db_rec = RTE_PTR_ADD(cq->umem_buf,
>> (uintptr_t)attr.db_umem_offset);
>>> +	cq->cq_ci = 0;
>>> +	rte_spinlock_init(&cq->sl);
>>> +	/* Subscribe CQ event to the event channel controlled by the driver.
>> */
>>> +	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc, cq->cq-
>>> obj,
>>> +						   sizeof(event_nums),
>>> +						   event_nums,
>>> +						   (uint64_t)(uintptr_t)cq);
>>> +	if (ret) {
>>> +		DRV_LOG(ERR, "Failed to subscribe CQE event.");
>>> +		rte_errno = errno;
>>> +		goto error;
>>> +	}
>>> +	/* Subscribe CQ event to the guest FD only if it is not in poll mode. */
>>> +	if (callfd != -1) {
>>> +		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv-
>>> eventc,
>>> +							      callfd,
>>> +							      cq->cq->obj, 0);
>>> +		if (ret) {
>>> +			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
>>> +			rte_errno = errno;
>>> +			goto error;
>>> +		}
>>> +	}
>>> +	/* First arming. */
>>> +	mlx5_vdpa_cq_arm(priv, cq);
>>> +	return 0;
>>> +error:
>>> +	mlx5_vdpa_cq_destroy(cq);
>>> +	return -1;
>>> +}
>>> +
>>> +static inline void __rte_unused
>>> +mlx5_vdpa_cq_poll(struct mlx5_vdpa_priv *priv __rte_unused,
>>> +		  struct mlx5_vdpa_cq *cq)
>>> +{
>>> +	struct mlx5_vdpa_event_qp *eqp =
>>> +				container_of(cq, struct
>> mlx5_vdpa_event_qp, cq);
>>> +	const unsigned int cqe_size = 1 << cq->log_desc_n;
>>> +	const unsigned int cqe_mask = cqe_size - 1;
>>> +	int ret;
>>> +
>>> +	do {
>>> +		volatile struct mlx5_cqe *cqe = cq->cqes + (cq->cq_ci &
>>> +							    cqe_mask);
>>> +
>>> +		ret = check_cqe(cqe, cqe_size, cq->cq_ci);
>>> +		switch (ret) {
>>> +		case MLX5_CQE_STATUS_ERR:
>>> +			cq->errors++;
>>> +			/*fall-through*/
>>> +		case MLX5_CQE_STATUS_SW_OWN:
>>> +			cq->cq_ci++;
>>> +			break;
>>> +		case MLX5_CQE_STATUS_HW_OWN:
>>> +		default:
>>> +			break;
>>> +		}
>>> +	} while (ret != MLX5_CQE_STATUS_HW_OWN);
>>
>> Isn't there a risk of endless loop here?
> 
> No. maximum iterations number is the CQ size , since HW cannot write more CQEs before the doorbell record is updated.
Ok.
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 08/13] vdpa/mlx5: add basic steering configurations
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
@ 2020-01-31 15:10     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 15:10 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> Add a steering object to be managed by a new file mlx5_vdpa_steer.c.
> 
> Allow promiscuous flow to scatter the device Rx packets to the virtio
> queues using RSS action.
> 
> In order to allow correct RSS in L3 and L4, split the flow to 7 flows
> as required by the device.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/Makefile          |   2 +
>  drivers/vdpa/mlx5/meson.build       |   1 +
>  drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  34 +++++
>  drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 265 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 303 insertions(+)
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
> 
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 09/13] vdpa/mlx5: support queue state operation
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 09/13] vdpa/mlx5: support queue state operation Matan Azrad
@ 2020-01-31 15:32     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 15:32 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> Add support for set_vring_state operation.
> 
> Using DevX API the virtq state can be changed as described in PRM:
> 	enable - move to ready state.
> 	disable - move to suspend state.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c       | 23 ++++++++++++++++++++++-
>  drivers/vdpa/mlx5/mlx5_vdpa.h       | 15 +++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 22 ++++++++++++++++++++--
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 25 +++++++++++++++++++++----
>  4 files changed, 78 insertions(+), 7 deletions(-)
> 
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 10/13] vdpa/mlx5: map doorbell
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 10/13] vdpa/mlx5: map doorbell Matan Azrad
@ 2020-01-31 15:40     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 15:40 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> The HW supports only 4 bytes doorbell writing detection.
> The virtio device set only 2 bytes when it rings the doorbell.
> 
> Map the virtio doorbell detected by the virtio queue kickfd to the HW
> VAR space when it expects to get the virtio emulation doorbell.
> 
> Use the EAL interrupt mechanism to get notification when a new event
> appears in kickfd by the guest and write 4 bytes to the HW doorbell space
> in the notification callback.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  3 ++
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 82 ++++++++++++++++++++++++++++++++++++-
>  2 files changed, 84 insertions(+), 1 deletion(-)
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 11/13] vdpa/mlx5: support live migration
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 11/13] vdpa/mlx5: support live migration Matan Azrad
@ 2020-01-31 16:01     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 16:01 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> Add support for live migration feature by the HW:
> 	Create a single Mkey that maps the memory address space of the
> 		VHOST live migration log file.
> 	Modify VIRTIO_NET_Q object and provide vhost_log_page,
> 		dirty_bitmap_mkey, dirty_bitmap_size, dirty_bitmap_addr
> 		and dirty_bitmap_dump_enable.
> 	Modify VIRTIO_NET_Q object and move state to SUSPEND.
> 	Query VIRTIO_NET_Q and get hw_available_idx and hw_used_idx.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  doc/guides/vdpadevs/features/mlx5.ini |   1 +
>  drivers/vdpa/mlx5/Makefile            |   1 +
>  drivers/vdpa/mlx5/meson.build         |   1 +
>  drivers/vdpa/mlx5/mlx5_vdpa.c         |  44 +++++++++++-
>  drivers/vdpa/mlx5/mlx5_vdpa.h         |  55 ++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 130 ++++++++++++++++++++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   7 +-
>  7 files changed, 236 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> 
> diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
> index e4ee34b..1da9c1b 100644
> --- a/doc/guides/vdpadevs/features/mlx5.ini
> +++ b/doc/guides/vdpadevs/features/mlx5.ini
> @@ -9,6 +9,7 @@ guest csum           = Y
>  host tso4            = Y
>  host tso6            = Y
>  version 1            = Y
> +log all              = Y
>  any layout           = Y
>  guest announce       = Y
>  mq                   = Y
> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> index 2f70a98..4d1f528 100644
> --- a/drivers/vdpa/mlx5/Makefile
> +++ b/drivers/vdpa/mlx5/Makefile
> @@ -12,6 +12,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_lm.c
>  
>  
>  # Basic CFLAGS.
> diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
> index 2849178..2e521b8 100644
> --- a/drivers/vdpa/mlx5/meson.build
> +++ b/drivers/vdpa/mlx5/meson.build
> @@ -16,6 +16,7 @@ sources = files(
>  	'mlx5_vdpa_event.c',
>  	'mlx5_vdpa_virtq.c',
>  	'mlx5_vdpa_steer.c',
> +	'mlx5_vdpa_lm.c',
>  )
>  cflags_options = [
>  	'-std=c11',
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 71189c4..4ce0ba0 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -19,7 +19,8 @@
>  			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
>  			    (1ULL << VIRTIO_NET_F_MQ) | \
>  			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> -			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
> +			    (1ULL << VIRTIO_F_ORDER_PLATFORM) | \
> +			    (1ULL << VHOST_F_LOG_ALL))
>  
>  #define MLX5_VDPA_PROTOCOL_FEATURES \
>  			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
> @@ -127,6 +128,45 @@
>  	return mlx5_vdpa_virtq_enable(virtq, state);
>  }
>  
> +static int
> +mlx5_vdpa_features_set(int vid)
> +{
> +	int did = rte_vhost_get_vdpa_device_id(vid);
> +	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
> +	uint64_t log_base, log_size;
> +	uint64_t features;
> +	int ret;
> +
> +	if (priv == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d.", did);
> +		return -EINVAL;
> +	}
> +	ret = rte_vhost_get_negotiated_features(vid, &features);
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to get negotiated features.");
> +		return ret;
> +	}
> +	if (RTE_VHOST_NEED_LOG(features)) {
> +		ret = rte_vhost_get_log_base(vid, &log_base, &log_size);
> +		if (ret) {
> +			DRV_LOG(ERR, "Failed to get log base.");
> +			return ret;
> +		}
> +		ret = mlx5_vdpa_dirty_bitmap_set(priv, log_base, log_size);
> +		if (ret) {
> +			DRV_LOG(ERR, "Failed to set dirty bitmap.");
> +			return ret;
> +		}
> +		DRV_LOG(INFO, "mlx5 vdpa: enabling dirty logging...");
> +		ret = mlx5_vdpa_logging_enable(priv, 1);
> +		if (ret) {
> +			DRV_LOG(ERR, "Failed t enable dirty logging.");
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
>  static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
>  	.get_queue_num = mlx5_vdpa_get_queue_num,
>  	.get_features = mlx5_vdpa_get_vdpa_features,
> @@ -134,7 +174,7 @@
>  	.dev_conf = NULL,
>  	.dev_close = NULL,
>  	.set_vring_state = mlx5_vdpa_set_vring_state,
> -	.set_features = NULL,
> +	.set_features = mlx5_vdpa_features_set,
>  	.migration_done = NULL,
>  	.get_vfio_group_fd = NULL,
>  	.get_vfio_device_fd = NULL,
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
> index af78ea1..70264e4 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> @@ -244,4 +244,59 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
>   */
>  int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
>  
> +/**
> + * Enable\Disable live migration logging.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + * @param[in] enable
> + *   Set for enable, unset for disable.
> + *
> + * @return
> + *   0 on success, a negative value otherwise.
> + */
> +int mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable);
> +
> +/**
> + * Set dirty bitmap logging to allow live migration.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + * @param[in] log_base
> + *   Vhost log base.
> + * @param[in] log_size
> + *   Vhost log size.
> + *
> + * @return
> + *   0 on success, a negative value otherwise.
> + */
> +int mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
> +			       uint64_t log_size);
> +
> +/**
> + * Log all virtqs information for live migration.
> + *
> + * @param[in] priv
> + *   The vdpa driver private structure.
> + * @param[in] enable
> + *   Set for enable, unset for disable.
> + *
> + * @return
> + *   0 on success, a negative value otherwise.
> + */
> +int mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv);
> +
> +/**
> + * Modify virtq state to be ready or suspend.
> + *
> + * @param[in] virtq
> + *   The vdpa driver private virtq structure.
> + * @param[in] state
> + *   Set for ready, otherwise suspend.
> + *
> + * @return
> + *   0 on success, a negative value otherwise.
> + */
> +int mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state);
> +
>  #endif /* RTE_PMD_MLX5_VDPA_H_ */
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> new file mode 100644
> index 0000000..cfeec5f
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2019 Mellanox Technologies, Ltd
> + */
> +#include <rte_malloc.h>
> +#include <rte_errno.h>
> +
> +#include "mlx5_vdpa_utils.h"
> +#include "mlx5_vdpa.h"
> +
> +
> +int
> +mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
> +{
> +	struct mlx5_devx_virtq_attr attr = {
> +		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
> +		.dirty_bitmap_dump_enable = enable,
> +	};
> +	struct mlx5_vdpa_virtq *virtq;
> +
> +	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
> +		attr.queue_index = virtq->index;
> +		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
> +			DRV_LOG(ERR, "Failed to modify virtq %d logging.",
> +				virtq->index);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}
> +
> +int
> +mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
> +			   uint64_t log_size)
> +{
> +	struct mlx5_devx_mkey_attr mkey_attr = {
> +			.addr = (uintptr_t)log_base,
> +			.size = log_size,
> +			.pd = priv->pdn,
> +			.pg_access = 1,
> +			.klm_array = NULL,
> +			.klm_num = 0,
> +	};
> +	struct mlx5_devx_virtq_attr attr = {
> +		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
> +		.dirty_bitmap_addr = log_base,
> +		.dirty_bitmap_size = log_size,
> +	};
> +	struct mlx5_vdpa_query_mr *mr = rte_malloc(__func__, sizeof(*mr), 0);
> +	struct mlx5_vdpa_virtq *virtq;
> +
> +	if (!mr) {
> +		DRV_LOG(ERR, "Failed to allocate mem for lm mr.");
> +		return -1;
> +	}
> +	mr->umem = mlx5_glue->devx_umem_reg(priv->ctx,
> +					    (void *)(uintptr_t)log_base,
> +					    log_size, IBV_ACCESS_LOCAL_WRITE);
> +	if (!mr->umem) {
> +		DRV_LOG(ERR, "Failed to register umem for lm mr.");
> +		goto err;
> +	}
> +	mkey_attr.umem_id = mr->umem->umem_id;
> +	mr->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
> +	if (!mr->mkey) {
> +		DRV_LOG(ERR, "Failed to create Mkey for lm.");
> +		goto err;
> +	}
> +	attr.dirty_bitmap_mkey = mr->mkey->id;
> +	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
> +		attr.queue_index = virtq->index;
> +		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
> +			DRV_LOG(ERR, "Failed to modify virtq %d for lm.",
> +				virtq->index);
> +			goto err;
> +		}
> +	}
> +	mr->is_indirect = 0;
> +	SLIST_INSERT_HEAD(&priv->mr_list, mr, next);
> +	return 0;
> +err:
> +	if (mr->mkey)
> +		mlx5_devx_cmd_destroy(mr->mkey);
> +	if (mr->umem)
> +		mlx5_glue->devx_umem_dereg(mr->umem);
> +	rte_free(mr);
> +	return -1;
> +}
> +
> +#define MLX5_VDPA_USED_RING_LEN(size) \
> +	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
> +
> +int
> +mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
> +{
> +	struct mlx5_devx_virtq_attr attr = {0};
> +	struct mlx5_vdpa_virtq *virtq;
> +	uint64_t features;
> +	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
> +
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to get negotiated features.");
> +		return -1;
> +	}
> +	if (RTE_VHOST_NEED_LOG(features)) {
> +		SLIST_FOREACH(virtq, &priv->virtq_list, next) {
> +			ret = mlx5_vdpa_virtq_modify(virtq, 0);
> +			if (ret)
> +				return -1;
> +			if (mlx5_devx_cmd_query_virtq(virtq->virtq, &attr)) {
> +				DRV_LOG(ERR, "Failed to query virtq %d.",
> +					virtq->index);
> +				return -1;
> +			}
> +			DRV_LOG(INFO, "Query vid %d vring %d: hw_available_idx="
> +				"%d, hw_used_index=%d", priv->vid, virtq->index,
> +				attr.hw_available_index, attr.hw_used_index);
> +			ret = rte_vhost_set_vring_base(priv->vid, virtq->index,
> +						       attr.hw_available_index,
> +						       attr.hw_used_index);
> +			if (ret) {
> +				DRV_LOG(ERR, "Failed to set virtq %d base.",
> +					virtq->index);
> +				return -1;
> +			}
> +			rte_vhost_log_used_vring(priv->vid, virtq->index, 0,
> +				       MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
> +		}
> +	}
> +	return 0;
> +}
To avoid one more level of indentation, I would do:
if (!RTE_VHOST_NEED_LOG(features))
	return 0;
Other than that:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 12/13] vdpa/mlx5: support close and config operations
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 12/13] vdpa/mlx5: support close and config operations Matan Azrad
@ 2020-01-31 16:06     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 16:06 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> Support dev_conf and dev_conf operations.
> These operations allow vdpa traffic.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 58 ++++++++++++++++++++++++++++++++++++++++---
>  drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
>  2 files changed, 55 insertions(+), 4 deletions(-)
> 
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 13/13] vdpa/mlx5: disable ROCE
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 13/13] vdpa/mlx5: disable ROCE Matan Azrad
@ 2020-01-31 16:42     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-01-31 16:42 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:09 AM, Matan Azrad wrote:
> In order to support virtio queue creation by the FW, ROCE mode
> should be disabled in the device.
> 
> Do it by netlink which is like the devlink tool commands:
> 	1. devlink dev param set pci/[pci] name enable_roce value false
> 	   cmode driverinit
>     	2. devlink dev reload pci/[pci]
> Or by sysfs which is like:
> 	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
> 
> The IB device is matched again after ROCE disabling.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/vdpa/mlx5/Makefile    |   2 +-
>  drivers/vdpa/mlx5/meson.build |   2 +-
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 192 ++++++++++++++++++++++++++++++++++--------
>  3 files changed, 161 insertions(+), 35 deletions(-)
> 
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 01/13] drivers: introduce mlx5 vDPA driver
  2020-01-30 14:38     ` Maxime Coquelin
@ 2020-02-01 17:53       ` Matan Azrad
  0 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-01 17:53 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Slava Ovsiienko
Hi Maxime
Thank you very much for the review.
Please see answers inline.
From: Maxime Coquelin
> On 1/29/20 11:08 AM, Matan Azrad wrote:
> > Add a new driver to support vDPA operations by Mellanox devices.
> >
> > The first Mellanox devices which support vDPA operations are
> > ConnectX6DX and Bluefield1 HCA for their PF ports and VF ports.
> >
> > This driver is depending on rdma-core like the mlx5 PMD, also it is
> > going to use mlx5 DevX to create HW objects directly by the FW.
> > Hence, the common/mlx5 library is linked to the mlx5_vdpa driver.
> 
> If possible, I would really appreciate to have the information on the versions
> required for the above dependencies. Better if it is also mentionned in the
> guide.
> 
It is already in the guide 😊
> > This driver will not be compiled by default due to the above
> > dependencies.
> >
> > Register a new log type for this driver.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  MAINTAINERS                                     |   7 +
> >  config/common_base                              |   5 +
> >  doc/guides/rel_notes/release_20_02.rst          |   5 +
> >  doc/guides/vdpadevs/features/mlx5.ini           |  14 ++
> >  doc/guides/vdpadevs/index.rst                   |   1 +
> >  doc/guides/vdpadevs/mlx5.rst                    | 111 ++++++++++++
> >  drivers/common/Makefile                         |   2 +-
> >  drivers/common/mlx5/Makefile                    |  17 +-
> >  drivers/meson.build                             |   8 +-
> >  drivers/vdpa/Makefile                           |   2 +
> >  drivers/vdpa/meson.build                        |   3 +-
> >  drivers/vdpa/mlx5/Makefile                      |  36 ++++
> >  drivers/vdpa/mlx5/meson.build                   |  29 +++
> >  drivers/vdpa/mlx5/mlx5_vdpa.c                   | 227
> ++++++++++++++++++++++++
> >  drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +++
> >  drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
> >  mk/rte.app.mk                                   |  15 +-
> >  17 files changed, 488 insertions(+), 17 deletions(-)  create mode
> > 100644 doc/guides/vdpadevs/features/mlx5.ini
> >  create mode 100644 doc/guides/vdpadevs/mlx5.rst  create mode 100644
> > drivers/vdpa/mlx5/Makefile  create mode 100644
> > drivers/vdpa/mlx5/meson.build  create mode 100644
> > drivers/vdpa/mlx5/mlx5_vdpa.c  create mode 100644
> > drivers/vdpa/mlx5/mlx5_vdpa_utils.h
> >  create mode 100644
> drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS index 150d507..f697e9a 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1103,6 +1103,13 @@ F: drivers/vdpa/ifc/
> >  F: doc/guides/vdpadevs/ifc.rst
> >  F: doc/guides/vdpadevs/features/ifcvf.ini
> >
> > +Mellanox mlx5 vDPA
> > +M: Matan Azrad <matan@mellanox.com>
> > +M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > +F: drivers/vdpa/mlx5/
> > +F: doc/guides/vdpadevs/mlx5.rst
> > +F: doc/guides/vdpadevs/features/mlx5.ini
> > +
> >
> >  Eventdev Drivers
> >  ----------------
> > diff --git a/config/common_base b/config/common_base index
> > c897dd0..6ea9c63 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -366,6 +366,11 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
> > CONFIG_RTE_LIBRTE_MLX5_PMD=n
> CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
> >
> > +#
> > +# Compile vdpa-oriented Mellanox ConnectX-6 & Bluefield (MLX5) PMD #
> > +CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=n
> > +
> >  # Linking method for mlx4/5 dependency on ibverbs and related
> > libraries  # Default linking is dynamic by linker.
> >  # Other options are: dynamic by dlopen at run-time, or statically
> embedded.
> > diff --git a/doc/guides/rel_notes/release_20_02.rst
> > b/doc/guides/rel_notes/release_20_02.rst
> > index 50e2c14..690e7db 100644
> > --- a/doc/guides/rel_notes/release_20_02.rst
> > +++ b/doc/guides/rel_notes/release_20_02.rst
> > @@ -113,6 +113,11 @@ New Features
> >    * Added support for RSS using L3/L4 source/destination only.
> >    * Added support for matching on GTP tunnel header item.
> >
> > +* **Add new vDPA PMD based on Mellanox devices**
> > +
> > +  Added a new Mellanox vDPA  (``mlx5_vdpa``) PMD.
> > +  See the :doc:`../vdpadevs/mlx5` guide for more details on this driver.
> > +
> >  * **Updated testpmd application.**
> >
> >    Added support for ESP and L2TPv3 over IP rte_flow patterns to the
> > testpmd diff --git a/doc/guides/vdpadevs/features/mlx5.ini
> > b/doc/guides/vdpadevs/features/mlx5.ini
> > new file mode 100644
> > index 0000000..d635bdf
> > --- /dev/null
> > +++ b/doc/guides/vdpadevs/features/mlx5.ini
> > @@ -0,0 +1,14 @@
> > +;
> > +; Supported features of the 'mlx5' VDPA driver.
> > +;
> > +; Refer to default.ini for the full list of available driver features.
> > +;
> > +[Features]
> > +Other kdrv           = Y
> > +ARMv8                = Y
> > +Power8               = Y
> > +x86-32               = Y
> > +x86-64               = Y
> > +Usage doc            = Y
> > +Design doc           = Y
> > +
> > diff --git a/doc/guides/vdpadevs/index.rst
> > b/doc/guides/vdpadevs/index.rst index 9657108..1a13efe 100644
> > --- a/doc/guides/vdpadevs/index.rst
> > +++ b/doc/guides/vdpadevs/index.rst
> > @@ -13,3 +13,4 @@ which can be used from an application through vhost
> API.
> >
> >      features_overview
> >      ifc
> > +    mlx5
> > diff --git a/doc/guides/vdpadevs/mlx5.rst
> > b/doc/guides/vdpadevs/mlx5.rst new file mode 100644 index
> > 0000000..1861e71
> > --- /dev/null
> > +++ b/doc/guides/vdpadevs/mlx5.rst
> > @@ -0,0 +1,111 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright 2019 Mellanox Technologies, Ltd
> > +
> > +MLX5 vDPA driver
> > +================
> > +
> > +The MLX5 vDPA (vhost data path acceleration) driver library
> > +(**librte_pmd_mlx5_vdpa**) provides support for **Mellanox
> > +ConnectX-6**, **Mellanox ConnectX-6DX** and **Mellanox BlueField**
> > +families of
> > +10/25/40/50/100/200 Gb/s adapters as well as their virtual functions
> > +(VF) in SR-IOV context.
> > +
> > +.. note::
> > +
> > +   Due to external dependencies, this driver is disabled in default
> > +   configuration of the "make" build. It can be enabled with
> > +   ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=y`` or by using "meson" build
> system which
> > +   will detect dependencies.
> > +
> > +
> > +Design
> > +------
> > +
> > +For security reasons and robustness, this driver only deals with
> > +virtual memory addresses. The way resources allocations are handled
> > +by the kernel, combined with hardware specifications that allow to
> > +handle virtual memory addresses directly, ensure that DPDK
> > +applications cannot access random physical memory (or memory that
> does not belong to the current process).
> > +
> > +The PMD can use libibverbs and libmlx5 to access the device firmware
> > +or directly the hardware components.
> > +There are different levels of objects and bypassing abilities to get
> > +the best performances:
> > +
> > +- Verbs is a complete high-level generic API
> > +- Direct Verbs is a device-specific API
> > +- DevX allows to access firmware objects
> > +- Direct Rules manages flow steering at low-level hardware layer
> > +
> > +Enabling librte_pmd_mlx5_vdpa causes DPDK applications to be linked
> > +against libibverbs.
> > +
> > +A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or
> > +vdpa/mlx5 driver but not in parallel. Hence, the user should decide
> > +the driver by the ``class`` parameter in the device argument list.
> > +By default, the mlx5 device will be probed by the net/mlx5 driver.
> > +
> > +Supported NICs
> > +--------------
> > +
> > +* Mellanox(R) ConnectX(R)-6 200G MCX654106A-HCAT (4x200G)
> > +* Mellanox(R) ConnectX(R)-6DX EN 100G MCX623106AN-CDAT (2*100G)
> > +* Mellanox(R) ConnectX(R)-6DX EN 200G MCX623105AN-VDAT (1*200G)
> > +* Mellanox(R) BlueField SmartNIC 25G MBF1M332A-ASCAT (2*25G)
> > +
> > +Prerequisites
> > +-------------
> > +
> > +- Mellanox OFED version: **4.7**
> > +  see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
> > +
> > +Compilation options
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +These options can be modified in the ``.config`` file.
> > +
> > +- ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` (default **n**)
> > +
> > +  Toggle compilation of librte_pmd_mlx5 itself.
> > +
> > +- ``CONFIG_RTE_IBVERBS_LINK_DLOPEN`` (default **n**)
> > +
> > +  Build PMD with additional code to make it loadable without hard
> > + dependencies on **libibverbs** nor **libmlx5**, which may not be
> > + installed  on the target system.
> > +
> > +  In this mode, their presence is still required for it to run
> > + properly,  however their absence won't prevent a DPDK application
> > + from starting (with  ``CONFIG_RTE_BUILD_SHARED_LIB`` disabled) and
> > + they won't show up as  missing with ``ldd(1)``.
> > +
> > +  It works by moving these dependencies to a purpose-built rdma-core
> "glue"
> > +  plug-in which must either be installed in a directory whose name is
> > + based  on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if
> > + set, or in a  standard location for the dynamic linker (e.g.
> > + ``/lib``) if left to the  default empty string (``""``).
> > +
> > +  This option has no performance impact.
> > +
> > +- ``CONFIG_RTE_IBVERBS_LINK_STATIC`` (default **n**)
> > +
> > +  Embed static flavor of the dependencies **libibverbs** and
> > + **libmlx5**  in the PMD shared library or the executable static binary.
> > +
> > +.. note::
> > +
> > +   For BlueField, target should be set to ``arm64-bluefield-linux-gcc``. This
> > +   will enable ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` and set
> > +   ``RTE_CACHE_LINE_SIZE`` to 64. Default armv8a configuration of make
> build and
> > +   meson build set it to 128 then brings performance degradation.
> > +
> > +Run-time configuration
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +- **ethtool** operations on related kernel interfaces also affect the
> PMD.
> > +
> > +- ``class`` parameter [string]
> > +
> > +  Select the class of the driver that should probe the device.
> > +  `vdpa` for the mlx5 vDPA driver.
> > +
> > diff --git a/drivers/common/Makefile b/drivers/common/Makefile index
> > 4775d4b..96bd7ac 100644
> > --- a/drivers/common/Makefile
> > +++ b/drivers/common/Makefile
> > @@ -35,7 +35,7 @@ ifneq (,$(findstring y,$(IAVF-y)))  DIRS-y += iavf
> > endif
> >
> > -ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
> > +ifeq ($(findstring
> >
> +y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA
> _PMD)),y)
> >  DIRS-y += mlx5
> >  endif
> >
> > diff --git a/drivers/common/mlx5/Makefile
> > b/drivers/common/mlx5/Makefile index 9d4d81f..c4b7999 100644
> > --- a/drivers/common/mlx5/Makefile
> > +++ b/drivers/common/mlx5/Makefile
> > @@ -10,15 +10,16 @@ LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
> > LIB_GLUE_VERSION = 20.02.0
> >
> >  # Sources.
> > +ifeq ($(findstring
> >
> +y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA
> _PMD)),y)
> >  ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> > -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
> > +SRCS-y += mlx5_glue.c
> >  endif
> > -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
> > -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
> > -SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
> > -
> > +SRCS-y += mlx5_devx_cmds.c
> > +SRCS-y += mlx5_common.c
> > +SRCS-y += mlx5_nl.c
> >  ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> > -INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
> > +INSTALL-y-lib += $(LIB_GLUE)
> > +endif
> >  endif
> >
> >  # Basic CFLAGS.
> > @@ -317,7 +318,9 @@ mlx5_autoconf.h: mlx5_autoconf.h.new
> >  		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
> >  		mv '$<' '$@'
> >
> > -$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
> > +ifeq ($(findstring
> >
> +y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA
> _PMD)),y)
> > +$(SRCS-y:.c=.o): mlx5_autoconf.h
> > +endif
> >
> >  # Generate dependency plug-in for rdma-core when the PMD must not be
> > linked  # directly, so that applications do not inherit this dependency.
> > diff --git a/drivers/meson.build b/drivers/meson.build index
> > 29708cc..bd154fa 100644
> > --- a/drivers/meson.build
> > +++ b/drivers/meson.build
> > @@ -42,6 +42,7 @@ foreach class:dpdk_driver_classes
> >  		build = true # set to false to disable, e.g. missing deps
> >  		reason = '<unknown reason>' # set if build == false to explain
> >  		name = drv
> > +		fmt_name = ''
> >  		allow_experimental_apis = false
> >  		sources = []
> >  		objs = []
> > @@ -98,8 +99,11 @@ foreach class:dpdk_driver_classes
> >  		else
> >  			class_drivers += name
> >
> > -
> 	dpdk_conf.set(config_flag_fmt.format(name.to_upper()),1)
> > -			lib_name = driver_name_fmt.format(name)
> > +			if fmt_name == ''
> > +				fmt_name = name
> > +			endif
> > +
> 	dpdk_conf.set(config_flag_fmt.format(fmt_name.to_upper()),1)
> > +			lib_name = driver_name_fmt.format(fmt_name)
> >
> >  			if allow_experimental_apis
> >  				cflags += '-DALLOW_EXPERIMENTAL_API'
> > diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index
> > b5a7a11..6e88359 100644
> > --- a/drivers/vdpa/Makefile
> > +++ b/drivers/vdpa/Makefile
> > @@ -7,4 +7,6 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
> >  DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc  endif
> >
> > +DIRS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5
> > +
> >  include $(RTE_SDK)/mk/rte.subdir.mk
> > diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build index
> > 2f047b5..e3ed54a 100644
> > --- a/drivers/vdpa/meson.build
> > +++ b/drivers/vdpa/meson.build
> > @@ -1,7 +1,8 @@
> >  # SPDX-License-Identifier: BSD-3-Clause  # Copyright 2019 Mellanox
> > Technologies, Ltd
> >
> > -drivers = ['ifc']
> > +drivers = ['ifc',
> > +	   'mlx5',]
> >  std_deps = ['bus_pci', 'kvargs']
> >  std_deps += ['vhost']
> >  config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
> > diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> > new file mode 100644 index 0000000..c1c8cc0
> > --- /dev/null
> > +++ b/drivers/vdpa/mlx5/Makefile
> > @@ -0,0 +1,36 @@
> > +#   SPDX-License-Identifier: BSD-3-Clause
> > +#   Copyright 2019 Mellanox Technologies, Ltd
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +# Library name.
> > +LIB = librte_pmd_mlx5_vdpa.a
> > +
> > +# Sources.
> > +SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
> > +
> > +# Basic CFLAGS.
> > +CFLAGS += -O3
> > +CFLAGS += -std=c11 -Wall -Wextra
> > +CFLAGS += -g
> > +CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5 CFLAGS +=
> > +-I$(RTE_SDK)/drivers/net/mlx5_vdpa
> > +CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5 CFLAGS += -
> D_BSD_SOURCE
> > +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += -D_XOPEN_SOURCE=600
> CFLAGS +=
> > +$(WERROR_FLAGS) CFLAGS += -Wno-strict-prototypes LDLIBS +=
> > +-lrte_common_mlx5 LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs
> > +-lrte_bus_pci
> > +
> > +# A few warnings cannot be avoided in external headers.
> > +CFLAGS += -Wno-error=cast-qual
> > +
> > +EXPORT_MAP := rte_pmd_mlx5_vdpa_version.map # memseg walk is not
> part
> > +of stable API CFLAGS += -DALLOW_EXPERIMENTAL_API
> > +
> > +CFLAGS += -DNDEBUG -UPEDANTIC
> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/drivers/vdpa/mlx5/meson.build
> > b/drivers/vdpa/mlx5/meson.build new file mode 100644 index
> > 0000000..4bca6ea
> > --- /dev/null
> > +++ b/drivers/vdpa/mlx5/meson.build
> > @@ -0,0 +1,29 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright 2019 Mellanox
> > +Technologies, Ltd
> > +
> > +if not is_linux
> > +	build = false
> > +	reason = 'only supported on Linux'
> > +	subdir_done()
> > +endif
> > +
> > +fmt_name = 'mlx5_vdpa'
> > +allow_experimental_apis = true
> > +deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal'] sources =
> > +files(
> > +	'mlx5_vdpa.c',
> > +)
> > +cflags_options = [
> > +	'-std=c11',
> > +	'-Wno-strict-prototypes',
> > +	'-D_BSD_SOURCE',
> > +	'-D_DEFAULT_SOURCE',
> > +	'-D_XOPEN_SOURCE=600'
> > +]
> > +foreach option:cflags_options
> > +	if cc.has_argument(option)
> > +		cflags += option
> > +	endif
> > +endforeach
> > +
> > +cflags += [ '-DNDEBUG', '-UPEDANTIC' ]
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > b/drivers/vdpa/mlx5/mlx5_vdpa.c new file mode 100644 index
> > 0000000..6286d7a
> > --- /dev/null
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > @@ -0,0 +1,227 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2019 Mellanox Technologies, Ltd  */ #include
> > +<rte_malloc.h> #include <rte_log.h> #include <rte_errno.h> #include
> > +<rte_bus_pci.h> #include <rte_vdpa.h>
> > +
> > +#include <mlx5_glue.h>
> > +#include <mlx5_common.h>
> > +
> > +#include "mlx5_vdpa_utils.h"
> > +
> > +
> > +struct mlx5_vdpa_priv {
> > +	TAILQ_ENTRY(mlx5_vdpa_priv) next;
> > +	int id; /* vDPA device id. */
> > +	struct ibv_context *ctx; /* Device context. */
> > +	struct rte_vdpa_dev_addr dev_addr;
> > +};
> > +
> > +TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
> > +
> TAILQ_HEAD_INITIALIZER(priv_list);
> > +static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
> > +int mlx5_vdpa_logtype;
> > +
> > +static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
> > +	.get_queue_num = NULL,
> > +	.get_features = NULL,
> > +	.get_protocol_features = NULL,
> > +	.dev_conf = NULL,
> > +	.dev_close = NULL,
> > +	.set_vring_state = NULL,
> > +	.set_features = NULL,
> > +	.migration_done = NULL,
> > +	.get_vfio_group_fd = NULL,
> > +	.get_vfio_device_fd = NULL,
> > +	.get_notify_area = NULL,
> > +};
> > +
> > +/**
> > + * DPDK callback to register a PCI device.
> > + *
> > + * This function spawns vdpa device out of a given PCI device.
> > + *
> > + * @param[in] pci_drv
> > + *   PCI driver structure (mlx5_vpda_driver).
> > + * @param[in] pci_dev
> > + *   PCI device information.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +static int
> > +mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> > +		    struct rte_pci_device *pci_dev __rte_unused) {
> > +	struct ibv_device **ibv_list;
> > +	struct ibv_device *ibv_match = NULL;
> > +	struct mlx5_vdpa_priv *priv = NULL;
> > +	struct ibv_context *ctx = NULL;
> > +	int ret;
> > +
> > +	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_VDPA)
> {
> > +		DRV_LOG(DEBUG, "Skip probing - should be probed by other
> mlx5"
> > +			" driver.");
> > +		return 1;
> 
> Maybe the function doc should be implemented for return 1 case:
> 
> * @return
> *   0 on success, a negative errno value otherwise and rte_errno is set.
> 
Sure, will fix it.
> > +	}
> > +	errno = 0;
> > +	ibv_list = mlx5_glue->get_device_list(&ret);
> > +	if (!ibv_list) {
> > +		rte_errno = errno;
> > +		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs
> loaded?");
> > +		return -ENOSYS;
> 
> Shouldn't return -rte_errno to be consistent with the rest of the function?
> For the sake of consistency, you could also goto error instead.
Ok, will save consistency.
> 
> > +	}
> > +	while (ret-- > 0) {
> > +		struct rte_pci_addr pci_addr;
> > +
> > +		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]-
> >name);
> > +		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path,
> &pci_addr))
> > +			continue;
> > +		if (pci_dev->addr.domain != pci_addr.domain ||
> > +		    pci_dev->addr.bus != pci_addr.bus ||
> > +		    pci_dev->addr.devid != pci_addr.devid ||
> > +		    pci_dev->addr.function != pci_addr.function)
> > +			continue;
> > +		DRV_LOG(INFO, "PCI information matches for device
> \"%s\".",
> > +			ibv_list[ret]->name);
> > +		ibv_match = ibv_list[ret];
> > +		break;
> > +	}
> > +	mlx5_glue->free_device_list(ibv_list);
> > +	if (!ibv_match) {
> > +		DRV_LOG(ERR, "No matching IB device for PCI slot "
> > +			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
> > +			pci_dev->addr.domain, pci_dev->addr.bus,
> > +			pci_dev->addr.devid, pci_dev->addr.function);
> > +		rte_errno = ENOENT;
> > +		return -rte_errno;
> > +	}
> > +	ctx = mlx5_glue->dv_open_device(ibv_match);
> > +	if (!ctx) {
> > +		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
> > +			ibv_match->name);
> > +		rte_errno = ENODEV;
> > +		return -rte_errno;
> > +	}
> > +	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv),
> > +			   RTE_CACHE_LINE_SIZE);
> > +	if (!priv) {
> > +		DRV_LOG(ERR, "Failed to allocate private memory.");
> > +		rte_errno = ENOMEM;
> > +		goto error;
> > +	}
> > +	priv->ctx = ctx;
> > +	priv->dev_addr.pci_addr = pci_dev->addr;
> > +	priv->dev_addr.type = PCI_ADDR;
> > +	priv->id = rte_vdpa_register_device(&priv->dev_addr,
> &mlx5_vdpa_ops);
> > +	if (priv->id < 0) {
> > +		DRV_LOG(ERR, "Failed to register vDPA device.");
> > +		rte_errno = rte_errno ? rte_errno : EINVAL;
> > +		goto error;
> > +	}
> > +	pthread_mutex_lock(&priv_list_lock);
> > +	TAILQ_INSERT_TAIL(&priv_list, priv, next);
> > +	pthread_mutex_unlock(&priv_list_lock);
> > +	return 0;
> > +
> > +error:
> > +	if (priv)
> > +		rte_free(priv);
> > +	if (ctx)
> > +		mlx5_glue->close_device(ctx);
> > +	return -rte_errno;
> > +}
> > +
> 
> These are minor comments.
> If directly fixed in v3, feel free to add my:
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (12 preceding siblings ...)
  2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 13/13] vdpa/mlx5: disable ROCE Matan Azrad
@ 2020-02-02 16:03   ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 01/13] drivers: introduce " Matan Azrad
                       ` (14 more replies)
  2020-02-03 13:24   ` [dpdk-dev] [PATCH v2 " Maxime Coquelin
  14 siblings, 15 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
v2:
- Reorder patches for 2 serieses - this is the seccond part of the previous series splitting.
- Fix spelling and per patch complition issues.
- moved to use claim_zero instead of pure asserts.
v3:
- Address Maxime C comments.
- Adjust to the last MLX5_ASSERT introduction.
 
Matan Azrad (13):
  drivers: introduce mlx5 vDPA driver
  vdpa/mlx5: support queues number operation
  vdpa/mlx5: support features get operations
  vdpa/mlx5: prepare memory regions
  vdpa/mlx5: prepare HW queues
  vdpa/mlx5: prepare virtio queues
  vdpa/mlx5: support stateless offloads
  vdpa/mlx5: add basic steering configurations
  vdpa/mlx5: support queue state operation
  vdpa/mlx5: map doorbell
  vdpa/mlx5: support live migration
  vdpa/mlx5: support close and config operations
  vdpa/mlx5: disable ROCE
 MAINTAINERS                                     |   7 +
 config/common_base                              |   5 +
 doc/guides/rel_notes/release_20_02.rst          |   5 +
 doc/guides/vdpadevs/features/mlx5.ini           |  27 ++
 doc/guides/vdpadevs/index.rst                   |   1 +
 doc/guides/vdpadevs/mlx5.rst                    | 111 +++++
 drivers/common/Makefile                         |   2 +-
 drivers/common/mlx5/Makefile                    |  17 +-
 drivers/common/mlx5/mlx5_prm.h                  |   4 +
 drivers/meson.build                             |   8 +-
 drivers/vdpa/Makefile                           |   2 +
 drivers/vdpa/meson.build                        |   3 +-
 drivers/vdpa/mlx5/Makefile                      |  58 +++
 drivers/vdpa/mlx5/meson.build                   |  38 ++
 drivers/vdpa/mlx5/mlx5_vdpa.c                   | 563 ++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h                   | 309 +++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c             | 400 +++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c                | 129 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c               | 346 +++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c             | 283 ++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             | 388 ++++++++++++++++
 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
 mk/rte.app.mk                                   |  15 +-
 24 files changed, 2727 insertions(+), 17 deletions(-)
 create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
 create mode 100644 doc/guides/vdpadevs/mlx5.rst
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/meson.build
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
 create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 01/13] drivers: introduce mlx5 vDPA driver
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 02/13] vdpa/mlx5: support queues number operation Matan Azrad
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add a new driver to support vDPA operations by Mellanox devices.
The first Mellanox devices which support vDPA operations are
ConnectX6DX and Bluefield1 HCA for their PF ports and VF ports.
This driver is depending on rdma-core like the mlx5 PMD, also it is
going to use mlx5 DevX to create HW objects directly by the FW.
Hence, the common/mlx5 library is linked to the mlx5_vdpa driver.
This driver will not be compiled by default due to the above
dependencies.
Register a new log type for this driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 MAINTAINERS                                     |   7 +
 config/common_base                              |   5 +
 doc/guides/rel_notes/release_20_02.rst          |   5 +
 doc/guides/vdpadevs/features/mlx5.ini           |  14 ++
 doc/guides/vdpadevs/index.rst                   |   1 +
 doc/guides/vdpadevs/mlx5.rst                    | 111 +++++++++++
 drivers/common/Makefile                         |   2 +-
 drivers/common/mlx5/Makefile                    |  17 +-
 drivers/meson.build                             |   8 +-
 drivers/vdpa/Makefile                           |   2 +
 drivers/vdpa/meson.build                        |   3 +-
 drivers/vdpa/mlx5/Makefile                      |  51 ++++++
 drivers/vdpa/mlx5/meson.build                   |  33 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.c                   | 234 ++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 ++
 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
 mk/rte.app.mk                                   |  15 +-
 17 files changed, 514 insertions(+), 17 deletions(-)
 create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
 create mode 100644 doc/guides/vdpadevs/mlx5.rst
 create mode 100644 drivers/vdpa/mlx5/Makefile
 create mode 100644 drivers/vdpa/mlx5/meson.build
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
 create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
diff --git a/MAINTAINERS b/MAINTAINERS
index 150d507..f697e9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1103,6 +1103,13 @@ F: drivers/vdpa/ifc/
 F: doc/guides/vdpadevs/ifc.rst
 F: doc/guides/vdpadevs/features/ifcvf.ini
 
+Mellanox mlx5 vDPA
+M: Matan Azrad <matan@mellanox.com>
+M: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
+F: drivers/vdpa/mlx5/
+F: doc/guides/vdpadevs/mlx5.rst
+F: doc/guides/vdpadevs/features/mlx5.ini
+
 
 Eventdev Drivers
 ----------------
diff --git a/config/common_base b/config/common_base
index c897dd0..6ea9c63 100644
--- a/config/common_base
+++ b/config/common_base
@@ -366,6 +366,11 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
 
+#
+# Compile vdpa-oriented Mellanox ConnectX-6 & Bluefield (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=n
+
 # Linking method for mlx4/5 dependency on ibverbs and related libraries
 # Default linking is dynamic by linker.
 # Other options are: dynamic by dlopen at run-time, or statically embedded.
diff --git a/doc/guides/rel_notes/release_20_02.rst b/doc/guides/rel_notes/release_20_02.rst
index 50e2c14..690e7db 100644
--- a/doc/guides/rel_notes/release_20_02.rst
+++ b/doc/guides/rel_notes/release_20_02.rst
@@ -113,6 +113,11 @@ New Features
   * Added support for RSS using L3/L4 source/destination only.
   * Added support for matching on GTP tunnel header item.
 
+* **Add new vDPA PMD based on Mellanox devices**
+
+  Added a new Mellanox vDPA  (``mlx5_vdpa``) PMD.
+  See the :doc:`../vdpadevs/mlx5` guide for more details on this driver.
+
 * **Updated testpmd application.**
 
   Added support for ESP and L2TPv3 over IP rte_flow patterns to the testpmd
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
new file mode 100644
index 0000000..d635bdf
--- /dev/null
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'mlx5' VDPA driver.
+;
+; Refer to default.ini for the full list of available driver features.
+;
+[Features]
+Other kdrv           = Y
+ARMv8                = Y
+Power8               = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
+Design doc           = Y
+
diff --git a/doc/guides/vdpadevs/index.rst b/doc/guides/vdpadevs/index.rst
index 9657108..1a13efe 100644
--- a/doc/guides/vdpadevs/index.rst
+++ b/doc/guides/vdpadevs/index.rst
@@ -13,3 +13,4 @@ which can be used from an application through vhost API.
 
     features_overview
     ifc
+    mlx5
diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
new file mode 100644
index 0000000..ce7c8a7
--- /dev/null
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -0,0 +1,111 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2019 Mellanox Technologies, Ltd
+
+MLX5 vDPA driver
+================
+
+The MLX5 vDPA (vhost data path acceleration) driver library
+(**librte_pmd_mlx5_vdpa**) provides support for **Mellanox ConnectX-6**,
+**Mellanox ConnectX-6DX** and **Mellanox BlueField** families of
+10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in
+SR-IOV context.
+
+.. note::
+
+   Due to external dependencies, this driver is disabled in default
+   configuration of the "make" build. It can be enabled with
+   ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD=y`` or by using "meson" build system which
+   will detect dependencies.
+
+
+Design
+------
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel,
+combined with hardware specifications that allow to handle virtual memory
+addresses directly, ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+The PMD can use libibverbs and libmlx5 to access the device firmware
+or directly the hardware components.
+There are different levels of objects and bypassing abilities
+to get the best performances:
+
+- Verbs is a complete high-level generic API
+- Direct Verbs is a device-specific API
+- DevX allows to access firmware objects
+- Direct Rules manages flow steering at low-level hardware layer
+
+Enabling librte_pmd_mlx5_vdpa causes DPDK applications to be linked against
+libibverbs.
+
+A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5
+driver but not in parallel. Hence, the user should decide the driver by the
+``class`` parameter in the device argument list.
+By default, the mlx5 device will be probed by the net/mlx5 driver.
+
+Supported NICs
+--------------
+
+* Mellanox(R) ConnectX(R)-6 200G MCX654106A-HCAT (4x200G)
+* Mellanox(R) ConnectX(R)-6DX EN 100G MCX623106AN-CDAT (2*100G)
+* Mellanox(R) ConnectX(R)-6DX EN 200G MCX623105AN-VDAT (1*200G)
+* Mellanox(R) BlueField SmartNIC 25G MBF1M332A-ASCAT (2*25G)
+
+Prerequisites
+-------------
+
+- Mellanox OFED version: **4.7**
+  see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` (default **n**)
+
+  Toggle compilation of librte_pmd_mlx5 itself.
+
+- ``CONFIG_RTE_IBVERBS_LINK_DLOPEN`` (default **n**)
+
+  Build PMD with additional code to make it loadable without hard
+  dependencies on **libibverbs** nor **libmlx5**, which may not be installed
+  on the target system.
+
+  In this mode, their presence is still required for it to run properly,
+  however their absence won't prevent a DPDK application from starting (with
+  ``CONFIG_RTE_BUILD_SHARED_LIB`` disabled) and they won't show up as
+  missing with ``ldd(1)``.
+
+  It works by moving these dependencies to a purpose-built rdma-core "glue"
+  plug-in which must either be installed in a directory whose name is based
+  on ``CONFIG_RTE_EAL_PMD_PATH`` suffixed with ``-glue`` if set, or in a
+  standard location for the dynamic linker (e.g. ``/lib``) if left to the
+  default empty string (``""``).
+
+  This option has no performance impact.
+
+- ``CONFIG_RTE_IBVERBS_LINK_STATIC`` (default **n**)
+
+  Embed static flavor of the dependencies **libibverbs** and **libmlx5**
+  in the PMD shared library or the executable static binary.
+
+.. note::
+
+   For BlueField, target should be set to ``arm64-bluefield-linux-gcc``. This
+   will enable ``CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD`` and set
+   ``RTE_CACHE_LINE_SIZE`` to 64. Default armv8a configuration of make build and
+   meson build set it to 128 then brings performance degradation.
+
+Run-time configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- **ethtool** operations on related kernel interfaces also affect the PMD.
+
+- ``class`` parameter [string]
+
+  Select the class of the driver that should probe the device.
+  `vdpa` for the mlx5 vDPA driver.
+
diff --git a/drivers/common/Makefile b/drivers/common/Makefile
index 4775d4b..96bd7ac 100644
--- a/drivers/common/Makefile
+++ b/drivers/common/Makefile
@@ -35,7 +35,7 @@ ifneq (,$(findstring y,$(IAVF-y)))
 DIRS-y += iavf
 endif
 
-ifeq ($(CONFIG_RTE_LIBRTE_MLX5_PMD),y)
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
 DIRS-y += mlx5
 endif
 
diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index 624d331..f32933d 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -10,15 +10,16 @@ LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
 LIB_GLUE_VERSION = 20.02.0
 
 # Sources.
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
 ifneq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_glue.c
+SRCS-y += mlx5_glue.c
 endif
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_devx_cmds.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_common.c
-SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_nl.c
-
+SRCS-y += mlx5_devx_cmds.c
+SRCS-y += mlx5_common.c
+SRCS-y += mlx5_nl.c
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-INSTALL-$(CONFIG_RTE_LIBRTE_MLX5_PMD)-lib += $(LIB_GLUE)
+INSTALL-y-lib += $(LIB_GLUE)
+endif
 endif
 
 # Basic CFLAGS.
@@ -317,7 +318,9 @@ mlx5_autoconf.h: mlx5_autoconf.h.new
 		cmp '$<' '$@' $(AUTOCONF_OUTPUT) || \
 		mv '$<' '$@'
 
-$(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+$(SRCS-y:.c=.o): mlx5_autoconf.h
+endif
 
 # Generate dependency plug-in for rdma-core when the PMD must not be linked
 # directly, so that applications do not inherit this dependency.
diff --git a/drivers/meson.build b/drivers/meson.build
index 29708cc..bd154fa 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -42,6 +42,7 @@ foreach class:dpdk_driver_classes
 		build = true # set to false to disable, e.g. missing deps
 		reason = '<unknown reason>' # set if build == false to explain
 		name = drv
+		fmt_name = ''
 		allow_experimental_apis = false
 		sources = []
 		objs = []
@@ -98,8 +99,11 @@ foreach class:dpdk_driver_classes
 		else
 			class_drivers += name
 
-			dpdk_conf.set(config_flag_fmt.format(name.to_upper()),1)
-			lib_name = driver_name_fmt.format(name)
+			if fmt_name == ''
+				fmt_name = name
+			endif
+			dpdk_conf.set(config_flag_fmt.format(fmt_name.to_upper()),1)
+			lib_name = driver_name_fmt.format(fmt_name)
 
 			if allow_experimental_apis
 				cflags += '-DALLOW_EXPERIMENTAL_API'
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index b5a7a11..6e88359 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -7,4 +7,6 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_IFC_PMD) += ifc
 endif
 
+DIRS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/vdpa/meson.build b/drivers/vdpa/meson.build
index 2f047b5..e3ed54a 100644
--- a/drivers/vdpa/meson.build
+++ b/drivers/vdpa/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2019 Mellanox Technologies, Ltd
 
-drivers = ['ifc']
+drivers = ['ifc',
+	   'mlx5',]
 std_deps = ['bus_pci', 'kvargs']
 std_deps += ['vhost']
 config_flag_fmt = 'RTE_LIBRTE_@0@_PMD'
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
new file mode 100644
index 0000000..1ab5296
--- /dev/null
+++ b/drivers/vdpa/mlx5/Makefile
@@ -0,0 +1,51 @@
+#   SPDX-License-Identifier: BSD-3-Clause
+#   Copyright 2019 Mellanox Technologies, Ltd
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name.
+LIB = librte_pmd_mlx5_vdpa.a
+
+# Sources.
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=c11 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
+CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
+CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+LDLIBS += -lrte_common_mlx5
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual
+
+EXPORT_MAP := rte_pmd_mlx5_vdpa_version.map
+# memseg walk is not part of stable API
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+# DEBUG which is usually provided on the command-line may enable
+# CONFIG_RTE_LIBRTE_MLX5_DEBUG.
+ifeq ($(DEBUG),1)
+CONFIG_RTE_LIBRTE_MLX5_DEBUG := y
+endif
+
+# User-defined CFLAGS.
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DEBUG),y)
+CFLAGS += -pedantic
+ifneq ($(CONFIG_RTE_TOOLCHAIN_ICC),y)
+CFLAGS += -DPEDANTIC
+endif
+AUTO_CONFIG_CFLAGS += -Wno-pedantic
+else
+CFLAGS += -UPEDANTIC
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
new file mode 100644
index 0000000..6d3ab98
--- /dev/null
+++ b/drivers/vdpa/mlx5/meson.build
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2019 Mellanox Technologies, Ltd
+
+if not is_linux
+	build = false
+	reason = 'only supported on Linux'
+	subdir_done()
+endif
+
+fmt_name = 'mlx5_vdpa'
+allow_experimental_apis = true
+deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
+sources = files(
+	'mlx5_vdpa.c',
+)
+cflags_options = [
+	'-std=c11',
+	'-Wno-strict-prototypes',
+	'-D_BSD_SOURCE',
+	'-D_DEFAULT_SOURCE',
+	'-D_XOPEN_SOURCE=600'
+]
+foreach option:cflags_options
+	if cc.has_argument(option)
+		cflags += option
+	endif
+endforeach
+
+if get_option('buildtype').contains('debug')
+	cflags += [ '-pedantic', '-DPEDANTIC' ]
+else
+	cflags += [ '-UPEDANTIC' ]
+endif
\ No newline at end of file
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
new file mode 100644
index 0000000..80204b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -0,0 +1,234 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_errno.h>
+#include <rte_bus_pci.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <rte_vdpa.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+
+
+struct mlx5_vdpa_priv {
+	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	int id; /* vDPA device id. */
+	struct ibv_context *ctx; /* Device context. */
+	struct rte_vdpa_dev_addr dev_addr;
+};
+
+TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
+					      TAILQ_HEAD_INITIALIZER(priv_list);
+static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
+int mlx5_vdpa_logtype;
+
+static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
+	.get_queue_num = NULL,
+	.get_features = NULL,
+	.get_protocol_features = NULL,
+	.dev_conf = NULL,
+	.dev_close = NULL,
+	.set_vring_state = NULL,
+	.set_features = NULL,
+	.migration_done = NULL,
+	.get_vfio_group_fd = NULL,
+	.get_vfio_device_fd = NULL,
+	.get_notify_area = NULL,
+};
+
+/**
+ * DPDK callback to register a PCI device.
+ *
+ * This function spawns vdpa device out of a given PCI device.
+ *
+ * @param[in] pci_drv
+ *   PCI driver structure (mlx5_vpda_driver).
+ * @param[in] pci_dev
+ *   PCI device information.
+ *
+ * @return
+ *   0 on success, 1 to skip this driver, a negative errno value otherwise
+ *   and rte_errno is set.
+ */
+static int
+mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		    struct rte_pci_device *pci_dev __rte_unused)
+{
+	struct ibv_device **ibv_list;
+	struct ibv_device *ibv_match = NULL;
+	struct mlx5_vdpa_priv *priv = NULL;
+	struct ibv_context *ctx = NULL;
+	int ret;
+
+	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_VDPA) {
+		DRV_LOG(DEBUG, "Skip probing - should be probed by other mlx5"
+			" driver.");
+		return 1;
+	}
+	errno = 0;
+	ibv_list = mlx5_glue->get_device_list(&ret);
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
+		return -rte_errno;
+	}
+	while (ret-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
+		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
+			continue;
+		if (pci_dev->addr.domain != pci_addr.domain ||
+		    pci_dev->addr.bus != pci_addr.bus ||
+		    pci_dev->addr.devid != pci_addr.devid ||
+		    pci_dev->addr.function != pci_addr.function)
+			continue;
+		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
+			ibv_list[ret]->name);
+		ibv_match = ibv_list[ret];
+		break;
+	}
+	mlx5_glue->free_device_list(ibv_list);
+	if (!ibv_match) {
+		DRV_LOG(ERR, "No matching IB device for PCI slot "
+			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		return -rte_errno;
+	}
+	ctx = mlx5_glue->dv_open_device(ibv_match);
+	if (!ctx) {
+		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
+			ibv_match->name);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv),
+			   RTE_CACHE_LINE_SIZE);
+	if (!priv) {
+		DRV_LOG(ERR, "Failed to allocate private memory.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	priv->ctx = ctx;
+	priv->dev_addr.pci_addr = pci_dev->addr;
+	priv->dev_addr.type = PCI_ADDR;
+	priv->id = rte_vdpa_register_device(&priv->dev_addr, &mlx5_vdpa_ops);
+	if (priv->id < 0) {
+		DRV_LOG(ERR, "Failed to register vDPA device.");
+		rte_errno = rte_errno ? rte_errno : EINVAL;
+		goto error;
+	}
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_INSERT_TAIL(&priv_list, priv, next);
+	pthread_mutex_unlock(&priv_list_lock);
+	return 0;
+
+error:
+	if (priv)
+		rte_free(priv);
+	if (ctx)
+		mlx5_glue->close_device(ctx);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to remove a PCI device.
+ *
+ * This function removes all vDPA devices belong to a given PCI device.
+ *
+ * @param[in] pci_dev
+ *   Pointer to the PCI device.
+ *
+ * @return
+ *   0 on success, the function cannot fail.
+ */
+static int
+mlx5_vdpa_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct mlx5_vdpa_priv *priv = NULL;
+	int found = 0;
+
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_FOREACH(priv, &priv_list, next) {
+		if (memcmp(&priv->dev_addr.pci_addr, &pci_dev->addr,
+			   sizeof(pci_dev->addr)) == 0) {
+			found = 1;
+			break;
+		}
+	}
+	if (found) {
+		TAILQ_REMOVE(&priv_list, priv, next);
+		mlx5_glue->close_device(priv->ctx);
+		rte_free(priv);
+	}
+	pthread_mutex_unlock(&priv_list_lock);
+	return 0;
+}
+
+static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+			       PCI_DEVICE_ID_MELLANOX_CONNECTX5BF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+			       PCI_DEVICE_ID_MELLANOX_CONNECTX5BFVF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6VF)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6DX)
+	},
+	{
+		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+				PCI_DEVICE_ID_MELLANOX_CONNECTX6DXVF)
+	},
+	{
+		.vendor_id = 0
+	}
+};
+
+static struct rte_pci_driver mlx5_vdpa_driver = {
+	.driver = {
+		.name = "mlx5_vdpa",
+	},
+	.id_table = mlx5_vdpa_pci_id_map,
+	.probe = mlx5_vdpa_pci_probe,
+	.remove = mlx5_vdpa_pci_remove,
+	.drv_flags = 0,
+};
+
+/**
+ * Driver initialization routine.
+ */
+RTE_INIT(rte_mlx5_vdpa_init)
+{
+	/* Initialize common log type. */
+	mlx5_vdpa_logtype = rte_log_register("pmd.vdpa.mlx5");
+	if (mlx5_vdpa_logtype >= 0)
+		rte_log_set_level(mlx5_vdpa_logtype, RTE_LOG_NOTICE);
+	if (mlx5_glue)
+		rte_pci_register(&mlx5_vdpa_driver);
+}
+
+RTE_PMD_EXPORT_NAME(net_mlx5_vdpa, __COUNTER__);
+RTE_PMD_REGISTER_PCI_TABLE(net_mlx5_vdpa, mlx5_vdpa_pci_id_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_mlx5_vdpa, "* ib_uverbs & mlx5_core & mlx5_ib");
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_utils.h b/drivers/vdpa/mlx5/mlx5_vdpa_utils.h
new file mode 100644
index 0000000..a239df9
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_utils.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_VDPA_UTILS_H_
+#define RTE_PMD_MLX5_VDPA_UTILS_H_
+
+#include <mlx5_common.h>
+
+
+extern int mlx5_vdpa_logtype;
+
+#define MLX5_VDPA_LOG_PREFIX "mlx5_vdpa"
+/* Generic printf()-like logging macro with automatic line feed. */
+#define DRV_LOG(level, ...) \
+	PMD_DRV_LOG_(level, mlx5_vdpa_logtype, MLX5_VDPA_LOG_PREFIX, \
+		__VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+		PMD_DRV_LOG_CPAREN)
+
+#endif /* RTE_PMD_MLX5_VDPA_UTILS_H_ */
diff --git a/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map b/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
new file mode 100644
index 0000000..143836e
--- /dev/null
+++ b/drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
@@ -0,0 +1,3 @@
+DPDK_20.02 {
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 45f4cad..b33cd8a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -196,18 +196,21 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF)      += -lrte_pmd_memif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_common_mlx5
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+_LDLIBS-y                                   += -lrte_common_mlx5
+endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)  += -lrte_pmd_mlx5_vdpa
 ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -ldl
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -ldl
+_LDLIBS-y                                   += -ldl
 else ifeq ($(CONFIG_RTE_IBVERBS_LINK_STATIC),y)
 LIBS_IBVERBS_STATIC = $(shell $(RTE_SDK)/buildtools/options-ibverbs-static.sh)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += $(LIBS_IBVERBS_STATIC)
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += $(LIBS_IBVERBS_STATIC)
+_LDLIBS-y                                   += $(LIBS_IBVERBS_STATIC)
 else
+ifeq ($(findstring y,$(CONFIG_RTE_LIBRTE_MLX5_PMD)$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD)),y)
+_LDLIBS-y                                   += -libverbs -lmlx5
+endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -libverbs -lmlx4
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -libverbs -lmlx5
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD)      += -lrte_pmd_mvpp2
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD)     += -lrte_pmd_mvneta
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 02/13] vdpa/mlx5: support queues number operation
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 01/13] drivers: introduce " Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 03/13] vdpa/mlx5: support features get operations Matan Azrad
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Support get_queue_num operation to get the maximum number of queues
supported by the device.
This number comes from the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 80204b3..5246fd2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -15,6 +15,7 @@
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
 
 #include "mlx5_vdpa_utils.h"
 
@@ -24,6 +25,7 @@ struct mlx5_vdpa_priv {
 	int id; /* vDPA device id. */
 	struct ibv_context *ctx; /* Device context. */
 	struct rte_vdpa_dev_addr dev_addr;
+	struct mlx5_hca_vdpa_attr caps;
 };
 
 TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
@@ -31,8 +33,43 @@ struct mlx5_vdpa_priv {
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 int mlx5_vdpa_logtype;
 
+static struct mlx5_vdpa_priv *
+mlx5_vdpa_find_priv_resource_by_did(int did)
+{
+	struct mlx5_vdpa_priv *priv;
+	int found = 0;
+
+	pthread_mutex_lock(&priv_list_lock);
+	TAILQ_FOREACH(priv, &priv_list, next) {
+		if (did == priv->id) {
+			found = 1;
+			break;
+		}
+	}
+	pthread_mutex_unlock(&priv_list_lock);
+	if (!found) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	return priv;
+}
+
+static int
+mlx5_vdpa_get_queue_num(int did, uint32_t *queue_num)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*queue_num = priv->caps.max_num_virtio_queues;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
-	.get_queue_num = NULL,
+	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = NULL,
 	.get_protocol_features = NULL,
 	.dev_conf = NULL,
@@ -67,6 +104,7 @@ struct mlx5_vdpa_priv {
 	struct ibv_device *ibv_match = NULL;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx = NULL;
+	struct mlx5_hca_attr attr;
 	int ret;
 
 	if (mlx5_class_get(pci_dev->device.devargs) != MLX5_CLASS_VDPA) {
@@ -120,6 +158,20 @@ struct mlx5_vdpa_priv {
 		rte_errno = ENOMEM;
 		goto error;
 	}
+	ret = mlx5_devx_cmd_query_hca_attr(ctx, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to read HCA capabilities.");
+		rte_errno = ENOTSUP;
+		goto error;
+	} else {
+		if (!attr.vdpa.valid || !attr.vdpa.max_num_virtio_queues) {
+			DRV_LOG(ERR, "Not enough capabilities to support vdpa,"
+				" maybe old FW/OFED version?");
+			rte_errno = ENOTSUP;
+			goto error;
+		}
+		priv->caps = attr.vdpa;
+	}
 	priv->ctx = ctx;
 	priv->dev_addr.pci_addr = pci_dev->addr;
 	priv->dev_addr.type = PCI_ADDR;
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 03/13] vdpa/mlx5: support features get operations
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 01/13] drivers: introduce " Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 02/13] vdpa/mlx5: support queues number operation Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
                       ` (11 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for get_features and get_protocol_features operations.
Part of the features are reported by the DevX capabilities.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |  7 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 66 +++++++++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index d635bdf..fea491d 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -4,6 +4,13 @@
 ; Refer to default.ini for the full list of available driver features.
 ;
 [Features]
+
+any layout           = Y
+guest announce       = Y
+mq                   = Y
+proto mq             = Y
+proto log shmfd      = Y
+proto host notifier  = Y
 Other kdrv           = Y
 ARMv8                = Y
 Power8               = Y
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 5246fd2..00d3a19 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
+#include <linux/virtio_net.h>
+
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
@@ -16,6 +18,7 @@
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
 
 #include "mlx5_vdpa_utils.h"
 
@@ -28,6 +31,27 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 };
 
+#ifndef VIRTIO_F_ORDER_PLATFORM
+#define VIRTIO_F_ORDER_PLATFORM 36
+#endif
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
+#define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
+			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+			    (1ULL << VIRTIO_NET_F_MQ) | \
+			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
+			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
+
+#define MLX5_VDPA_PROTOCOL_FEATURES \
+			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) | \
+			     (1ULL << VHOST_USER_PROTOCOL_F_MQ))
+
 TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -68,10 +92,48 @@ struct mlx5_vdpa_priv {
 	return 0;
 }
 
+static int
+mlx5_vdpa_get_vdpa_features(int did, uint64_t *features)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*features = MLX5_VDPA_DEFAULT_FEATURES;
+	if (priv->caps.virtio_queue_type & (1 << MLX5_VIRTQ_TYPE_PACKED))
+		*features |= (1ULL << VIRTIO_F_RING_PACKED);
+	if (priv->caps.tso_ipv4)
+		*features |= (1ULL << VIRTIO_NET_F_HOST_TSO4);
+	if (priv->caps.tso_ipv6)
+		*features |= (1ULL << VIRTIO_NET_F_HOST_TSO6);
+	if (priv->caps.tx_csum)
+		*features |= (1ULL << VIRTIO_NET_F_CSUM);
+	if (priv->caps.rx_csum)
+		*features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+	if (priv->caps.virtio_version_1_0)
+		*features |= (1ULL << VIRTIO_F_VERSION_1);
+	return 0;
+}
+
+static int
+mlx5_vdpa_get_protocol_features(int did, uint64_t *features)
+{
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	*features = MLX5_VDPA_PROTOCOL_FEATURES;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
-	.get_features = NULL,
-	.get_protocol_features = NULL,
+	.get_features = mlx5_vdpa_get_vdpa_features,
+	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = NULL,
 	.dev_close = NULL,
 	.set_vring_state = NULL,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 04/13] vdpa/mlx5: prepare memory regions
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (2 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 03/13] vdpa/mlx5: support features get operations Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
                       ` (10 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
In order to map the guest physical addresses used by the virtio device
guest side to the host physical addresses used by the HW as the host
side, memory regions are created.
By this way, for example, the HW can translate the addresses of the
packets posted by the guest and to take the packets from the correct
place.
The design is to work with single MR which will be configured to the
virtio queues in the HW, hence a lot of direct MRs are grouped to single
indirect MR.
Create functions to prepare and release MRs with all the related
resources that are required for it.
Create a new file mlx5_vdpa_mem.c to manage all the MR related code
in the driver.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/Makefile        |   4 +-
 drivers/vdpa/mlx5/meson.build     |   5 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c     |  17 +-
 drivers/vdpa/mlx5/mlx5_vdpa.h     |  66 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 346 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 420 insertions(+), 18 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 1ab5296..bceab1e 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -8,6 +8,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
@@ -15,6 +16,7 @@ CFLAGS += -std=c11 -Wall -Wextra
 CFLAGS += -g
 CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
 CFLAGS += -I$(RTE_SDK)/drivers/net/mlx5_vdpa
+CFLAGS += -I$(RTE_SDK)/lib/librte_sched
 CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
 CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
@@ -22,7 +24,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 LDLIBS += -lrte_common_mlx5
-LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci -lrte_sched
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 6d3ab98..47f9537 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -9,9 +9,10 @@ endif
 
 fmt_name = 'mlx5_vdpa'
 allow_experimental_apis = true
-deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal']
+deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
+	'mlx5_vdpa_mem.c',
 )
 cflags_options = [
 	'-std=c11',
@@ -30,4 +31,4 @@ if get_option('buildtype').contains('debug')
 	cflags += [ '-pedantic', '-DPEDANTIC' ]
 else
 	cflags += [ '-UPEDANTIC' ]
-endif
\ No newline at end of file
+endif
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 00d3a19..16107cf 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -7,13 +7,6 @@
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic ignored "-Wpedantic"
-#endif
-#include <rte_vdpa.h>
-#ifdef PEDANTIC
-#pragma GCC diagnostic error "-Wpedantic"
-#endif
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -21,16 +14,9 @@
 #include <mlx5_prm.h>
 
 #include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
 
 
-struct mlx5_vdpa_priv {
-	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	int id; /* vDPA device id. */
-	struct ibv_context *ctx; /* Device context. */
-	struct rte_vdpa_dev_addr dev_addr;
-	struct mlx5_hca_vdpa_attr caps;
-};
-
 #ifndef VIRTIO_F_ORDER_PLATFORM
 #define VIRTIO_F_ORDER_PLATFORM 36
 #endif
@@ -243,6 +229,7 @@ struct mlx5_vdpa_priv {
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
+	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
new file mode 100644
index 0000000..f367991
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+
+#ifndef RTE_PMD_MLX5_VDPA_H_
+#define RTE_PMD_MLX5_VDPA_H_
+
+#include <sys/queue.h>
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <rte_vdpa.h>
+#include <rte_vhost.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <mlx5_glue.h>
+#include <mlx5_devx_cmds.h>
+
+struct mlx5_vdpa_query_mr {
+	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
+	void *addr;
+	uint64_t length;
+	struct mlx5dv_devx_umem *umem;
+	struct mlx5_devx_obj *mkey;
+	int is_indirect;
+};
+
+struct mlx5_vdpa_priv {
+	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	int id; /* vDPA device id. */
+	int vid; /* vhost device id. */
+	struct ibv_context *ctx; /* Device context. */
+	struct rte_vdpa_dev_addr dev_addr;
+	struct mlx5_hca_vdpa_attr caps;
+	uint32_t pdn; /* Protection Domain number. */
+	struct ibv_pd *pd;
+	uint32_t gpa_mkey_index;
+	struct ibv_mr *null_mr;
+	struct rte_vhost_memory *vmem;
+	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+};
+
+/**
+ * Release all the prepared memory regions and all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Register all the memory regions of the virtio device to the HW and allocate
+ * all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
+
+#endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
new file mode 100644
index 0000000..398ca35
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -0,0 +1,346 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <stdlib.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_common.h>
+#include <rte_sched_common.h>
+
+#include <mlx5_prm.h>
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static int
+mlx5_vdpa_pd_prepare(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->pd)
+		return 0;
+	priv->pd = mlx5_glue->alloc_pd(priv->ctx);
+	if (priv->pd == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PD.");
+		return errno ? -errno : -ENOMEM;
+	}
+	struct mlx5dv_obj obj;
+	struct mlx5dv_pd pd_info;
+	int ret = 0;
+
+	obj.pd.in = priv->pd;
+	obj.pd.out = &pd_info;
+	ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
+	if (ret) {
+		DRV_LOG(ERR, "Fail to get PD object info.");
+		mlx5_glue->dealloc_pd(priv->pd);
+		priv->pd = NULL;
+		return -errno;
+	}
+	priv->pdn = pd_info.pdn;
+	return 0;
+#else
+	(void)priv;
+	DRV_LOG(ERR, "Cannot get pdn - no DV support.");
+	return -ENOTSUP;
+#endif /* HAVE_IBV_FLOW_DV_SUPPORT */
+}
+
+void
+mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_query_mr *entry;
+	struct mlx5_vdpa_query_mr *next;
+
+	entry = SLIST_FIRST(&priv->mr_list);
+	while (entry) {
+		next = SLIST_NEXT(entry, next);
+		claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
+		if (!entry->is_indirect)
+			claim_zero(mlx5_glue->devx_umem_dereg(entry->umem));
+		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
+		rte_free(entry);
+		entry = next;
+	}
+	SLIST_INIT(&priv->mr_list);
+	if (priv->null_mr) {
+		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
+		priv->null_mr = NULL;
+	}
+	if (priv->pd) {
+		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
+		priv->pd = NULL;
+	}
+	if (priv->vmem) {
+		free(priv->vmem);
+		priv->vmem = NULL;
+	}
+}
+
+static int
+mlx5_vdpa_regions_addr_cmp(const void *a, const void *b)
+{
+	const struct rte_vhost_mem_region *region_a = a;
+	const struct rte_vhost_mem_region *region_b = b;
+
+	if (region_a->guest_phys_addr < region_b->guest_phys_addr)
+		return -1;
+	if (region_a->guest_phys_addr > region_b->guest_phys_addr)
+		return 1;
+	return 0;
+}
+
+#define KLM_NUM_MAX_ALIGN(sz) (RTE_ALIGN_CEIL(sz, MLX5_MAX_KLM_BYTE_COUNT) / \
+			       MLX5_MAX_KLM_BYTE_COUNT)
+
+/*
+ * Allocate and sort the region list and choose indirect mkey mode:
+ *   1. Calculate GCD, guest memory size and indirect mkey entries num per mode.
+ *   2. Align GCD to the maximum allowed size(2G) and to be power of 2.
+ *   2. Decide the indirect mkey mode according to the next rules:
+ *         a. If both KLM_FBS entries number and KLM entries number are bigger
+ *            than the maximum allowed(MLX5_DEVX_MAX_KLM_ENTRIES) - error.
+ *         b. KLM mode if KLM_FBS entries number is bigger than the maximum
+ *            allowed(MLX5_DEVX_MAX_KLM_ENTRIES).
+ *         c. KLM mode if GCD is smaller than the minimum allowed(4K).
+ *         d. KLM mode if the total size of KLM entries is in one cache line
+ *            and the total size of KLM_FBS entries is not in one cache line.
+ *         e. Otherwise, KLM_FBS mode.
+ */
+static struct rte_vhost_memory *
+mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
+				    uint64_t *gcd, uint32_t *entries_num)
+{
+	struct rte_vhost_memory *mem;
+	uint64_t size;
+	uint64_t klm_entries_num = 0;
+	uint64_t klm_fbs_entries_num;
+	uint32_t i;
+	int ret = rte_vhost_get_mem_table(vid, &mem);
+
+	if (ret < 0) {
+		DRV_LOG(ERR, "Failed to get VM memory layout vid =%d.", vid);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	qsort(mem->regions, mem->nregions, sizeof(mem->regions[0]),
+	      mlx5_vdpa_regions_addr_cmp);
+	*mem_size = (mem->regions[(mem->nregions - 1)].guest_phys_addr) +
+				      (mem->regions[(mem->nregions - 1)].size) -
+					      (mem->regions[0].guest_phys_addr);
+	*gcd = 0;
+	for (i = 0; i < mem->nregions; ++i) {
+		DRV_LOG(INFO,  "Region %u: HVA 0x%" PRIx64 ", GPA 0x%" PRIx64
+			", size 0x%" PRIx64 ".", i,
+			mem->regions[i].host_user_addr,
+			mem->regions[i].guest_phys_addr, mem->regions[i].size);
+		if (i > 0) {
+			/* Hole handle. */
+			size = mem->regions[i].guest_phys_addr -
+				(mem->regions[i - 1].guest_phys_addr +
+				 mem->regions[i - 1].size);
+			*gcd = rte_get_gcd(*gcd, size);
+			klm_entries_num += KLM_NUM_MAX_ALIGN(size);
+		}
+		size = mem->regions[i].size;
+		*gcd = rte_get_gcd(*gcd, size);
+		klm_entries_num += KLM_NUM_MAX_ALIGN(size);
+	}
+	if (*gcd > MLX5_MAX_KLM_BYTE_COUNT)
+		*gcd = rte_get_gcd(*gcd, MLX5_MAX_KLM_BYTE_COUNT);
+	if (!RTE_IS_POWER_OF_2(*gcd)) {
+		uint64_t candidate_gcd = rte_align64prevpow2(*gcd);
+
+		while (candidate_gcd > 1 && (*gcd % candidate_gcd))
+			candidate_gcd /= 2;
+		DRV_LOG(DEBUG, "GCD 0x%" PRIx64 " is not power of 2. Adjusted "
+			"GCD is 0x%" PRIx64 ".", *gcd, candidate_gcd);
+		*gcd = candidate_gcd;
+	}
+	klm_fbs_entries_num = *mem_size / *gcd;
+	if (*gcd < MLX5_MIN_KLM_FIXED_BUFFER_SIZE || klm_fbs_entries_num >
+	    MLX5_DEVX_MAX_KLM_ENTRIES ||
+	    ((klm_entries_num * sizeof(struct mlx5_klm)) <=
+	    RTE_CACHE_LINE_SIZE && (klm_fbs_entries_num *
+				    sizeof(struct mlx5_klm)) >
+							RTE_CACHE_LINE_SIZE)) {
+		*mode = MLX5_MKC_ACCESS_MODE_KLM;
+		*entries_num = klm_entries_num;
+		DRV_LOG(INFO, "Indirect mkey mode is KLM.");
+	} else {
+		*mode = MLX5_MKC_ACCESS_MODE_KLM_FBS;
+		*entries_num = klm_fbs_entries_num;
+		DRV_LOG(INFO, "Indirect mkey mode is KLM Fixed Buffer Size.");
+	}
+	DRV_LOG(DEBUG, "Memory registration information: nregions = %u, "
+		"mem_size = 0x%" PRIx64 ", GCD = 0x%" PRIx64
+		", klm_fbs_entries_num = 0x%" PRIx64 ", klm_entries_num = 0x%"
+		PRIx64 ".", mem->nregions, *mem_size, *gcd, klm_fbs_entries_num,
+		klm_entries_num);
+	if (*entries_num > MLX5_DEVX_MAX_KLM_ENTRIES) {
+		DRV_LOG(ERR, "Failed to prepare memory of vid %d - memory is "
+			"too fragmented.", vid);
+		free(mem);
+		return NULL;
+	}
+	return mem;
+}
+
+#define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
+				MLX5_MAX_KLM_BYTE_COUNT : (sz))
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_mkey_attr mkey_attr;
+	struct mlx5_vdpa_query_mr *entry = NULL;
+	struct rte_vhost_mem_region *reg = NULL;
+	uint8_t mode;
+	uint32_t entries_num = 0;
+	uint32_t i;
+	uint64_t gcd;
+	uint64_t klm_size;
+	uint64_t mem_size;
+	uint64_t k;
+	int klm_index = 0;
+	int ret;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
+	struct mlx5_klm klm_array[entries_num];
+
+	if (!mem)
+		return -rte_errno;
+	priv->vmem = mem;
+	ret = mlx5_vdpa_pd_prepare(priv);
+	if (ret)
+		goto error;
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		ret = -errno;
+		goto error;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
+		if (!entry) {
+			ret = -ENOMEM;
+			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
+			goto error;
+		}
+		entry->umem = mlx5_glue->devx_umem_reg(priv->ctx,
+					 (void *)(uintptr_t)reg->host_user_addr,
+					     reg->size, IBV_ACCESS_LOCAL_WRITE);
+		if (!entry->umem) {
+			DRV_LOG(ERR, "Failed to register Umem by Devx.");
+			ret = -errno;
+			goto error;
+		}
+		mkey_attr.addr = (uintptr_t)(reg->guest_phys_addr);
+		mkey_attr.size = reg->size;
+		mkey_attr.umem_id = entry->umem->umem_id;
+		mkey_attr.pd = priv->pdn;
+		mkey_attr.pg_access = 1;
+		mkey_attr.klm_array = NULL;
+		mkey_attr.klm_num = 0;
+		entry->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+		if (!entry->mkey) {
+			DRV_LOG(ERR, "Failed to create direct Mkey.");
+			ret = -rte_errno;
+			goto error;
+		}
+		entry->addr = (void *)(uintptr_t)(reg->host_user_addr);
+		entry->length = reg->size;
+		entry->is_indirect = 0;
+		if (i > 0) {
+			uint64_t sadd;
+			uint64_t empty_region_sz = reg->guest_phys_addr -
+					  (mem->regions[i - 1].guest_phys_addr +
+					   mem->regions[i - 1].size);
+
+			if (empty_region_sz > 0) {
+				sadd = mem->regions[i - 1].guest_phys_addr +
+				       mem->regions[i - 1].size;
+				klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
+				      KLM_SIZE_MAX_ALIGN(empty_region_sz) : gcd;
+				for (k = 0; k < empty_region_sz;
+				     k += klm_size) {
+					klm_array[klm_index].byte_count =
+						k + klm_size > empty_region_sz ?
+						 empty_region_sz - k : klm_size;
+					klm_array[klm_index].mkey =
+							    priv->null_mr->lkey;
+					klm_array[klm_index].address = sadd + k;
+					klm_index++;
+				}
+			}
+		}
+		klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
+					    KLM_SIZE_MAX_ALIGN(reg->size) : gcd;
+		for (k = 0; k < reg->size; k += klm_size) {
+			klm_array[klm_index].byte_count = k + klm_size >
+					   reg->size ? reg->size - k : klm_size;
+			klm_array[klm_index].mkey = entry->mkey->id;
+			klm_array[klm_index].address = reg->guest_phys_addr + k;
+			klm_index++;
+		}
+		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
+	}
+	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
+	mkey_attr.size = mem_size;
+	mkey_attr.pd = priv->pdn;
+	mkey_attr.umem_id = 0;
+	/* Must be zero for KLM mode. */
+	mkey_attr.log_entity_size = mode == MLX5_MKC_ACCESS_MODE_KLM_FBS ?
+							  rte_log2_u64(gcd) : 0;
+	mkey_attr.pg_access = 0;
+	mkey_attr.klm_array = klm_array;
+	mkey_attr.klm_num = klm_index;
+	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
+	if (!entry) {
+		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
+		ret = -ENOMEM;
+		goto error;
+	}
+	entry->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+	if (!entry->mkey) {
+		DRV_LOG(ERR, "Failed to create indirect Mkey.");
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->is_indirect = 1;
+	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
+	priv->gpa_mkey_index = entry->mkey->id;
+	return 0;
+error:
+	if (entry) {
+		if (entry->mkey)
+			mlx5_devx_cmd_destroy(entry->mkey);
+		if (entry->umem)
+			mlx5_glue->devx_umem_dereg(entry->umem);
+		rte_free(entry);
+	}
+	mlx5_vdpa_mem_dereg(priv);
+	rte_errno = -ret;
+	return ret;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 05/13] vdpa/mlx5: prepare HW queues
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (3 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
As an arrangement to the vitrio queues creation, a 2 QPs and CQ may be
created for the virtio queue.
The design is to trigger an event for the guest and for the vdpa driver
when a new CQE is posted by the HW after the packet transition.
This patch add the basic operations to create and destroy the above HW
objects  and to trigger the CQE events when a new CQE is posted.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_prm.h      |   4 +
 drivers/vdpa/mlx5/Makefile          |   1 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  89 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 400 ++++++++++++++++++++++++++++++++++++
 5 files changed, 495 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 15940c4..855b37a 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -391,6 +391,10 @@ struct mlx5_cqe {
 /* CQE format value. */
 #define MLX5_COMPRESSED 0x3
 
+/* CQ doorbell cmd types. */
+#define MLX5_CQ_DBR_CMD_SOL_ONLY (1 << 24)
+#define MLX5_CQ_DBR_CMD_ALL (0 << 24)
+
 /* Action type of header modification. */
 enum {
 	MLX5_MODIFICATION_TYPE_SET = 0x1,
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index bceab1e..086af1b 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -9,6 +9,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 47f9537..3da0d76 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -13,6 +13,7 @@ deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
+	'mlx5_vdpa_event.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f367991..6282635 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -15,9 +15,40 @@
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
+#include <rte_spinlock.h>
+#include <rte_interrupts.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
+#include <mlx5_prm.h>
+
+
+#define MLX5_VDPA_INTR_RETRIES 256
+#define MLX5_VDPA_INTR_RETRIES_USEC 1000
+
+struct mlx5_vdpa_cq {
+	uint16_t log_desc_n;
+	uint32_t cq_ci:24;
+	uint32_t arm_sn:2;
+	rte_spinlock_t sl;
+	struct mlx5_devx_obj *cq;
+	struct mlx5dv_devx_umem *umem_obj;
+	union {
+		volatile void *umem_buf;
+		volatile struct mlx5_cqe *cqes;
+	};
+	volatile uint32_t *db_rec;
+	uint64_t errors;
+};
+
+struct mlx5_vdpa_event_qp {
+	struct mlx5_vdpa_cq cq;
+	struct mlx5_devx_obj *fw_qp;
+	struct mlx5_devx_obj *sw_qp;
+	struct mlx5dv_devx_umem *umem_obj;
+	void *umem_buf;
+	volatile uint32_t *db_rec;
+};
 
 struct mlx5_vdpa_query_mr {
 	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
@@ -40,6 +71,10 @@ struct mlx5_vdpa_priv {
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
 	struct rte_vhost_memory *vmem;
+	uint32_t eqn;
+	struct mlx5dv_devx_event_channel *eventc;
+	struct mlx5dv_devx_uar *uar;
+	struct rte_intr_handle intr_handle;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -63,4 +98,58 @@ struct mlx5_vdpa_priv {
  */
 int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
 
+
+/**
+ * Create an event QP and all its related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] desc_n
+ *   Number of descriptors.
+ * @param[in] callfd
+ *   The guest notification file descriptor.
+ * @param[in/out] eqp
+ *   Pointer to the event QP structure.
+ *
+ * @return
+ *   0 on success, -1 otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+
+/**
+ * Destroy an event QP and all its related resources.
+ *
+ * @param[in/out] eqp
+ *   Pointer to the event QP structure.
+ */
+void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
+
+/**
+ * Release all the event global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Setup CQE event.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Unset CQE event .
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
new file mode 100644
index 0000000..c50e58e
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -0,0 +1,400 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <unistd.h>
+#include <stdint.h>
+#include <fcntl.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_common.h>
+#include <rte_io.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+void
+mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->uar) {
+		mlx5_glue->devx_free_uar(priv->uar);
+		priv->uar = NULL;
+	}
+	if (priv->eventc) {
+		mlx5_glue->devx_destroy_event_channel(priv->eventc);
+		priv->eventc = NULL;
+	}
+	priv->eqn = 0;
+}
+
+/* Prepare all the global resources for all the event objects.*/
+static int
+mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t lcore;
+
+	if (priv->eventc)
+		return 0;
+	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
+	if (mlx5_glue->devx_query_eqn(priv->ctx, lcore, &priv->eqn)) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
+		return -1;
+	}
+	priv->eventc = mlx5_glue->devx_create_event_channel(priv->ctx,
+			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+	if (!priv->eventc) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create event channel %d.",
+			rte_errno);
+		goto error;
+	}
+	priv->uar = mlx5_glue->devx_alloc_uar(priv->ctx, 0);
+	if (!priv->uar) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to allocate UAR.");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_vdpa_event_qp_global_release(priv);
+	return -1;
+}
+
+static void
+mlx5_vdpa_cq_destroy(struct mlx5_vdpa_cq *cq)
+{
+	if (cq->cq)
+		claim_zero(mlx5_devx_cmd_destroy(cq->cq));
+	if (cq->umem_obj)
+		claim_zero(mlx5_glue->devx_umem_dereg(cq->umem_obj));
+	if (cq->umem_buf)
+		rte_free((void *)(uintptr_t)cq->umem_buf);
+	memset(cq, 0, sizeof(*cq));
+}
+
+static inline void
+mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
+{
+	const unsigned int cqe_mask = (1 << cq->log_desc_n) - 1;
+	uint32_t arm_sn = cq->arm_sn << MLX5_CQ_SQN_OFFSET;
+	uint32_t cq_ci = cq->cq_ci & MLX5_CI_MASK & cqe_mask;
+	uint32_t doorbell_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | cq_ci;
+	uint64_t doorbell = ((uint64_t)doorbell_hi << 32) | cq->cq->id;
+	uint64_t db_be = rte_cpu_to_be_64(doorbell);
+	uint32_t *addr = RTE_PTR_ADD(priv->uar->base_addr, MLX5_CQ_DOORBELL);
+
+	rte_io_wmb();
+	cq->db_rec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(doorbell_hi);
+	rte_wmb();
+#ifdef RTE_ARCH_64
+	*(uint64_t *)addr = db_be;
+#else
+	*(uint32_t *)addr = db_be;
+	rte_io_wmb();
+	*((uint32_t *)addr + 1) = db_be >> 32;
+#endif
+	cq->arm_sn++;
+}
+
+static int
+mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
+		    int callfd, struct mlx5_vdpa_cq *cq)
+{
+	struct mlx5_devx_cq_attr attr;
+	size_t pgsize = sysconf(_SC_PAGESIZE);
+	uint32_t umem_size;
+	int ret;
+	uint16_t event_nums[1] = {0};
+
+	cq->log_desc_n = log_desc_n;
+	umem_size = sizeof(struct mlx5_cqe) * (1 << log_desc_n) +
+							sizeof(*cq->db_rec) * 2;
+	cq->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
+	if (!cq->umem_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for CQ.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	cq->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
+						(void *)(uintptr_t)cq->umem_buf,
+						umem_size,
+						IBV_ACCESS_LOCAL_WRITE);
+	if (!cq->umem_obj) {
+		DRV_LOG(ERR, "Failed to register umem for CQ.");
+		goto error;
+	}
+	attr.q_umem_valid = 1;
+	attr.db_umem_valid = 1;
+	attr.use_first_only = 0;
+	attr.overrun_ignore = 0;
+	attr.uar_page_id = priv->uar->page_id;
+	attr.q_umem_id = cq->umem_obj->umem_id;
+	attr.q_umem_offset = 0;
+	attr.db_umem_id = cq->umem_obj->umem_id;
+	attr.db_umem_offset = sizeof(struct mlx5_cqe) * (1 << log_desc_n);
+	attr.eqn = priv->eqn;
+	attr.log_cq_size = log_desc_n;
+	attr.log_page_size = rte_log2_u32(pgsize);
+	cq->cq = mlx5_devx_cmd_create_cq(priv->ctx, &attr);
+	if (!cq->cq)
+		goto error;
+	cq->db_rec = RTE_PTR_ADD(cq->umem_buf, (uintptr_t)attr.db_umem_offset);
+	cq->cq_ci = 0;
+	rte_spinlock_init(&cq->sl);
+	/* Subscribe CQ event to the event channel controlled by the driver. */
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc, cq->cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)cq);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to subscribe CQE event.");
+		rte_errno = errno;
+		goto error;
+	}
+	/* Subscribe CQ event to the guest FD only if it is not in poll mode. */
+	if (callfd != -1) {
+		ret = mlx5_glue->devx_subscribe_devx_event_fd(priv->eventc,
+							      callfd,
+							      cq->cq->obj, 0);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to subscribe CQE event fd.");
+			rte_errno = errno;
+			goto error;
+		}
+	}
+	/* First arming. */
+	mlx5_vdpa_cq_arm(priv, cq);
+	return 0;
+error:
+	mlx5_vdpa_cq_destroy(cq);
+	return -1;
+}
+
+static inline void __rte_unused
+mlx5_vdpa_cq_poll(struct mlx5_vdpa_priv *priv __rte_unused,
+		  struct mlx5_vdpa_cq *cq)
+{
+	struct mlx5_vdpa_event_qp *eqp =
+				container_of(cq, struct mlx5_vdpa_event_qp, cq);
+	const unsigned int cqe_size = 1 << cq->log_desc_n;
+	const unsigned int cqe_mask = cqe_size - 1;
+	int ret;
+
+	do {
+		volatile struct mlx5_cqe *cqe = cq->cqes + (cq->cq_ci &
+							    cqe_mask);
+
+		ret = check_cqe(cqe, cqe_size, cq->cq_ci);
+		switch (ret) {
+		case MLX5_CQE_STATUS_ERR:
+			cq->errors++;
+			/*fall-through*/
+		case MLX5_CQE_STATUS_SW_OWN:
+			cq->cq_ci++;
+			break;
+		case MLX5_CQE_STATUS_HW_OWN:
+		default:
+			break;
+		}
+	} while (ret != MLX5_CQE_STATUS_HW_OWN);
+	rte_io_wmb();
+	/* Ring CQ doorbell record. */
+	cq->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	rte_io_wmb();
+	/* Ring SW QP doorbell record. */
+	eqp->db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cqe_size);
+}
+
+static void
+mlx5_vdpa_interrupt_handler(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_EVENT
+	(void)cb_arg;
+	return;
+#else
+	struct mlx5_vdpa_priv *priv = cb_arg;
+	union {
+		struct mlx5dv_devx_async_event_hdr event_resp;
+		uint8_t buf[sizeof(struct mlx5dv_devx_async_event_hdr) + 128];
+	} out;
+
+	while (mlx5_glue->devx_get_event(priv->eventc, &out.event_resp,
+					 sizeof(out.buf)) >=
+				       (ssize_t)sizeof(out.event_resp.cookie)) {
+		struct mlx5_vdpa_cq *cq = (struct mlx5_vdpa_cq *)
+					       (uintptr_t)out.event_resp.cookie;
+		rte_spinlock_lock(&cq->sl);
+		mlx5_vdpa_cq_poll(priv, cq);
+		mlx5_vdpa_cq_arm(priv, cq);
+		rte_spinlock_unlock(&cq->sl);
+		DRV_LOG(DEBUG, "CQ %d event: new cq_ci = %u.", cq->cq->id,
+			cq->cq_ci);
+	}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+int
+mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
+{
+	int flags = fcntl(priv->eventc->fd, F_GETFL);
+	int ret = fcntl(priv->eventc->fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to change event channel FD.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	priv->intr_handle.fd = priv->eventc->fd;
+	priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&priv->intr_handle,
+				       mlx5_vdpa_interrupt_handler, priv)) {
+		priv->intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register CQE interrupt %d.", rte_errno);
+		return -rte_errno;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
+{
+	int retries = MLX5_VDPA_INTR_RETRIES;
+	int ret = -EAGAIN;
+
+	if (priv->intr_handle.fd) {
+		while (retries-- && ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(&priv->intr_handle,
+						    mlx5_vdpa_interrupt_handler,
+						    priv);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d "
+					"of CQ interrupt, retries = %d.",
+					priv->intr_handle.fd, retries);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+			}
+		}
+		memset(&priv->intr_handle, 0, sizeof(priv->intr_handle));
+	}
+}
+
+void
+mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (eqp->sw_qp)
+		claim_zero(mlx5_devx_cmd_destroy(eqp->sw_qp));
+	if (eqp->umem_obj)
+		claim_zero(mlx5_glue->devx_umem_dereg(eqp->umem_obj));
+	if (eqp->umem_buf)
+		rte_free(eqp->umem_buf);
+	if (eqp->fw_qp)
+		claim_zero(mlx5_devx_cmd_destroy(eqp->fw_qp));
+	mlx5_vdpa_cq_destroy(&eqp->cq);
+	memset(eqp, 0, sizeof(*eqp));
+}
+
+static int
+mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_RST2INIT_QP,
+					  eqp->sw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to INIT state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp, MLX5_CMD_OP_RST2INIT_QP,
+					  eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to INIT state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_INIT2RTR_QP,
+					  eqp->sw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RTR state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp, MLX5_CMD_OP_INIT2RTR_QP,
+					  eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RTR state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_RTR2RTS_QP,
+					  eqp->sw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RTS state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp, MLX5_CMD_OP_RTR2RTS_QP,
+					  eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RTS state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+{
+	struct mlx5_devx_qp_attr attr = {0};
+	uint16_t log_desc_n = rte_log2_u32(desc_n);
+	uint32_t umem_size = (1 << log_desc_n) * MLX5_WSEG_SIZE +
+						       sizeof(*eqp->db_rec) * 2;
+
+	if (mlx5_vdpa_event_qp_global_prepare(priv))
+		return -1;
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+		return -1;
+	attr.pd = priv->pdn;
+	eqp->fw_qp = mlx5_devx_cmd_create_qp(priv->ctx, &attr);
+	if (!eqp->fw_qp) {
+		DRV_LOG(ERR, "Failed to create FW QP(%u).", rte_errno);
+		goto error;
+	}
+	eqp->umem_buf = rte_zmalloc(__func__, umem_size, 4096);
+	if (!eqp->umem_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for SW QP.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	eqp->umem_obj = mlx5_glue->devx_umem_reg(priv->ctx,
+					       (void *)(uintptr_t)eqp->umem_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!eqp->umem_obj) {
+		DRV_LOG(ERR, "Failed to register umem for SW QP.");
+		goto error;
+	}
+	attr.uar_index = priv->uar->page_id;
+	attr.cqn = eqp->cq.cq->id;
+	attr.log_page_size = rte_log2_u32(sysconf(_SC_PAGESIZE));
+	attr.rq_size = 1 << log_desc_n;
+	attr.log_rq_stride = rte_log2_u32(MLX5_WSEG_SIZE);
+	attr.sq_size = 0; /* No need SQ. */
+	attr.dbr_umem_valid = 1;
+	attr.wq_umem_id = eqp->umem_obj->umem_id;
+	attr.wq_umem_offset = 0;
+	attr.dbr_umem_id = eqp->umem_obj->umem_id;
+	attr.dbr_address = (1 << log_desc_n) * MLX5_WSEG_SIZE;
+	eqp->sw_qp = mlx5_devx_cmd_create_qp(priv->ctx, &attr);
+	if (!eqp->sw_qp) {
+		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
+		goto error;
+	}
+	eqp->db_rec = RTE_PTR_ADD(eqp->umem_buf, (uintptr_t)attr.dbr_address);
+	if (mlx5_vdpa_qps2rts(eqp))
+		goto error;
+	/* First ringing. */
+	rte_write32(rte_cpu_to_be_32(1 << log_desc_n), &eqp->db_rec[0]);
+	return 0;
+error:
+	mlx5_vdpa_event_qp_destroy(eqp);
+	return -1;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 06/13] vdpa/mlx5: prepare virtio queues
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (4 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
The HW virtq object represents an emulated context for a VIRTIO_NET
virtqueue which was created and managed by a VIRTIO_NET driver as
defined in VIRTIO Specification.
Add support to prepare and release all the basic HW resources needed
the user virtqs emulation according to the rte_vhost configurations.
This patch prepares the basic configurations needed by DevX commands to
create a virtq.
Add new file mlx5_vdpa_virtq.c to manage virtq operations.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/vdpa/mlx5/Makefile          |   1 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  36 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 212 ++++++++++++++++++++++++++++++++++++
 5 files changed, 251 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 086af1b..1b24400 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_mlx5_vdpa.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 3da0d76..732ddce 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -14,6 +14,7 @@ sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
 	'mlx5_vdpa_event.c',
+	'mlx5_vdpa_virtq.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 16107cf..d76c3aa 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -230,6 +230,7 @@
 		goto error;
 	}
 	SLIST_INIT(&priv->mr_list);
+	SLIST_INIT(&priv->virtq_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 6282635..9284420 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,6 +59,19 @@ struct mlx5_vdpa_query_mr {
 	int is_indirect;
 };
 
+struct mlx5_vdpa_virtq {
+	SLIST_ENTRY(mlx5_vdpa_virtq) next;
+	uint16_t index;
+	uint16_t vq_size;
+	struct mlx5_devx_obj *virtq;
+	struct mlx5_vdpa_event_qp eqp;
+	struct {
+		struct mlx5dv_devx_umem *obj;
+		void *buf;
+		uint32_t size;
+	} umems[3];
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	int id; /* vDPA device id. */
@@ -75,6 +88,10 @@ struct mlx5_vdpa_priv {
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_uar *uar;
 	struct rte_intr_handle intr_handle;
+	struct mlx5_devx_obj *td;
+	struct mlx5_devx_obj *tis;
+	uint16_t nr_virtqs;
+	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -152,4 +169,23 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Release a virtq and all its related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create all the HW virtqs resources and all their related resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
new file mode 100644
index 0000000..781bccf
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <string.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+static int
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
+{
+	int i;
+
+	if (virtq->virtq) {
+		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->virtq = NULL;
+	}
+	for (i = 0; i < 3; ++i) {
+		if (virtq->umems[i].obj)
+			claim_zero(mlx5_glue->devx_umem_dereg
+							 (virtq->umems[i].obj));
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+	}
+	memset(&virtq->umems, 0, sizeof(virtq->umems));
+	if (virtq->eqp.fw_qp)
+		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+	return 0;
+}
+
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_virtq *entry;
+	struct mlx5_vdpa_virtq *next;
+
+	entry = SLIST_FIRST(&priv->virtq_list);
+	while (entry) {
+		next = SLIST_NEXT(entry, next);
+		mlx5_vdpa_virtq_unset(entry);
+		SLIST_REMOVE(&priv->virtq_list, entry, mlx5_vdpa_virtq, next);
+		rte_free(entry);
+		entry = next;
+	}
+	SLIST_INIT(&priv->virtq_list);
+	if (priv->tis) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
+		priv->tis = NULL;
+	}
+	if (priv->td) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->td));
+		priv->td = NULL;
+	}
+}
+
+static uint64_t
+mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
+{
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+	uint64_t gpa = 0;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		if (hva >= reg->host_user_addr &&
+		    hva < reg->host_user_addr + reg->size) {
+			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
+			break;
+		}
+	}
+	return gpa;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv,
+		      struct mlx5_vdpa_virtq *virtq, int index)
+{
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	uint64_t gpa;
+	int ret;
+	int i;
+	uint16_t last_avail_idx;
+	uint16_t last_used_idx;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	/*
+	 * No need event QPs creation when the guest in poll mode or when the
+	 * capability allows it.
+	 */
+	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
+					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+						      MLX5_VIRTQ_EVENT_MODE_QP :
+						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+						&virtq->eqp);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
+		attr.qp_id = virtq->eqp.fw_qp->id;
+	} else {
+		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
+			" need event QPs and event mechanism.", index);
+	}
+	/* Setup 3 UMEMs for each virtq. */
+	for (i = 0; i < 3; ++i) {
+		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
+							  priv->caps.umems[i].b;
+		virtq->umems[i].buf = rte_zmalloc(__func__,
+						  virtq->umems[i].size, 4096);
+		if (!virtq->umems[i].buf) {
+			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				" %u.", i, index);
+			goto error;
+		}
+		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->ctx,
+							virtq->umems[i].buf,
+							virtq->umems[i].size,
+							IBV_ACCESS_LOCAL_WRITE);
+		if (!virtq->umems[i].obj) {
+			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				i, index);
+			goto error;
+		}
+		attr.umems[i].id = virtq->umems[i].obj->umem_id;
+		attr.umems[i].offset = 0;
+		attr.umems[i].size = virtq->umems[i].size;
+	}
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+		goto error;
+	}
+	attr.desc_addr = gpa;
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for used ring.");
+		goto error;
+	}
+	attr.used_addr = gpa;
+	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
+	if (!gpa) {
+		DRV_LOG(ERR, "Fail to get GPA for available ring.");
+		goto error;
+	}
+	attr.available_addr = gpa;
+	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
+				 &last_used_idx);
+	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		"virtq %d.", priv->vid, last_avail_idx, last_used_idx, index);
+	attr.hw_available_index = last_avail_idx;
+	attr.hw_used_index = last_used_idx;
+	attr.q_size = vq.size;
+	attr.mkey = priv->gpa_mkey_index;
+	attr.tis_id = priv->tis->id;
+	attr.queue_index = index;
+	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
+	if (!virtq->virtq)
+		goto error;
+	return 0;
+error:
+	mlx5_vdpa_virtq_unset(virtq);
+	return -1;
+}
+
+int
+mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	uint32_t i;
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+
+	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transport domain.");
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	priv->tis = mlx5_devx_cmd_create_tis(priv->ctx, &tis_attr);
+	if (!priv->tis) {
+		DRV_LOG(ERR, "Failed to create TIS.");
+		goto error;
+	}
+	for (i = 0; i < nr_vring; i++) {
+		virtq = rte_zmalloc(__func__, sizeof(*virtq), 0);
+		if (!virtq || mlx5_vdpa_virtq_setup(priv, virtq, i)) {
+			if (virtq)
+				rte_free(virtq);
+			goto error;
+		}
+		SLIST_INSERT_HEAD(&priv->virtq_list, virtq, next);
+	}
+	priv->nr_virtqs = nr_vring;
+	return 0;
+error:
+	mlx5_vdpa_virtqs_release(priv);
+	return -1;
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 07/13] vdpa/mlx5: support stateless offloads
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (5 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
                       ` (7 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for the next features in virtq configuration:
	VIRTIO_F_RING_PACKED,
	VIRTIO_NET_F_HOST_TSO4,
	VIRTIO_NET_F_HOST_TSO6,
	VIRTIO_NET_F_CSUM,
	VIRTIO_NET_F_GUEST_CSUM,
	VIRTIO_F_VERSION_1,
These features support depends in the DevX capabilities reported by the
device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |   7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  10 ----
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  10 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 108 ++++++++++++++++++++++++++++------
 4 files changed, 107 insertions(+), 28 deletions(-)
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index fea491d..e4ee34b 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -4,10 +4,15 @@
 ; Refer to default.ini for the full list of available driver features.
 ;
 [Features]
-
+csum                 = Y
+guest csum           = Y
+host tso4            = Y
+host tso6            = Y
+version 1            = Y
 any layout           = Y
 guest announce       = Y
 mq                   = Y
+packed               = Y
 proto mq             = Y
 proto log shmfd      = Y
 proto host notifier  = Y
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d76c3aa..f625b5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,8 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
-#include <linux/virtio_net.h>
-
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
@@ -17,14 +15,6 @@
 #include "mlx5_vdpa.h"
 
 
-#ifndef VIRTIO_F_ORDER_PLATFORM
-#define VIRTIO_F_ORDER_PLATFORM 36
-#endif
-
-#ifndef VIRTIO_F_RING_PACKED
-#define VIRTIO_F_RING_PACKED 34
-#endif
-
 #define MLX5_VDPA_DEFAULT_FEATURES ((1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
 			    (1ULL << VIRTIO_NET_F_MQ) | \
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 9284420..02cf139 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -5,6 +5,7 @@
 #ifndef RTE_PMD_MLX5_VDPA_H_
 #define RTE_PMD_MLX5_VDPA_H_
 
+#include <linux/virtio_net.h>
 #include <sys/queue.h>
 
 #ifdef PEDANTIC
@@ -26,6 +27,14 @@
 #define MLX5_VDPA_INTR_RETRIES 256
 #define MLX5_VDPA_INTR_RETRIES_USEC 1000
 
+#ifndef VIRTIO_F_ORDER_PLATFORM
+#define VIRTIO_F_ORDER_PLATFORM 36
+#endif
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
 struct mlx5_vdpa_cq {
 	uint16_t log_desc_n;
 	uint32_t cq_ci:24;
@@ -91,6 +100,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_devx_obj *td;
 	struct mlx5_devx_obj *tis;
 	uint16_t nr_virtqs;
+	uint64_t features; /* Negotiated features. */
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 781bccf..e27af28 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -57,6 +57,7 @@
 		claim_zero(mlx5_devx_cmd_destroy(priv->td));
 		priv->td = NULL;
 	}
+	priv->features = 0;
 }
 
 static uint64_t
@@ -94,6 +95,14 @@
 		return -1;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
+	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
+							VIRTIO_F_VERSION_1));
+	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
@@ -139,24 +148,29 @@
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
 	}
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.desc);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
-		goto error;
-	}
-	attr.desc_addr = gpa;
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.used);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for used ring.");
-		goto error;
-	}
-	attr.used_addr = gpa;
-	gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, (uint64_t)(uintptr_t)vq.avail);
-	if (!gpa) {
-		DRV_LOG(ERR, "Fail to get GPA for available ring.");
-		goto error;
+	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.desc);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
+			goto error;
+		}
+		attr.desc_addr = gpa;
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.used);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get GPA for used ring.");
+			goto error;
+		}
+		attr.used_addr = gpa;
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+					   (uint64_t)(uintptr_t)vq.avail);
+		if (!gpa) {
+			DRV_LOG(ERR, "Failed to get GPA for available ring.");
+			goto error;
+		}
+		attr.available_addr = gpa;
 	}
-	attr.available_addr = gpa;
 	rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
 				 &last_used_idx);
 	DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
@@ -176,6 +190,61 @@
 	return -1;
 }
 
+static int
+mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+		if (!(priv->caps.virtio_queue_type & (1 <<
+						     MLX5_VIRTQ_TYPE_PACKED))) {
+			DRV_LOG(ERR, "Failed to configur PACKED mode for vdev "
+				"%d - it was not reported by HW/driver"
+				" capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4)) {
+		if (!priv->caps.tso_ipv4) {
+			DRV_LOG(ERR, "Failed to enable TSO4 for vdev %d - TSO4"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6)) {
+		if (!priv->caps.tso_ipv6) {
+			DRV_LOG(ERR, "Failed to enable TSO6 for vdev %d - TSO6"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_CSUM)) {
+		if (!priv->caps.tx_csum) {
+			DRV_LOG(ERR, "Failed to enable CSUM for vdev %d - CSUM"
+				" was not reported by HW/driver capability.",
+				priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)) {
+		if (!priv->caps.rx_csum) {
+			DRV_LOG(ERR, "Failed to enable GUEST CSUM for vdev %d"
+				" GUEST CSUM was not reported by HW/driver "
+				"capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	if (priv->features & (1ULL << VIRTIO_F_VERSION_1)) {
+		if (!priv->caps.virtio_version_1_0) {
+			DRV_LOG(ERR, "Failed to enable version 1 for vdev %d "
+				"version 1 was not reported by HW/driver"
+				" capability.", priv->vid);
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -183,7 +252,12 @@
 	struct mlx5_vdpa_virtq *virtq;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
 
+	if (ret || mlx5_vdpa_features_validate(priv)) {
+		DRV_LOG(ERR, "Failed to configure negotiated features.");
+		return -1;
+	}
 	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
 	if (!priv->td) {
 		DRV_LOG(ERR, "Failed to create transport domain.");
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 08/13] vdpa/mlx5: add basic steering configurations
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (6 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 09/13] vdpa/mlx5: support queue state operation Matan Azrad
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add a steering object to be managed by a new file mlx5_vdpa_steer.c.
Allow promiscuous flow to scatter the device Rx packets to the virtio
queues using RSS action.
In order to allow correct RSS in L3 and L4, split the flow to 7 flows
as required by the device.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/Makefile          |   2 +
 drivers/vdpa/mlx5/meson.build       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  34 +++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 265 ++++++++++++++++++++++++++++++++++++
 5 files changed, 303 insertions(+)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 1b24400..8362fef 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -11,6 +11,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
+
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 732ddce..3f85dda 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
 	'mlx5_vdpa_mem.c',
 	'mlx5_vdpa_event.c',
 	'mlx5_vdpa_virtq.c',
+	'mlx5_vdpa_steer.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f625b5e..28b94a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -209,6 +209,7 @@
 			goto error;
 		}
 		priv->caps = attr.vdpa;
+		priv->log_max_rqt_size = attr.log_max_rqt_size;
 	}
 	priv->ctx = ctx;
 	priv->dev_addr.pci_addr = pci_dev->addr;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 02cf139..d7eb5ee 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -81,6 +81,18 @@ struct mlx5_vdpa_virtq {
 	} umems[3];
 };
 
+struct mlx5_vdpa_steer {
+	struct mlx5_devx_obj *rqt;
+	void *domain;
+	void *tbl;
+	struct {
+		struct mlx5dv_flow_matcher *matcher;
+		struct mlx5_devx_obj *tir;
+		void *tir_action;
+		void *flow;
+	} rss[7];
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	int id; /* vDPA device id. */
@@ -101,7 +113,9 @@ struct mlx5_vdpa_priv {
 	struct mlx5_devx_obj *tis;
 	uint16_t nr_virtqs;
 	uint64_t features; /* Negotiated features. */
+	uint16_t log_max_rqt_size;
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
+	struct mlx5_vdpa_steer steer;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
@@ -198,4 +212,24 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Unset steering and release all its related resources- stop traffic.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Setup steering and all its related resources to enable RSS trafic from the
+ * device to all the Rx host queues.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
new file mode 100644
index 0000000..f365c10
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -0,0 +1,265 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <netinet/in.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_common.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+int
+mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
+{
+	int ret __rte_unused;
+	unsigned i;
+
+	for (i = 0; i < RTE_DIM(priv->steer.rss); ++i) {
+		if (priv->steer.rss[i].flow) {
+			claim_zero(mlx5_glue->dv_destroy_flow
+						     (priv->steer.rss[i].flow));
+			priv->steer.rss[i].flow = NULL;
+		}
+		if (priv->steer.rss[i].tir_action) {
+			claim_zero(mlx5_glue->destroy_flow_action
+					       (priv->steer.rss[i].tir_action));
+			priv->steer.rss[i].tir_action = NULL;
+		}
+		if (priv->steer.rss[i].tir) {
+			claim_zero(mlx5_devx_cmd_destroy
+						      (priv->steer.rss[i].tir));
+			priv->steer.rss[i].tir = NULL;
+		}
+		if (priv->steer.rss[i].matcher) {
+			claim_zero(mlx5_glue->dv_destroy_flow_matcher
+						  (priv->steer.rss[i].matcher));
+			priv->steer.rss[i].matcher = NULL;
+		}
+	}
+	if (priv->steer.tbl) {
+		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
+		priv->steer.tbl = NULL;
+	}
+	if (priv->steer.domain) {
+		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
+		priv->steer.domain = NULL;
+	}
+	if (priv->steer.rqt) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
+		priv->steer.rqt = NULL;
+	}
+	return 0;
+}
+
+/*
+ * According to VIRTIO_NET Spec the virtqueues index identity its type by:
+ * 0 receiveq1
+ * 1 transmitq1
+ * ...
+ * 2(N-1) receiveqN
+ * 2(N-1)+1 transmitqN
+ * 2N controlq
+ */
+static uint8_t
+is_virtq_recvq(int virtq_index, int nr_vring)
+{
+	if (virtq_index % 2 == 0 && virtq_index != nr_vring - 1)
+		return 1;
+	return 0;
+}
+
+#define MLX5_VDPA_DEFAULT_RQT_SIZE 512
+static int __rte_unused
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_vdpa_virtq *virtq;
+	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
+				 1 << priv->log_max_rqt_size);
+	struct mlx5_devx_rqt_attr *attr = rte_zmalloc(__func__, sizeof(*attr)
+						      + rqt_n *
+						      sizeof(uint32_t), 0);
+	uint32_t i = 0, j;
+	int ret = 0;
+
+	if (!attr) {
+		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
+		rte_errno = ENOMEM;
+		return -ENOMEM;
+	}
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+			attr->rq_list[i] = virtq->virtq->id;
+			i++;
+		}
+	}
+	for (j = 0; i != rqt_n; ++i, ++j)
+		attr->rq_list[i] = attr->rq_list[j];
+	attr->rq_type = MLX5_INLINE_Q_TYPE_VIRTQ;
+	attr->rqt_max_size = rqt_n;
+	attr->rqt_actual_size = rqt_n;
+	if (!priv->steer.rqt) {
+		priv->steer.rqt = mlx5_devx_cmd_create_rqt(priv->ctx, attr);
+		if (!priv->steer.rqt) {
+			DRV_LOG(ERR, "Failed to create RQT.");
+			ret = -rte_errno;
+		}
+	} else {
+		ret = mlx5_devx_cmd_modify_rqt(priv->steer.rqt, attr);
+		if (ret)
+			DRV_LOG(ERR, "Failed to modify RQT.");
+	}
+	rte_free(attr);
+	return ret;
+}
+
+static int __rte_unused
+mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_MLX5DV_DR
+	struct mlx5_devx_tir_attr tir_att = {
+		.disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT,
+		.rx_hash_fn = MLX5_RX_HASH_FN_TOEPLITZ,
+		.transport_domain = priv->td->id,
+		.indirect_table = priv->steer.rqt->id,
+		.rx_hash_symmetric = 1,
+		.rx_hash_toeplitz_key = { 0x2cc681d1, 0x5bdbf4f7, 0xfca28319,
+					  0xdb1a3e94, 0x6b9e38d9, 0x2c9c03d1,
+					  0xad9944a7, 0xd9563d59, 0x063c25f3,
+					  0xfc1fdc2a },
+	};
+	struct {
+		size_t size;
+		/**< Size of match value. Do NOT split size and key! */
+		uint32_t buf[MLX5_ST_SZ_DW(fte_match_param)];
+		/**< Matcher value. This value is used as the mask or a key. */
+	} matcher_mask = {
+				.size = sizeof(matcher_mask.buf),
+			},
+	  matcher_value = {
+				.size = sizeof(matcher_value.buf),
+			};
+	struct mlx5dv_flow_matcher_attr dv_attr = {
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.match_mask = (void *)&matcher_mask,
+	};
+	void *match_m = matcher_mask.buf;
+	void *match_v = matcher_value.buf;
+	void *headers_m = MLX5_ADDR_OF(fte_match_param, match_m, outer_headers);
+	void *headers_v = MLX5_ADDR_OF(fte_match_param, match_v, outer_headers);
+	void *actions[1];
+	const uint8_t l3_hash =
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_SRC_IP) |
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_DST_IP);
+	const uint8_t l4_hash =
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_SPORT) |
+		(1 << MLX5_RX_HASH_FIELD_SELECT_SELECTED_FIELDS_L4_DPORT);
+	enum { PRIO, CRITERIA, IP_VER_M, IP_VER_V, IP_PROT_M, IP_PROT_V, L3_BIT,
+	       L4_BIT, HASH, END};
+	const uint8_t vars[RTE_DIM(priv->steer.rss)][END] = {
+		{ 7, 0, 0, 0, 0, 0, 0, 0, 0 },
+		{ 6, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0, 0,
+		 MLX5_L3_PROT_TYPE_IPV4, 0, l3_hash },
+		{ 6, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0, 0,
+		 MLX5_L3_PROT_TYPE_IPV6, 0, l3_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0xff,
+		 IPPROTO_UDP, MLX5_L3_PROT_TYPE_IPV4, MLX5_L4_PROT_TYPE_UDP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 4, 0xff,
+		 IPPROTO_TCP, MLX5_L3_PROT_TYPE_IPV4, MLX5_L4_PROT_TYPE_TCP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0xff,
+		 IPPROTO_UDP, MLX5_L3_PROT_TYPE_IPV6, MLX5_L4_PROT_TYPE_UDP,
+		 l3_hash | l4_hash },
+		{ 5, 1 << MLX5_MATCH_CRITERIA_ENABLE_OUTER_BIT, 0xf, 6, 0xff,
+		 IPPROTO_TCP, MLX5_L3_PROT_TYPE_IPV6, MLX5_L4_PROT_TYPE_TCP,
+		 l3_hash | l4_hash },
+	};
+	unsigned i;
+
+	for (i = 0; i < RTE_DIM(priv->steer.rss); ++i) {
+		dv_attr.priority = vars[i][PRIO];
+		dv_attr.match_criteria_enable = vars[i][CRITERIA];
+		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_version,
+			 vars[i][IP_VER_M]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_version,
+			 vars[i][IP_VER_V]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ip_protocol,
+			 vars[i][IP_PROT_M]);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
+			 vars[i][IP_PROT_V]);
+		tir_att.rx_hash_field_selector_outer.l3_prot_type =
+								vars[i][L3_BIT];
+		tir_att.rx_hash_field_selector_outer.l4_prot_type =
+								vars[i][L4_BIT];
+		tir_att.rx_hash_field_selector_outer.selected_fields =
+								  vars[i][HASH];
+		priv->steer.rss[i].matcher = mlx5_glue->dv_create_flow_matcher
+					 (priv->ctx, &dv_attr, priv->steer.tbl);
+		if (!priv->steer.rss[i].matcher) {
+			DRV_LOG(ERR, "Failed to create matcher %d.", i);
+			goto error;
+		}
+		priv->steer.rss[i].tir = mlx5_devx_cmd_create_tir(priv->ctx,
+								  &tir_att);
+		if (!priv->steer.rss[i].tir) {
+			DRV_LOG(ERR, "Failed to create TIR %d.", i);
+			goto error;
+		}
+		priv->steer.rss[i].tir_action =
+				mlx5_glue->dv_create_flow_action_dest_devx_tir
+						  (priv->steer.rss[i].tir->obj);
+		if (!priv->steer.rss[i].tir_action) {
+			DRV_LOG(ERR, "Failed to create TIR action %d.", i);
+			goto error;
+		}
+		actions[0] = priv->steer.rss[i].tir_action;
+		priv->steer.rss[i].flow = mlx5_glue->dv_create_flow
+					(priv->steer.rss[i].matcher,
+					 (void *)&matcher_value, 1, actions);
+		if (!priv->steer.rss[i].flow) {
+			DRV_LOG(ERR, "Failed to create flow %d.", i);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	/* Resources will be freed by the caller. */
+	return -1;
+#else
+	(void)priv;
+	return -ENOTSUP;
+#endif /* HAVE_MLX5DV_DR */
+}
+
+int
+mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
+{
+#ifdef HAVE_MLX5DV_DR
+	if (mlx5_vdpa_rqt_prepare(priv))
+		return -1;
+	priv->steer.domain = mlx5_glue->dr_create_domain(priv->ctx,
+						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		goto error;
+	}
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		goto error;
+	}
+	if (mlx5_vdpa_rss_flows_create(priv))
+		goto error;
+	return 0;
+error:
+	mlx5_vdpa_steer_unset(priv);
+	return -1;
+#else
+	(void)priv;
+	return -ENOTSUP;
+#endif /* HAVE_MLX5DV_DR */
+}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 09/13] vdpa/mlx5: support queue state operation
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (7 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 10/13] vdpa/mlx5: map doorbell Matan Azrad
                       ` (5 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for set_vring_state operation.
Using DevX API the virtq state can be changed as described in PRM:
	enable - move to ready state.
	disable - move to suspend state.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 23 ++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 15 +++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 22 ++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 25 +++++++++++++++++++++----
 4 files changed, 78 insertions(+), 7 deletions(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 28b94a3..3615681 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -106,13 +106,34 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_set_vring_state(int vid, int vring, int state)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	struct mlx5_vdpa_virtq *virtq = NULL;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	SLIST_FOREACH(virtq, &priv->virtq_list, next)
+		if (virtq->index == vring)
+			break;
+	if (!virtq) {
+		DRV_LOG(ERR, "Invalid or unconfigured vring id: %d.", vring);
+		return -EINVAL;
+	}
+	return mlx5_vdpa_virtq_enable(virtq, state);
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = NULL,
 	.dev_close = NULL,
-	.set_vring_state = NULL,
+	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = NULL,
 	.migration_done = NULL,
 	.get_vfio_group_fd = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index d7eb5ee..629a282 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -70,8 +70,10 @@ struct mlx5_vdpa_query_mr {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
+	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
+	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_vdpa_event_qp eqp;
 	struct {
@@ -213,6 +215,19 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 
 /**
+ * Enable\Disable virtq..
+ *
+ * @param[in] virtq
+ *   The vdpa driver private virtq structure.
+ * @param[in] enable
+ *   Set to enable, otherwise disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_virtq *virtq, int enable);
+
+/**
  * Unset steering and release all its related resources- stop traffic.
  *
  * @param[in] priv
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index f365c10..36017f1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -73,7 +73,7 @@
 }
 
 #define MLX5_VDPA_DEFAULT_RQT_SIZE 512
-static int __rte_unused
+static int
 mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_vdpa_virtq *virtq;
@@ -91,7 +91,8 @@
 		return -ENOMEM;
 	}
 	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
-		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+		if (is_virtq_recvq(virtq->index, priv->nr_virtqs) &&
+		    virtq->enable) {
 			attr->rq_list[i] = virtq->virtq->id;
 			i++;
 		}
@@ -116,6 +117,23 @@
 	return ret;
 }
 
+int
+mlx5_vdpa_virtq_enable(struct mlx5_vdpa_virtq *virtq, int enable)
+{
+	struct mlx5_vdpa_priv *priv = virtq->priv;
+	int ret = 0;
+
+	if (virtq->enable == !!enable)
+		return 0;
+	virtq->enable = !!enable;
+	if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
+		ret = mlx5_vdpa_rqt_prepare(priv);
+		if (ret)
+			virtq->enable = !enable;
+	}
+	return ret;
+}
+
 static int __rte_unused
 mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e27af28..9967be3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -15,13 +15,13 @@
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int i;
+	unsigned int i;
 
 	if (virtq->virtq) {
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 		virtq->virtq = NULL;
 	}
-	for (i = 0; i < 3; ++i) {
+	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 		if (virtq->umems[i].obj)
 			claim_zero(mlx5_glue->devx_umem_dereg
 							 (virtq->umems[i].obj));
@@ -60,6 +60,19 @@
 	priv->features = 0;
 }
 
+static int
+mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
+{
+	struct mlx5_devx_virtq_attr attr = {
+			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.state = state ? MLX5_VIRTQ_STATE_RDY :
+					 MLX5_VIRTQ_STATE_SUSPEND,
+			.queue_index = virtq->index,
+	};
+
+	return mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+}
+
 static uint64_t
 mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 {
@@ -86,7 +99,7 @@
 	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
-	int i;
+	unsigned int i;
 	uint16_t last_avail_idx;
 	uint16_t last_used_idx;
 
@@ -125,7 +138,7 @@
 			" need event QPs and event mechanism.", index);
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < 3; ++i) {
+	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
 							  priv->caps.umems[i].b;
 		virtq->umems[i].buf = rte_zmalloc(__func__,
@@ -182,8 +195,12 @@
 	attr.tis_id = priv->tis->id;
 	attr.queue_index = index;
 	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->ctx, &attr);
+	virtq->priv = priv;
 	if (!virtq->virtq)
 		goto error;
+	if (mlx5_vdpa_virtq_modify(virtq, 1))
+		goto error;
+	virtq->enable = 1;
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 10/13] vdpa/mlx5: map doorbell
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (8 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 09/13] vdpa/mlx5: support queue state operation Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 11/13] vdpa/mlx5: support live migration Matan Azrad
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
The HW supports only 4 bytes doorbell writing detection.
The virtio device set only 2 bytes when it rings the doorbell.
Map the virtio doorbell detected by the virtio queue kickfd to the HW
VAR space when it expects to get the virtio emulation doorbell.
Use the EAL interrupt mechanism to get notification when a new event
appears in kickfd by the guest and write 4 bytes to the HW doorbell space
in the notification callback.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  3 ++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 80 +++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 629a282..5424be5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -81,6 +81,7 @@ struct mlx5_vdpa_virtq {
 		void *buf;
 		uint32_t size;
 	} umems[3];
+	struct rte_intr_handle intr_handle;
 };
 
 struct mlx5_vdpa_steer {
@@ -118,6 +119,8 @@ struct mlx5_vdpa_priv {
 	uint16_t log_max_rqt_size;
 	SLIST_HEAD(virtq_list, mlx5_vdpa_virtq) virtq_list;
 	struct mlx5_vdpa_steer steer;
+	struct mlx5dv_var *var;
+	void *virtq_db_addr;
 	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 9967be3..32a13ce 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -2,9 +2,12 @@
  * Copyright 2019 Mellanox Technologies, Ltd
  */
 #include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
 
 #include <rte_malloc.h>
 #include <rte_errno.h>
+#include <rte_io.h>
 
 #include <mlx5_common.h>
 
@@ -12,11 +15,52 @@
 #include "mlx5_vdpa.h"
 
 
+static void
+mlx5_vdpa_virtq_handler(void *cb_arg)
+{
+	struct mlx5_vdpa_virtq *virtq = cb_arg;
+	struct mlx5_vdpa_priv *priv = virtq->priv;
+	uint64_t buf;
+	int nbytes;
+
+	do {
+		nbytes = read(virtq->intr_handle.fd, &buf, 8);
+		if (nbytes < 0) {
+			if (errno == EINTR ||
+			    errno == EWOULDBLOCK ||
+			    errno == EAGAIN)
+				continue;
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+				virtq->index, strerror(errno));
+		}
+		break;
+	} while (1);
+	rte_write32(virtq->index, priv->virtq_db_addr);
+	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	unsigned int i;
+	int retries = MLX5_VDPA_INTR_RETRIES;
+	int ret = -EAGAIN;
 
+	if (virtq->intr_handle.fd) {
+		while (retries-- && ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(&virtq->intr_handle,
+							mlx5_vdpa_virtq_handler,
+							virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d "
+					"of virtq %d interrupt, retries = %d.",
+					virtq->intr_handle.fd,
+					(int)virtq->index, retries);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+			}
+		}
+		memset(&virtq->intr_handle, 0, sizeof(virtq->intr_handle));
+	}
 	if (virtq->virtq) {
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 		virtq->virtq = NULL;
@@ -57,6 +101,14 @@
 		claim_zero(mlx5_devx_cmd_destroy(priv->td));
 		priv->td = NULL;
 	}
+	if (priv->virtq_db_addr) {
+		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
+		priv->virtq_db_addr = NULL;
+	}
+	if (priv->var) {
+		mlx5_glue->dv_free_var(priv->var);
+		priv->var = NULL;
+	}
 	priv->features = 0;
 }
 
@@ -201,6 +253,17 @@
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->enable = 1;
+	virtq->intr_handle.fd = vq.kickfd;
+	virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&virtq->intr_handle,
+				       mlx5_vdpa_virtq_handler, virtq)) {
+		virtq->intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register virtq %d interrupt.", index);
+		goto error;
+	} else {
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			virtq->intr_handle.fd, index);
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
@@ -275,6 +338,23 @@
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
 		return -1;
 	}
+	priv->var = mlx5_glue->dv_alloc_var(priv->ctx, 0);
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.\n", errno);
+		return -1;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, priv->ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		goto error;
+	} else {
+		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+			priv->virtq_db_addr);
+	}
 	priv->td = mlx5_devx_cmd_create_td(priv->ctx);
 	if (!priv->td) {
 		DRV_LOG(ERR, "Failed to create transport domain.");
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 11/13] vdpa/mlx5: support live migration
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (9 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 10/13] vdpa/mlx5: map doorbell Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 12/13] vdpa/mlx5: support close and config operations Matan Azrad
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Add support for live migration feature by the HW:
	Create a single Mkey that maps the memory address space of the
		VHOST live migration log file.
	Modify VIRTIO_NET_Q object and provide vhost_log_page,
		dirty_bitmap_mkey, dirty_bitmap_size, dirty_bitmap_addr
		and dirty_bitmap_dump_enable.
	Modify VIRTIO_NET_Q object and move state to SUSPEND.
	Query VIRTIO_NET_Q and get hw_available_idx and hw_used_idx.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/vdpadevs/features/mlx5.ini |   1 +
 drivers/vdpa/mlx5/Makefile            |   1 +
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  44 +++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  55 +++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 129 ++++++++++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   7 +-
 7 files changed, 235 insertions(+), 3 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
diff --git a/doc/guides/vdpadevs/features/mlx5.ini b/doc/guides/vdpadevs/features/mlx5.ini
index e4ee34b..1da9c1b 100644
--- a/doc/guides/vdpadevs/features/mlx5.ini
+++ b/doc/guides/vdpadevs/features/mlx5.ini
@@ -9,6 +9,7 @@ guest csum           = Y
 host tso4            = Y
 host tso6            = Y
 version 1            = Y
+log all              = Y
 any layout           = Y
 guest announce       = Y
 mq                   = Y
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index 8362fef..d4a544c 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -12,6 +12,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_event.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_virtq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_steer.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_VDPA_PMD) += mlx5_vdpa_lm.c
 
 
 # Basic CFLAGS.
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 3f85dda..bb96dad 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -16,6 +16,7 @@ sources = files(
 	'mlx5_vdpa_event.c',
 	'mlx5_vdpa_virtq.c',
 	'mlx5_vdpa_steer.c',
+	'mlx5_vdpa_lm.c',
 )
 cflags_options = [
 	'-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 3615681..1bb6c68 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -19,7 +19,8 @@
 			    (1ULL << VIRTIO_F_ANY_LAYOUT) | \
 			    (1ULL << VIRTIO_NET_F_MQ) | \
 			    (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
-			    (1ULL << VIRTIO_F_ORDER_PLATFORM))
+			    (1ULL << VIRTIO_F_ORDER_PLATFORM) | \
+			    (1ULL << VHOST_F_LOG_ALL))
 
 #define MLX5_VDPA_PROTOCOL_FEATURES \
 			    ((1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ) | \
@@ -127,6 +128,45 @@
 	return mlx5_vdpa_virtq_enable(virtq, state);
 }
 
+static int
+mlx5_vdpa_features_set(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	uint64_t log_base, log_size;
+	uint64_t features;
+	int ret;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	ret = rte_vhost_get_negotiated_features(vid, &features);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to get negotiated features.");
+		return ret;
+	}
+	if (RTE_VHOST_NEED_LOG(features)) {
+		ret = rte_vhost_get_log_base(vid, &log_base, &log_size);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to get log base.");
+			return ret;
+		}
+		ret = mlx5_vdpa_dirty_bitmap_set(priv, log_base, log_size);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set dirty bitmap.");
+			return ret;
+		}
+		DRV_LOG(INFO, "mlx5 vdpa: enabling dirty logging...");
+		ret = mlx5_vdpa_logging_enable(priv, 1);
+		if (ret) {
+			DRV_LOG(ERR, "Failed t enable dirty logging.");
+			return ret;
+		}
+	}
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
@@ -134,7 +174,7 @@
 	.dev_conf = NULL,
 	.dev_close = NULL,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
-	.set_features = NULL,
+	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
 	.get_vfio_group_fd = NULL,
 	.get_vfio_device_fd = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 5424be5..527436d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -250,4 +250,59 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 int mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Enable\Disable live migration logging.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] enable
+ *   Set for enable, unset for disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable);
+
+/**
+ * Set dirty bitmap logging to allow live migration.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] log_base
+ *   Vhost log base.
+ * @param[in] log_size
+ *   Vhost log size.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
+			       uint64_t log_size);
+
+/**
+ * Log all virtqs information for live migration.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ * @param[in] enable
+ *   Set for enable, unset for disable.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Modify virtq state to be ready or suspend.
+ *
+ * @param[in] virtq
+ *   The vdpa driver private virtq structure.
+ * @param[in] state
+ *   Set for ready, otherwise suspend.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state);
+
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
new file mode 100644
index 0000000..3358704
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2019 Mellanox Technologies, Ltd
+ */
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+
+int
+mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
+{
+	struct mlx5_devx_virtq_attr attr = {
+		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.dirty_bitmap_dump_enable = enable,
+	};
+	struct mlx5_vdpa_virtq *virtq;
+
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		attr.queue_index = virtq->index;
+		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to modify virtq %d logging.",
+				virtq->index);
+			return -1;
+		}
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
+			   uint64_t log_size)
+{
+	struct mlx5_devx_mkey_attr mkey_attr = {
+			.addr = (uintptr_t)log_base,
+			.size = log_size,
+			.pd = priv->pdn,
+			.pg_access = 1,
+			.klm_array = NULL,
+			.klm_num = 0,
+	};
+	struct mlx5_devx_virtq_attr attr = {
+		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.dirty_bitmap_addr = log_base,
+		.dirty_bitmap_size = log_size,
+	};
+	struct mlx5_vdpa_query_mr *mr = rte_malloc(__func__, sizeof(*mr), 0);
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!mr) {
+		DRV_LOG(ERR, "Failed to allocate mem for lm mr.");
+		return -1;
+	}
+	mr->umem = mlx5_glue->devx_umem_reg(priv->ctx,
+					    (void *)(uintptr_t)log_base,
+					    log_size, IBV_ACCESS_LOCAL_WRITE);
+	if (!mr->umem) {
+		DRV_LOG(ERR, "Failed to register umem for lm mr.");
+		goto err;
+	}
+	mkey_attr.umem_id = mr->umem->umem_id;
+	mr->mkey = mlx5_devx_cmd_mkey_create(priv->ctx, &mkey_attr);
+	if (!mr->mkey) {
+		DRV_LOG(ERR, "Failed to create Mkey for lm.");
+		goto err;
+	}
+	attr.dirty_bitmap_mkey = mr->mkey->id;
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		attr.queue_index = virtq->index;
+		if (mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to modify virtq %d for lm.",
+				virtq->index);
+			goto err;
+		}
+	}
+	mr->is_indirect = 0;
+	SLIST_INSERT_HEAD(&priv->mr_list, mr, next);
+	return 0;
+err:
+	if (mr->mkey)
+		mlx5_devx_cmd_destroy(mr->mkey);
+	if (mr->umem)
+		mlx5_glue->devx_umem_dereg(mr->umem);
+	rte_free(mr);
+	return -1;
+}
+
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
+int
+mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	uint64_t features;
+	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
+
+	if (ret) {
+		DRV_LOG(ERR, "Failed to get negotiated features.");
+		return -1;
+	}
+	if (!RTE_VHOST_NEED_LOG(features))
+		return 0;
+	SLIST_FOREACH(virtq, &priv->virtq_list, next) {
+		ret = mlx5_vdpa_virtq_modify(virtq, 0);
+		if (ret)
+			return -1;
+		if (mlx5_devx_cmd_query_virtq(virtq->virtq, &attr)) {
+			DRV_LOG(ERR, "Failed to query virtq %d.", virtq->index);
+			return -1;
+		}
+		DRV_LOG(INFO, "Query vid %d vring %d: hw_available_idx=%d, "
+			"hw_used_index=%d", priv->vid, virtq->index,
+			attr.hw_available_index, attr.hw_used_index);
+		ret = rte_vhost_set_vring_base(priv->vid, virtq->index,
+					       attr.hw_available_index,
+					       attr.hw_used_index);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set virtq %d base.",
+				virtq->index);
+			return -1;
+		}
+		rte_vhost_log_used_vring(priv->vid, virtq->index, 0,
+				       MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+	}
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 32a13ce..2312331 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -112,7 +112,7 @@
 	priv->features = 0;
 }
 
-static int
+int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
@@ -253,6 +253,11 @@
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->enable = 1;
+	virtq->priv = priv;
+	/* Be sure notifications are not missed during configuration. */
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	rte_write32(virtq->index, priv->virtq_db_addr);
+	/* Setup doorbell mapping. */
 	virtq->intr_handle.fd = vq.kickfd;
 	virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
 	if (rte_intr_callback_register(&virtq->intr_handle,
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 12/13] vdpa/mlx5: support close and config operations
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (10 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 11/13] vdpa/mlx5: support live migration Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE Matan Azrad
                       ` (2 subsequent siblings)
  14 siblings, 0 replies; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
Support dev_conf and dev_conf operations.
These operations allow vdpa traffic.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 58 ++++++++++++++++++++++++++++++++++++++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 1bb6c68..57619d2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -167,12 +167,59 @@
 	return 0;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+	int ret = 0;
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -1;
+	}
+	if (priv->configured)
+		ret |= mlx5_vdpa_lm_log(priv);
+	mlx5_vdpa_cqe_event_unset(priv);
+	ret |= mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_event_qp_global_release(priv);
+	mlx5_vdpa_mem_dereg(priv);
+	priv->configured = 0;
+	priv->vid = 0;
+	return ret;
+}
+
+static int
+mlx5_vdpa_dev_config(int vid)
+{
+	int did = rte_vhost_get_vdpa_device_id(vid);
+	struct mlx5_vdpa_priv *priv = mlx5_vdpa_find_priv_resource_by_did(did);
+
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d.", did);
+		return -EINVAL;
+	}
+	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
+		return -1;
+	}
+	priv->vid = vid;
+	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_virtqs_prepare(priv) ||
+	    mlx5_vdpa_steer_setup(priv) || mlx5_vdpa_cqe_event_setup(priv)) {
+		mlx5_vdpa_dev_close(vid);
+		return -1;
+	}
+	priv->configured = 1;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
-	.dev_conf = NULL,
-	.dev_close = NULL,
+	.dev_conf = mlx5_vdpa_dev_config,
+	.dev_close = mlx5_vdpa_dev_close,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
@@ -321,12 +368,15 @@
 			break;
 		}
 	}
-	if (found) {
+	if (found)
 		TAILQ_REMOVE(&priv_list, priv, next);
+	pthread_mutex_unlock(&priv_list_lock);
+	if (found) {
+		if (priv->configured)
+			mlx5_vdpa_dev_close(priv->vid);
 		mlx5_glue->close_device(priv->ctx);
 		rte_free(priv);
 	}
-	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 527436d..824e174 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -98,6 +98,7 @@ struct mlx5_vdpa_steer {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	uint8_t configured;
 	int id; /* vDPA device id. */
 	int vid; /* vhost device id. */
 	struct ibv_context *ctx; /* Device context. */
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (11 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 12/13] vdpa/mlx5: support close and config operations Matan Azrad
@ 2020-02-02 16:03     ` Matan Azrad
  2020-02-03  9:27       ` Maxime Coquelin
  2020-02-03  8:34     ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Maxime Coquelin
  2020-02-03 16:42     ` Maxime Coquelin
  14 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-02-02 16:03 UTC (permalink / raw)
  To: dev, Viacheslav Ovsiienko; +Cc: Maxime Coquelin
In order to support virtio queue creation by the FW, ROCE mode
should be disabled in the device.
Do it by netlink which is like the devlink tool commands:
	1. devlink dev param set pci/[pci] name enable_roce value false
	   cmode driverinit
    	2. devlink dev reload pci/[pci]
Or by sysfs which is like:
	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
The IB device is matched again after ROCE disabling.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/Makefile    |   2 +-
 drivers/vdpa/mlx5/meson.build |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c | 191 ++++++++++++++++++++++++++++++++++--------
 3 files changed, 160 insertions(+), 35 deletions(-)
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index d4a544c..7153217 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -29,7 +29,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 LDLIBS += -lrte_common_mlx5
-LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_bus_pci -lrte_sched
+LDLIBS += -lrte_eal -lrte_vhost -lrte_kvargs -lrte_pci -lrte_bus_pci -lrte_sched
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index bb96dad..9c152e5 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -9,7 +9,7 @@ endif
 
 fmt_name = 'mlx5_vdpa'
 allow_experimental_apis = true
-deps += ['hash', 'common_mlx5', 'vhost', 'bus_pci', 'eal', 'sched']
+deps += ['hash', 'common_mlx5', 'vhost', 'pci', 'bus_pci', 'eal', 'sched']
 sources = files(
 	'mlx5_vdpa.c',
 	'mlx5_vdpa_mem.c',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 57619d2..710f305 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -1,15 +1,19 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2019 Mellanox Technologies, Ltd
  */
+#include <unistd.h>
+
 #include <rte_malloc.h>
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_bus_pci.h>
+#include <rte_pci.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
 #include <mlx5_devx_cmds.h>
 #include <mlx5_prm.h>
+#include <mlx5_nl.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
@@ -228,6 +232,145 @@
 	.get_notify_area = NULL,
 };
 
+static struct ibv_device *
+mlx5_vdpa_get_ib_device_match(struct rte_pci_addr *addr)
+{
+	int n;
+	struct ibv_device **ibv_list = mlx5_glue->get_device_list(&n);
+	struct ibv_device *ibv_match = NULL;
+
+	if (!ibv_list) {
+		rte_errno = ENOSYS;
+		return NULL;
+	}
+	while (n-- > 0) {
+		struct rte_pci_addr pci_addr;
+
+		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[n]->name);
+		if (mlx5_dev_to_pci_addr(ibv_list[n]->ibdev_path, &pci_addr))
+			continue;
+		if (memcmp(addr, &pci_addr, sizeof(pci_addr)))
+			continue;
+		ibv_match = ibv_list[n];
+		break;
+	}
+	if (!ibv_match)
+		rte_errno = ENOENT;
+	mlx5_glue->free_device_list(ibv_list);
+	return ibv_match;
+}
+
+/* Try to disable ROCE by Netlink\Devlink. */
+static int
+mlx5_vdpa_nl_roce_disable(const char *addr)
+{
+	int nlsk_fd = mlx5_nl_init(NETLINK_GENERIC);
+	int devlink_id;
+	int enable;
+	int ret;
+
+	if (nlsk_fd < 0)
+		return nlsk_fd;
+	devlink_id = mlx5_nl_devlink_family_id_get(nlsk_fd);
+	if (devlink_id < 0) {
+		ret = devlink_id;
+		DRV_LOG(DEBUG, "Failed to get devlink id for ROCE operations by"
+			" Netlink.");
+		goto close;
+	}
+	ret = mlx5_nl_enable_roce_get(nlsk_fd, devlink_id, addr, &enable);
+	if (ret) {
+		DRV_LOG(DEBUG, "Failed to get ROCE enable by Netlink: %d.",
+			ret);
+		goto close;
+	} else if (!enable) {
+		DRV_LOG(INFO, "ROCE has already disabled(Netlink).");
+		goto close;
+	}
+	ret = mlx5_nl_enable_roce_set(nlsk_fd, devlink_id, addr, 0);
+	if (ret)
+		DRV_LOG(DEBUG, "Failed to disable ROCE by Netlink: %d.", ret);
+	else
+		DRV_LOG(INFO, "ROCE is disabled by Netlink successfully.");
+close:
+	close(nlsk_fd);
+	return ret;
+}
+
+/* Try to disable ROCE by sysfs. */
+static int
+mlx5_vdpa_sys_roce_disable(const char *addr)
+{
+	FILE *file_o;
+	int enable;
+	int ret;
+
+	MKSTR(file_p, "/sys/bus/pci/devices/%s/roce_enable", addr);
+	file_o = fopen(file_p, "rb");
+	if (!file_o) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	ret = fscanf(file_o, "%d", &enable);
+	if (ret != 1) {
+		rte_errno = EINVAL;
+		ret = EINVAL;
+		goto close;
+	} else if (!enable) {
+		ret = 0;
+		DRV_LOG(INFO, "ROCE has already disabled(sysfs).");
+		goto close;
+	}
+	fclose(file_o);
+	file_o = fopen(file_p, "wb");
+	if (!file_o) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	fprintf(file_o, "0\n");
+	ret = 0;
+close:
+	if (ret)
+		DRV_LOG(DEBUG, "Failed to disable ROCE by sysfs: %d.", ret);
+	else
+		DRV_LOG(INFO, "ROCE is disabled by sysfs successfully.");
+	fclose(file_o);
+	return ret;
+}
+
+#define MLX5_VDPA_MAX_RETRIES 20
+#define MLX5_VDPA_USEC 1000
+static int
+mlx5_vdpa_roce_disable(struct rte_pci_addr *addr, struct ibv_device **ibv)
+{
+	char addr_name[64] = {0};
+
+	rte_pci_device_name(addr, addr_name, sizeof(addr_name));
+	/* Firstly try to disable ROCE by Netlink and fallback to sysfs. */
+	if (mlx5_vdpa_nl_roce_disable(addr_name) == 0 ||
+	    mlx5_vdpa_sys_roce_disable(addr_name) == 0) {
+		/*
+		 * Succeed to disable ROCE, wait for the IB device to appear
+		 * again after reload.
+		 */
+		int r;
+		struct ibv_device *ibv_new;
+
+		for (r = MLX5_VDPA_MAX_RETRIES; r; r--) {
+			ibv_new = mlx5_vdpa_get_ib_device_match(addr);
+			if (ibv_new) {
+				*ibv = ibv_new;
+				return 0;
+			}
+			usleep(MLX5_VDPA_USEC);
+		}
+		DRV_LOG(ERR, "Cannot much device %s after ROCE disable, "
+			"retries exceed %d", addr_name, MLX5_VDPA_MAX_RETRIES);
+		rte_errno = EAGAIN;
+	}
+	return -rte_errno;
+}
+
 /**
  * DPDK callback to register a PCI device.
  *
@@ -246,8 +389,7 @@
 mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		    struct rte_pci_device *pci_dev __rte_unused)
 {
-	struct ibv_device **ibv_list;
-	struct ibv_device *ibv_match = NULL;
+	struct ibv_device *ibv;
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct ibv_context *ctx = NULL;
 	struct mlx5_hca_attr attr;
@@ -258,42 +400,25 @@
 			" driver.");
 		return 1;
 	}
-	errno = 0;
-	ibv_list = mlx5_glue->get_device_list(&ret);
-	if (!ibv_list) {
-		rte_errno = ENOSYS;
-		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
+	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
+	if (!ibv) {
+		DRV_LOG(ERR, "No matching IB device for PCI slot "
+			PCI_PRI_FMT ".", pci_dev->addr.domain,
+			pci_dev->addr.bus, pci_dev->addr.devid,
+			pci_dev->addr.function);
 		return -rte_errno;
-	}
-	while (ret-- > 0) {
-		struct rte_pci_addr pci_addr;
-
-		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
-		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
-			continue;
-		if (pci_dev->addr.domain != pci_addr.domain ||
-		    pci_dev->addr.bus != pci_addr.bus ||
-		    pci_dev->addr.devid != pci_addr.devid ||
-		    pci_dev->addr.function != pci_addr.function)
-			continue;
+	} else {
 		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
-			ibv_list[ret]->name);
-		ibv_match = ibv_list[ret];
-		break;
+			ibv->name);
 	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!ibv_match) {
-		DRV_LOG(ERR, "No matching IB device for PCI slot "
-			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
-		rte_errno = ENOENT;
-		return -rte_errno;
+	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
+		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
+			ibv->name);
+		//return -rte_errno;
 	}
-	ctx = mlx5_glue->dv_open_device(ibv_match);
+	ctx = mlx5_glue->dv_open_device(ibv);
 	if (!ctx) {
-		DRV_LOG(ERR, "Failed to open IB device \"%s\".",
-			ibv_match->name);
+		DRV_LOG(ERR, "Failed to open IB device \"%s\".", ibv->name);
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (12 preceding siblings ...)
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE Matan Azrad
@ 2020-02-03  8:34     ` Maxime Coquelin
  2020-02-03 16:42     ` Maxime Coquelin
  14 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03  8:34 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
Thanks Matan, it looks OK to me.
On 2/2/20 5:03 PM, Matan Azrad wrote:
> v2:
> - Reorder patches for 2 serieses - this is the seccond part of the previous series splitting.
> - Fix spelling and per patch complition issues.
> - moved to use claim_zero instead of pure asserts.
> 
> v3:
> - Address Maxime C comments.
> - Adjust to the last MLX5_ASSERT introduction.
>  
> 
> Matan Azrad (13):
>   drivers: introduce mlx5 vDPA driver
>   vdpa/mlx5: support queues number operation
>   vdpa/mlx5: support features get operations
>   vdpa/mlx5: prepare memory regions
>   vdpa/mlx5: prepare HW queues
>   vdpa/mlx5: prepare virtio queues
>   vdpa/mlx5: support stateless offloads
>   vdpa/mlx5: add basic steering configurations
>   vdpa/mlx5: support queue state operation
>   vdpa/mlx5: map doorbell
>   vdpa/mlx5: support live migration
>   vdpa/mlx5: support close and config operations
>   vdpa/mlx5: disable ROCE
> 
>  MAINTAINERS                                     |   7 +
>  config/common_base                              |   5 +
>  doc/guides/rel_notes/release_20_02.rst          |   5 +
>  doc/guides/vdpadevs/features/mlx5.ini           |  27 ++
>  doc/guides/vdpadevs/index.rst                   |   1 +
>  doc/guides/vdpadevs/mlx5.rst                    | 111 +++++
>  drivers/common/Makefile                         |   2 +-
>  drivers/common/mlx5/Makefile                    |  17 +-
>  drivers/common/mlx5/mlx5_prm.h                  |   4 +
>  drivers/meson.build                             |   8 +-
>  drivers/vdpa/Makefile                           |   2 +
>  drivers/vdpa/meson.build                        |   3 +-
>  drivers/vdpa/mlx5/Makefile                      |  58 +++
>  drivers/vdpa/mlx5/meson.build                   |  38 ++
>  drivers/vdpa/mlx5/mlx5_vdpa.c                   | 563 ++++++++++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa.h                   | 309 +++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_event.c             | 400 +++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_lm.c                | 129 ++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_mem.c               | 346 +++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_steer.c             | 283 ++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             | 388 ++++++++++++++++
>  drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
>  mk/rte.app.mk                                   |  15 +-
>  24 files changed, 2727 insertions(+), 17 deletions(-)
>  create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
>  create mode 100644 doc/guides/vdpadevs/mlx5.rst
>  create mode 100644 drivers/vdpa/mlx5/Makefile
>  create mode 100644 drivers/vdpa/mlx5/meson.build
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>  create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
> 
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE
  2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE Matan Azrad
@ 2020-02-03  9:27       ` Maxime Coquelin
  2020-02-03 11:00         ` Maxime Coquelin
  0 siblings, 1 reply; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03  9:27 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
Hi Matan,
On 2/2/20 5:03 PM, Matan Azrad wrote:
> In order to support virtio queue creation by the FW, ROCE mode
> should be disabled in the device.
> 
> Do it by netlink which is like the devlink tool commands:
> 	1. devlink dev param set pci/[pci] name enable_roce value false
> 	   cmode driverinit
>     	2. devlink dev reload pci/[pci]
> Or by sysfs which is like:
> 	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
> 
> The IB device is matched again after ROCE disabling.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/vdpa/mlx5/Makefile    |   2 +-
>  drivers/vdpa/mlx5/meson.build |   2 +-
>  drivers/vdpa/mlx5/mlx5_vdpa.c | 191 ++++++++++++++++++++++++++++++++++--------
>  3 files changed, 160 insertions(+), 35 deletions(-)
...
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 57619d2..710f305 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
...
> @@ -246,8 +389,7 @@
>  mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>  		    struct rte_pci_device *pci_dev __rte_unused)
>  {
> -	struct ibv_device **ibv_list;
> -	struct ibv_device *ibv_match = NULL;
> +	struct ibv_device *ibv;
>  	struct mlx5_vdpa_priv *priv = NULL;
>  	struct ibv_context *ctx = NULL;
>  	struct mlx5_hca_attr attr;
> @@ -258,42 +400,25 @@
>  			" driver.");
>  		return 1;
>  	}
> -	errno = 0;
> -	ibv_list = mlx5_glue->get_device_list(&ret);
> -	if (!ibv_list) {
> -		rte_errno = ENOSYS;
> -		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
> +	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
> +	if (!ibv) {
> +		DRV_LOG(ERR, "No matching IB device for PCI slot "
> +			PCI_PRI_FMT ".", pci_dev->addr.domain,
> +			pci_dev->addr.bus, pci_dev->addr.devid,
> +			pci_dev->addr.function);
>  		return -rte_errno;
> -	}
> -	while (ret-- > 0) {
> -		struct rte_pci_addr pci_addr;
> -
> -		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
> -		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
> -			continue;
> -		if (pci_dev->addr.domain != pci_addr.domain ||
> -		    pci_dev->addr.bus != pci_addr.bus ||
> -		    pci_dev->addr.devid != pci_addr.devid ||
> -		    pci_dev->addr.function != pci_addr.function)
> -			continue;
> +	} else {
>  		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
> -			ibv_list[ret]->name);
> -		ibv_match = ibv_list[ret];
> -		break;
> +			ibv->name);
>  	}
> -	mlx5_glue->free_device_list(ibv_list);
> -	if (!ibv_match) {
> -		DRV_LOG(ERR, "No matching IB device for PCI slot "
> -			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
> -			pci_dev->addr.domain, pci_dev->addr.bus,
> -			pci_dev->addr.devid, pci_dev->addr.function);
> -		rte_errno = ENOENT;
> -		return -rte_errno;
> +	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
> +		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
> +			ibv->name);
> +		//return -rte_errno;
>  	}
Is that commented return expected?
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE
  2020-02-03  9:27       ` Maxime Coquelin
@ 2020-02-03 11:00         ` Maxime Coquelin
  2020-02-03 12:44           ` Matan Azrad
  0 siblings, 1 reply; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03 11:00 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 2/3/20 10:27 AM, Maxime Coquelin wrote:
> Hi Matan,
> 
> On 2/2/20 5:03 PM, Matan Azrad wrote:
>> In order to support virtio queue creation by the FW, ROCE mode
>> should be disabled in the device.
>>
>> Do it by netlink which is like the devlink tool commands:
>> 	1. devlink dev param set pci/[pci] name enable_roce value false
>> 	   cmode driverinit
>>     	2. devlink dev reload pci/[pci]
>> Or by sysfs which is like:
>> 	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
>>
>> The IB device is matched again after ROCE disabling.
>>
>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/vdpa/mlx5/Makefile    |   2 +-
>>  drivers/vdpa/mlx5/meson.build |   2 +-
>>  drivers/vdpa/mlx5/mlx5_vdpa.c | 191 ++++++++++++++++++++++++++++++++++--------
>>  3 files changed, 160 insertions(+), 35 deletions(-)
> ...
>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
>> index 57619d2..710f305 100644
>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> 
> ...
> 
>> @@ -246,8 +389,7 @@
>>  mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>>  		    struct rte_pci_device *pci_dev __rte_unused)
>>  {
>> -	struct ibv_device **ibv_list;
>> -	struct ibv_device *ibv_match = NULL;
>> +	struct ibv_device *ibv;
>>  	struct mlx5_vdpa_priv *priv = NULL;
>>  	struct ibv_context *ctx = NULL;
>>  	struct mlx5_hca_attr attr;
>> @@ -258,42 +400,25 @@
>>  			" driver.");
>>  		return 1;
>>  	}
>> -	errno = 0;
>> -	ibv_list = mlx5_glue->get_device_list(&ret);
>> -	if (!ibv_list) {
>> -		rte_errno = ENOSYS;
>> -		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs loaded?");
>> +	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
>> +	if (!ibv) {
>> +		DRV_LOG(ERR, "No matching IB device for PCI slot "
>> +			PCI_PRI_FMT ".", pci_dev->addr.domain,
>> +			pci_dev->addr.bus, pci_dev->addr.devid,
>> +			pci_dev->addr.function);
>>  		return -rte_errno;
>> -	}
>> -	while (ret-- > 0) {
>> -		struct rte_pci_addr pci_addr;
>> -
>> -		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]->name);
>> -		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path, &pci_addr))
>> -			continue;
>> -		if (pci_dev->addr.domain != pci_addr.domain ||
>> -		    pci_dev->addr.bus != pci_addr.bus ||
>> -		    pci_dev->addr.devid != pci_addr.devid ||
>> -		    pci_dev->addr.function != pci_addr.function)
>> -			continue;
>> +	} else {
>>  		DRV_LOG(INFO, "PCI information matches for device \"%s\".",
>> -			ibv_list[ret]->name);
>> -		ibv_match = ibv_list[ret];
>> -		break;
>> +			ibv->name);
>>  	}
>> -	mlx5_glue->free_device_list(ibv_list);
>> -	if (!ibv_match) {
>> -		DRV_LOG(ERR, "No matching IB device for PCI slot "
>> -			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
>> -			pci_dev->addr.domain, pci_dev->addr.bus,
>> -			pci_dev->addr.devid, pci_dev->addr.function);
>> -		rte_errno = ENOENT;
>> -		return -rte_errno;
>> +	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
>> +		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
>> +			ibv->name);
>> +		//return -rte_errno;
>>  	}
> 
> Is that commented return expected?
> 
Please let me know if I should remove the comment, or remove the return.
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE
  2020-02-03 11:00         ` Maxime Coquelin
@ 2020-02-03 12:44           ` Matan Azrad
  2020-02-03 12:45             ` Maxime Coquelin
  0 siblings, 1 reply; 174+ messages in thread
From: Matan Azrad @ 2020-02-03 12:44 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Slava Ovsiienko
From: Maxime Coquelin
> Sent: Monday, February 3, 2020 1:00 PM
> To: Matan Azrad <matan@mellanox.com>; dev@dpdk.org; Slava Ovsiienko
> <viacheslavo@mellanox.com>
> Subject: Re: [PATCH v3 13/13] vdpa/mlx5: disable ROCE
> 
> 
> 
> On 2/3/20 10:27 AM, Maxime Coquelin wrote:
> > Hi Matan,
> >
> > On 2/2/20 5:03 PM, Matan Azrad wrote:
> >> In order to support virtio queue creation by the FW, ROCE mode should
> >> be disabled in the device.
> >>
> >> Do it by netlink which is like the devlink tool commands:
> >> 	1. devlink dev param set pci/[pci] name enable_roce value false
> >> 	   cmode driverinit
> >>     	2. devlink dev reload pci/[pci]
> >> Or by sysfs which is like:
> >> 	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
> >>
> >> The IB device is matched again after ROCE disabling.
> >>
> >> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >>  drivers/vdpa/mlx5/Makefile    |   2 +-
> >>  drivers/vdpa/mlx5/meson.build |   2 +-
> >>  drivers/vdpa/mlx5/mlx5_vdpa.c | 191
> >> ++++++++++++++++++++++++++++++++++--------
> >>  3 files changed, 160 insertions(+), 35 deletions(-)
> > ...
> >> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
> >> b/drivers/vdpa/mlx5/mlx5_vdpa.c index 57619d2..710f305 100644
> >> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> >> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> >
> > ...
> >
> >> @@ -246,8 +389,7 @@
> >>  mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> >>  		    struct rte_pci_device *pci_dev __rte_unused)  {
> >> -	struct ibv_device **ibv_list;
> >> -	struct ibv_device *ibv_match = NULL;
> >> +	struct ibv_device *ibv;
> >>  	struct mlx5_vdpa_priv *priv = NULL;
> >>  	struct ibv_context *ctx = NULL;
> >>  	struct mlx5_hca_attr attr;
> >> @@ -258,42 +400,25 @@
> >>  			" driver.");
> >>  		return 1;
> >>  	}
> >> -	errno = 0;
> >> -	ibv_list = mlx5_glue->get_device_list(&ret);
> >> -	if (!ibv_list) {
> >> -		rte_errno = ENOSYS;
> >> -		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs
> loaded?");
> >> +	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
> >> +	if (!ibv) {
> >> +		DRV_LOG(ERR, "No matching IB device for PCI slot "
> >> +			PCI_PRI_FMT ".", pci_dev->addr.domain,
> >> +			pci_dev->addr.bus, pci_dev->addr.devid,
> >> +			pci_dev->addr.function);
> >>  		return -rte_errno;
> >> -	}
> >> -	while (ret-- > 0) {
> >> -		struct rte_pci_addr pci_addr;
> >> -
> >> -		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]-
> >name);
> >> -		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path,
> &pci_addr))
> >> -			continue;
> >> -		if (pci_dev->addr.domain != pci_addr.domain ||
> >> -		    pci_dev->addr.bus != pci_addr.bus ||
> >> -		    pci_dev->addr.devid != pci_addr.devid ||
> >> -		    pci_dev->addr.function != pci_addr.function)
> >> -			continue;
> >> +	} else {
> >>  		DRV_LOG(INFO, "PCI information matches for device
> \"%s\".",
> >> -			ibv_list[ret]->name);
> >> -		ibv_match = ibv_list[ret];
> >> -		break;
> >> +			ibv->name);
> >>  	}
> >> -	mlx5_glue->free_device_list(ibv_list);
> >> -	if (!ibv_match) {
> >> -		DRV_LOG(ERR, "No matching IB device for PCI slot "
> >> -			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
> >> -			pci_dev->addr.domain, pci_dev->addr.bus,
> >> -			pci_dev->addr.devid, pci_dev->addr.function);
> >> -		rte_errno = ENOENT;
> >> -		return -rte_errno;
> >> +	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
> >> +		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
> >> +			ibv->name);
> >> +		//return -rte_errno;
> >>  	}
> >
> > Is that commented return expected?
> >
> 
> Please let me know if I should remove the comment, or remove the return.
Sorry, forgot the comment , good catch!
It should not be comment just need to remove "//".
Can you do it in integration?
> 
> Thanks,
> Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE
  2020-02-03 12:44           ` Matan Azrad
@ 2020-02-03 12:45             ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03 12:45 UTC (permalink / raw)
  To: Matan Azrad, dev, Slava Ovsiienko
On 2/3/20 1:44 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin
>> Sent: Monday, February 3, 2020 1:00 PM
>> To: Matan Azrad <matan@mellanox.com>; dev@dpdk.org; Slava Ovsiienko
>> <viacheslavo@mellanox.com>
>> Subject: Re: [PATCH v3 13/13] vdpa/mlx5: disable ROCE
>>
>>
>>
>> On 2/3/20 10:27 AM, Maxime Coquelin wrote:
>>> Hi Matan,
>>>
>>> On 2/2/20 5:03 PM, Matan Azrad wrote:
>>>> In order to support virtio queue creation by the FW, ROCE mode should
>>>> be disabled in the device.
>>>>
>>>> Do it by netlink which is like the devlink tool commands:
>>>> 	1. devlink dev param set pci/[pci] name enable_roce value false
>>>> 	   cmode driverinit
>>>>     	2. devlink dev reload pci/[pci]
>>>> Or by sysfs which is like:
>>>> 	echo 0 >  /sys/bus/pci/devices/[pci]/roce_enable
>>>>
>>>> The IB device is matched again after ROCE disabling.
>>>>
>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>>>> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>>  drivers/vdpa/mlx5/Makefile    |   2 +-
>>>>  drivers/vdpa/mlx5/meson.build |   2 +-
>>>>  drivers/vdpa/mlx5/mlx5_vdpa.c | 191
>>>> ++++++++++++++++++++++++++++++++++--------
>>>>  3 files changed, 160 insertions(+), 35 deletions(-)
>>> ...
>>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>> b/drivers/vdpa/mlx5/mlx5_vdpa.c index 57619d2..710f305 100644
>>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>
>>> ...
>>>
>>>> @@ -246,8 +389,7 @@
>>>>  mlx5_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>>>>  		    struct rte_pci_device *pci_dev __rte_unused)  {
>>>> -	struct ibv_device **ibv_list;
>>>> -	struct ibv_device *ibv_match = NULL;
>>>> +	struct ibv_device *ibv;
>>>>  	struct mlx5_vdpa_priv *priv = NULL;
>>>>  	struct ibv_context *ctx = NULL;
>>>>  	struct mlx5_hca_attr attr;
>>>> @@ -258,42 +400,25 @@
>>>>  			" driver.");
>>>>  		return 1;
>>>>  	}
>>>> -	errno = 0;
>>>> -	ibv_list = mlx5_glue->get_device_list(&ret);
>>>> -	if (!ibv_list) {
>>>> -		rte_errno = ENOSYS;
>>>> -		DRV_LOG(ERR, "Failed to get device list, is ib_uverbs
>> loaded?");
>>>> +	ibv = mlx5_vdpa_get_ib_device_match(&pci_dev->addr);
>>>> +	if (!ibv) {
>>>> +		DRV_LOG(ERR, "No matching IB device for PCI slot "
>>>> +			PCI_PRI_FMT ".", pci_dev->addr.domain,
>>>> +			pci_dev->addr.bus, pci_dev->addr.devid,
>>>> +			pci_dev->addr.function);
>>>>  		return -rte_errno;
>>>> -	}
>>>> -	while (ret-- > 0) {
>>>> -		struct rte_pci_addr pci_addr;
>>>> -
>>>> -		DRV_LOG(DEBUG, "Checking device \"%s\"..", ibv_list[ret]-
>>> name);
>>>> -		if (mlx5_dev_to_pci_addr(ibv_list[ret]->ibdev_path,
>> &pci_addr))
>>>> -			continue;
>>>> -		if (pci_dev->addr.domain != pci_addr.domain ||
>>>> -		    pci_dev->addr.bus != pci_addr.bus ||
>>>> -		    pci_dev->addr.devid != pci_addr.devid ||
>>>> -		    pci_dev->addr.function != pci_addr.function)
>>>> -			continue;
>>>> +	} else {
>>>>  		DRV_LOG(INFO, "PCI information matches for device
>> \"%s\".",
>>>> -			ibv_list[ret]->name);
>>>> -		ibv_match = ibv_list[ret];
>>>> -		break;
>>>> +			ibv->name);
>>>>  	}
>>>> -	mlx5_glue->free_device_list(ibv_list);
>>>> -	if (!ibv_match) {
>>>> -		DRV_LOG(ERR, "No matching IB device for PCI slot "
>>>> -			"%" SCNx32 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 ".",
>>>> -			pci_dev->addr.domain, pci_dev->addr.bus,
>>>> -			pci_dev->addr.devid, pci_dev->addr.function);
>>>> -		rte_errno = ENOENT;
>>>> -		return -rte_errno;
>>>> +	if (mlx5_vdpa_roce_disable(&pci_dev->addr, &ibv) != 0) {
>>>> +		DRV_LOG(WARNING, "Failed to disable ROCE for \"%s\".",
>>>> +			ibv->name);
>>>> +		//return -rte_errno;
>>>>  	}
>>>
>>> Is that commented return expected?
>>>
>>
>> Please let me know if I should remove the comment, or remove the return.
> 
> Sorry, forgot the comment , good catch!
> It should not be comment just need to remove "//".
Thanks Matan.
> Can you do it in integration?
Sure, will do now.
Maxime
>>
>> Thanks,
>> Maxime
> 
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver
  2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
                     ` (13 preceding siblings ...)
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
@ 2020-02-03 13:24   ` Maxime Coquelin
  2020-02-03 16:41     ` Maxime Coquelin
  14 siblings, 1 reply; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03 13:24 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 1/29/20 11:08 AM, Matan Azrad wrote:
> v2:
> - Reorder patches for 2 serieses - this is the seccond part of the previous series splitting.
> - Fix spelling and per patch complition issues.
> - moved to use claim_zero instead of pure asserts.
> 
> Matan Azrad (13):
>   drivers: introduce mlx5 vDPA driver
>   vdpa/mlx5: support queues number operation
>   vdpa/mlx5: support features get operations
>   vdpa/mlx5: prepare memory regions
>   vdpa/mlx5: prepare HW queues
>   vdpa/mlx5: prepare virtio queues
>   vdpa/mlx5: support stateless offloads
>   vdpa/mlx5: add basic steering configurations
>   vdpa/mlx5: support queue state operation
>   vdpa/mlx5: map doorbell
>   vdpa/mlx5: support live migration
>   vdpa/mlx5: support close and config operations
>   vdpa/mlx5: disable ROCE
> 
>  MAINTAINERS                                     |   7 +
>  config/common_base                              |   5 +
>  doc/guides/rel_notes/release_20_02.rst          |   5 +
>  doc/guides/vdpadevs/features/mlx5.ini           |  27 ++
>  doc/guides/vdpadevs/index.rst                   |   1 +
>  doc/guides/vdpadevs/mlx5.rst                    | 111 +++++
>  drivers/common/Makefile                         |   2 +-
>  drivers/common/mlx5/Makefile                    |  17 +-
>  drivers/common/mlx5/mlx5_prm.h                  |   4 +
>  drivers/meson.build                             |   8 +-
>  drivers/vdpa/Makefile                           |   2 +
>  drivers/vdpa/meson.build                        |   3 +-
>  drivers/vdpa/mlx5/Makefile                      |  43 ++
>  drivers/vdpa/mlx5/meson.build                   |  34 ++
>  drivers/vdpa/mlx5/mlx5_vdpa.c                   | 563 ++++++++++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa.h                   | 303 +++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_event.c             | 399 +++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_lm.c                | 130 ++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_mem.c               | 346 +++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_steer.c             | 283 ++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             | 388 ++++++++++++++++
>  drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
>  mk/rte.app.mk                                   |  15 +-
>  24 files changed, 2702 insertions(+), 17 deletions(-)
>  create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
>  create mode 100644 doc/guides/vdpadevs/mlx5.rst
>  create mode 100644 drivers/vdpa/mlx5/Makefile
>  create mode 100644 drivers/vdpa/mlx5/meson.build
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>  create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
> 
Applied to dpdk-next-virtio.
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver
  2020-02-03 13:24   ` [dpdk-dev] [PATCH v2 " Maxime Coquelin
@ 2020-02-03 16:41     ` Maxime Coquelin
  0 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03 16:41 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 2/3/20 2:24 PM, Maxime Coquelin wrote:
> 
> 
> On 1/29/20 11:08 AM, Matan Azrad wrote:
>> v2:
>> - Reorder patches for 2 serieses - this is the seccond part of the previous series splitting.
>> - Fix spelling and per patch complition issues.
>> - moved to use claim_zero instead of pure asserts.
>>
>> Matan Azrad (13):
>>   drivers: introduce mlx5 vDPA driver
>>   vdpa/mlx5: support queues number operation
>>   vdpa/mlx5: support features get operations
>>   vdpa/mlx5: prepare memory regions
>>   vdpa/mlx5: prepare HW queues
>>   vdpa/mlx5: prepare virtio queues
>>   vdpa/mlx5: support stateless offloads
>>   vdpa/mlx5: add basic steering configurations
>>   vdpa/mlx5: support queue state operation
>>   vdpa/mlx5: map doorbell
>>   vdpa/mlx5: support live migration
>>   vdpa/mlx5: support close and config operations
>>   vdpa/mlx5: disable ROCE
>>
>>  MAINTAINERS                                     |   7 +
>>  config/common_base                              |   5 +
>>  doc/guides/rel_notes/release_20_02.rst          |   5 +
>>  doc/guides/vdpadevs/features/mlx5.ini           |  27 ++
>>  doc/guides/vdpadevs/index.rst                   |   1 +
>>  doc/guides/vdpadevs/mlx5.rst                    | 111 +++++
>>  drivers/common/Makefile                         |   2 +-
>>  drivers/common/mlx5/Makefile                    |  17 +-
>>  drivers/common/mlx5/mlx5_prm.h                  |   4 +
>>  drivers/meson.build                             |   8 +-
>>  drivers/vdpa/Makefile                           |   2 +
>>  drivers/vdpa/meson.build                        |   3 +-
>>  drivers/vdpa/mlx5/Makefile                      |  43 ++
>>  drivers/vdpa/mlx5/meson.build                   |  34 ++
>>  drivers/vdpa/mlx5/mlx5_vdpa.c                   | 563 ++++++++++++++++++++++++
>>  drivers/vdpa/mlx5/mlx5_vdpa.h                   | 303 +++++++++++++
>>  drivers/vdpa/mlx5/mlx5_vdpa_event.c             | 399 +++++++++++++++++
>>  drivers/vdpa/mlx5/mlx5_vdpa_lm.c                | 130 ++++++
>>  drivers/vdpa/mlx5/mlx5_vdpa_mem.c               | 346 +++++++++++++++
>>  drivers/vdpa/mlx5/mlx5_vdpa_steer.c             | 283 ++++++++++++
>>  drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +
>>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             | 388 ++++++++++++++++
>>  drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
>>  mk/rte.app.mk                                   |  15 +-
>>  24 files changed, 2702 insertions(+), 17 deletions(-)
>>  create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
>>  create mode 100644 doc/guides/vdpadevs/mlx5.rst
>>  create mode 100644 drivers/vdpa/mlx5/Makefile
>>  create mode 100644 drivers/vdpa/mlx5/meson.build
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
>>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>>  create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
>>
> 
> Applied to dpdk-next-virtio.
Thanks Matan for notifying me I didn't replied to the right version of
the series.
I properly applied version 3.
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
* Re: [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver
  2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
                       ` (13 preceding siblings ...)
  2020-02-03  8:34     ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Maxime Coquelin
@ 2020-02-03 16:42     ` Maxime Coquelin
  14 siblings, 0 replies; 174+ messages in thread
From: Maxime Coquelin @ 2020-02-03 16:42 UTC (permalink / raw)
  To: Matan Azrad, dev, Viacheslav Ovsiienko
On 2/2/20 5:03 PM, Matan Azrad wrote:
> v2:
> - Reorder patches for 2 serieses - this is the seccond part of the previous series splitting.
> - Fix spelling and per patch complition issues.
> - moved to use claim_zero instead of pure asserts.
> 
> v3:
> - Address Maxime C comments.
> - Adjust to the last MLX5_ASSERT introduction.
>  
> 
> Matan Azrad (13):
>   drivers: introduce mlx5 vDPA driver
>   vdpa/mlx5: support queues number operation
>   vdpa/mlx5: support features get operations
>   vdpa/mlx5: prepare memory regions
>   vdpa/mlx5: prepare HW queues
>   vdpa/mlx5: prepare virtio queues
>   vdpa/mlx5: support stateless offloads
>   vdpa/mlx5: add basic steering configurations
>   vdpa/mlx5: support queue state operation
>   vdpa/mlx5: map doorbell
>   vdpa/mlx5: support live migration
>   vdpa/mlx5: support close and config operations
>   vdpa/mlx5: disable ROCE
> 
>  MAINTAINERS                                     |   7 +
>  config/common_base                              |   5 +
>  doc/guides/rel_notes/release_20_02.rst          |   5 +
>  doc/guides/vdpadevs/features/mlx5.ini           |  27 ++
>  doc/guides/vdpadevs/index.rst                   |   1 +
>  doc/guides/vdpadevs/mlx5.rst                    | 111 +++++
>  drivers/common/Makefile                         |   2 +-
>  drivers/common/mlx5/Makefile                    |  17 +-
>  drivers/common/mlx5/mlx5_prm.h                  |   4 +
>  drivers/meson.build                             |   8 +-
>  drivers/vdpa/Makefile                           |   2 +
>  drivers/vdpa/meson.build                        |   3 +-
>  drivers/vdpa/mlx5/Makefile                      |  58 +++
>  drivers/vdpa/mlx5/meson.build                   |  38 ++
>  drivers/vdpa/mlx5/mlx5_vdpa.c                   | 563 ++++++++++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa.h                   | 309 +++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_event.c             | 400 +++++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_lm.c                | 129 ++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_mem.c               | 346 +++++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_steer.c             | 283 ++++++++++++
>  drivers/vdpa/mlx5/mlx5_vdpa_utils.h             |  20 +
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c             | 388 ++++++++++++++++
>  drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map |   3 +
>  mk/rte.app.mk                                   |  15 +-
>  24 files changed, 2727 insertions(+), 17 deletions(-)
>  create mode 100644 doc/guides/vdpadevs/features/mlx5.ini
>  create mode 100644 doc/guides/vdpadevs/mlx5.rst
>  create mode 100644 drivers/vdpa/mlx5/Makefile
>  create mode 100644 drivers/vdpa/mlx5/meson.build
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_event.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_lm.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_mem.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_steer.c
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_utils.h
>  create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
>  create mode 100644 drivers/vdpa/mlx5/rte_pmd_mlx5_vdpa_version.map
> 
Applied to dpdk-next-virtio.
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 174+ messages in thread
end of thread, other threads:[~2020-02-03 16:42 UTC | newest]
Thread overview: 174+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-20 17:02 [dpdk-dev] [PATCH v1 00/38] Introduce mlx5 vDPA driver Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 01/38] net/mlx5: separate DevX commands interface Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 02/38] mlx5: prepare common library Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 03/38] mlx5: share the mlx5 glue reference Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 04/38] mlx5: share mlx5 PCI device detection Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 05/38] mlx5: share mlx5 devices information Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 06/38] drivers: introduce mlx5 vDPA driver Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 07/38] common/mlx5: expose vDPA DevX capabilities Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 08/38] vdpa/mlx5: support queues number operation Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 09/38] vdpa/mlx5: support features get operations Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 10/38] common/mlx5: glue null memory region allocation Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 11/38] common/mlx5: support DevX indirect mkey creation Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 12/38] common/mlx5: glue event queue query Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 13/38] common/mlx5: glue event interrupt commands Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 14/38] common/mlx5: glue UAR allocation Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 15/38] common/mlx5: add DevX command to create CQ Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 16/38] common/mlx5: glue VAR allocation Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 17/38] common/mlx5: add DevX virtio emulation commands Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 18/38] vdpa/mlx5: prepare memory regions Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 19/38] mlx5: share CQ entry check Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 20/38] vdpa/mlx5: prepare completion queues Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 21/38] vdpa/mlx5: handle completions Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 22/38] vdpa/mlx5: prepare virtio queues Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 23/38] vdpa/mlx5: support stateless offloads Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 24/38] common/mlx5: allow type configuration for DevX RQT Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 25/38] common/mlx5: add TIR fields constants Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 26/38] common/mlx5: add DevX command to modify RQT Matan Azrad
2020-01-20 17:02 ` [dpdk-dev] [PATCH v1 27/38] common/mlx5: get DevX capability for max RQT size Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 28/38] vdpa/mlx5: add basic steering configurations Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 29/38] vdpa/mlx5: support queue state operation Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 30/38] vdpa/mlx5: map doorbell Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 31/38] vdpa/mlx5: support live migration Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 32/38] vdpa/mlx5: support close and config operations Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 33/38] mlx5: skip probing according to the vDPA mode Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 34/38] net/mlx5: separate Netlink commands interface Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 35/38] net/mlx5: reduce Netlink commands dependencies Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 36/38] mlx5: share Netlink commands Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 37/38] common/mlx5: support ROCE disable through Netlink Matan Azrad
2020-01-20 17:03 ` [dpdk-dev] [PATCH v1 38/38] vdpa/mlx5: disable ROCE Matan Azrad
2020-01-28 10:05 ` [dpdk-dev] [PATCH v2 00/25] Introduce mlx5 common library Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 01/25] net/mlx5: separate DevX commands interface Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 02/25] drivers: introduce mlx5 common library Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 05/25] common/mlx5: share mlx5 devices information Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 06/25] common/mlx5: share CQ entry check Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 08/25] common/mlx5: glue null memory region allocation Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 10/25] common/mlx5: glue event queue query Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 11/25] common/mlx5: glue event interrupt commands Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 12/25] common/mlx5: glue UAR allocation Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 14/25] common/mlx5: glue VAR allocation Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 15/25] common/mlx5: add DevX virtq commands Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 18/25] common/mlx5: add TIR field constants Matan Azrad
2020-01-28 10:05   ` [dpdk-dev] [PATCH v2 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 21/25] net/mlx5: select driver by vDPA device argument Matan Azrad
2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 22/25] net/mlx5: separate Netlink command interface Matan Azrad
2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 24/25] common/mlx5: share Netlink commands Matan Azrad
2020-01-28 10:06   ` [dpdk-dev] [PATCH v2 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
2020-01-28 16:27   ` [dpdk-dev] [PATCH v3 00/25] Introduce mlx5 common library Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 01/25] net/mlx5: separate DevX commands interface Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 02/25] drivers: introduce mlx5 common library Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 05/25] common/mlx5: share mlx5 devices information Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 06/25] common/mlx5: share CQ entry check Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 08/25] common/mlx5: glue null memory region allocation Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 10/25] common/mlx5: glue event queue query Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 11/25] common/mlx5: glue event interrupt commands Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 12/25] common/mlx5: glue UAR allocation Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 14/25] common/mlx5: glue VAR allocation Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 15/25] common/mlx5: add DevX virtq commands Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 18/25] common/mlx5: add TIR field constants Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 21/25] net/mlx5: select driver by vDPA device argument Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 22/25] net/mlx5: separate Netlink command interface Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 24/25] common/mlx5: share Netlink commands Matan Azrad
2020-01-28 16:27     ` [dpdk-dev] [PATCH v3 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
2020-01-29 12:38     ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 01/25] net/mlx5: separate DevX commands interface Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 02/25] drivers: introduce mlx5 common library Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 03/25] common/mlx5: share the mlx5 glue reference Matan Azrad
2020-01-30  8:10         ` Matan Azrad
2020-01-30  8:38           ` Raslan Darawsheh
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 04/25] common/mlx5: share mlx5 PCI device detection Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 05/25] common/mlx5: share mlx5 devices information Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 06/25] common/mlx5: share CQ entry check Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 07/25] common/mlx5: add query vDPA DevX capabilities Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 08/25] common/mlx5: glue null memory region allocation Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 09/25] common/mlx5: support DevX indirect mkey creation Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 10/25] common/mlx5: glue event queue query Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 11/25] common/mlx5: glue event interrupt commands Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 12/25] common/mlx5: glue UAR allocation Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 13/25] common/mlx5: add DevX command to create CQ Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 14/25] common/mlx5: glue VAR allocation Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 15/25] common/mlx5: add DevX virtq commands Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 16/25] common/mlx5: add support for DevX QP operations Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 17/25] common/mlx5: allow type configuration for DevX RQT Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 18/25] common/mlx5: add TIR field constants Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 19/25] common/mlx5: add DevX command to modify RQT Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 20/25] common/mlx5: get DevX capability for max RQT size Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 21/25] net/mlx5: select driver by class device argument Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 22/25] net/mlx5: separate Netlink command interface Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 23/25] net/mlx5: reduce Netlink commands dependencies Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 24/25] common/mlx5: share Netlink commands Matan Azrad
2020-01-29 12:38       ` [dpdk-dev] [PATCH v4 25/25] common/mlx5: support ROCE disable through Netlink Matan Azrad
2020-01-30 12:26       ` [dpdk-dev] [PATCH v4 00/25] Introduce mlx5 common library Raslan Darawsheh
2020-01-29 10:08 ` [dpdk-dev] [PATCH v2 00/13] Introduce mlx5 vDPA driver Matan Azrad
2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 01/13] drivers: introduce " Matan Azrad
2020-01-30 14:38     ` Maxime Coquelin
2020-02-01 17:53       ` Matan Azrad
2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 02/13] vdpa/mlx5: support queues number operation Matan Azrad
2020-01-30 14:46     ` Maxime Coquelin
2020-01-29 10:08   ` [dpdk-dev] [PATCH v2 03/13] vdpa/mlx5: support features get operations Matan Azrad
2020-01-30 14:50     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
2020-01-30 17:39     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
2020-01-30 18:17     ` Maxime Coquelin
2020-01-31  6:56       ` Matan Azrad
2020-01-31 14:47         ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
2020-01-30 20:00     ` Maxime Coquelin
2020-01-31  7:34       ` Matan Azrad
2020-01-31 14:46         ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
2020-01-30 20:08     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
2020-01-31 15:10     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 09/13] vdpa/mlx5: support queue state operation Matan Azrad
2020-01-31 15:32     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 10/13] vdpa/mlx5: map doorbell Matan Azrad
2020-01-31 15:40     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 11/13] vdpa/mlx5: support live migration Matan Azrad
2020-01-31 16:01     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 12/13] vdpa/mlx5: support close and config operations Matan Azrad
2020-01-31 16:06     ` Maxime Coquelin
2020-01-29 10:09   ` [dpdk-dev] [PATCH v2 13/13] vdpa/mlx5: disable ROCE Matan Azrad
2020-01-31 16:42     ` Maxime Coquelin
2020-02-02 16:03   ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 01/13] drivers: introduce " Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 02/13] vdpa/mlx5: support queues number operation Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 03/13] vdpa/mlx5: support features get operations Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 04/13] vdpa/mlx5: prepare memory regions Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 05/13] vdpa/mlx5: prepare HW queues Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 06/13] vdpa/mlx5: prepare virtio queues Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 07/13] vdpa/mlx5: support stateless offloads Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 08/13] vdpa/mlx5: add basic steering configurations Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 09/13] vdpa/mlx5: support queue state operation Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 10/13] vdpa/mlx5: map doorbell Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 11/13] vdpa/mlx5: support live migration Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 12/13] vdpa/mlx5: support close and config operations Matan Azrad
2020-02-02 16:03     ` [dpdk-dev] [PATCH v3 13/13] vdpa/mlx5: disable ROCE Matan Azrad
2020-02-03  9:27       ` Maxime Coquelin
2020-02-03 11:00         ` Maxime Coquelin
2020-02-03 12:44           ` Matan Azrad
2020-02-03 12:45             ` Maxime Coquelin
2020-02-03  8:34     ` [dpdk-dev] [PATCH v3 00/13] Introduce mlx5 vDPA driver Maxime Coquelin
2020-02-03 16:42     ` Maxime Coquelin
2020-02-03 13:24   ` [dpdk-dev] [PATCH v2 " Maxime Coquelin
2020-02-03 16:41     ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).