DPDK patches and discussions
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: Suanming Mou <suanmingm@nvidia.com>, Ori Kam <orika@nvidia.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Matan Azrad <matan@nvidia.com>,
	Raslan Darawsheh <rasland@nvidia.com>
Subject: Re: [dpdk-dev] [PATCH v3 2/4] regex/mlx5: add data path scattered mbuf process
Date: Tue, 30 Mar 2021 08:05:00 +0000	[thread overview]
Message-ID: <DM6PR12MB3753A90EEC7CF377873A931ADF7D9@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20210330013916.1319266-3-suanmingm@nvidia.com>

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Tuesday, March 30, 2021 4:39
> To: Ori Kam <orika@nvidia.com>
> Cc: dev@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v3 2/4] regex/mlx5: add data path scattered mbuf process
> 
Nice feature, but I would fix the typos and reword a bit:

> UMR WQE can convert multiple mkey's memory sapce to contiguous space.
Typo: "sapce?"

And rather not "convert mkey" but "present data buffers scattered within
multiple mbufs with single indirect mkey".


> Take advantage of the UMR WQE, scattered mbuf in one operation can be
> converted to an indirect mkey. The RegEx which only accepts one mkey can
> now process the whole scattered mbuf.
I would add "in one operation."

> 
> The maximum scattered mbuf can be supported in one UMR WQE is now
> defined as 64. Multiple operations scattered mbufs can be add to one UMR
Typos: "THE multiple", "added"

I would reword - "The mbufs from multiple operations can be combined into 
one UMR. Also, I would add few words what UMR is.

> WQE if there is enough space in the KLM array, since the operations can
> address their own mbuf's content by the mkey's address and length.
> However, one operation's scattered mbuf's can't be placed in two different
> UMR WQE's KLM array, if the UMR WQE's KLM does not has enough free
> space for one operation, a new UMR WQE will be required.
I would say "the extra UMR WQE will be engaged"

> 
> In case the UMR WQE's indirect mkey will be over wrapped by the SQ's WQE
> move, the meky's index used by the UMR WQE should be the index of last
typo: "meky"

> the RegEX WQE in the operations. As one operation consumes one WQE set,
> build the RegEx WQE by reverse helps address the mkey more efficiently.
typo: TO address

With best regards,
Slava

> Once the operations in one burst consumes multiple mkeys, when the mkey
> KLM array is full, the reverse WQE set index will always be the last of the new
> mkey's for the new UMR WQE.
> 
> In GGA mode, the SQ WQE's memory layout becomes UMR/NOP and RegEx
> WQE by interleave. The UMR and RegEx WQE can be called as WQE set. The
> SQ's pi and ci will also be increased as WQE set not as WQE.
> 
> For operations don't have scattered mbuf, uses the mbuf's mkey directly,
> the WQE set combination is NOP + RegEx.
> For operations have scattered mubf but share the UMR WQE with others,
> the WQE set combination is NOP + RegEx.
> For operations complete the UMR WQE, the WQE set combination is UMR +
> RegEx.
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> ---
>  doc/guides/regexdevs/mlx5.rst            |   5 +
>  doc/guides/rel_notes/release_21_05.rst   |   4 +
>  drivers/regex/mlx5/mlx5_regex.c          |   9 +
>  drivers/regex/mlx5/mlx5_regex.h          |  26 +-
>  drivers/regex/mlx5/mlx5_regex_control.c  |  43 ++-
> drivers/regex/mlx5/mlx5_regex_fastpath.c | 378
> +++++++++++++++++++++--
>  6 files changed, 407 insertions(+), 58 deletions(-)
> 
> diff --git a/doc/guides/regexdevs/mlx5.rst b/doc/guides/regexdevs/mlx5.rst
> index faaa6ac11d..45a0b96980 100644
> --- a/doc/guides/regexdevs/mlx5.rst
> +++ b/doc/guides/regexdevs/mlx5.rst
> @@ -35,6 +35,11 @@ be specified as device parameter. The RegEx device
> can be probed and used with  other Mellanox devices, by adding more
> options in the class.
>  For example: ``class=net:regex`` will probe both the net PMD and the RegEx
> PMD.
> 
> +Features
> +--------
> +
> +- Multi segments mbuf support.
> +
>  Supported NICs
>  --------------
> 
> diff --git a/doc/guides/rel_notes/release_21_05.rst
> b/doc/guides/rel_notes/release_21_05.rst
> index 3c76148b11..c3d6b8e8ae 100644
> --- a/doc/guides/rel_notes/release_21_05.rst
> +++ b/doc/guides/rel_notes/release_21_05.rst
> @@ -119,6 +119,10 @@ New Features
>    * Added command to display Rx queue used descriptor count.
>      ``show port (port_id) rxq (queue_id) desc used count``
> 
> +* **Updated Mellanox RegEx PMD.**
> +
> +  * Added support for multi segments mbuf.
> +
> 
>  Removed Items
>  -------------
> diff --git a/drivers/regex/mlx5/mlx5_regex.c
> b/drivers/regex/mlx5/mlx5_regex.c index ac5b205fa9..82c485e50c 100644
> --- a/drivers/regex/mlx5/mlx5_regex.c
> +++ b/drivers/regex/mlx5/mlx5_regex.c
> @@ -199,6 +199,13 @@ mlx5_regex_pci_probe(struct rte_pci_driver
> *pci_drv __rte_unused,
>  	}
>  	priv->regexdev->dev_ops = &mlx5_regexdev_ops;
>  	priv->regexdev->enqueue = mlx5_regexdev_enqueue;
> +#ifdef HAVE_MLX5_UMR_IMKEY
> +	if (!attr.umr_indirect_mkey_disabled &&
> +	    !attr.umr_modify_entity_size_disabled)
> +		priv->has_umr = 1;
> +	if (priv->has_umr)
> +		priv->regexdev->enqueue = mlx5_regexdev_enqueue_gga;
> #endif
>  	priv->regexdev->dequeue = mlx5_regexdev_dequeue;
>  	priv->regexdev->device = (struct rte_device *)pci_dev;
>  	priv->regexdev->data->dev_private = priv; @@ -213,6 +220,8 @@
> mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>  	    rte_errno = ENOMEM;
>  		goto error;
>  	}
> +	DRV_LOG(INFO, "RegEx GGA is %s.",
> +		priv->has_umr ? "supported" : "unsupported");
>  	return 0;
> 
>  error:
> diff --git a/drivers/regex/mlx5/mlx5_regex.h
> b/drivers/regex/mlx5/mlx5_regex.h index a2b3f0d9f3..51a2101e53 100644
> --- a/drivers/regex/mlx5/mlx5_regex.h
> +++ b/drivers/regex/mlx5/mlx5_regex.h
> @@ -15,6 +15,7 @@
>  #include <mlx5_common_devx.h>
> 
>  #include "mlx5_rxp.h"
> +#include "mlx5_regex_utils.h"
> 
>  struct mlx5_regex_sq {
>  	uint16_t log_nb_desc; /* Log 2 number of desc for this object. */
> @@ -40,6 +41,7 @@ struct mlx5_regex_qp {
>  	struct mlx5_regex_job *jobs;
>  	struct ibv_mr *metadata;
>  	struct ibv_mr *outputs;
> +	struct ibv_mr *imkey_addr; /* Indirect mkey array region. */
>  	size_t ci, pi;
>  	struct mlx5_mr_ctrl mr_ctrl;
>  };
> @@ -71,8 +73,29 @@ struct mlx5_regex_priv {
>  	struct mlx5_mr_share_cache mr_scache; /* Global shared MR cache.
> */
>  	uint8_t is_bf2; /* The device is BF2 device. */
>  	uint8_t sq_ts_format; /* Whether SQ supports timestamp formats.
> */
> +	uint8_t has_umr; /* The device supports UMR. */
>  };
> 
> +#ifdef HAVE_IBV_FLOW_DV_SUPPORT
> +static inline int
> +regex_get_pdn(void *pd, uint32_t *pdn)
> +{
> +	struct mlx5dv_obj obj;
> +	struct mlx5dv_pd pd_info;
> +	int ret = 0;
> +
> +	obj.pd.in = pd;
> +	obj.pd.out = &pd_info;
> +	ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
> +	if (ret) {
> +		DRV_LOG(DEBUG, "Fail to get PD object info");
> +		return ret;
> +	}
> +	*pdn = pd_info.pdn;
> +	return 0;
> +}
> +#endif
> +
>  /* mlx5_regex.c */
>  int mlx5_regex_start(struct rte_regexdev *dev);  int
> mlx5_regex_stop(struct rte_regexdev *dev); @@ -108,5 +131,6 @@
> uint16_t mlx5_regexdev_enqueue(struct rte_regexdev *dev, uint16_t
> qp_id,
>  		       struct rte_regex_ops **ops, uint16_t nb_ops);  uint16_t
> mlx5_regexdev_dequeue(struct rte_regexdev *dev, uint16_t qp_id,
>  		       struct rte_regex_ops **ops, uint16_t nb_ops);
> -
> +uint16_t mlx5_regexdev_enqueue_gga(struct rte_regexdev *dev, uint16_t
> qp_id,
> +		       struct rte_regex_ops **ops, uint16_t nb_ops);
>  #endif /* MLX5_REGEX_H */
> diff --git a/drivers/regex/mlx5/mlx5_regex_control.c
> b/drivers/regex/mlx5/mlx5_regex_control.c
> index 55fbb419ed..eef0fe579d 100644
> --- a/drivers/regex/mlx5/mlx5_regex_control.c
> +++ b/drivers/regex/mlx5/mlx5_regex_control.c
> @@ -27,6 +27,9 @@
> 
>  #define MLX5_REGEX_NUM_WQE_PER_PAGE (4096/64)
> 
> +#define MLX5_REGEX_WQE_LOG_NUM(has_umr, log_desc) \
> +		((has_umr) ? ((log_desc) + 2) : (log_desc))
> +
>  /**
>   * Returns the number of qp obj to be created.
>   *
> @@ -91,26 +94,6 @@ regex_ctrl_create_cq(struct mlx5_regex_priv *priv,
> struct mlx5_regex_cq *cq)
>  	return 0;
>  }
> 
> -#ifdef HAVE_IBV_FLOW_DV_SUPPORT
> -static int
> -regex_get_pdn(void *pd, uint32_t *pdn)
> -{
> -	struct mlx5dv_obj obj;
> -	struct mlx5dv_pd pd_info;
> -	int ret = 0;
> -
> -	obj.pd.in = pd;
> -	obj.pd.out = &pd_info;
> -	ret = mlx5_glue->dv_init_obj(&obj, MLX5DV_OBJ_PD);
> -	if (ret) {
> -		DRV_LOG(DEBUG, "Fail to get PD object info");
> -		return ret;
> -	}
> -	*pdn = pd_info.pdn;
> -	return 0;
> -}
> -#endif
> -
>  /**
>   * Destroy the SQ object.
>   *
> @@ -168,14 +151,16 @@ regex_ctrl_create_sq(struct mlx5_regex_priv *priv,
> struct mlx5_regex_qp *qp,
>  	int ret;
> 
>  	sq->log_nb_desc = log_nb_desc;
> +	sq->sqn = q_ind;
>  	sq->ci = 0;
>  	sq->pi = 0;
>  	ret = regex_get_pdn(priv->pd, &pd_num);
>  	if (ret)
>  		return ret;
>  	attr.wq_attr.pd = pd_num;
> -	ret = mlx5_devx_sq_create(priv->ctx, &sq->sq_obj, log_nb_desc,
> &attr,
> -				  SOCKET_ID_ANY);
> +	ret = mlx5_devx_sq_create(priv->ctx, &sq->sq_obj,
> +			MLX5_REGEX_WQE_LOG_NUM(priv->has_umr,
> log_nb_desc),
> +			&attr, SOCKET_ID_ANY);
>  	if (ret) {
>  		DRV_LOG(ERR, "Can't create SQ object.");
>  		rte_errno = ENOMEM;
> @@ -225,10 +210,18 @@ mlx5_regex_qp_setup(struct rte_regexdev *dev,
> uint16_t qp_ind,
> 
>  	qp = &priv->qps[qp_ind];
>  	qp->flags = cfg->qp_conf_flags;
> -	qp->cq.log_nb_desc = rte_log2_u32(cfg->nb_desc);
> -	qp->nb_desc = 1 << qp->cq.log_nb_desc;
> +	log_desc = rte_log2_u32(cfg->nb_desc);
> +	/*
> +	 * UMR mode requires two WQEs(UMR and RegEx WQE) for one
> descriptor.
> +	 * For CQ, expand the CQE number multiple with 2.
> +	 * For SQ, the UMR and RegEx WQE for one descriptor consumes 4
> WQEBBS,
> +	 * expand the WQE number multiple with 4.
> +	 */
> +	qp->cq.log_nb_desc = log_desc + (!!priv->has_umr);
> +	qp->nb_desc = 1 << log_desc;
>  	if (qp->flags & RTE_REGEX_QUEUE_PAIR_CFG_OOS_F)
> -		qp->nb_obj = regex_ctrl_get_nb_obj(qp->nb_desc);
> +		qp->nb_obj = regex_ctrl_get_nb_obj
> +			(1 << MLX5_REGEX_WQE_LOG_NUM(priv-
> >has_umr, log_desc));
>  	else
>  		qp->nb_obj = 1;
>  	qp->sqs = rte_malloc(NULL,
> diff --git a/drivers/regex/mlx5/mlx5_regex_fastpath.c
> b/drivers/regex/mlx5/mlx5_regex_fastpath.c
> index beaea7b63f..4f9402c583 100644
> --- a/drivers/regex/mlx5/mlx5_regex_fastpath.c
> +++ b/drivers/regex/mlx5/mlx5_regex_fastpath.c
> @@ -32,6 +32,15 @@
>  #define MLX5_REGEX_WQE_GATHER_OFFSET 32  #define
> MLX5_REGEX_WQE_SCATTER_OFFSET 48  #define
> MLX5_REGEX_METADATA_OFF 32
> +#define MLX5_REGEX_UMR_WQE_SIZE 192
> +/* The maximum KLMs can be added to one UMR indirect mkey. */ #define
> +MLX5_REGEX_MAX_KLM_NUM 128
> +/* The KLM array size for one job. */
> +#define MLX5_REGEX_KLMS_SIZE \
> +	((MLX5_REGEX_MAX_KLM_NUM) * sizeof(struct mlx5_klm))
> +/* In WQE set mode, the pi should be quarter of the
> +MLX5_REGEX_MAX_WQE_INDEX. */ #define
> MLX5_REGEX_UMR_SQ_PI_IDX(pi, ops) \
> +	(((pi) + (ops)) & (MLX5_REGEX_MAX_WQE_INDEX >> 2))
> 
>  static inline uint32_t
>  sq_size_get(struct mlx5_regex_sq *sq)
> @@ -49,6 +58,8 @@ struct mlx5_regex_job {
>  	uint64_t user_id;
>  	volatile uint8_t *output;
>  	volatile uint8_t *metadata;
> +	struct mlx5_klm *imkey_array; /* Indirect mkey's KLM array. */
> +	struct mlx5_devx_obj *imkey; /* UMR WQE's indirect meky. */
>  } __rte_cached_aligned;
> 
>  static inline void
> @@ -99,12 +110,13 @@ set_wqe_ctrl_seg(struct mlx5_wqe_ctrl_seg *seg,
> uint16_t pi, uint8_t opcode,  }
> 
>  static inline void
> -prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp,
> -	 struct mlx5_regex_sq *sq, struct rte_regex_ops *op,
> -	 struct mlx5_regex_job *job)
> +__prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_sq *sq,
> +	   struct rte_regex_ops *op, struct mlx5_regex_job *job,
> +	   size_t pi, struct mlx5_klm *klm)
>  {
> -	size_t wqe_offset = (sq->pi & (sq_size_get(sq) - 1)) *
> MLX5_SEND_WQE_BB;
> -	uint32_t lkey;
> +	size_t wqe_offset = (pi & (sq_size_get(sq) - 1)) *
> +			    (MLX5_SEND_WQE_BB << (priv->has_umr ? 2 : 0))
> +
> +			    (priv->has_umr ? MLX5_REGEX_UMR_WQE_SIZE :
> 0);
>  	uint16_t group0 = op->req_flags &
> RTE_REGEX_OPS_REQ_GROUP_ID0_VALID_F ?
>  				op->group_id0 : 0;
>  	uint16_t group1 = op->req_flags &
> RTE_REGEX_OPS_REQ_GROUP_ID1_VALID_F ?
> @@ -122,14 +134,11 @@ prep_one(struct mlx5_regex_priv *priv, struct
> mlx5_regex_qp *qp,
>  			       RTE_REGEX_OPS_REQ_GROUP_ID2_VALID_F |
>  			       RTE_REGEX_OPS_REQ_GROUP_ID3_VALID_F)))
>  		group0 = op->group_id0;
> -	lkey = mlx5_mr_addr2mr_bh(priv->pd, 0,
> -				  &priv->mr_scache, &qp->mr_ctrl,
> -				  rte_pktmbuf_mtod(op->mbuf, uintptr_t),
> -				  !!(op->mbuf->ol_flags &
> EXT_ATTACHED_MBUF));
>  	uint8_t *wqe = (uint8_t *)(uintptr_t)sq->sq_obj.wqes + wqe_offset;
>  	int ds = 4; /*  ctrl + meta + input + output */
> 
> -	set_wqe_ctrl_seg((struct mlx5_wqe_ctrl_seg *)wqe, sq->pi,
> +	set_wqe_ctrl_seg((struct mlx5_wqe_ctrl_seg *)wqe,
> +			 (priv->has_umr ? (pi * 4 + 3) : pi),
>  			 MLX5_OPCODE_MMO,
> MLX5_OPC_MOD_MMO_REGEX,
>  			 sq->sq_obj.sq->id, 0, ds, 0, 0);
>  	set_regex_ctrl_seg(wqe + 12, 0, group0, group1, group2, group3,
> @@ -137,36 +146,54 @@ prep_one(struct mlx5_regex_priv *priv, struct
> mlx5_regex_qp *qp,
>  	struct mlx5_wqe_data_seg *input_seg =
>  		(struct mlx5_wqe_data_seg *)(wqe +
> 
> MLX5_REGEX_WQE_GATHER_OFFSET);
> -	input_seg->byte_count =
> -		rte_cpu_to_be_32(rte_pktmbuf_data_len(op->mbuf));
> -	input_seg->addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(op-
> >mbuf,
> -							    uintptr_t));
> -	input_seg->lkey = lkey;
> +	input_seg->byte_count = rte_cpu_to_be_32(klm->byte_count);
> +	input_seg->addr = rte_cpu_to_be_64(klm->address);
> +	input_seg->lkey = klm->mkey;
>  	job->user_id = op->user_id;
> +}
> +
> +static inline void
> +prep_one(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp,
> +	 struct mlx5_regex_sq *sq, struct rte_regex_ops *op,
> +	 struct mlx5_regex_job *job)
> +{
> +	struct mlx5_klm klm;
> +
> +	klm.byte_count = rte_pktmbuf_data_len(op->mbuf);
> +	klm.mkey = mlx5_mr_addr2mr_bh(priv->pd, 0,
> +				  &priv->mr_scache, &qp->mr_ctrl,
> +				  rte_pktmbuf_mtod(op->mbuf, uintptr_t),
> +				  !!(op->mbuf->ol_flags &
> EXT_ATTACHED_MBUF));
> +	klm.address = rte_pktmbuf_mtod(op->mbuf, uintptr_t);
> +	__prep_one(priv, sq, op, job, sq->pi, &klm);
>  	sq->db_pi = sq->pi;
>  	sq->pi = (sq->pi + 1) & MLX5_REGEX_MAX_WQE_INDEX;  }
> 
>  static inline void
> -send_doorbell(struct mlx5dv_devx_uar *uar, struct mlx5_regex_sq *sq)
> +send_doorbell(struct mlx5_regex_priv *priv, struct mlx5_regex_sq *sq)
>  {
> +	struct mlx5dv_devx_uar *uar = priv->uar;
>  	size_t wqe_offset = (sq->db_pi & (sq_size_get(sq) - 1)) *
> -		MLX5_SEND_WQE_BB;
> +		(MLX5_SEND_WQE_BB << (priv->has_umr ? 2 : 0)) +
> +		(priv->has_umr ? MLX5_REGEX_UMR_WQE_SIZE : 0);
>  	uint8_t *wqe = (uint8_t *)(uintptr_t)sq->sq_obj.wqes + wqe_offset;
> -	((struct mlx5_wqe_ctrl_seg *)wqe)->fm_ce_se =
> MLX5_WQE_CTRL_CQ_UPDATE;
> +	/* Or the fm_ce_se instead of set, avoid the fence be cleared. */
> +	((struct mlx5_wqe_ctrl_seg *)wqe)->fm_ce_se |=
> +MLX5_WQE_CTRL_CQ_UPDATE;
>  	uint64_t *doorbell_addr =
>  		(uint64_t *)((uint8_t *)uar->base_addr + 0x800);
>  	rte_io_wmb();
> -	sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32((sq-
> >db_pi + 1) &
> -
> MLX5_REGEX_MAX_WQE_INDEX);
> +	sq->sq_obj.db_rec[MLX5_SND_DBR] = rte_cpu_to_be_32((priv-
> >has_umr ?
> +					(sq->db_pi * 4 + 3) : sq->db_pi) &
> +					MLX5_REGEX_MAX_WQE_INDEX);
>  	rte_wmb();
>  	*doorbell_addr = *(volatile uint64_t *)wqe;
>  	rte_wmb();
>  }
> 
>  static inline int
> -can_send(struct mlx5_regex_sq *sq) {
> -	return ((uint16_t)(sq->pi - sq->ci) < sq_size_get(sq));
> +get_free(struct mlx5_regex_sq *sq) {
> +	return (sq_size_get(sq) - (uint16_t)(sq->pi - sq->ci));
>  }
> 
>  static inline uint32_t
> @@ -174,6 +201,211 @@ job_id_get(uint32_t qid, size_t sq_size, size_t
> index) {
>  	return qid * sq_size + (index & (sq_size - 1));  }
> 
> +#ifdef HAVE_MLX5_UMR_IMKEY
> +static inline int
> +mkey_klm_available(struct mlx5_klm *klm, uint32_t pos, uint32_t new) {
> +	return (klm && ((pos + new) <= MLX5_REGEX_MAX_KLM_NUM)); }
> +
> +static inline void
> +complete_umr_wqe(struct mlx5_regex_qp *qp, struct mlx5_regex_sq *sq,
> +		 struct mlx5_regex_job *mkey_job,
> +		 size_t umr_index, uint32_t klm_size, uint32_t total_len) {
> +	size_t wqe_offset = (umr_index & (sq_size_get(sq) - 1)) *
> +		(MLX5_SEND_WQE_BB * 4);
> +	struct mlx5_wqe_ctrl_seg *wqe = (struct mlx5_wqe_ctrl_seg
> *)((uint8_t *)
> +				   (uintptr_t)sq->sq_obj.wqes + wqe_offset);
> +	struct mlx5_wqe_umr_ctrl_seg *ucseg =
> +				(struct mlx5_wqe_umr_ctrl_seg *)(wqe + 1);
> +	struct mlx5_wqe_mkey_context_seg *mkc =
> +				(struct mlx5_wqe_mkey_context_seg
> *)(ucseg + 1);
> +	struct mlx5_klm *iklm = (struct mlx5_klm *)(mkc + 1);
> +	uint16_t klm_align = RTE_ALIGN(klm_size, 4);
> +
> +	memset(wqe, 0, MLX5_REGEX_UMR_WQE_SIZE);
> +	/* Set WQE control seg. Non-inline KLM UMR WQE size must be 9
> WQE_DS. */
> +	set_wqe_ctrl_seg(wqe, (umr_index * 4), MLX5_OPCODE_UMR,
> +			 0, sq->sq_obj.sq->id, 0, 9, 0,
> +			 rte_cpu_to_be_32(mkey_job->imkey->id));
> +	/* Set UMR WQE control seg. */
> +	ucseg->mkey_mask |=
> rte_cpu_to_be_64(MLX5_WQE_UMR_CTRL_MKEY_MASK_LEN |
> +
> 	MLX5_WQE_UMR_CTRL_FLAG_TRNSLATION_OFFSET |
> +
> 	MLX5_WQE_UMR_CTRL_MKEY_MASK_ACCESS_LOCAL_WRITE);
> +	ucseg->klm_octowords = rte_cpu_to_be_16(klm_align);
> +	/* Set mkey context seg. */
> +	mkc->len = rte_cpu_to_be_64(total_len);
> +	mkc->qpn_mkey = rte_cpu_to_be_32(0xffffff00 |
> +					(mkey_job->imkey->id & 0xff));
> +	/* Set UMR pointer to data seg. */
> +	iklm->address = rte_cpu_to_be_64
> +				((uintptr_t)((char *)mkey_job-
> >imkey_array));
> +	iklm->mkey = rte_cpu_to_be_32(qp->imkey_addr->lkey);
> +	iklm->byte_count = rte_cpu_to_be_32(klm_align);
> +	/* Clear the padding memory. */
> +	memset((uint8_t *)&mkey_job->imkey_array[klm_size], 0,
> +	       sizeof(struct mlx5_klm) * (klm_align - klm_size));
> +
> +	/* Add the following RegEx WQE with fence. */
> +	wqe = (struct mlx5_wqe_ctrl_seg *)
> +				(((uint8_t *)wqe) +
> MLX5_REGEX_UMR_WQE_SIZE);
> +	wqe->fm_ce_se |= MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE;
> +}
> +
> +static inline void
> +prep_nop_regex_wqe_set(struct mlx5_regex_priv *priv, struct
> mlx5_regex_sq *sq,
> +		       struct rte_regex_ops *op, struct mlx5_regex_job *job,
> +		       size_t pi, struct mlx5_klm *klm) {
> +	size_t wqe_offset = (pi & (sq_size_get(sq) - 1)) *
> +			    (MLX5_SEND_WQE_BB << 2);
> +	struct mlx5_wqe_ctrl_seg *wqe = (struct mlx5_wqe_ctrl_seg
> *)((uint8_t *)
> +				   (uintptr_t)sq->sq_obj.wqes + wqe_offset);
> +
> +	/* Clear the WQE memory used as UMR WQE previously. */
> +	if ((rte_be_to_cpu_32(wqe->opmod_idx_opcode) & 0xff) !=
> MLX5_OPCODE_NOP)
> +		memset(wqe, 0, MLX5_REGEX_UMR_WQE_SIZE);
> +	/* UMR WQE size is 9 DS, align nop WQE to 3 WQEBBS(12 DS). */
> +	set_wqe_ctrl_seg(wqe, pi * 4, MLX5_OPCODE_NOP, 0, sq-
> >sq_obj.sq->id,
> +			 0, 12, 0, 0);
> +	__prep_one(priv, sq, op, job, pi, klm); }
> +
> +static inline void
> +prep_regex_umr_wqe_set(struct mlx5_regex_priv *priv, struct
> mlx5_regex_qp *qp,
> +	 struct mlx5_regex_sq *sq, struct rte_regex_ops **op, size_t
> nb_ops) {
> +	struct mlx5_regex_job *job = NULL;
> +	size_t sqid = sq->sqn, mkey_job_id = 0;
> +	size_t left_ops = nb_ops;
> +	uint32_t klm_num = 0, len;
> +	struct mlx5_klm *mkey_klm = NULL;
> +	struct mlx5_klm klm;
> +
> +	sqid = sq->sqn;
> +	while (left_ops--)
> +		rte_prefetch0(op[left_ops]);
> +	left_ops = nb_ops;
> +	/*
> +	 * Build the WQE set by reverse. In case the burst may consume
> +	 * multiple mkeys, build the WQE set as normal will hard to
> +	 * address the last mkey index, since we will only know the last
> +	 * RegEx WQE's index when finishes building.
> +	 */
> +	while (left_ops--) {
> +		struct rte_mbuf *mbuf = op[left_ops]->mbuf;
> +		size_t pi = MLX5_REGEX_UMR_SQ_PI_IDX(sq->pi, left_ops);
> +
> +		if (mbuf->nb_segs > 1) {
> +			size_t scatter_size = 0;
> +
> +			if (!mkey_klm_available(mkey_klm, klm_num,
> +						mbuf->nb_segs)) {
> +				/*
> +				 * The mkey's KLM is full, create the UMR
> +				 * WQE in the next WQE set.
> +				 */
> +				if (mkey_klm)
> +					complete_umr_wqe(qp, sq,
> +						&qp->jobs[mkey_job_id],
> +
> 	MLX5_REGEX_UMR_SQ_PI_IDX(pi, 1),
> +						klm_num, len);
> +				/*
> +				 * Get the indircet mkey and KLM array index
> +				 * from the last WQE set.
> +				 */
> +				mkey_job_id = job_id_get(sqid,
> +							 sq_size_get(sq), pi);
> +				mkey_klm = qp-
> >jobs[mkey_job_id].imkey_array;
> +				klm_num = 0;
> +				len = 0;
> +			}
> +			/* Build RegEx WQE's data segment KLM. */
> +			klm.address = len;
> +			klm.mkey = rte_cpu_to_be_32
> +					(qp->jobs[mkey_job_id].imkey->id);
> +			while (mbuf) {
> +				/* Build indirect mkey seg's KLM. */
> +				mkey_klm->mkey =
> mlx5_mr_addr2mr_bh(priv->pd,
> +					NULL, &priv->mr_scache, &qp-
> >mr_ctrl,
> +					rte_pktmbuf_mtod(mbuf, uintptr_t),
> +					!!(mbuf->ol_flags &
> EXT_ATTACHED_MBUF));
> +				mkey_klm->address = rte_cpu_to_be_64
> +					(rte_pktmbuf_mtod(mbuf,
> uintptr_t));
> +				mkey_klm->byte_count = rte_cpu_to_be_32
> +
> 	(rte_pktmbuf_data_len(mbuf));
> +				/*
> +				 * Save the mbuf's total size for RegEx data
> +				 * segment.
> +				 */
> +				scatter_size +=
> rte_pktmbuf_data_len(mbuf);
> +				mkey_klm++;
> +				klm_num++;
> +				mbuf = mbuf->next;
> +			}
> +			len += scatter_size;
> +			klm.byte_count = scatter_size;
> +		} else {
> +			/* The single mubf case. Build the KLM directly. */
> +			klm.mkey = mlx5_mr_addr2mr_bh(priv->pd, NULL,
> +					&priv->mr_scache, &qp->mr_ctrl,
> +					rte_pktmbuf_mtod(mbuf, uintptr_t),
> +					!!(mbuf->ol_flags &
> EXT_ATTACHED_MBUF));
> +			klm.address = rte_pktmbuf_mtod(mbuf, uintptr_t);
> +			klm.byte_count = rte_pktmbuf_data_len(mbuf);
> +		}
> +		job = &qp->jobs[job_id_get(sqid, sq_size_get(sq), pi)];
> +		/*
> +		 * Build the nop + RegEx WQE set by default. The fist nop
> WQE
> +		 * will be updated later as UMR WQE if scattered mubf exist.
> +		 */
> +		prep_nop_regex_wqe_set(priv, sq, op[left_ops], job, pi,
> &klm);
> +	}
> +	/*
> +	 * Scattered mbuf have been added to the KLM array. Complete the
> build
> +	 * of UMR WQE, update the first nop WQE as UMR WQE.
> +	 */
> +	if (mkey_klm)
> +		complete_umr_wqe(qp, sq, &qp->jobs[mkey_job_id], sq-
> >pi,
> +				 klm_num, len);
> +	sq->db_pi = MLX5_REGEX_UMR_SQ_PI_IDX(sq->pi, nb_ops - 1);
> +	sq->pi = MLX5_REGEX_UMR_SQ_PI_IDX(sq->pi, nb_ops); }
> +
> +uint16_t
> +mlx5_regexdev_enqueue_gga(struct rte_regexdev *dev, uint16_t qp_id,
> +			  struct rte_regex_ops **ops, uint16_t nb_ops) {
> +	struct mlx5_regex_priv *priv = dev->data->dev_private;
> +	struct mlx5_regex_qp *queue = &priv->qps[qp_id];
> +	struct mlx5_regex_sq *sq;
> +	size_t sqid, nb_left = nb_ops, nb_desc;
> +
> +	while ((sqid = ffs(queue->free_sqs))) {
> +		sqid--; /* ffs returns 1 for bit 0 */
> +		sq = &queue->sqs[sqid];
> +		nb_desc = get_free(sq);
> +		if (nb_desc) {
> +			/* The ops be handled can't exceed nb_ops. */
> +			if (nb_desc > nb_left)
> +				nb_desc = nb_left;
> +			else
> +				queue->free_sqs &= ~(1 << sqid);
> +			prep_regex_umr_wqe_set(priv, queue, sq, ops,
> nb_desc);
> +			send_doorbell(priv, sq);
> +			nb_left -= nb_desc;
> +		}
> +		if (!nb_left)
> +			break;
> +		ops += nb_desc;
> +	}
> +	nb_ops -= nb_left;
> +	queue->pi += nb_ops;
> +	return nb_ops;
> +}
> +#endif
> +
>  uint16_t
>  mlx5_regexdev_enqueue(struct rte_regexdev *dev, uint16_t qp_id,
>  		      struct rte_regex_ops **ops, uint16_t nb_ops) @@ -
> 186,17 +418,17 @@ mlx5_regexdev_enqueue(struct rte_regexdev *dev,
> uint16_t qp_id,
>  	while ((sqid = ffs(queue->free_sqs))) {
>  		sqid--; /* ffs returns 1 for bit 0 */
>  		sq = &queue->sqs[sqid];
> -		while (can_send(sq)) {
> +		while (get_free(sq)) {
>  			job_id = job_id_get(sqid, sq_size_get(sq), sq->pi);
>  			prep_one(priv, queue, sq, ops[i], &queue-
> >jobs[job_id]);
>  			i++;
>  			if (unlikely(i == nb_ops)) {
> -				send_doorbell(priv->uar, sq);
> +				send_doorbell(priv, sq);
>  				goto out;
>  			}
>  		}
>  		queue->free_sqs &= ~(1 << sqid);
> -		send_doorbell(priv->uar, sq);
> +		send_doorbell(priv, sq);
>  	}
> 
>  out:
> @@ -308,6 +540,10 @@ mlx5_regexdev_dequeue(struct rte_regexdev
> *dev, uint16_t qp_id,
>  			  MLX5_REGEX_MAX_WQE_INDEX;
>  		size_t sqid = cqe->rsvd3[2];
>  		struct mlx5_regex_sq *sq = &queue->sqs[sqid];
> +
> +		/* UMR mode WQE counter move as WQE set(4 WQEBBS).*/
> +		if (priv->has_umr)
> +			wq_counter >>= 2;
>  		while (sq->ci != wq_counter) {
>  			if (unlikely(i == nb_ops)) {
>  				/* Return without updating cq->ci */ @@ -
> 316,7 +552,9 @@ mlx5_regexdev_dequeue(struct rte_regexdev *dev,
> uint16_t qp_id,
>  			uint32_t job_id = job_id_get(sqid, sq_size_get(sq),
>  						     sq->ci);
>  			extract_result(ops[i], &queue->jobs[job_id]);
> -			sq->ci = (sq->ci + 1) &
> MLX5_REGEX_MAX_WQE_INDEX;
> +			sq->ci = (sq->ci + 1) & (priv->has_umr ?
> +				 (MLX5_REGEX_MAX_WQE_INDEX >> 2) :
> +				  MLX5_REGEX_MAX_WQE_INDEX);
>  			i++;
>  		}
>  		cq->ci = (cq->ci + 1) & 0xffffff;
> @@ -331,7 +569,7 @@ mlx5_regexdev_dequeue(struct rte_regexdev *dev,
> uint16_t qp_id,  }
> 
>  static void
> -setup_sqs(struct mlx5_regex_qp *queue)
> +setup_sqs(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *queue)
>  {
>  	size_t sqid, entry;
>  	uint32_t job_id;
> @@ -342,6 +580,14 @@ setup_sqs(struct mlx5_regex_qp *queue)
>  			job_id = sqid * sq_size_get(sq) + entry;
>  			struct mlx5_regex_job *job = &queue->jobs[job_id];
> 
> +			/* Fill UMR WQE with NOP in advanced. */
> +			if (priv->has_umr) {
> +				set_wqe_ctrl_seg
> +					((struct mlx5_wqe_ctrl_seg *)wqe,
> +					 entry * 2, MLX5_OPCODE_NOP, 0,
> +					 sq->sq_obj.sq->id, 0, 12, 0, 0);
> +				wqe += MLX5_REGEX_UMR_WQE_SIZE;
> +			}
>  			set_metadata_seg((struct mlx5_wqe_metadata_seg
> *)
>  					 (wqe +
> MLX5_REGEX_WQE_METADATA_OFFSET),
>  					 0, queue->metadata->lkey,
> @@ -358,8 +604,9 @@ setup_sqs(struct mlx5_regex_qp *queue)  }
> 
>  static int
> -setup_buffers(struct mlx5_regex_qp *qp, struct ibv_pd *pd)
> +setup_buffers(struct mlx5_regex_priv *priv, struct mlx5_regex_qp *qp)
>  {
> +	struct ibv_pd *pd = priv->pd;
>  	uint32_t i;
>  	int err;
> 
> @@ -395,6 +642,24 @@ setup_buffers(struct mlx5_regex_qp *qp, struct
> ibv_pd *pd)
>  		goto err_output;
>  	}
> 
> +	if (priv->has_umr) {
> +		ptr = rte_calloc(__func__, qp->nb_desc,
> MLX5_REGEX_KLMS_SIZE,
> +				 MLX5_REGEX_KLMS_SIZE);
> +		if (!ptr) {
> +			err = -ENOMEM;
> +			goto err_imkey;
> +		}
> +		qp->imkey_addr = mlx5_glue->reg_mr(pd, ptr,
> +					MLX5_REGEX_KLMS_SIZE * qp-
> >nb_desc,
> +					IBV_ACCESS_LOCAL_WRITE);
> +		if (!qp->imkey_addr) {
> +			rte_free(ptr);
> +			DRV_LOG(ERR, "Failed to register output");
> +			err = -EINVAL;
> +			goto err_imkey;
> +		}
> +	}
> +
>  	/* distribute buffers to jobs */
>  	for (i = 0; i < qp->nb_desc; i++) {
>  		qp->jobs[i].output =
> @@ -403,9 +668,18 @@ setup_buffers(struct mlx5_regex_qp *qp, struct
> ibv_pd *pd)
>  		qp->jobs[i].metadata =
>  			(uint8_t *)qp->metadata->addr +
>  			(i % qp->nb_desc) * MLX5_REGEX_METADATA_SIZE;
> +		if (qp->imkey_addr)
> +			qp->jobs[i].imkey_array = (struct mlx5_klm *)
> +				qp->imkey_addr->addr +
> +				(i % qp->nb_desc) *
> MLX5_REGEX_MAX_KLM_NUM;
>  	}
> +
>  	return 0;
> 
> +err_imkey:
> +	ptr = qp->outputs->addr;
> +	rte_free(ptr);
> +	mlx5_glue->dereg_mr(qp->outputs);
>  err_output:
>  	ptr = qp->metadata->addr;
>  	rte_free(ptr);
> @@ -417,23 +691,57 @@ int
>  mlx5_regexdev_setup_fastpath(struct mlx5_regex_priv *priv, uint32_t
> qp_id)  {
>  	struct mlx5_regex_qp *qp = &priv->qps[qp_id];
> -	int err;
> +	struct mlx5_klm klm = { 0 };
> +	struct mlx5_devx_mkey_attr attr = {
> +		.klm_array = &klm,
> +		.klm_num = 1,
> +		.umr_en = 1,
> +	};
> +	uint32_t i;
> +	int err = 0;
> 
>  	qp->jobs = rte_calloc(__func__, qp->nb_desc, sizeof(*qp->jobs),
> 64);
>  	if (!qp->jobs)
>  		return -ENOMEM;
> -	err = setup_buffers(qp, priv->pd);
> +	err = setup_buffers(priv, qp);
>  	if (err) {
>  		rte_free(qp->jobs);
>  		return err;
>  	}
> -	setup_sqs(qp);
> -	return 0;
> +
> +	setup_sqs(priv, qp);
> +
> +	if (priv->has_umr) {
> +#ifdef HAVE_IBV_FLOW_DV_SUPPORT
> +		if (regex_get_pdn(priv->pd, &attr.pd)) {
> +			err = -rte_errno;
> +			DRV_LOG(ERR, "Failed to get pdn.");
> +			mlx5_regexdev_teardown_fastpath(priv, qp_id);
> +			return err;
> +		}
> +#endif
> +		for (i = 0; i < qp->nb_desc; i++) {
> +			attr.klm_num = MLX5_REGEX_MAX_KLM_NUM;
> +			attr.klm_array = qp->jobs[i].imkey_array;
> +			qp->jobs[i].imkey =
> mlx5_devx_cmd_mkey_create(priv->ctx,
> +								      &attr);
> +			if (!qp->jobs[i].imkey) {
> +				err = -rte_errno;
> +				DRV_LOG(ERR, "Failed to allocate imkey.");
> +				mlx5_regexdev_teardown_fastpath(priv,
> qp_id);
> +			}
> +		}
> +	}
> +	return err;
>  }
> 
>  static void
>  free_buffers(struct mlx5_regex_qp *qp)
>  {
> +	if (qp->imkey_addr) {
> +		mlx5_glue->dereg_mr(qp->imkey_addr);
> +		rte_free(qp->imkey_addr->addr);
> +	}
>  	if (qp->metadata) {
>  		mlx5_glue->dereg_mr(qp->metadata);
>  		rte_free(qp->metadata->addr);
> @@ -448,8 +756,14 @@ void
>  mlx5_regexdev_teardown_fastpath(struct mlx5_regex_priv *priv, uint32_t
> qp_id)  {
>  	struct mlx5_regex_qp *qp = &priv->qps[qp_id];
> +	uint32_t i;
> 
>  	if (qp) {
> +		for (i = 0; i < qp->nb_desc; i++) {
> +			if (qp->jobs[i].imkey)
> +				claim_zero(mlx5_devx_cmd_destroy
> +							(qp->jobs[i].imkey));
> +		}
>  		free_buffers(qp);
>  		if (qp->jobs)
>  			rte_free(qp->jobs);
> --
> 2.25.1


  reply	other threads:[~2021-03-30  8:05 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 23:57 [dpdk-dev] [PATCH 0/3] regex/mlx5: support scattered mbuf Suanming Mou
2021-03-09 23:57 ` [dpdk-dev] [PATCH 1/3] common/mlx5: add user memory registration bits Suanming Mou
2021-03-09 23:57 ` [dpdk-dev] [PATCH 2/3] regex/mlx5: add data path scattered mbuf process Suanming Mou
2021-03-09 23:57 ` [dpdk-dev] [PATCH 3/3] app/test-regex: support scattered mbuf input Suanming Mou
2021-03-24 21:14 ` [dpdk-dev] [PATCH 0/3] regex/mlx5: support scattered mbuf Thomas Monjalon
2021-03-25  4:32 ` [dpdk-dev] [PATCH v2 0/4] " Suanming Mou
2021-03-25  4:32   ` [dpdk-dev] [PATCH v2 1/4] common/mlx5: add user memory registration bits Suanming Mou
2021-03-29  9:29     ` Ori Kam
2021-03-25  4:32   ` [dpdk-dev] [PATCH v2 2/4] regex/mlx5: add data path scattered mbuf process Suanming Mou
2021-03-29  9:34     ` Ori Kam
2021-03-29  9:52       ` Suanming Mou
2021-03-25  4:32   ` [dpdk-dev] [PATCH v2 3/4] app/test-regex: support scattered mbuf input Suanming Mou
2021-03-29  9:27     ` Ori Kam
2021-03-25  4:32   ` [dpdk-dev] [PATCH v2 4/4] regex/mlx5: prevent wrong calculation of free sqs in umr mode Suanming Mou
2021-03-29  9:35     ` Ori Kam
2021-03-30  1:39 ` [dpdk-dev] [PATCH v3 0/4] regex/mlx5: support scattered mbuf Suanming Mou
2021-03-30  1:39   ` [dpdk-dev] [PATCH v3 1/4] common/mlx5: add user memory registration bits Suanming Mou
2021-03-30  1:39   ` [dpdk-dev] [PATCH v3 2/4] regex/mlx5: add data path scattered mbuf process Suanming Mou
2021-03-30  8:05     ` Slava Ovsiienko [this message]
2021-03-30  9:00       ` Suanming Mou
2021-03-30  1:39   ` [dpdk-dev] [PATCH v3 3/4] app/test-regex: support scattered mbuf input Suanming Mou
2021-03-30  1:39   ` [dpdk-dev] [PATCH v3 4/4] regex/mlx5: prevent wrong calculation of free sqs in umr mode Suanming Mou
2021-04-06 16:22     ` Thomas Monjalon
2021-04-07  1:00       ` Suanming Mou
2021-04-07  7:11         ` Thomas Monjalon
2021-04-07  7:14           ` Suanming Mou
2021-03-31  7:37 ` [dpdk-dev] [PATCH v4 0/4] regex/mlx5: support scattered mbuf Suanming Mou
2021-03-31  7:37   ` [dpdk-dev] [PATCH v4 1/4] common/mlx5: add user memory registration bits Suanming Mou
2021-03-31  7:37   ` [dpdk-dev] [PATCH v4 2/4] regex/mlx5: add data path scattered mbuf process Suanming Mou
2021-03-31  7:37   ` [dpdk-dev] [PATCH v4 3/4] app/test-regex: support scattered mbuf input Suanming Mou
2021-03-31  7:37   ` [dpdk-dev] [PATCH v4 4/4] regex/mlx5: prevent wrong calculation of free sqs in umr mode Suanming Mou
2021-04-07  7:21 ` [dpdk-dev] [PATCH v5 0/3] regex/mlx5: support scattered mbuf Suanming Mou
2021-04-07  7:21   ` [dpdk-dev] [PATCH v5 1/3] common/mlx5: add user memory registration bits Suanming Mou
2021-04-07  7:21   ` [dpdk-dev] [PATCH v5 2/3] regex/mlx5: add data path scattered mbuf process Suanming Mou
2021-04-07  7:21   ` [dpdk-dev] [PATCH v5 3/3] app/test-regex: support scattered mbuf input Suanming Mou
2021-04-08 20:53   ` [dpdk-dev] [PATCH v5 0/3] regex/mlx5: support scattered mbuf Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB3753A90EEC7CF377873A931ADF7D9@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=suanmingm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).