From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <shahafs@mellanox.com>
Received: from EUR04-VI1-obe.outbound.protection.outlook.com
 (mail-eopbgr80075.outbound.protection.outlook.com [40.107.8.75])
 by dpdk.org (Postfix) with ESMTP id 58B662BD8
 for <dev@dpdk.org>; Mon,  8 Apr 2019 07:48:34 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=kJ6QLSMCwlbI8ZONRfQ6DyZvZALMy1b2IRzakAmH5cc=;
 b=MdYbVesnfsljI51S32Wrb1D/PbEhCEgxWwfj/vuKHprAr7JQlKUjbpgaryjwDnRJSpwQy8gtP1KbwwLhS79XnD5hhhHZUI46MfNA1Ler96Gzrt7x+7fHx29QdKG1jpbIbQEr9gOgDOMnhBGfAafCQe4duK+UV5kxnekNvj4Ti2E=
Received: from AM0PR0502MB3795.eurprd05.prod.outlook.com (52.133.45.150) by
 AM0PR0502MB3876.eurprd05.prod.outlook.com (52.133.48.18) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.1771.21; Mon, 8 Apr 2019 05:48:32 +0000
Received: from AM0PR0502MB3795.eurprd05.prod.outlook.com
 ([fe80::4192:b468:41e1:c323]) by AM0PR0502MB3795.eurprd05.prod.outlook.com
 ([fe80::4192:b468:41e1:c323%4]) with mapi id 15.20.1771.011; Mon, 8 Apr 2019
 05:48:32 +0000
From: Shahaf Shuler <shahafs@mellanox.com>
To: Yongseok Koh <yskoh@mellanox.com>
CC: "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v3 3/4] net/mlx5: remove device register remap
Thread-Index: AQHU60+89UhckMtnYU+tSQi7Kgnfk6YxxECw
Date: Mon, 8 Apr 2019 05:48:32 +0000
Message-ID: <AM0PR0502MB379559A3B4BB7509C74ACE9AC32C0@AM0PR0502MB3795.eurprd05.prod.outlook.com>
References: <20190325193627.19726-1-yskoh@mellanox.com>
 <20190405013357.14503-1-yskoh@mellanox.com>
 <20190405013357.14503-4-yskoh@mellanox.com>
In-Reply-To: <20190405013357.14503-4-yskoh@mellanox.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=shahafs@mellanox.com; 
x-originating-ip: [193.47.165.251]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: e7547a13-e066-48c1-1bee-08d6bbe5d5ca
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0;
 RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600139)(711020)(4605104)(4618075)(2017052603328)(7193020);
 SRVR:AM0PR0502MB3876; 
x-ms-traffictypediagnostic: AM0PR0502MB3876:
x-microsoft-antispam-prvs: <AM0PR0502MB387689F8FFC483A880B1ECCFC32C0@AM0PR0502MB3876.eurprd05.prod.outlook.com>
x-forefront-prvs: 0001227049
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(366004)(136003)(346002)(396003)(376002)(39860400002)(199004)(189003)(71190400001)(76176011)(6436002)(14454004)(81166006)(81156014)(9686003)(8676002)(229853002)(68736007)(105586002)(53946003)(106356001)(6116002)(71200400001)(33656002)(3846002)(478600001)(6506007)(30864003)(25786009)(102836004)(86362001)(7696005)(476003)(6862004)(486006)(26005)(74316002)(97736004)(2906002)(8936002)(316002)(4326008)(446003)(11346002)(186003)(7736002)(5660300002)(14444005)(5024004)(256004)(99286004)(305945005)(66066001)(53936002)(6246003)(52536014)(55016002)(6636002);
 DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR0502MB3876;
 H:AM0PR0502MB3795.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; A:1; MX:1; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: inad015rUJ89cKsfxm6DXlcZC4qV76OVc2H8AtdK7J88yFcI9A67idQIuMCZYwp8tVQzOzS8j0hW+AhHDJmASpKRiVWIZHfteo1V1PBCGqNniE8BMR23oaTTvbGGgoI8F1jxzwUdalWggtmMzsY4HfAwhVzvzqBJO3qtGyffxj/vZ9NHgkuBlcIzh9W4hrnorltzQ3OtlX6hr2Qqzx9ciMLfCDXWiiq/NN4hyV/knC83cND2mDwvdcZ8mNL6t8HjbUGe/fFe/RahUQUX4+mkG7Y9cv8oSgVeNG8ap83wjKIVkJjq+dRGKfBxL+KdDVH3w+Fc09Px53iuhZMJMuEp3eZnE0+LOJCo2f7jqzCthLhX+zeYEqW2vBeeqHSfi+M78y87T9gppUNbGCMNiZjLhVGM6jhmz5/2TSQzacyK0HM=
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: e7547a13-e066-48c1-1bee-08d6bbe5d5ca
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Apr 2019 05:48:32.7040 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0502MB3876
Subject: Re: [dpdk-dev] [PATCH v3 3/4] net/mlx5: remove device register remap
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Apr 2019 05:48:34 -0000

Hi Koh,

See small comments below. Same for mlx4 patch.


Friday, April 5, 2019 4:34 AM, Yongseok Koh:
> Subject: [dpdk-dev] [PATCH v3 3/4] net/mlx5: remove device register remap
>=20
> UAR (User Access Region) register does not need to be remapped for
> primary process but it should be remapped only for secondary process. UAR
> register table is in the process private structure in rte_eth_devices[],
> 	(struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private
>=20
> The actual UAR table follows the data structure and the table is used for=
 both
> Tx and Rx.
>=20
> For Tx, BlueFlame in UAR is used to ring the doorbell. MLX5_TX_BFREG(txq)
> is defined to get a register for the txq. Processes access its own privat=
e data
> to acquire the register from the UAR table.
>=20
> For Rx, the doorbell in UAR is required in arming CQ event. However, it i=
s a
> known issue that the register isn't remapped for secondary process.
>=20
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c         | 198 +++++-----------------------------=
------
>  drivers/net/mlx5/mlx5.h         |  15 ++-
>  drivers/net/mlx5/mlx5_ethdev.c  |  17 ++++
>  drivers/net/mlx5/mlx5_rxtx.h    |  11 ++-
>  drivers/net/mlx5/mlx5_trigger.c |   6 --
>  drivers/net/mlx5/mlx5_txq.c     | 180 ++++++++++++++++++++++------------=
-
> -

[...]

>  /**
> @@ -1182,12 +1010,32 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  	}
>  	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
>  	if (rte_eal_process_type() =3D=3D RTE_PROC_SECONDARY) {
> +		struct mlx5_proc_priv *ppriv;
> +		size_t ppriv_size;
> +
>  		eth_dev =3D rte_eth_dev_attach_secondary(name);
>  		if (eth_dev =3D=3D NULL) {
>  			DRV_LOG(ERR, "can not attach rte ethdev");
>  			rte_errno =3D ENOMEM;
>  			return NULL;
>  		}
> +		priv =3D eth_dev->data->dev_private;
> +		/*
> +		 * UAR register table follows the process private structure.
> +		 * BlueFlame registers for Tx queues come first and registers
> +		 * for Rx queues follows.
> +		 */
> +		ppriv_size =3D sizeof(struct mlx5_proc_priv) +
> +			     (priv->rxqs_n + priv->txqs_n) * sizeof(void *);

Why you add also the rxqs_n? why not only the txqs?=20

> +		ppriv =3D rte_malloc_socket("mlx5_proc_priv", ppriv_size,
> +					  RTE_CACHE_LINE_SIZE,
> +					  dpdk_dev->numa_node);
> +		if (!ppriv) {
> +			rte_errno =3D ENOMEM;
> +			return NULL;
> +		}
> +		ppriv->uar_table_sz =3D ppriv_size;
> +		eth_dev->process_private =3D ppriv;
>  		eth_dev->device =3D dpdk_dev;
>  		eth_dev->dev_ops =3D &mlx5_dev_sec_ops;
>  		/* Receive command fd from primary process */ @@ -1195,7
> +1043,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  		if (err < 0)
>  			return NULL;
>  		/* Remap UAR for Tx queues. */
> -		err =3D mlx5_tx_uar_remap(eth_dev, err);
> +		err =3D mlx5_tx_uar_init_secondary(eth_dev, err);
>  		if (err)
>  			return NULL;
>  		/*
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 699c8fcf6d..1ac4ad71b1 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -97,8 +97,6 @@ struct mlx5_shared_data {
>  	/* Global spinlock for primary and secondary processes. */
>  	int init_done; /* Whether primary has done initialization. */
>  	unsigned int secondary_cnt; /* Number of secondary processes
> init'd. */
> -	void *uar_base;
> -	/* Reserved UAR address space for TXQ UAR(hw doorbell) mapping.
> */
>  	struct mlx5_dev_list mem_event_cb_list;
>  	rte_rwlock_t mem_event_rwlock;
>  };
> @@ -106,8 +104,6 @@ struct mlx5_shared_data {
>  /* Per-process data structure, not visible to other processes. */  struc=
t
> mlx5_local_data {
>  	int init_done; /* Whether a secondary has done initialization. */
> -	void *uar_base;
> -	/* Reserved UAR address space for TXQ UAR(hw doorbell) mapping.
> */
>  };
>=20
>  extern struct mlx5_shared_data *mlx5_shared_data; @@ -282,6 +278,17
> @@ struct mlx5_ibv_shared {
>  	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
> };
>=20
> +/* Per-process private structure. */
> +struct mlx5_proc_priv {
> +	size_t uar_table_sz;
> +	/* Size of UAR register table. */
> +	void *uar_table[];
> +	/* Table of UAR registers for each process. */ };
> +
> +#define MLX5_PROC_PRIV(port_id) \
> +	((struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private)
> +
>  struct mlx5_priv {
>  	LIST_ENTRY(mlx5_priv) mem_event_cb;
>  	/**< Called by memory event callback. */ diff --git
> a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index
> 9ae9dddd3c..42297f11c9 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -382,6 +382,8 @@ int
>  mlx5_dev_configure(struct rte_eth_dev *dev)  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_proc_priv *ppriv;
> +	size_t ppriv_size;
>  	unsigned int rxqs_n =3D dev->data->nb_rx_queues;
>  	unsigned int txqs_n =3D dev->data->nb_tx_queues;
>  	unsigned int i;
> @@ -450,6 +452,21 @@ mlx5_dev_configure(struct rte_eth_dev *dev)
>  		if (++j =3D=3D rxqs_n)
>  			j =3D 0;
>  	}
> +	/*
> +	 * UAR register table follows the process private structure. BlueFlame
> +	 * registers for Tx queues come first and registers for Rx queues
> +	 * follows.
> +	 */
> +	ppriv_size =3D sizeof(struct mlx5_proc_priv) +
> +		     (priv->rxqs_n + priv->txqs_n) * sizeof(void *);

Ditto.=20

> +	ppriv =3D rte_malloc_socket("mlx5_proc_priv", ppriv_size,
> +				  RTE_CACHE_LINE_SIZE, dev->device-
> >numa_node);
> +	if (!ppriv) {
> +		rte_errno =3D ENOMEM;
> +		return -rte_errno;
> +	}
> +	ppriv->uar_table_sz =3D ppriv_size;
> +	dev->process_private =3D ppriv;
>  	return 0;
>  }
>=20
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 7b58063ceb..5d49892429 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -201,8 +201,8 @@ struct mlx5_txq_data {
>  	volatile void *wqes; /* Work queue (use volatile to write into). */
>  	volatile uint32_t *qp_db; /* Work queue doorbell. */
>  	volatile uint32_t *cq_db; /* Completion queue doorbell. */
> -	volatile void *bf_reg; /* Blueflame register remapped. */
>  	struct rte_mbuf *(*elts)[]; /* TX elements. */
> +	uint16_t port_id; /* Port ID of device. */
>  	uint16_t idx; /* Queue index. */
>  	struct mlx5_txq_stats stats; /* TX queue counters. */  #ifndef
> RTE_ARCH_64 @@ -231,9 +231,12 @@ struct mlx5_txq_ctrl {
>  	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
>  	struct mlx5_priv *priv; /* Back pointer to private data. */
>  	off_t uar_mmap_offset; /* UAR mmap offset for non-primary
> process. */
> -	volatile void *bf_reg_orig; /* Blueflame register from verbs. */
> +	void *bf_reg; /* BlueFlame register from Verbs. */

I guess you keep this one in order to get the VA offset for the secondary m=
apping, right? Because otherwise we can take the bf_reg from the UAR table =
on the process private.

If so, better to rename it to uar_page_offset (or other name you like) in o=
rder to avoid fields duplication.=20

>  };
>=20
> +#define MLX5_TX_BFREG(txq) \
> +		(MLX5_PROC_PRIV((txq)->port_id)->uar_table[(txq)->idx])
> +
>  /* mlx5_rxq.c */
>=20
>  extern uint8_t rss_hash_default_key[];
> @@ -301,7 +304,7 @@ uint64_t mlx5_get_rx_queue_offloads(struct
> rte_eth_dev *dev);  int mlx5_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t idx, uint16_t desc,
>  			unsigned int socket, const struct rte_eth_txconf
> *conf);  void mlx5_tx_queue_release(void *dpdk_txq); -int
> mlx5_tx_uar_remap(struct rte_eth_dev *dev, int fd);
> +int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
>  struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t
> idx);  struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev,
> uint16_t idx);  int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv); @=
@
> -704,7 +707,7 @@ static __rte_always_inline void
> mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct
> mlx5_wqe *wqe,
>  		       int cond)
>  {
> -	uint64_t *dst =3D (uint64_t *)((uintptr_t)txq->bf_reg);
> +	uint64_t *dst =3D MLX5_TX_BFREG(txq);

I guess no perf penalty due to this change right?
Would you consider to prefetch the addr before the db logic just to be on t=
he safe side?

>  	volatile uint64_t *src =3D ((volatile uint64_t *)wqe);
>=20
>  	rte_cio_wmb();
> diff --git a/drivers/net/mlx5/mlx5_trigger.c
> b/drivers/net/mlx5/mlx5_trigger.c index 7c1e5594d6..b7fde35758 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -58,12 +58,6 @@ mlx5_txq_start(struct rte_eth_dev *dev)
>  			goto error;
>  		}
>  	}
> -	ret =3D mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
> -	if (ret) {
> -		/* Adjust index for rollback. */
> -		i =3D priv->txqs_n - 1;
> -		goto error;
> -	}
>  	return 0;
>  error:
>  	ret =3D rte_errno; /* Save rte_errno before cleanup. */ diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> 4bd08cb035..5fb1761955 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -229,13 +229,99 @@ mlx5_tx_queue_release(void *dpdk_txq)
>  		}
>  }
>=20
> +/**
> + * Initialize Tx UAR registers for primary process.
> + *
> + * @param txq_ctrl
> + *   Pointer to Tx queue control structure.
> + */
> +static void
> +txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl) {
> +	struct mlx5_priv *priv =3D txq_ctrl->priv;
> +	struct mlx5_proc_priv *ppriv =3D MLX5_PROC_PRIV(PORT_ID(priv));
> +
> +	assert(rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY);
> +	assert(ppriv);
> +	ppriv->uar_table[txq_ctrl->txq.idx] =3D txq_ctrl->bf_reg; #ifndef
> +RTE_ARCH_64
> +	struct mlx5_priv *priv =3D txq_ctrl->priv;
> +	struct mlx5_txq_data *txq =3D &txq_ctrl->txq;
> +	unsigned int lock_idx;
> +	/* Assign an UAR lock according to UAR page number */
> +	lock_idx =3D (txq_ctrl->uar_mmap_offset / page_size) &
> +		   MLX5_UAR_PAGE_NUM_MASK;
> +	txq->uar_lock =3D &priv->uar_lock[lock_idx]; #endif }
>=20
>  /**
> - * Mmap TX UAR(HW doorbell) pages into reserved UAR address space.
> - * Both primary and secondary process do mmap to make UAR address
> - * aligned.
> + * Remap UAR register of a Tx queue for secondary process.
>   *
> - * @param[in] dev
> + * Remapped address is stored at the table in the process private
> +structure of
> + * the device, indexed by queue index.
> + *
> + * @param txq_ctrl
> + *   Pointer to Tx queue control structure.
> + * @param fd
> + *   Verbs file descriptor to map UAR pages.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set=
.
> + */
> +static int
> +txq_uar_init_secondary(struct mlx5_txq_ctrl *txq_ctrl, int fd) {
> +	struct mlx5_priv *priv =3D txq_ctrl->priv;
> +	struct mlx5_proc_priv *ppriv =3D MLX5_PROC_PRIV(PORT_ID(priv));
> +	struct mlx5_txq_data *txq =3D &txq_ctrl->txq;
> +	void *addr;
> +	uintptr_t uar_va;
> +	uintptr_t offset;
> +	const size_t page_size =3D sysconf(_SC_PAGESIZE);
> +
> +	assert(ppriv);
> +	/*
> +	 * As rdma-core, UARs are mapped in size of OS page
> +	 * size. Ref to libmlx5 function: mlx5_init_context()
> +	 */
> +	uar_va =3D (uintptr_t)txq_ctrl->bf_reg;
> +	offset =3D uar_va & (page_size - 1); /* Offset in page. */
> +	addr =3D mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, fd,
> +			txq_ctrl->uar_mmap_offset);
> +	if (addr =3D=3D MAP_FAILED) {
> +		DRV_LOG(ERR,
> +			"port %u mmap failed for BF reg of txq %u",
> +			txq->port_id, txq->idx);
> +		rte_errno =3D ENXIO;
> +		return -rte_errno;
> +	}
> +	addr =3D RTE_PTR_ADD(addr, offset);
> +	ppriv->uar_table[txq->idx] =3D addr;
> +	return 0;
> +}
> +
> +/**
> + * Unmap UAR register of a Tx queue for secondary process.
> + *
> + * @param txq_ctrl
> + *   Pointer to Tx queue control structure.
> + */
> +static void
> +txq_uar_uninit_secondary(struct mlx5_txq_ctrl *txq_ctrl) {
> +	struct mlx5_proc_priv *ppriv =3D MLX5_PROC_PRIV(PORT_ID(txq_ctrl-
> >priv));
> +	const size_t page_size =3D sysconf(_SC_PAGESIZE);
> +	void *addr;
> +
> +	addr =3D ppriv->uar_table[txq_ctrl->txq.idx];
> +	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size); }
> +
> +/**
> + * Initialize Tx UAR registers for secondary process.
> + *
> + * @param dev
>   *   Pointer to Ethernet device.
>   * @param fd
>   *   Verbs file descriptor to map UAR pages.
> @@ -244,81 +330,36 @@ mlx5_tx_queue_release(void *dpdk_txq)
>   *   0 on success, a negative errno value otherwise and rte_errno is set=
.
>   */
>  int
> -mlx5_tx_uar_remap(struct rte_eth_dev *dev, int fd)
> +mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> -	unsigned int i, j;
> -	uintptr_t pages[priv->txqs_n];
> -	unsigned int pages_n =3D 0;
> -	uintptr_t uar_va;
> -	uintptr_t off;
> -	void *addr;
> -	void *ret;
>  	struct mlx5_txq_data *txq;
>  	struct mlx5_txq_ctrl *txq_ctrl;
> -	int already_mapped;
> -	size_t page_size =3D sysconf(_SC_PAGESIZE);
> -#ifndef RTE_ARCH_64
> -	unsigned int lock_idx;
> -#endif
> +	unsigned int i;
> +	int ret;
>=20
> -	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
> -	/*
> -	 * As rdma-core, UARs are mapped in size of OS page size.
> -	 * Use aligned address to avoid duplicate mmap.
> -	 * Ref to libmlx5 function: mlx5_init_context()
> -	 */
> +	assert(rte_eal_process_type() =3D=3D RTE_PROC_SECONDARY);
>  	for (i =3D 0; i !=3D priv->txqs_n; ++i) {
>  		if (!(*priv->txqs)[i])
>  			continue;
>  		txq =3D (*priv->txqs)[i];
>  		txq_ctrl =3D container_of(txq, struct mlx5_txq_ctrl, txq);
>  		assert(txq->idx =3D=3D (uint16_t)i);
> -		/* UAR addr form verbs used to find dup and offset in page.
> */
> -		uar_va =3D (uintptr_t)txq_ctrl->bf_reg_orig;
> -		off =3D uar_va & (page_size - 1); /* offset in page. */
> -		uar_va =3D RTE_ALIGN_FLOOR(uar_va, page_size); /* page
> addr. */
> -		already_mapped =3D 0;
> -		for (j =3D 0; j !=3D pages_n; ++j) {
> -			if (pages[j] =3D=3D uar_va) {
> -				already_mapped =3D 1;
> -				break;
> -			}
> -		}
> -		/* new address in reserved UAR address space. */
> -		addr =3D RTE_PTR_ADD(mlx5_shared_data->uar_base,
> -				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
> -		if (!already_mapped) {
> -			pages[pages_n++] =3D uar_va;
> -			/* fixed mmap to specified address in reserved
> -			 * address space.
> -			 */
> -			ret =3D mmap(addr, page_size,
> -				   PROT_WRITE, MAP_FIXED | MAP_SHARED,
> fd,
> -				   txq_ctrl->uar_mmap_offset);
> -			if (ret !=3D addr) {
> -				/* fixed mmap have to return same address
> */
> -				DRV_LOG(ERR,
> -					"port %u call to mmap failed on UAR"
> -					" for txq %u",
> -					dev->data->port_id, txq->idx);
> -				rte_errno =3D ENXIO;
> -				return -rte_errno;
> -			}
> -		}
> -		if (rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY) /* save
> once */
> -			txq_ctrl->txq.bf_reg =3D RTE_PTR_ADD((void *)addr,
> off);
> -		else
> -			assert(txq_ctrl->txq.bf_reg =3D=3D
> -			       RTE_PTR_ADD((void *)addr, off));
> -#ifndef RTE_ARCH_64
> -		/* Assign a UAR lock according to UAR page number */
> -		lock_idx =3D (txq_ctrl->uar_mmap_offset / page_size) &
> -			   MLX5_UAR_PAGE_NUM_MASK;
> -		txq->uar_lock =3D &priv->uar_lock[lock_idx];
> -#endif
> +		ret =3D txq_uar_init_secondary(txq_ctrl, fd);
> +		if (ret)
> +			goto error;
>  	}
>  	return 0;
> +error:
> +	/* Rollback. */
> +	do {
> +		if (!(*priv->txqs)[i])
> +			continue;
> +		txq =3D (*priv->txqs)[i];
> +		txq_ctrl =3D container_of(txq, struct mlx5_txq_ctrl, txq);
> +		txq_uar_uninit_secondary(txq_ctrl);
> +	} while (i--);
> +	return -rte_errno;
>  }
>=20
>  /**
> @@ -507,7 +548,6 @@ mlx5_txq_ibv_new(struct rte_eth_dev *dev,
> uint16_t idx)
>  	txq_data->wqes =3D qp.sq.buf;
>  	txq_data->wqe_n =3D log2above(qp.sq.wqe_cnt);
>  	txq_data->qp_db =3D &qp.dbrec[MLX5_SND_DBR];
> -	txq_ctrl->bf_reg_orig =3D qp.bf.reg;
>  	txq_data->cq_db =3D cq_info.dbrec;
>  	txq_data->cqes =3D
>  		(volatile struct mlx5_cqe (*)[])
> @@ -521,6 +561,8 @@ mlx5_txq_ibv_new(struct rte_eth_dev *dev,
> uint16_t idx)
>  	txq_ibv->qp =3D tmpl.qp;
>  	txq_ibv->cq =3D tmpl.cq;
>  	rte_atomic32_inc(&txq_ibv->refcnt);
> +	txq_ctrl->bf_reg =3D qp.bf.reg;
> +	txq_uar_init(txq_ctrl);
>  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
>  		txq_ctrl->uar_mmap_offset =3D qp.uar_mmap_offset;
>  		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx", @@ -
> 778,6 +820,7 @@ mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
> uint16_t desc,
>  	tmpl->priv =3D priv;
>  	tmpl->socket =3D socket;
>  	tmpl->txq.elts_n =3D log2above(desc);
> +	tmpl->txq.port_id =3D dev->data->port_id;
>  	tmpl->txq.idx =3D idx;
>  	txq_set_params(tmpl);
>  	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d", @@ -
> 836,15 +879,12 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
> {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_txq_ctrl *txq;
> -	size_t page_size =3D sysconf(_SC_PAGESIZE);
>=20
>  	if (!(*priv->txqs)[idx])
>  		return 0;
>  	txq =3D container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
>  	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
>  		txq->ibv =3D NULL;
> -	munmap((void *)RTE_ALIGN_FLOOR((uintptr_t)txq->txq.bf_reg,
> page_size),
> -	       page_size);
>  	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
>  		txq_free_elts(txq);
>  		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
> --
> 2.11.0

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id AC3E1A0096
	for <public@inbox.dpdk.org>; Mon,  8 Apr 2019 07:48:36 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 7FF082C54;
	Mon,  8 Apr 2019 07:48:35 +0200 (CEST)
Received: from EUR04-VI1-obe.outbound.protection.outlook.com
 (mail-eopbgr80075.outbound.protection.outlook.com [40.107.8.75])
 by dpdk.org (Postfix) with ESMTP id 58B662BD8
 for <dev@dpdk.org>; Mon,  8 Apr 2019 07:48:34 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=kJ6QLSMCwlbI8ZONRfQ6DyZvZALMy1b2IRzakAmH5cc=;
 b=MdYbVesnfsljI51S32Wrb1D/PbEhCEgxWwfj/vuKHprAr7JQlKUjbpgaryjwDnRJSpwQy8gtP1KbwwLhS79XnD5hhhHZUI46MfNA1Ler96Gzrt7x+7fHx29QdKG1jpbIbQEr9gOgDOMnhBGfAafCQe4duK+UV5kxnekNvj4Ti2E=
Received: from AM0PR0502MB3795.eurprd05.prod.outlook.com (52.133.45.150) by
 AM0PR0502MB3876.eurprd05.prod.outlook.com (52.133.48.18) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.1771.21; Mon, 8 Apr 2019 05:48:32 +0000
Received: from AM0PR0502MB3795.eurprd05.prod.outlook.com
 ([fe80::4192:b468:41e1:c323]) by AM0PR0502MB3795.eurprd05.prod.outlook.com
 ([fe80::4192:b468:41e1:c323%4]) with mapi id 15.20.1771.011; Mon, 8 Apr 2019
 05:48:32 +0000
From: Shahaf Shuler <shahafs@mellanox.com>
To: Yongseok Koh <yskoh@mellanox.com>
CC: "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v3 3/4] net/mlx5: remove device register remap
Thread-Index: AQHU60+89UhckMtnYU+tSQi7Kgnfk6YxxECw
Date: Mon, 8 Apr 2019 05:48:32 +0000
Message-ID:
 <AM0PR0502MB379559A3B4BB7509C74ACE9AC32C0@AM0PR0502MB3795.eurprd05.prod.outlook.com>
References: <20190325193627.19726-1-yskoh@mellanox.com>
 <20190405013357.14503-1-yskoh@mellanox.com>
 <20190405013357.14503-4-yskoh@mellanox.com>
In-Reply-To: <20190405013357.14503-4-yskoh@mellanox.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=shahafs@mellanox.com; 
x-originating-ip: [193.47.165.251]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: e7547a13-e066-48c1-1bee-08d6bbe5d5ca
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0;
 RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600139)(711020)(4605104)(4618075)(2017052603328)(7193020);
 SRVR:AM0PR0502MB3876; 
x-ms-traffictypediagnostic: AM0PR0502MB3876:
x-microsoft-antispam-prvs: <AM0PR0502MB387689F8FFC483A880B1ECCFC32C0@AM0PR0502MB3876.eurprd05.prod.outlook.com>
x-forefront-prvs: 0001227049
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(366004)(136003)(346002)(396003)(376002)(39860400002)(199004)(189003)(71190400001)(76176011)(6436002)(14454004)(81166006)(81156014)(9686003)(8676002)(229853002)(68736007)(105586002)(53946003)(106356001)(6116002)(71200400001)(33656002)(3846002)(478600001)(6506007)(30864003)(25786009)(102836004)(86362001)(7696005)(476003)(6862004)(486006)(26005)(74316002)(97736004)(2906002)(8936002)(316002)(4326008)(446003)(11346002)(186003)(7736002)(5660300002)(14444005)(5024004)(256004)(99286004)(305945005)(66066001)(53936002)(6246003)(52536014)(55016002)(6636002);
 DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR0502MB3876;
 H:AM0PR0502MB3795.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; A:1; MX:1; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: inad015rUJ89cKsfxm6DXlcZC4qV76OVc2H8AtdK7J88yFcI9A67idQIuMCZYwp8tVQzOzS8j0hW+AhHDJmASpKRiVWIZHfteo1V1PBCGqNniE8BMR23oaTTvbGGgoI8F1jxzwUdalWggtmMzsY4HfAwhVzvzqBJO3qtGyffxj/vZ9NHgkuBlcIzh9W4hrnorltzQ3OtlX6hr2Qqzx9ciMLfCDXWiiq/NN4hyV/knC83cND2mDwvdcZ8mNL6t8HjbUGe/fFe/RahUQUX4+mkG7Y9cv8oSgVeNG8ap83wjKIVkJjq+dRGKfBxL+KdDVH3w+Fc09Px53iuhZMJMuEp3eZnE0+LOJCo2f7jqzCthLhX+zeYEqW2vBeeqHSfi+M78y87T9gppUNbGCMNiZjLhVGM6jhmz5/2TSQzacyK0HM=
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: e7547a13-e066-48c1-1bee-08d6bbe5d5ca
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Apr 2019 05:48:32.7040 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0502MB3876
Subject: Re: [dpdk-dev] [PATCH v3 3/4] net/mlx5: remove device register remap
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190408054832.dpvHb-bsMhdttHG5T141ys9Arj1Fx7tqy-UWe1LN2Zs@z>

Hi Koh,

See small comments below. Same for mlx4 patch.


Friday, April 5, 2019 4:34 AM, Yongseok Koh:
> Subject: [dpdk-dev] [PATCH v3 3/4] net/mlx5: remove device register remap
>=20
> UAR (User Access Region) register does not need to be remapped for
> primary process but it should be remapped only for secondary process. UAR
> register table is in the process private structure in rte_eth_devices[],
> 	(struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private
>=20
> The actual UAR table follows the data structure and the table is used for=
 both
> Tx and Rx.
>=20
> For Tx, BlueFlame in UAR is used to ring the doorbell. MLX5_TX_BFREG(txq)
> is defined to get a register for the txq. Processes access its own privat=
e data
> to acquire the register from the UAR table.
>=20
> For Rx, the doorbell in UAR is required in arming CQ event. However, it i=
s a
> known issue that the register isn't remapped for secondary process.
>=20
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c         | 198 +++++-----------------------------=
------
>  drivers/net/mlx5/mlx5.h         |  15 ++-
>  drivers/net/mlx5/mlx5_ethdev.c  |  17 ++++
>  drivers/net/mlx5/mlx5_rxtx.h    |  11 ++-
>  drivers/net/mlx5/mlx5_trigger.c |   6 --
>  drivers/net/mlx5/mlx5_txq.c     | 180 ++++++++++++++++++++++------------=
-
> -

[...]

>  /**
> @@ -1182,12 +1010,32 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  	}
>  	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
>  	if (rte_eal_process_type() =3D=3D RTE_PROC_SECONDARY) {
> +		struct mlx5_proc_priv *ppriv;
> +		size_t ppriv_size;
> +
>  		eth_dev =3D rte_eth_dev_attach_secondary(name);
>  		if (eth_dev =3D=3D NULL) {
>  			DRV_LOG(ERR, "can not attach rte ethdev");
>  			rte_errno =3D ENOMEM;
>  			return NULL;
>  		}
> +		priv =3D eth_dev->data->dev_private;
> +		/*
> +		 * UAR register table follows the process private structure.
> +		 * BlueFlame registers for Tx queues come first and registers
> +		 * for Rx queues follows.
> +		 */
> +		ppriv_size =3D sizeof(struct mlx5_proc_priv) +
> +			     (priv->rxqs_n + priv->txqs_n) * sizeof(void *);

Why you add also the rxqs_n? why not only the txqs?=20

> +		ppriv =3D rte_malloc_socket("mlx5_proc_priv", ppriv_size,
> +					  RTE_CACHE_LINE_SIZE,
> +					  dpdk_dev->numa_node);
> +		if (!ppriv) {
> +			rte_errno =3D ENOMEM;
> +			return NULL;
> +		}
> +		ppriv->uar_table_sz =3D ppriv_size;
> +		eth_dev->process_private =3D ppriv;
>  		eth_dev->device =3D dpdk_dev;
>  		eth_dev->dev_ops =3D &mlx5_dev_sec_ops;
>  		/* Receive command fd from primary process */ @@ -1195,7
> +1043,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  		if (err < 0)
>  			return NULL;
>  		/* Remap UAR for Tx queues. */
> -		err =3D mlx5_tx_uar_remap(eth_dev, err);
> +		err =3D mlx5_tx_uar_init_secondary(eth_dev, err);
>  		if (err)
>  			return NULL;
>  		/*
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 699c8fcf6d..1ac4ad71b1 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -97,8 +97,6 @@ struct mlx5_shared_data {
>  	/* Global spinlock for primary and secondary processes. */
>  	int init_done; /* Whether primary has done initialization. */
>  	unsigned int secondary_cnt; /* Number of secondary processes
> init'd. */
> -	void *uar_base;
> -	/* Reserved UAR address space for TXQ UAR(hw doorbell) mapping.
> */
>  	struct mlx5_dev_list mem_event_cb_list;
>  	rte_rwlock_t mem_event_rwlock;
>  };
> @@ -106,8 +104,6 @@ struct mlx5_shared_data {
>  /* Per-process data structure, not visible to other processes. */  struc=
t
> mlx5_local_data {
>  	int init_done; /* Whether a secondary has done initialization. */
> -	void *uar_base;
> -	/* Reserved UAR address space for TXQ UAR(hw doorbell) mapping.
> */
>  };
>=20
>  extern struct mlx5_shared_data *mlx5_shared_data; @@ -282,6 +278,17
> @@ struct mlx5_ibv_shared {
>  	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
> };
>=20
> +/* Per-process private structure. */
> +struct mlx5_proc_priv {
> +	size_t uar_table_sz;
> +	/* Size of UAR register table. */
> +	void *uar_table[];
> +	/* Table of UAR registers for each process. */ };
> +
> +#define MLX5_PROC_PRIV(port_id) \
> +	((struct mlx5_proc_priv *)rte_eth_devices[port_id].process_private)
> +
>  struct mlx5_priv {
>  	LIST_ENTRY(mlx5_priv) mem_event_cb;
>  	/**< Called by memory event callback. */ diff --git
> a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index
> 9ae9dddd3c..42297f11c9 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -382,6 +382,8 @@ int
>  mlx5_dev_configure(struct rte_eth_dev *dev)  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_proc_priv *ppriv;
> +	size_t ppriv_size;
>  	unsigned int rxqs_n =3D dev->data->nb_rx_queues;
>  	unsigned int txqs_n =3D dev->data->nb_tx_queues;
>  	unsigned int i;
> @@ -450,6 +452,21 @@ mlx5_dev_configure(struct rte_eth_dev *dev)
>  		if (++j =3D=3D rxqs_n)
>  			j =3D 0;
>  	}
> +	/*
> +	 * UAR register table follows the process private structure. BlueFlame
> +	 * registers for Tx queues come first and registers for Rx queues
> +	 * follows.
> +	 */
> +	ppriv_size =3D sizeof(struct mlx5_proc_priv) +
> +		     (priv->rxqs_n + priv->txqs_n) * sizeof(void *);

Ditto.=20

> +	ppriv =3D rte_malloc_socket("mlx5_proc_priv", ppriv_size,
> +				  RTE_CACHE_LINE_SIZE, dev->device-
> >numa_node);
> +	if (!ppriv) {
> +		rte_errno =3D ENOMEM;
> +		return -rte_errno;
> +	}
> +	ppriv->uar_table_sz =3D ppriv_size;
> +	dev->process_private =3D ppriv;
>  	return 0;
>  }
>=20
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 7b58063ceb..5d49892429 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -201,8 +201,8 @@ struct mlx5_txq_data {
>  	volatile void *wqes; /* Work queue (use volatile to write into). */
>  	volatile uint32_t *qp_db; /* Work queue doorbell. */
>  	volatile uint32_t *cq_db; /* Completion queue doorbell. */
> -	volatile void *bf_reg; /* Blueflame register remapped. */
>  	struct rte_mbuf *(*elts)[]; /* TX elements. */
> +	uint16_t port_id; /* Port ID of device. */
>  	uint16_t idx; /* Queue index. */
>  	struct mlx5_txq_stats stats; /* TX queue counters. */  #ifndef
> RTE_ARCH_64 @@ -231,9 +231,12 @@ struct mlx5_txq_ctrl {
>  	struct mlx5_txq_ibv *ibv; /* Verbs queue object. */
>  	struct mlx5_priv *priv; /* Back pointer to private data. */
>  	off_t uar_mmap_offset; /* UAR mmap offset for non-primary
> process. */
> -	volatile void *bf_reg_orig; /* Blueflame register from verbs. */
> +	void *bf_reg; /* BlueFlame register from Verbs. */

I guess you keep this one in order to get the VA offset for the secondary m=
apping, right? Because otherwise we can take the bf_reg from the UAR table =
on the process private.

If so, better to rename it to uar_page_offset (or other name you like) in o=
rder to avoid fields duplication.=20

>  };
>=20
> +#define MLX5_TX_BFREG(txq) \
> +		(MLX5_PROC_PRIV((txq)->port_id)->uar_table[(txq)->idx])
> +
>  /* mlx5_rxq.c */
>=20
>  extern uint8_t rss_hash_default_key[];
> @@ -301,7 +304,7 @@ uint64_t mlx5_get_rx_queue_offloads(struct
> rte_eth_dev *dev);  int mlx5_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t idx, uint16_t desc,
>  			unsigned int socket, const struct rte_eth_txconf
> *conf);  void mlx5_tx_queue_release(void *dpdk_txq); -int
> mlx5_tx_uar_remap(struct rte_eth_dev *dev, int fd);
> +int mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd);
>  struct mlx5_txq_ibv *mlx5_txq_ibv_new(struct rte_eth_dev *dev, uint16_t
> idx);  struct mlx5_txq_ibv *mlx5_txq_ibv_get(struct rte_eth_dev *dev,
> uint16_t idx);  int mlx5_txq_ibv_release(struct mlx5_txq_ibv *txq_ibv); @=
@
> -704,7 +707,7 @@ static __rte_always_inline void
> mlx5_tx_dbrec_cond_wmb(struct mlx5_txq_data *txq, volatile struct
> mlx5_wqe *wqe,
>  		       int cond)
>  {
> -	uint64_t *dst =3D (uint64_t *)((uintptr_t)txq->bf_reg);
> +	uint64_t *dst =3D MLX5_TX_BFREG(txq);

I guess no perf penalty due to this change right?
Would you consider to prefetch the addr before the db logic just to be on t=
he safe side?

>  	volatile uint64_t *src =3D ((volatile uint64_t *)wqe);
>=20
>  	rte_cio_wmb();
> diff --git a/drivers/net/mlx5/mlx5_trigger.c
> b/drivers/net/mlx5/mlx5_trigger.c index 7c1e5594d6..b7fde35758 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -58,12 +58,6 @@ mlx5_txq_start(struct rte_eth_dev *dev)
>  			goto error;
>  		}
>  	}
> -	ret =3D mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
> -	if (ret) {
> -		/* Adjust index for rollback. */
> -		i =3D priv->txqs_n - 1;
> -		goto error;
> -	}
>  	return 0;
>  error:
>  	ret =3D rte_errno; /* Save rte_errno before cleanup. */ diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> 4bd08cb035..5fb1761955 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -229,13 +229,99 @@ mlx5_tx_queue_release(void *dpdk_txq)
>  		}
>  }
>=20
> +/**
> + * Initialize Tx UAR registers for primary process.
> + *
> + * @param txq_ctrl
> + *   Pointer to Tx queue control structure.
> + */
> +static void
> +txq_uar_init(struct mlx5_txq_ctrl *txq_ctrl) {
> +	struct mlx5_priv *priv =3D txq_ctrl->priv;
> +	struct mlx5_proc_priv *ppriv =3D MLX5_PROC_PRIV(PORT_ID(priv));
> +
> +	assert(rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY);
> +	assert(ppriv);
> +	ppriv->uar_table[txq_ctrl->txq.idx] =3D txq_ctrl->bf_reg; #ifndef
> +RTE_ARCH_64
> +	struct mlx5_priv *priv =3D txq_ctrl->priv;
> +	struct mlx5_txq_data *txq =3D &txq_ctrl->txq;
> +	unsigned int lock_idx;
> +	/* Assign an UAR lock according to UAR page number */
> +	lock_idx =3D (txq_ctrl->uar_mmap_offset / page_size) &
> +		   MLX5_UAR_PAGE_NUM_MASK;
> +	txq->uar_lock =3D &priv->uar_lock[lock_idx]; #endif }
>=20
>  /**
> - * Mmap TX UAR(HW doorbell) pages into reserved UAR address space.
> - * Both primary and secondary process do mmap to make UAR address
> - * aligned.
> + * Remap UAR register of a Tx queue for secondary process.
>   *
> - * @param[in] dev
> + * Remapped address is stored at the table in the process private
> +structure of
> + * the device, indexed by queue index.
> + *
> + * @param txq_ctrl
> + *   Pointer to Tx queue control structure.
> + * @param fd
> + *   Verbs file descriptor to map UAR pages.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set=
.
> + */
> +static int
> +txq_uar_init_secondary(struct mlx5_txq_ctrl *txq_ctrl, int fd) {
> +	struct mlx5_priv *priv =3D txq_ctrl->priv;
> +	struct mlx5_proc_priv *ppriv =3D MLX5_PROC_PRIV(PORT_ID(priv));
> +	struct mlx5_txq_data *txq =3D &txq_ctrl->txq;
> +	void *addr;
> +	uintptr_t uar_va;
> +	uintptr_t offset;
> +	const size_t page_size =3D sysconf(_SC_PAGESIZE);
> +
> +	assert(ppriv);
> +	/*
> +	 * As rdma-core, UARs are mapped in size of OS page
> +	 * size. Ref to libmlx5 function: mlx5_init_context()
> +	 */
> +	uar_va =3D (uintptr_t)txq_ctrl->bf_reg;
> +	offset =3D uar_va & (page_size - 1); /* Offset in page. */
> +	addr =3D mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, fd,
> +			txq_ctrl->uar_mmap_offset);
> +	if (addr =3D=3D MAP_FAILED) {
> +		DRV_LOG(ERR,
> +			"port %u mmap failed for BF reg of txq %u",
> +			txq->port_id, txq->idx);
> +		rte_errno =3D ENXIO;
> +		return -rte_errno;
> +	}
> +	addr =3D RTE_PTR_ADD(addr, offset);
> +	ppriv->uar_table[txq->idx] =3D addr;
> +	return 0;
> +}
> +
> +/**
> + * Unmap UAR register of a Tx queue for secondary process.
> + *
> + * @param txq_ctrl
> + *   Pointer to Tx queue control structure.
> + */
> +static void
> +txq_uar_uninit_secondary(struct mlx5_txq_ctrl *txq_ctrl) {
> +	struct mlx5_proc_priv *ppriv =3D MLX5_PROC_PRIV(PORT_ID(txq_ctrl-
> >priv));
> +	const size_t page_size =3D sysconf(_SC_PAGESIZE);
> +	void *addr;
> +
> +	addr =3D ppriv->uar_table[txq_ctrl->txq.idx];
> +	munmap(RTE_PTR_ALIGN_FLOOR(addr, page_size), page_size); }
> +
> +/**
> + * Initialize Tx UAR registers for secondary process.
> + *
> + * @param dev
>   *   Pointer to Ethernet device.
>   * @param fd
>   *   Verbs file descriptor to map UAR pages.
> @@ -244,81 +330,36 @@ mlx5_tx_queue_release(void *dpdk_txq)
>   *   0 on success, a negative errno value otherwise and rte_errno is set=
.
>   */
>  int
> -mlx5_tx_uar_remap(struct rte_eth_dev *dev, int fd)
> +mlx5_tx_uar_init_secondary(struct rte_eth_dev *dev, int fd)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> -	unsigned int i, j;
> -	uintptr_t pages[priv->txqs_n];
> -	unsigned int pages_n =3D 0;
> -	uintptr_t uar_va;
> -	uintptr_t off;
> -	void *addr;
> -	void *ret;
>  	struct mlx5_txq_data *txq;
>  	struct mlx5_txq_ctrl *txq_ctrl;
> -	int already_mapped;
> -	size_t page_size =3D sysconf(_SC_PAGESIZE);
> -#ifndef RTE_ARCH_64
> -	unsigned int lock_idx;
> -#endif
> +	unsigned int i;
> +	int ret;
>=20
> -	memset(pages, 0, priv->txqs_n * sizeof(uintptr_t));
> -	/*
> -	 * As rdma-core, UARs are mapped in size of OS page size.
> -	 * Use aligned address to avoid duplicate mmap.
> -	 * Ref to libmlx5 function: mlx5_init_context()
> -	 */
> +	assert(rte_eal_process_type() =3D=3D RTE_PROC_SECONDARY);
>  	for (i =3D 0; i !=3D priv->txqs_n; ++i) {
>  		if (!(*priv->txqs)[i])
>  			continue;
>  		txq =3D (*priv->txqs)[i];
>  		txq_ctrl =3D container_of(txq, struct mlx5_txq_ctrl, txq);
>  		assert(txq->idx =3D=3D (uint16_t)i);
> -		/* UAR addr form verbs used to find dup and offset in page.
> */
> -		uar_va =3D (uintptr_t)txq_ctrl->bf_reg_orig;
> -		off =3D uar_va & (page_size - 1); /* offset in page. */
> -		uar_va =3D RTE_ALIGN_FLOOR(uar_va, page_size); /* page
> addr. */
> -		already_mapped =3D 0;
> -		for (j =3D 0; j !=3D pages_n; ++j) {
> -			if (pages[j] =3D=3D uar_va) {
> -				already_mapped =3D 1;
> -				break;
> -			}
> -		}
> -		/* new address in reserved UAR address space. */
> -		addr =3D RTE_PTR_ADD(mlx5_shared_data->uar_base,
> -				   uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1));
> -		if (!already_mapped) {
> -			pages[pages_n++] =3D uar_va;
> -			/* fixed mmap to specified address in reserved
> -			 * address space.
> -			 */
> -			ret =3D mmap(addr, page_size,
> -				   PROT_WRITE, MAP_FIXED | MAP_SHARED,
> fd,
> -				   txq_ctrl->uar_mmap_offset);
> -			if (ret !=3D addr) {
> -				/* fixed mmap have to return same address
> */
> -				DRV_LOG(ERR,
> -					"port %u call to mmap failed on UAR"
> -					" for txq %u",
> -					dev->data->port_id, txq->idx);
> -				rte_errno =3D ENXIO;
> -				return -rte_errno;
> -			}
> -		}
> -		if (rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY) /* save
> once */
> -			txq_ctrl->txq.bf_reg =3D RTE_PTR_ADD((void *)addr,
> off);
> -		else
> -			assert(txq_ctrl->txq.bf_reg =3D=3D
> -			       RTE_PTR_ADD((void *)addr, off));
> -#ifndef RTE_ARCH_64
> -		/* Assign a UAR lock according to UAR page number */
> -		lock_idx =3D (txq_ctrl->uar_mmap_offset / page_size) &
> -			   MLX5_UAR_PAGE_NUM_MASK;
> -		txq->uar_lock =3D &priv->uar_lock[lock_idx];
> -#endif
> +		ret =3D txq_uar_init_secondary(txq_ctrl, fd);
> +		if (ret)
> +			goto error;
>  	}
>  	return 0;
> +error:
> +	/* Rollback. */
> +	do {
> +		if (!(*priv->txqs)[i])
> +			continue;
> +		txq =3D (*priv->txqs)[i];
> +		txq_ctrl =3D container_of(txq, struct mlx5_txq_ctrl, txq);
> +		txq_uar_uninit_secondary(txq_ctrl);
> +	} while (i--);
> +	return -rte_errno;
>  }
>=20
>  /**
> @@ -507,7 +548,6 @@ mlx5_txq_ibv_new(struct rte_eth_dev *dev,
> uint16_t idx)
>  	txq_data->wqes =3D qp.sq.buf;
>  	txq_data->wqe_n =3D log2above(qp.sq.wqe_cnt);
>  	txq_data->qp_db =3D &qp.dbrec[MLX5_SND_DBR];
> -	txq_ctrl->bf_reg_orig =3D qp.bf.reg;
>  	txq_data->cq_db =3D cq_info.dbrec;
>  	txq_data->cqes =3D
>  		(volatile struct mlx5_cqe (*)[])
> @@ -521,6 +561,8 @@ mlx5_txq_ibv_new(struct rte_eth_dev *dev,
> uint16_t idx)
>  	txq_ibv->qp =3D tmpl.qp;
>  	txq_ibv->cq =3D tmpl.cq;
>  	rte_atomic32_inc(&txq_ibv->refcnt);
> +	txq_ctrl->bf_reg =3D qp.bf.reg;
> +	txq_uar_init(txq_ctrl);
>  	if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
>  		txq_ctrl->uar_mmap_offset =3D qp.uar_mmap_offset;
>  		DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx", @@ -
> 778,6 +820,7 @@ mlx5_txq_new(struct rte_eth_dev *dev, uint16_t idx,
> uint16_t desc,
>  	tmpl->priv =3D priv;
>  	tmpl->socket =3D socket;
>  	tmpl->txq.elts_n =3D log2above(desc);
> +	tmpl->txq.port_id =3D dev->data->port_id;
>  	tmpl->txq.idx =3D idx;
>  	txq_set_params(tmpl);
>  	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d", @@ -
> 836,15 +879,12 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
> {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_txq_ctrl *txq;
> -	size_t page_size =3D sysconf(_SC_PAGESIZE);
>=20
>  	if (!(*priv->txqs)[idx])
>  		return 0;
>  	txq =3D container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
>  	if (txq->ibv && !mlx5_txq_ibv_release(txq->ibv))
>  		txq->ibv =3D NULL;
> -	munmap((void *)RTE_ALIGN_FLOOR((uintptr_t)txq->txq.bf_reg,
> page_size),
> -	       page_size);
>  	if (rte_atomic32_dec_and_test(&txq->refcnt)) {
>  		txq_free_elts(txq);
>  		mlx5_mr_btree_free(&txq->txq.mr_ctrl.cache_bh);
> --
> 2.11.0