From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <yskoh@mellanox.com>
Received: from EUR01-HE1-obe.outbound.protection.outlook.com
 (mail-eopbgr130054.outbound.protection.outlook.com [40.107.13.54])
 by dpdk.org (Postfix) with ESMTP id BE4FE1B1FE
 for <dev@dpdk.org>; Fri, 12 Apr 2019 21:22:30 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Z90ZIUvudy/VoHFhZzJlUbR2/nvb3/FiCP6fFqcmx9s=;
 b=rfnwTfFZmEmm6JdOvoYWd/ePt5rvRpP5yDAA7GXBfH7//c9YvMXQqfaI1SWPnEtQYVCBoLqrhgIHlmMgKL5Bm2n52SKbxsA9lPxDD48R5KGDB5j0BW6agatg2ULNW6qMro2s2xXC1+XGIwEa5KyJ+Y+wPlBq7NaBYDYxLmsgyVc=
Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com (52.134.72.27) by
 DB3PR0502MB4075.eurprd05.prod.outlook.com (52.134.67.13) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.1792.14; Fri, 12 Apr 2019 19:22:28 +0000
Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com
 ([fe80::6072:43be:7c2d:103a]) by DB3PR0502MB3980.eurprd05.prod.outlook.com
 ([fe80::6072:43be:7c2d:103a%3]) with mapi id 15.20.1792.009; Fri, 12 Apr 2019
 19:22:28 +0000
From: Yongseok Koh <yskoh@mellanox.com>
To: Slava Ovsiienko <viacheslavo@mellanox.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, Shahaf Shuler <shahafs@mellanox.com>
Thread-Topic: [dpdk-dev] [PATCH 1/1] net/mlx5: share Memory Regions for
 multiport device
Thread-Index: AQHU8WURRZsxq7tda0WzT4lPBI9p0A==
Date: Fri, 12 Apr 2019 19:22:28 +0000
Message-ID: <20190412192218.GA14250@mtidpdk.mti.labs.mlnx>
References: <1555083940-24539-1-git-send-email-viacheslavo@mellanox.com>
In-Reply-To: <1555083940-24539-1-git-send-email-viacheslavo@mellanox.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-clientproxiedby: BYAPR11CA0084.namprd11.prod.outlook.com
 (2603:10b6:a03:f4::25) To DB3PR0502MB3980.eurprd05.prod.outlook.com
 (2603:10a6:8:10::27)
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=yskoh@mellanox.com; 
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [209.116.155.178]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: d5dfed32-1593-4adb-0ce9-08d6bf7c3350
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0;
 RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600139)(711020)(4605104)(4618075)(2017052603328)(7193020);
 SRVR:DB3PR0502MB4075; 
x-ms-traffictypediagnostic: DB3PR0502MB4075:
x-ms-exchange-purlcount: 1
x-microsoft-antispam-prvs: <DB3PR0502MB407543209E65CEA7EBA5C3EAC3280@DB3PR0502MB4075.eurprd05.prod.outlook.com>
x-forefront-prvs: 0005B05917
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(366004)(396003)(376002)(136003)(39860400002)(346002)(189003)(199004)(81166006)(76176011)(256004)(229853002)(30864003)(5660300002)(478600001)(186003)(52116002)(6636002)(6486002)(7736002)(6436002)(305945005)(53936002)(966005)(26005)(107886003)(86362001)(316002)(25786009)(1076003)(3846002)(53946003)(9686003)(54906003)(6116002)(33656002)(71200400001)(8676002)(45080400002)(386003)(14444005)(106356001)(6306002)(81156014)(6506007)(6512007)(6246003)(105586002)(14454004)(66066001)(4326008)(486006)(6862004)(102836004)(476003)(97736004)(68736007)(446003)(99286004)(11346002)(71190400001)(8936002)(2906002)(579004)(559001);
 DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0502MB4075;
 H:DB3PR0502MB3980.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; MX:1; A:1; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: x+cZpjPwDYpA89qo4qVLZphPNxLqkD7yBDYUyaTZVJCxBb24MpFJmmhpirW5VhYNn4oVGsRBxO/hm2JJKT2QN+suaIrRwkUhHCLlN8olue+07tioN4j3JCB37FhtzEI0ZDWW5Fg009pwLIciMHN6jliDOJ64TGUapEkziaouHrPtvHxBwlCqXQR6lcIFl+75BtII3ksbshGWQ/OvuQnoHJsP7ooLgoetAsSnZCTD+qDBdXlR/g3gZED2tq+SGgudH4nIXBB9P5DE694MQ4v3n/hdJqVF5CsSuvhAD1sdOYqDWMuEkVnG/IEUoofEv30E5jOd6u3mfHGtLlyxvqeJXQwlrgE3DOeW7c095WKtlHNTfv3PUUpGT0kau2dyRp6oLiwwW7oYZOQ6ebrzVbs1uiuPOfFZw9GR+BHDppQ5Oa0=
Content-Type: text/plain; charset="us-ascii"
Content-ID: <26288DF300794244B60DFFE58DEDEBA7@eurprd05.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: d5dfed32-1593-4adb-0ce9-08d6bf7c3350
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Apr 2019 19:22:28.2041 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0502MB4075
Subject: Re: [dpdk-dev] [PATCH 1/1] net/mlx5: share Memory Regions for
 multiport device
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Apr 2019 19:22:31 -0000

On Fri, Apr 12, 2019 at 03:45:40PM +0000, Viacheslav Ovsiienko wrote:
> The multiport Infiniband device support was introduced [1].
> All active ports, belonging to the same Infiniband device use the signle
> shared Infiniband context of that device and share the resources:
>   - QPs are created within shared context
>   - Verbs flows are also created with specifying port index
>   - DV/DR resources
>   - Protection Domain
>   - Event Handlers
>=20
> This patchset adds support for Memory Regions sharing between
> portes, created on the base of multiport Infiniban device.
> The datapath of mlx5 uses the layered cache subsystem for
> allocating/releasing Memory Regions, only the lowest layer L3
> is subject to share due to performance issues.
>=20
> [1] https://eur03.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fpa=
tches.dpdk.org%2Fcover%2F51800%2F&amp;data=3D02%7C01%7Cyskoh%40mellanox.com=
%7Cf16a8f547c3b4b3a1f7608d6bf5df4cd%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%=
7C0%7C636906807620990767&amp;sdata=3DuinIUW9euBYb74BhK9f8INkhaTM6jwyCXDhmGt=
mEIME%3D&amp;reserved=3D0
>=20
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c     |  40 +++++++----
>  drivers/net/mlx5/mlx5.h     |  15 ++--
>  drivers/net/mlx5/mlx5_mr.c  | 163 ++++++++++++++++++++++----------------=
------
>  drivers/net/mlx5/mlx5_mr.h  |   5 +-
>  drivers/net/mlx5/mlx5_txq.c |   2 +-
>  5 files changed, 121 insertions(+), 104 deletions(-)
>=20
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 9ff50df..7ff5c8b 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -144,6 +144,7 @@ struct mlx5_dev_spawn_data {
>  	struct mlx5_switch_info info; /**< Switch information. */
>  	struct ibv_device *ibv_dev; /**< Associated IB device. */
>  	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
> +	struct rte_pci_device *pci_dev; /**< Backend PCI device. */
>  };
> =20
>  static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list =3D LIST_HEAD_INITIALI=
ZER();
> @@ -222,6 +223,7 @@ struct mlx5_dev_spawn_data {
>  		sizeof(sh->ibdev_name));
>  	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
>  		sizeof(sh->ibdev_path));
> +	sh->pci_dev =3D spawn->pci_dev;
>  	pthread_mutex_init(&sh->intr_mutex, NULL);
>  	/*
>  	 * Setting port_id to max unallowed value means
> @@ -236,6 +238,22 @@ struct mlx5_dev_spawn_data {
>  		err =3D ENOMEM;
>  		goto error;
>  	}
> +	/*
> +	 * Once the device is added to the list of memory event
> +	 * callback, its global MR cache table cannot be expanded
> +	 * on the fly because of deadlock. If it overflows, lookup
> +	 * should be done by searching MR list linearly, which is slow.
> +	 *
> +	 * At this point the device is not added to the memory
> +	 * event list yet, context is just being created.
> +	 */
> +	err =3D mlx5_mr_btree_init(&sh->mr.cache,
> +				 MLX5_MR_BTREE_CACHE_N * 2,
> +				 sh->pci_dev->device.numa_node);
> +	if (err) {
> +		err =3D rte_errno;
> +		goto error;
> +	}
>  	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
>  exit:
>  	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
> @@ -283,6 +301,8 @@ struct mlx5_dev_spawn_data {
>  	assert(rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY);
>  	if (--sh->refcnt)
>  		goto exit;
> +	/* Release created Memory Regions. */
> +	mlx5_mr_release(sh);
>  	LIST_REMOVE(sh, next);
>  	/*
>  	 *  Ensure there is no async event handler installed.
> @@ -615,7 +635,10 @@ struct mlx5_dev_spawn_data {
>  	}
>  	mlx5_proc_priv_uninit(dev);
>  	mlx5_mprq_free_mp(dev);
> -	mlx5_mr_release(dev);
> +	/* Remove from memory callback device list. */
> +	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
> +	LIST_REMOVE(priv, mem_event_cb);
> +	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);

I'm almost okay with this patch but I'm concerned about this part. As devic=
es
sharing the same context is added to the callback list,
mlx5_mr_mem_event_free_cb() will be called multiple times with the same
addr/len. This should be redundant and unnecessary. How about adding the sh=
ared
ctx to the list instead?

>  	assert(priv->sh);
>  	mlx5_free_shared_dr(priv);
>  	if (priv->rss_conf.rss_key !=3D NULL)
> @@ -1493,19 +1516,6 @@ struct mlx5_dev_spawn_data {
>  		goto error;
>  	}
>  	priv->config.flow_prio =3D err;
> -	/*
> -	 * Once the device is added to the list of memory event
> -	 * callback, its global MR cache table cannot be expanded
> -	 * on the fly because of deadlock. If it overflows, lookup
> -	 * should be done by searching MR list linearly, which is slow.
> -	 */
> -	err =3D mlx5_mr_btree_init(&priv->mr.cache,
> -				 MLX5_MR_BTREE_CACHE_N * 2,
> -				 eth_dev->device->numa_node);
> -	if (err) {
> -		err =3D rte_errno;
> -		goto error;
> -	}
>  	/* Add device to memory callback list. */
>  	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
>  	LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,
> @@ -1702,6 +1712,7 @@ struct mlx5_dev_spawn_data {
>  			list[ns].ibv_port =3D i;
>  			list[ns].ibv_dev =3D ibv_match[0];
>  			list[ns].eth_dev =3D NULL;
> +			list[ns].pci_dev =3D pci_dev;
>  			list[ns].ifindex =3D mlx5_nl_ifindex
>  					(nl_rdma, list[ns].ibv_dev->name, i);
>  			if (!list[ns].ifindex) {
> @@ -1768,6 +1779,7 @@ struct mlx5_dev_spawn_data {
>  			list[ns].ibv_port =3D 1;
>  			list[ns].ibv_dev =3D ibv_match[i];
>  			list[ns].eth_dev =3D NULL;
> +			list[ns].pci_dev =3D pci_dev;
>  			list[ns].ifindex =3D 0;
>  			if (nl_rdma >=3D 0)
>  				list[ns].ifindex =3D mlx5_nl_ifindex
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 14c7f3c..8eb1019 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -257,6 +257,14 @@ struct mlx5_ibv_shared {
>  	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
>  	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
>  	struct ibv_device_attr_ex device_attr; /* Device properties. */
> +	struct rte_pci_device *pci_dev; /* Backend PCI device. */
> +	struct { /* Global device shared L3 MR cache. */

The comment sounds redundant and this structure has not only the global cac=
he
but also MR list.

> +		uint32_t dev_gen; /* Generation number to flush local caches. */
> +		rte_rwlock_t rwlock; /* MR Lock. */
> +		struct mlx5_mr_btree cache; /* Global MR cache table. */
> +		struct mlx5_mr_list mr_list; /* Registered MR list. */
> +		struct mlx5_mr_list mr_free_list; /* Freed MR list. */
> +	} mr;
>  	/* Shared DV/DR flow data section. */
>  	pthread_mutex_t dv_mutex; /* DV context mutex. */
>  	uint32_t dv_refcnt; /* DV/DR data reference counter. */
> @@ -323,13 +331,6 @@ struct mlx5_priv {
>  	struct mlx5_flows ctrl_flows; /* Control flow rules. */
>  	LIST_HEAD(counters, mlx5_flow_counter) flow_counters;
>  	/* Flow counters. */
> -	struct {
> -		uint32_t dev_gen; /* Generation number to flush local caches. */
> -		rte_rwlock_t rwlock; /* MR Lock. */
> -		struct mlx5_mr_btree cache; /* Global MR cache table. */
> -		struct mlx5_mr_list mr_list; /* Registered MR list. */
> -		struct mlx5_mr_list mr_free_list; /* Freed MR list. */
> -	} mr;
>  	LIST_HEAD(rxq, mlx5_rxq_ctrl) rxqsctrl; /* DPDK Rx queues. */
>  	LIST_HEAD(rxqibv, mlx5_rxq_ibv) rxqsibv; /* Verbs Rx queues. */
>  	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index a3732d4..5140cc2 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -36,7 +36,7 @@ struct mr_update_mp_data {
> =20
>  /**
>   * Expand B-tree table to a given size. Can't be called with holding
> - * memory_hotplug_lock or priv->mr.rwlock due to rte_realloc().
> + * memory_hotplug_lock or sh->mr.rwlock due to rte_realloc().
>   *
>   * @param bt
>   *   Pointer to B-tree structure.
> @@ -350,7 +350,7 @@ struct mr_update_mp_data {
>  		n =3D mr_find_next_chunk(mr, &entry, n);
>  		if (!entry.end)
>  			break;
> -		if (mr_btree_insert(&priv->mr.cache, &entry) < 0) {
> +		if (mr_btree_insert(&priv->sh->mr.cache, &entry) < 0) {
>  			/*
>  			 * Overflowed, but the global table cannot be expanded
>  			 * because of deadlock.
> @@ -382,7 +382,7 @@ struct mr_update_mp_data {
>  	struct mlx5_mr *mr;
> =20
>  	/* Iterate all the existing MRs. */
> -	LIST_FOREACH(mr, &priv->mr.mr_list, mr) {
> +	LIST_FOREACH(mr, &priv->sh->mr.mr_list, mr) {
>  		unsigned int n;
> =20
>  		if (mr->ms_n =3D=3D 0)
> @@ -420,6 +420,7 @@ struct mr_update_mp_data {
>  	      uintptr_t addr)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	uint16_t idx;
>  	uint32_t lkey =3D UINT32_MAX;
>  	struct mlx5_mr *mr;
> @@ -430,10 +431,10 @@ struct mr_update_mp_data {
>  	 * has to be searched by traversing the original MR list instead, which
>  	 * is very slow path. Otherwise, the global cache is all inclusive.
>  	 */
> -	if (!unlikely(priv->mr.cache.overflow)) {
> -		lkey =3D mr_btree_lookup(&priv->mr.cache, &idx, addr);
> +	if (!unlikely(sh->mr.cache.overflow)) {
> +		lkey =3D mr_btree_lookup(&sh->mr.cache, &idx, addr);
>  		if (lkey !=3D UINT32_MAX)
> -			*entry =3D (*priv->mr.cache.table)[idx];
> +			*entry =3D (*sh->mr.cache.table)[idx];
>  	} else {
>  		/* Falling back to the slowest path. */
>  		mr =3D mr_lookup_dev_list(dev, entry, addr);
> @@ -468,13 +469,12 @@ struct mr_update_mp_data {
>  /**
>   * Release resources of detached MR having no online entry.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> + * @param sh
> + *   Pointer to Ethernet device shared context.
>   */
>  static void
> -mlx5_mr_garbage_collect(struct rte_eth_dev *dev)
> +mlx5_mr_garbage_collect(struct mlx5_ibv_shared *sh)
>  {
> -	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_mr *mr_next;
>  	struct mlx5_mr_list free_list =3D LIST_HEAD_INITIALIZER(free_list);
> =20
> @@ -484,11 +484,11 @@ struct mr_update_mp_data {
>  	 * MR can't be freed with holding the lock because rte_free() could cal=
l
>  	 * memory free callback function. This will be a deadlock situation.
>  	 */
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/* Detach the whole free list and release it after unlocking. */
> -	free_list =3D priv->mr.mr_free_list;
> -	LIST_INIT(&priv->mr.mr_free_list);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	free_list =3D sh->mr.mr_free_list;
> +	LIST_INIT(&sh->mr.mr_free_list);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	/* Release resources. */
>  	mr_next =3D LIST_FIRST(&free_list);
>  	while (mr_next !=3D NULL) {
> @@ -548,12 +548,12 @@ struct mr_update_mp_data {
>  		      dev->data->port_id, (void *)addr);
>  		return UINT32_MAX;
>  	}
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	rte_rwlock_read_lock(&priv->sh->mr.rwlock);
>  	/* Fill in output data. */
>  	mr_lookup_dev(dev, entry, addr);
>  	/* Lookup can't fail. */
>  	assert(entry->lkey !=3D UINT32_MAX);
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&priv->sh->mr.rwlock);
>  	DEBUG("port %u MR CREATED by primary process for %p:\n"
>  	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=3D0x%x",
>  	      dev->data->port_id, (void *)addr,
> @@ -582,6 +582,7 @@ struct mr_update_mp_data {
>  		       uintptr_t addr)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_dev_config *config =3D &priv->config;
>  	struct rte_mem_config *mcfg =3D rte_eal_get_configuration()->mem_config=
;
>  	const struct rte_memseg_list *msl;
> @@ -602,12 +603,12 @@ struct mr_update_mp_data {
>  		dev->data->port_id, (void *)addr);
>  	/*
>  	 * Release detached MRs if any. This can't be called with holding eithe=
r
> -	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
> +	 * memory_hotplug_lock or sh->mr.rwlock. MRs on the free list have
>  	 * been detached by the memory free event but it couldn't be released
>  	 * inside the callback due to deadlock. As a result, releasing resource=
s
>  	 * is quite opportunistic.
>  	 */
> -	mlx5_mr_garbage_collect(dev);
> +	mlx5_mr_garbage_collect(sh);
>  	/*
>  	 * If enabled, find out a contiguous virtual address chunk in use, to
>  	 * which the given address belongs, in order to register maximum range.
> @@ -710,7 +711,7 @@ struct mr_update_mp_data {
>  		goto alloc_resources;
>  	}
>  	assert(data.msl =3D=3D data_re.msl);
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/*
>  	 * Check the address is really missing. If other thread already created
>  	 * one or it is not found due to overflow, abort and return.
> @@ -721,10 +722,10 @@ struct mr_update_mp_data {
>  		 * low-on-memory. Then, this entry will have to be searched
>  		 * here again.
>  		 */
> -		mr_btree_insert(&priv->mr.cache, entry);
> +		mr_btree_insert(&sh->mr.cache, entry);
>  		DEBUG("port %u found MR for %p on final lookup, abort",
>  		      dev->data->port_id, (void *)addr);
> -		rte_rwlock_write_unlock(&priv->mr.rwlock);
> +		rte_rwlock_write_unlock(&sh->mr.rwlock);
>  		rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  		/*
>  		 * Must be unlocked before calling rte_free() because
> @@ -769,7 +770,7 @@ struct mr_update_mp_data {
>  	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
>  	 * through mlx5_alloc_verbs_buf().
>  	 */
> -	mr->ibv_mr =3D mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
> +	mr->ibv_mr =3D mlx5_glue->reg_mr(sh->pd, (void *)data.start, len,
>  				       IBV_ACCESS_LOCAL_WRITE);
>  	if (mr->ibv_mr =3D=3D NULL) {
>  		DEBUG("port %u fail to create a verbs MR for address (%p)",
> @@ -779,7 +780,7 @@ struct mr_update_mp_data {
>  	}
>  	assert((uintptr_t)mr->ibv_mr->addr =3D=3D data.start);
>  	assert(mr->ibv_mr->length =3D=3D len);
> -	LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> +	LIST_INSERT_HEAD(&sh->mr.mr_list, mr, mr);
>  	DEBUG("port %u MR CREATED (%p) for %p:\n"
>  	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
>  	      " lkey=3D0x%x base_idx=3D%u ms_n=3D%u, ms_bmp_n=3D%u",
> @@ -792,11 +793,11 @@ struct mr_update_mp_data {
>  	mr_lookup_dev(dev, entry, addr);
>  	/* Lookup can't fail. */
>  	assert(entry->lkey !=3D UINT32_MAX);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  	return entry->lkey;
>  err_mrlock:
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  err_memlock:
>  	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  err_nolock:
> @@ -854,14 +855,15 @@ struct mr_update_mp_data {
>  mr_rebuild_dev_cache(struct rte_eth_dev *dev)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_mr *mr;
> =20
>  	DRV_LOG(DEBUG, "port %u rebuild dev cache[]", dev->data->port_id);
>  	/* Flush cache to rebuild. */
> -	priv->mr.cache.len =3D 1;
> -	priv->mr.cache.overflow =3D 0;
> +	sh->mr.cache.len =3D 1;
> +	sh->mr.cache.overflow =3D 0;
>  	/* Iterate all the existing MRs. */
> -	LIST_FOREACH(mr, &priv->mr.mr_list, mr)
> +	LIST_FOREACH(mr, &sh->mr.mr_list, mr)
>  		if (mr_insert_dev_cache(dev, mr) < 0)
>  			return;
>  }
> @@ -888,6 +890,7 @@ struct mr_update_mp_data {
>  mlx5_mr_mem_event_free_cb(struct rte_eth_dev *dev, const void *addr, siz=
e_t len)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	const struct rte_memseg_list *msl;
>  	struct mlx5_mr *mr;
>  	int ms_n;
> @@ -901,7 +904,7 @@ struct mr_update_mp_data {
>  	assert((uintptr_t)addr =3D=3D RTE_ALIGN((uintptr_t)addr, msl->page_sz))=
;
>  	assert(len =3D=3D RTE_ALIGN(len, msl->page_sz));
>  	ms_n =3D len / msl->page_sz;
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/* Clear bits of freed memsegs from MR. */
>  	for (i =3D 0; i < ms_n; ++i) {
>  		const struct rte_memseg *ms;
> @@ -928,7 +931,7 @@ struct mr_update_mp_data {
>  		rte_bitmap_clear(mr->ms_bmp, pos);
>  		if (--mr->ms_n =3D=3D 0) {
>  			LIST_REMOVE(mr, mr);
> -			LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> +			LIST_INSERT_HEAD(&sh->mr.mr_free_list, mr, mr);
>  			DEBUG("port %u remove MR(%p) from list",
>  			      dev->data->port_id, (void *)mr);
>  		}
> @@ -949,12 +952,12 @@ struct mr_update_mp_data {
>  		 * generation below) will be guaranteed to be seen by other core
>  		 * before the core sees the newly allocated memory.
>  		 */
> -		++priv->mr.dev_gen;
> +		++sh->mr.dev_gen;
>  		DEBUG("broadcasting local cache flush, gen=3D%d",
> -		      priv->mr.dev_gen);
> +		      sh->mr.dev_gen);
>  		rte_smp_wmb();
>  	}
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  }
> =20
>  /**
> @@ -1013,6 +1016,7 @@ struct mr_update_mp_data {
>  		   struct mlx5_mr_cache *entry, uintptr_t addr)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_mr_btree *bt =3D &mr_ctrl->cache_bh;
>  	uint16_t idx;
>  	uint32_t lkey;
> @@ -1021,12 +1025,12 @@ struct mr_update_mp_data {
>  	if (unlikely(bt->len =3D=3D bt->size))
>  		mr_btree_expand(bt, bt->size << 1);
>  	/* Look up in the global cache. */
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> -	lkey =3D mr_btree_lookup(&priv->mr.cache, &idx, addr);
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
> +	lkey =3D mr_btree_lookup(&sh->mr.cache, &idx, addr);
>  	if (lkey !=3D UINT32_MAX) {
>  		/* Found. */
> -		*entry =3D (*priv->mr.cache.table)[idx];
> -		rte_rwlock_read_unlock(&priv->mr.rwlock);
> +		*entry =3D (*sh->mr.cache.table)[idx];
> +		rte_rwlock_read_unlock(&sh->mr.rwlock);
>  		/*
>  		 * Update local cache. Even if it fails, return the found entry
>  		 * to update top-half cache. Next time, this entry will be found
> @@ -1035,7 +1039,7 @@ struct mr_update_mp_data {
>  		mr_btree_insert(bt, entry);
>  		return lkey;
>  	}
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  	/* First time to see the address? Create a new MR. */
>  	lkey =3D mlx5_mr_create(dev, entry, addr);
>  	/*
> @@ -1261,6 +1265,7 @@ struct mr_update_mp_data {
>  	struct mr_update_mp_data *data =3D opaque;
>  	struct rte_eth_dev *dev =3D data->dev;
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_mr_ctrl *mr_ctrl =3D data->mr_ctrl;
>  	struct mlx5_mr *mr =3D NULL;
>  	uintptr_t addr =3D (uintptr_t)memhdr->addr;
> @@ -1270,9 +1275,9 @@ struct mr_update_mp_data {
> =20
>  	assert(rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY);
>  	/* If already registered, it should return. */
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
>  	lkey =3D mr_lookup_dev(dev, &entry, addr);
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  	if (lkey !=3D UINT32_MAX)
>  		return;
>  	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
> @@ -1286,11 +1291,11 @@ struct mr_update_mp_data {
>  		data->ret =3D -1;
>  		return;
>  	}
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> -	LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
> +	LIST_INSERT_HEAD(&sh->mr.mr_list, mr, mr);
>  	/* Insert to the global cache table. */
>  	mr_insert_dev_cache(dev, mr);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	/* Insert to the local cache table */
>  	mlx5_mr_addr2mr_bh(dev, mr_ctrl, addr);
>  }
> @@ -1345,6 +1350,7 @@ struct mr_update_mp_data {
>  	struct rte_eth_dev *dev;
>  	struct mlx5_mr *mr;
>  	struct mlx5_priv *priv;
> +	struct mlx5_ibv_shared *sh;
> =20
>  	dev =3D pci_dev_to_eth_dev(pdev);
>  	if (!dev) {
> @@ -1361,11 +1367,12 @@ struct mr_update_mp_data {
>  		rte_errno =3D EINVAL;
>  		return -1;
>  	}
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> -	LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> +	sh =3D priv->sh;
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
> +	LIST_INSERT_HEAD(&sh->mr.mr_list, mr, mr);
>  	/* Insert to the global cache table. */
>  	mr_insert_dev_cache(dev, mr);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	return 0;
>  }
> =20
> @@ -1390,6 +1397,7 @@ struct mr_update_mp_data {
>  {
>  	struct rte_eth_dev *dev;
>  	struct mlx5_priv *priv;
> +	struct mlx5_ibv_shared *sh;
>  	struct mlx5_mr *mr;
>  	struct mlx5_mr_cache entry;
> =20
> @@ -1401,10 +1409,11 @@ struct mr_update_mp_data {
>  		return -1;
>  	}
>  	priv =3D dev->data->dev_private;
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	sh =3D priv->sh;
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
>  	mr =3D mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
>  	if (!mr) {
> -		rte_rwlock_read_unlock(&priv->mr.rwlock);
> +		rte_rwlock_read_unlock(&sh->mr.rwlock);
>  		DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
>  				 "to PCI device %p", (uintptr_t)addr,
>  				 (void *)pdev);
> @@ -1412,7 +1421,7 @@ struct mr_update_mp_data {
>  		return -1;
>  	}
>  	LIST_REMOVE(mr, mr);
> -	LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> +	LIST_INSERT_HEAD(&sh->mr.mr_free_list, mr, mr);
>  	DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
>  	      (void *)mr);
>  	mr_rebuild_dev_cache(dev);
> @@ -1425,11 +1434,11 @@ struct mr_update_mp_data {
>  	 * generation below) will be guaranteed to be seen by other core
>  	 * before the core sees the newly allocated memory.
>  	 */
> -	++priv->mr.dev_gen;
> +	++sh->mr.dev_gen;
>  	DEBUG("broadcasting local cache flush, gen=3D%d",
> -			priv->mr.dev_gen);
> +			sh->mr.dev_gen);

Looks indentation was wrong even before your change.
Please correct it.

Thanks,
Yongseok

>  	rte_smp_wmb();
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  	return 0;
>  }
> =20
> @@ -1550,25 +1559,24 @@ struct mr_update_mp_data {
>  /**
>   * Dump all the created MRs and the global cache entries.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> + * @param sh
> + *   Pointer to Ethernet device shared context.
>   */
>  void
> -mlx5_mr_dump_dev(struct rte_eth_dev *dev __rte_unused)
> +mlx5_mr_dump_dev(struct mlx5_ibv_shared *sh __rte_unused)
>  {
>  #ifndef NDEBUG
> -	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_mr *mr;
>  	int mr_n =3D 0;
>  	int chunk_n =3D 0;
> =20
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
>  	/* Iterate all the existing MRs. */
> -	LIST_FOREACH(mr, &priv->mr.mr_list, mr) {
> +	LIST_FOREACH(mr, &sh->mr.mr_list, mr) {
>  		unsigned int n;
> =20
> -		DEBUG("port %u MR[%u], LKey =3D 0x%x, ms_n =3D %u, ms_bmp_n =3D %u",
> -		      dev->data->port_id, mr_n++,
> +		DEBUG("device %s MR[%u], LKey =3D 0x%x, ms_n =3D %u, ms_bmp_n =3D %u",
> +		      sh->ibdev_name, mr_n++,
>  		      rte_cpu_to_be_32(mr->ibv_mr->lkey),
>  		      mr->ms_n, mr->ms_bmp_n);
>  		if (mr->ms_n =3D=3D 0)
> @@ -1583,45 +1591,40 @@ struct mr_update_mp_data {
>  			      chunk_n++, ret.start, ret.end);
>  		}
>  	}
> -	DEBUG("port %u dumping global cache", dev->data->port_id);
> -	mlx5_mr_btree_dump(&priv->mr.cache);
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	DEBUG("device %s dumping global cache", sh->ibdev_name);
> +	mlx5_mr_btree_dump(&sh->mr.cache);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  #endif
>  }
> =20
>  /**
> - * Release all the created MRs and resources. Remove device from memory =
callback
> + * Release all the created MRs and resources for shared device context.
>   * list.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> + * @param sh
> + *   Pointer to Ethernet device shared context.
>   */
>  void
> -mlx5_mr_release(struct rte_eth_dev *dev)
> +mlx5_mr_release(struct mlx5_ibv_shared *sh)
>  {
> -	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_mr *mr_next;
> =20
> -	/* Remove from memory callback device list. */
> -	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
> -	LIST_REMOVE(priv, mem_event_cb);
> -	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);
>  	if (rte_log_get_level(mlx5_logtype) =3D=3D RTE_LOG_DEBUG)
> -		mlx5_mr_dump_dev(dev);
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +		mlx5_mr_dump_dev(sh);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/* Detach from MR list and move to free list. */
> -	mr_next =3D LIST_FIRST(&priv->mr.mr_list);
> +	mr_next =3D LIST_FIRST(&sh->mr.mr_list);
>  	while (mr_next !=3D NULL) {
>  		struct mlx5_mr *mr =3D mr_next;
> =20
>  		mr_next =3D LIST_NEXT(mr, mr);
>  		LIST_REMOVE(mr, mr);
> -		LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> +		LIST_INSERT_HEAD(&sh->mr.mr_free_list, mr, mr);
>  	}
> -	LIST_INIT(&priv->mr.mr_list);
> +	LIST_INIT(&sh->mr.mr_list);
>  	/* Free global cache. */
> -	mlx5_mr_btree_free(&priv->mr.cache);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	mlx5_mr_btree_free(&sh->mr.cache);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	/* Free all remaining MRs. */
> -	mlx5_mr_garbage_collect(dev);
> +	mlx5_mr_garbage_collect(sh);
>  }
> diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
> index 786f6a3..89e89b7 100644
> --- a/drivers/net/mlx5/mlx5_mr.h
> +++ b/drivers/net/mlx5/mlx5_mr.h
> @@ -62,6 +62,7 @@ struct mlx5_mr_ctrl {
>  	struct mlx5_mr_btree cache_bh; /* Cache for bottom-half. */
>  } __rte_packed;
> =20
> +struct mlx5_ibv_shared;
>  extern struct mlx5_dev_list  mlx5_mem_event_cb_list;
>  extern rte_rwlock_t mlx5_mem_event_rwlock;
> =20
> @@ -76,11 +77,11 @@ void mlx5_mr_mem_event_cb(enum rte_mem_event event_ty=
pe, const void *addr,
>  			  size_t len, void *arg);
>  int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_c=
trl,
>  		      struct rte_mempool *mp);
> -void mlx5_mr_release(struct rte_eth_dev *dev);
> +void mlx5_mr_release(struct mlx5_ibv_shared *sh);
> =20
>  /* Debug purpose functions. */
>  void mlx5_mr_btree_dump(struct mlx5_mr_btree *bt);
> -void mlx5_mr_dump_dev(struct rte_eth_dev *dev);
> +void mlx5_mr_dump_dev(struct mlx5_ibv_shared *sh);
> =20
>  /**
>   * Look up LKey from given lookup table by linear search. Firstly look u=
p the
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 9965b2b..ed82594 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -814,7 +814,7 @@ struct mlx5_txq_ctrl *
>  		goto error;
>  	}
>  	/* Save pointer of global generation number to check memory event. */
> -	tmpl->txq.mr_ctrl.dev_gen_ptr =3D &priv->mr.dev_gen;
> +	tmpl->txq.mr_ctrl.dev_gen_ptr =3D &priv->sh->mr.dev_gen;
>  	assert(desc > MLX5_TX_COMP_THRESH);
>  	tmpl->txq.offloads =3D conf->offloads |
>  			     dev->data->dev_conf.txmode.offloads;
> --=20
> 1.8.3.1
>=20

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id 1174BA0096
	for <public@inbox.dpdk.org>; Fri, 12 Apr 2019 21:22:33 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id F2D961B206;
	Fri, 12 Apr 2019 21:22:31 +0200 (CEST)
Received: from EUR01-HE1-obe.outbound.protection.outlook.com
 (mail-eopbgr130054.outbound.protection.outlook.com [40.107.13.54])
 by dpdk.org (Postfix) with ESMTP id BE4FE1B1FE
 for <dev@dpdk.org>; Fri, 12 Apr 2019 21:22:30 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Z90ZIUvudy/VoHFhZzJlUbR2/nvb3/FiCP6fFqcmx9s=;
 b=rfnwTfFZmEmm6JdOvoYWd/ePt5rvRpP5yDAA7GXBfH7//c9YvMXQqfaI1SWPnEtQYVCBoLqrhgIHlmMgKL5Bm2n52SKbxsA9lPxDD48R5KGDB5j0BW6agatg2ULNW6qMro2s2xXC1+XGIwEa5KyJ+Y+wPlBq7NaBYDYxLmsgyVc=
Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com (52.134.72.27) by
 DB3PR0502MB4075.eurprd05.prod.outlook.com (52.134.67.13) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.1792.14; Fri, 12 Apr 2019 19:22:28 +0000
Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com
 ([fe80::6072:43be:7c2d:103a]) by DB3PR0502MB3980.eurprd05.prod.outlook.com
 ([fe80::6072:43be:7c2d:103a%3]) with mapi id 15.20.1792.009; Fri, 12 Apr 2019
 19:22:28 +0000
From: Yongseok Koh <yskoh@mellanox.com>
To: Slava Ovsiienko <viacheslavo@mellanox.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, Shahaf Shuler <shahafs@mellanox.com>
Thread-Topic: [dpdk-dev] [PATCH 1/1] net/mlx5: share Memory Regions for
 multiport device
Thread-Index: AQHU8WURRZsxq7tda0WzT4lPBI9p0A==
Date: Fri, 12 Apr 2019 19:22:28 +0000
Message-ID: <20190412192218.GA14250@mtidpdk.mti.labs.mlnx>
References: <1555083940-24539-1-git-send-email-viacheslavo@mellanox.com>
In-Reply-To: <1555083940-24539-1-git-send-email-viacheslavo@mellanox.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-clientproxiedby: BYAPR11CA0084.namprd11.prod.outlook.com
 (2603:10b6:a03:f4::25) To DB3PR0502MB3980.eurprd05.prod.outlook.com
 (2603:10a6:8:10::27)
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=yskoh@mellanox.com; 
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [209.116.155.178]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: d5dfed32-1593-4adb-0ce9-08d6bf7c3350
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0;
 RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600139)(711020)(4605104)(4618075)(2017052603328)(7193020);
 SRVR:DB3PR0502MB4075; 
x-ms-traffictypediagnostic: DB3PR0502MB4075:
x-ms-exchange-purlcount: 1
x-microsoft-antispam-prvs: <DB3PR0502MB407543209E65CEA7EBA5C3EAC3280@DB3PR0502MB4075.eurprd05.prod.outlook.com>
x-forefront-prvs: 0005B05917
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(366004)(396003)(376002)(136003)(39860400002)(346002)(189003)(199004)(81166006)(76176011)(256004)(229853002)(30864003)(5660300002)(478600001)(186003)(52116002)(6636002)(6486002)(7736002)(6436002)(305945005)(53936002)(966005)(26005)(107886003)(86362001)(316002)(25786009)(1076003)(3846002)(53946003)(9686003)(54906003)(6116002)(33656002)(71200400001)(8676002)(45080400002)(386003)(14444005)(106356001)(6306002)(81156014)(6506007)(6512007)(6246003)(105586002)(14454004)(66066001)(4326008)(486006)(6862004)(102836004)(476003)(97736004)(68736007)(446003)(99286004)(11346002)(71190400001)(8936002)(2906002)(579004)(559001);
 DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0502MB4075;
 H:DB3PR0502MB3980.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; MX:1; A:1; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: x+cZpjPwDYpA89qo4qVLZphPNxLqkD7yBDYUyaTZVJCxBb24MpFJmmhpirW5VhYNn4oVGsRBxO/hm2JJKT2QN+suaIrRwkUhHCLlN8olue+07tioN4j3JCB37FhtzEI0ZDWW5Fg009pwLIciMHN6jliDOJ64TGUapEkziaouHrPtvHxBwlCqXQR6lcIFl+75BtII3ksbshGWQ/OvuQnoHJsP7ooLgoetAsSnZCTD+qDBdXlR/g3gZED2tq+SGgudH4nIXBB9P5DE694MQ4v3n/hdJqVF5CsSuvhAD1sdOYqDWMuEkVnG/IEUoofEv30E5jOd6u3mfHGtLlyxvqeJXQwlrgE3DOeW7c095WKtlHNTfv3PUUpGT0kau2dyRp6oLiwwW7oYZOQ6ebrzVbs1uiuPOfFZw9GR+BHDppQ5Oa0=
Content-Type: text/plain; charset="UTF-8"
Content-ID: <26288DF300794244B60DFFE58DEDEBA7@eurprd05.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: d5dfed32-1593-4adb-0ce9-08d6bf7c3350
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Apr 2019 19:22:28.2041 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0502MB4075
Subject: Re: [dpdk-dev] [PATCH 1/1] net/mlx5: share Memory Regions for
 multiport device
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190412192228.y-3nPVhFepy7XZ_yhsF7CGMhm31KlBmQZM8Kom3Z3q4@z>

On Fri, Apr 12, 2019 at 03:45:40PM +0000, Viacheslav Ovsiienko wrote:
> The multiport Infiniband device support was introduced [1].
> All active ports, belonging to the same Infiniband device use the signle
> shared Infiniband context of that device and share the resources:
>   - QPs are created within shared context
>   - Verbs flows are also created with specifying port index
>   - DV/DR resources
>   - Protection Domain
>   - Event Handlers
>=20
> This patchset adds support for Memory Regions sharing between
> portes, created on the base of multiport Infiniban device.
> The datapath of mlx5 uses the layered cache subsystem for
> allocating/releasing Memory Regions, only the lowest layer L3
> is subject to share due to performance issues.
>=20
> [1] https://eur03.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fpa=
tches.dpdk.org%2Fcover%2F51800%2F&amp;data=3D02%7C01%7Cyskoh%40mellanox.com=
%7Cf16a8f547c3b4b3a1f7608d6bf5df4cd%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%=
7C0%7C636906807620990767&amp;sdata=3DuinIUW9euBYb74BhK9f8INkhaTM6jwyCXDhmGt=
mEIME%3D&amp;reserved=3D0
>=20
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c     |  40 +++++++----
>  drivers/net/mlx5/mlx5.h     |  15 ++--
>  drivers/net/mlx5/mlx5_mr.c  | 163 ++++++++++++++++++++++----------------=
------
>  drivers/net/mlx5/mlx5_mr.h  |   5 +-
>  drivers/net/mlx5/mlx5_txq.c |   2 +-
>  5 files changed, 121 insertions(+), 104 deletions(-)
>=20
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 9ff50df..7ff5c8b 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -144,6 +144,7 @@ struct mlx5_dev_spawn_data {
>  	struct mlx5_switch_info info; /**< Switch information. */
>  	struct ibv_device *ibv_dev; /**< Associated IB device. */
>  	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
> +	struct rte_pci_device *pci_dev; /**< Backend PCI device. */
>  };
> =20
>  static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list =3D LIST_HEAD_INITIALI=
ZER();
> @@ -222,6 +223,7 @@ struct mlx5_dev_spawn_data {
>  		sizeof(sh->ibdev_name));
>  	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
>  		sizeof(sh->ibdev_path));
> +	sh->pci_dev =3D spawn->pci_dev;
>  	pthread_mutex_init(&sh->intr_mutex, NULL);
>  	/*
>  	 * Setting port_id to max unallowed value means
> @@ -236,6 +238,22 @@ struct mlx5_dev_spawn_data {
>  		err =3D ENOMEM;
>  		goto error;
>  	}
> +	/*
> +	 * Once the device is added to the list of memory event
> +	 * callback, its global MR cache table cannot be expanded
> +	 * on the fly because of deadlock. If it overflows, lookup
> +	 * should be done by searching MR list linearly, which is slow.
> +	 *
> +	 * At this point the device is not added to the memory
> +	 * event list yet, context is just being created.
> +	 */
> +	err =3D mlx5_mr_btree_init(&sh->mr.cache,
> +				 MLX5_MR_BTREE_CACHE_N * 2,
> +				 sh->pci_dev->device.numa_node);
> +	if (err) {
> +		err =3D rte_errno;
> +		goto error;
> +	}
>  	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
>  exit:
>  	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
> @@ -283,6 +301,8 @@ struct mlx5_dev_spawn_data {
>  	assert(rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY);
>  	if (--sh->refcnt)
>  		goto exit;
> +	/* Release created Memory Regions. */
> +	mlx5_mr_release(sh);
>  	LIST_REMOVE(sh, next);
>  	/*
>  	 *  Ensure there is no async event handler installed.
> @@ -615,7 +635,10 @@ struct mlx5_dev_spawn_data {
>  	}
>  	mlx5_proc_priv_uninit(dev);
>  	mlx5_mprq_free_mp(dev);
> -	mlx5_mr_release(dev);
> +	/* Remove from memory callback device list. */
> +	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
> +	LIST_REMOVE(priv, mem_event_cb);
> +	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);

I'm almost okay with this patch but I'm concerned about this part. As devic=
es
sharing the same context is added to the callback list,
mlx5_mr_mem_event_free_cb() will be called multiple times with the same
addr/len. This should be redundant and unnecessary. How about adding the sh=
ared
ctx to the list instead?

>  	assert(priv->sh);
>  	mlx5_free_shared_dr(priv);
>  	if (priv->rss_conf.rss_key !=3D NULL)
> @@ -1493,19 +1516,6 @@ struct mlx5_dev_spawn_data {
>  		goto error;
>  	}
>  	priv->config.flow_prio =3D err;
> -	/*
> -	 * Once the device is added to the list of memory event
> -	 * callback, its global MR cache table cannot be expanded
> -	 * on the fly because of deadlock. If it overflows, lookup
> -	 * should be done by searching MR list linearly, which is slow.
> -	 */
> -	err =3D mlx5_mr_btree_init(&priv->mr.cache,
> -				 MLX5_MR_BTREE_CACHE_N * 2,
> -				 eth_dev->device->numa_node);
> -	if (err) {
> -		err =3D rte_errno;
> -		goto error;
> -	}
>  	/* Add device to memory callback list. */
>  	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
>  	LIST_INSERT_HEAD(&mlx5_shared_data->mem_event_cb_list,
> @@ -1702,6 +1712,7 @@ struct mlx5_dev_spawn_data {
>  			list[ns].ibv_port =3D i;
>  			list[ns].ibv_dev =3D ibv_match[0];
>  			list[ns].eth_dev =3D NULL;
> +			list[ns].pci_dev =3D pci_dev;
>  			list[ns].ifindex =3D mlx5_nl_ifindex
>  					(nl_rdma, list[ns].ibv_dev->name, i);
>  			if (!list[ns].ifindex) {
> @@ -1768,6 +1779,7 @@ struct mlx5_dev_spawn_data {
>  			list[ns].ibv_port =3D 1;
>  			list[ns].ibv_dev =3D ibv_match[i];
>  			list[ns].eth_dev =3D NULL;
> +			list[ns].pci_dev =3D pci_dev;
>  			list[ns].ifindex =3D 0;
>  			if (nl_rdma >=3D 0)
>  				list[ns].ifindex =3D mlx5_nl_ifindex
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 14c7f3c..8eb1019 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -257,6 +257,14 @@ struct mlx5_ibv_shared {
>  	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
>  	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
>  	struct ibv_device_attr_ex device_attr; /* Device properties. */
> +	struct rte_pci_device *pci_dev; /* Backend PCI device. */
> +	struct { /* Global device shared L3 MR cache. */

The comment sounds redundant and this structure has not only the global cac=
he
but also MR list.

> +		uint32_t dev_gen; /* Generation number to flush local caches. */
> +		rte_rwlock_t rwlock; /* MR Lock. */
> +		struct mlx5_mr_btree cache; /* Global MR cache table. */
> +		struct mlx5_mr_list mr_list; /* Registered MR list. */
> +		struct mlx5_mr_list mr_free_list; /* Freed MR list. */
> +	} mr;
>  	/* Shared DV/DR flow data section. */
>  	pthread_mutex_t dv_mutex; /* DV context mutex. */
>  	uint32_t dv_refcnt; /* DV/DR data reference counter. */
> @@ -323,13 +331,6 @@ struct mlx5_priv {
>  	struct mlx5_flows ctrl_flows; /* Control flow rules. */
>  	LIST_HEAD(counters, mlx5_flow_counter) flow_counters;
>  	/* Flow counters. */
> -	struct {
> -		uint32_t dev_gen; /* Generation number to flush local caches. */
> -		rte_rwlock_t rwlock; /* MR Lock. */
> -		struct mlx5_mr_btree cache; /* Global MR cache table. */
> -		struct mlx5_mr_list mr_list; /* Registered MR list. */
> -		struct mlx5_mr_list mr_free_list; /* Freed MR list. */
> -	} mr;
>  	LIST_HEAD(rxq, mlx5_rxq_ctrl) rxqsctrl; /* DPDK Rx queues. */
>  	LIST_HEAD(rxqibv, mlx5_rxq_ibv) rxqsibv; /* Verbs Rx queues. */
>  	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index a3732d4..5140cc2 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -36,7 +36,7 @@ struct mr_update_mp_data {
> =20
>  /**
>   * Expand B-tree table to a given size. Can't be called with holding
> - * memory_hotplug_lock or priv->mr.rwlock due to rte_realloc().
> + * memory_hotplug_lock or sh->mr.rwlock due to rte_realloc().
>   *
>   * @param bt
>   *   Pointer to B-tree structure.
> @@ -350,7 +350,7 @@ struct mr_update_mp_data {
>  		n =3D mr_find_next_chunk(mr, &entry, n);
>  		if (!entry.end)
>  			break;
> -		if (mr_btree_insert(&priv->mr.cache, &entry) < 0) {
> +		if (mr_btree_insert(&priv->sh->mr.cache, &entry) < 0) {
>  			/*
>  			 * Overflowed, but the global table cannot be expanded
>  			 * because of deadlock.
> @@ -382,7 +382,7 @@ struct mr_update_mp_data {
>  	struct mlx5_mr *mr;
> =20
>  	/* Iterate all the existing MRs. */
> -	LIST_FOREACH(mr, &priv->mr.mr_list, mr) {
> +	LIST_FOREACH(mr, &priv->sh->mr.mr_list, mr) {
>  		unsigned int n;
> =20
>  		if (mr->ms_n =3D=3D 0)
> @@ -420,6 +420,7 @@ struct mr_update_mp_data {
>  	      uintptr_t addr)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	uint16_t idx;
>  	uint32_t lkey =3D UINT32_MAX;
>  	struct mlx5_mr *mr;
> @@ -430,10 +431,10 @@ struct mr_update_mp_data {
>  	 * has to be searched by traversing the original MR list instead, which
>  	 * is very slow path. Otherwise, the global cache is all inclusive.
>  	 */
> -	if (!unlikely(priv->mr.cache.overflow)) {
> -		lkey =3D mr_btree_lookup(&priv->mr.cache, &idx, addr);
> +	if (!unlikely(sh->mr.cache.overflow)) {
> +		lkey =3D mr_btree_lookup(&sh->mr.cache, &idx, addr);
>  		if (lkey !=3D UINT32_MAX)
> -			*entry =3D (*priv->mr.cache.table)[idx];
> +			*entry =3D (*sh->mr.cache.table)[idx];
>  	} else {
>  		/* Falling back to the slowest path. */
>  		mr =3D mr_lookup_dev_list(dev, entry, addr);
> @@ -468,13 +469,12 @@ struct mr_update_mp_data {
>  /**
>   * Release resources of detached MR having no online entry.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> + * @param sh
> + *   Pointer to Ethernet device shared context.
>   */
>  static void
> -mlx5_mr_garbage_collect(struct rte_eth_dev *dev)
> +mlx5_mr_garbage_collect(struct mlx5_ibv_shared *sh)
>  {
> -	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_mr *mr_next;
>  	struct mlx5_mr_list free_list =3D LIST_HEAD_INITIALIZER(free_list);
> =20
> @@ -484,11 +484,11 @@ struct mr_update_mp_data {
>  	 * MR can't be freed with holding the lock because rte_free() could cal=
l
>  	 * memory free callback function. This will be a deadlock situation.
>  	 */
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/* Detach the whole free list and release it after unlocking. */
> -	free_list =3D priv->mr.mr_free_list;
> -	LIST_INIT(&priv->mr.mr_free_list);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	free_list =3D sh->mr.mr_free_list;
> +	LIST_INIT(&sh->mr.mr_free_list);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	/* Release resources. */
>  	mr_next =3D LIST_FIRST(&free_list);
>  	while (mr_next !=3D NULL) {
> @@ -548,12 +548,12 @@ struct mr_update_mp_data {
>  		      dev->data->port_id, (void *)addr);
>  		return UINT32_MAX;
>  	}
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	rte_rwlock_read_lock(&priv->sh->mr.rwlock);
>  	/* Fill in output data. */
>  	mr_lookup_dev(dev, entry, addr);
>  	/* Lookup can't fail. */
>  	assert(entry->lkey !=3D UINT32_MAX);
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&priv->sh->mr.rwlock);
>  	DEBUG("port %u MR CREATED by primary process for %p:\n"
>  	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "), lkey=3D0x%x",
>  	      dev->data->port_id, (void *)addr,
> @@ -582,6 +582,7 @@ struct mr_update_mp_data {
>  		       uintptr_t addr)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_dev_config *config =3D &priv->config;
>  	struct rte_mem_config *mcfg =3D rte_eal_get_configuration()->mem_config=
;
>  	const struct rte_memseg_list *msl;
> @@ -602,12 +603,12 @@ struct mr_update_mp_data {
>  		dev->data->port_id, (void *)addr);
>  	/*
>  	 * Release detached MRs if any. This can't be called with holding eithe=
r
> -	 * memory_hotplug_lock or priv->mr.rwlock. MRs on the free list have
> +	 * memory_hotplug_lock or sh->mr.rwlock. MRs on the free list have
>  	 * been detached by the memory free event but it couldn't be released
>  	 * inside the callback due to deadlock. As a result, releasing resource=
s
>  	 * is quite opportunistic.
>  	 */
> -	mlx5_mr_garbage_collect(dev);
> +	mlx5_mr_garbage_collect(sh);
>  	/*
>  	 * If enabled, find out a contiguous virtual address chunk in use, to
>  	 * which the given address belongs, in order to register maximum range.
> @@ -710,7 +711,7 @@ struct mr_update_mp_data {
>  		goto alloc_resources;
>  	}
>  	assert(data.msl =3D=3D data_re.msl);
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/*
>  	 * Check the address is really missing. If other thread already created
>  	 * one or it is not found due to overflow, abort and return.
> @@ -721,10 +722,10 @@ struct mr_update_mp_data {
>  		 * low-on-memory. Then, this entry will have to be searched
>  		 * here again.
>  		 */
> -		mr_btree_insert(&priv->mr.cache, entry);
> +		mr_btree_insert(&sh->mr.cache, entry);
>  		DEBUG("port %u found MR for %p on final lookup, abort",
>  		      dev->data->port_id, (void *)addr);
> -		rte_rwlock_write_unlock(&priv->mr.rwlock);
> +		rte_rwlock_write_unlock(&sh->mr.rwlock);
>  		rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  		/*
>  		 * Must be unlocked before calling rte_free() because
> @@ -769,7 +770,7 @@ struct mr_update_mp_data {
>  	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
>  	 * through mlx5_alloc_verbs_buf().
>  	 */
> -	mr->ibv_mr =3D mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
> +	mr->ibv_mr =3D mlx5_glue->reg_mr(sh->pd, (void *)data.start, len,
>  				       IBV_ACCESS_LOCAL_WRITE);
>  	if (mr->ibv_mr =3D=3D NULL) {
>  		DEBUG("port %u fail to create a verbs MR for address (%p)",
> @@ -779,7 +780,7 @@ struct mr_update_mp_data {
>  	}
>  	assert((uintptr_t)mr->ibv_mr->addr =3D=3D data.start);
>  	assert(mr->ibv_mr->length =3D=3D len);
> -	LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> +	LIST_INSERT_HEAD(&sh->mr.mr_list, mr, mr);
>  	DEBUG("port %u MR CREATED (%p) for %p:\n"
>  	      "  [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
>  	      " lkey=3D0x%x base_idx=3D%u ms_n=3D%u, ms_bmp_n=3D%u",
> @@ -792,11 +793,11 @@ struct mr_update_mp_data {
>  	mr_lookup_dev(dev, entry, addr);
>  	/* Lookup can't fail. */
>  	assert(entry->lkey !=3D UINT32_MAX);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  	return entry->lkey;
>  err_mrlock:
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  err_memlock:
>  	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  err_nolock:
> @@ -854,14 +855,15 @@ struct mr_update_mp_data {
>  mr_rebuild_dev_cache(struct rte_eth_dev *dev)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_mr *mr;
> =20
>  	DRV_LOG(DEBUG, "port %u rebuild dev cache[]", dev->data->port_id);
>  	/* Flush cache to rebuild. */
> -	priv->mr.cache.len =3D 1;
> -	priv->mr.cache.overflow =3D 0;
> +	sh->mr.cache.len =3D 1;
> +	sh->mr.cache.overflow =3D 0;
>  	/* Iterate all the existing MRs. */
> -	LIST_FOREACH(mr, &priv->mr.mr_list, mr)
> +	LIST_FOREACH(mr, &sh->mr.mr_list, mr)
>  		if (mr_insert_dev_cache(dev, mr) < 0)
>  			return;
>  }
> @@ -888,6 +890,7 @@ struct mr_update_mp_data {
>  mlx5_mr_mem_event_free_cb(struct rte_eth_dev *dev, const void *addr, siz=
e_t len)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	const struct rte_memseg_list *msl;
>  	struct mlx5_mr *mr;
>  	int ms_n;
> @@ -901,7 +904,7 @@ struct mr_update_mp_data {
>  	assert((uintptr_t)addr =3D=3D RTE_ALIGN((uintptr_t)addr, msl->page_sz))=
;
>  	assert(len =3D=3D RTE_ALIGN(len, msl->page_sz));
>  	ms_n =3D len / msl->page_sz;
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/* Clear bits of freed memsegs from MR. */
>  	for (i =3D 0; i < ms_n; ++i) {
>  		const struct rte_memseg *ms;
> @@ -928,7 +931,7 @@ struct mr_update_mp_data {
>  		rte_bitmap_clear(mr->ms_bmp, pos);
>  		if (--mr->ms_n =3D=3D 0) {
>  			LIST_REMOVE(mr, mr);
> -			LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> +			LIST_INSERT_HEAD(&sh->mr.mr_free_list, mr, mr);
>  			DEBUG("port %u remove MR(%p) from list",
>  			      dev->data->port_id, (void *)mr);
>  		}
> @@ -949,12 +952,12 @@ struct mr_update_mp_data {
>  		 * generation below) will be guaranteed to be seen by other core
>  		 * before the core sees the newly allocated memory.
>  		 */
> -		++priv->mr.dev_gen;
> +		++sh->mr.dev_gen;
>  		DEBUG("broadcasting local cache flush, gen=3D%d",
> -		      priv->mr.dev_gen);
> +		      sh->mr.dev_gen);
>  		rte_smp_wmb();
>  	}
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  }
> =20
>  /**
> @@ -1013,6 +1016,7 @@ struct mr_update_mp_data {
>  		   struct mlx5_mr_cache *entry, uintptr_t addr)
>  {
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_mr_btree *bt =3D &mr_ctrl->cache_bh;
>  	uint16_t idx;
>  	uint32_t lkey;
> @@ -1021,12 +1025,12 @@ struct mr_update_mp_data {
>  	if (unlikely(bt->len =3D=3D bt->size))
>  		mr_btree_expand(bt, bt->size << 1);
>  	/* Look up in the global cache. */
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> -	lkey =3D mr_btree_lookup(&priv->mr.cache, &idx, addr);
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
> +	lkey =3D mr_btree_lookup(&sh->mr.cache, &idx, addr);
>  	if (lkey !=3D UINT32_MAX) {
>  		/* Found. */
> -		*entry =3D (*priv->mr.cache.table)[idx];
> -		rte_rwlock_read_unlock(&priv->mr.rwlock);
> +		*entry =3D (*sh->mr.cache.table)[idx];
> +		rte_rwlock_read_unlock(&sh->mr.rwlock);
>  		/*
>  		 * Update local cache. Even if it fails, return the found entry
>  		 * to update top-half cache. Next time, this entry will be found
> @@ -1035,7 +1039,7 @@ struct mr_update_mp_data {
>  		mr_btree_insert(bt, entry);
>  		return lkey;
>  	}
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  	/* First time to see the address? Create a new MR. */
>  	lkey =3D mlx5_mr_create(dev, entry, addr);
>  	/*
> @@ -1261,6 +1265,7 @@ struct mr_update_mp_data {
>  	struct mr_update_mp_data *data =3D opaque;
>  	struct rte_eth_dev *dev =3D data->dev;
>  	struct mlx5_priv *priv =3D dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh =3D priv->sh;
>  	struct mlx5_mr_ctrl *mr_ctrl =3D data->mr_ctrl;
>  	struct mlx5_mr *mr =3D NULL;
>  	uintptr_t addr =3D (uintptr_t)memhdr->addr;
> @@ -1270,9 +1275,9 @@ struct mr_update_mp_data {
> =20
>  	assert(rte_eal_process_type() =3D=3D RTE_PROC_PRIMARY);
>  	/* If already registered, it should return. */
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
>  	lkey =3D mr_lookup_dev(dev, &entry, addr);
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  	if (lkey !=3D UINT32_MAX)
>  		return;
>  	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
> @@ -1286,11 +1291,11 @@ struct mr_update_mp_data {
>  		data->ret =3D -1;
>  		return;
>  	}
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> -	LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
> +	LIST_INSERT_HEAD(&sh->mr.mr_list, mr, mr);
>  	/* Insert to the global cache table. */
>  	mr_insert_dev_cache(dev, mr);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	/* Insert to the local cache table */
>  	mlx5_mr_addr2mr_bh(dev, mr_ctrl, addr);
>  }
> @@ -1345,6 +1350,7 @@ struct mr_update_mp_data {
>  	struct rte_eth_dev *dev;
>  	struct mlx5_mr *mr;
>  	struct mlx5_priv *priv;
> +	struct mlx5_ibv_shared *sh;
> =20
>  	dev =3D pci_dev_to_eth_dev(pdev);
>  	if (!dev) {
> @@ -1361,11 +1367,12 @@ struct mr_update_mp_data {
>  		rte_errno =3D EINVAL;
>  		return -1;
>  	}
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> -	LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> +	sh =3D priv->sh;
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
> +	LIST_INSERT_HEAD(&sh->mr.mr_list, mr, mr);
>  	/* Insert to the global cache table. */
>  	mr_insert_dev_cache(dev, mr);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	return 0;
>  }
> =20
> @@ -1390,6 +1397,7 @@ struct mr_update_mp_data {
>  {
>  	struct rte_eth_dev *dev;
>  	struct mlx5_priv *priv;
> +	struct mlx5_ibv_shared *sh;
>  	struct mlx5_mr *mr;
>  	struct mlx5_mr_cache entry;
> =20
> @@ -1401,10 +1409,11 @@ struct mr_update_mp_data {
>  		return -1;
>  	}
>  	priv =3D dev->data->dev_private;
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	sh =3D priv->sh;
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
>  	mr =3D mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
>  	if (!mr) {
> -		rte_rwlock_read_unlock(&priv->mr.rwlock);
> +		rte_rwlock_read_unlock(&sh->mr.rwlock);
>  		DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
>  				 "to PCI device %p", (uintptr_t)addr,
>  				 (void *)pdev);
> @@ -1412,7 +1421,7 @@ struct mr_update_mp_data {
>  		return -1;
>  	}
>  	LIST_REMOVE(mr, mr);
> -	LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> +	LIST_INSERT_HEAD(&sh->mr.mr_free_list, mr, mr);
>  	DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
>  	      (void *)mr);
>  	mr_rebuild_dev_cache(dev);
> @@ -1425,11 +1434,11 @@ struct mr_update_mp_data {
>  	 * generation below) will be guaranteed to be seen by other core
>  	 * before the core sees the newly allocated memory.
>  	 */
> -	++priv->mr.dev_gen;
> +	++sh->mr.dev_gen;
>  	DEBUG("broadcasting local cache flush, gen=3D%d",
> -			priv->mr.dev_gen);
> +			sh->mr.dev_gen);

Looks indentation was wrong even before your change.
Please correct it.

Thanks,
Yongseok

>  	rte_smp_wmb();
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  	return 0;
>  }
> =20
> @@ -1550,25 +1559,24 @@ struct mr_update_mp_data {
>  /**
>   * Dump all the created MRs and the global cache entries.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> + * @param sh
> + *   Pointer to Ethernet device shared context.
>   */
>  void
> -mlx5_mr_dump_dev(struct rte_eth_dev *dev __rte_unused)
> +mlx5_mr_dump_dev(struct mlx5_ibv_shared *sh __rte_unused)
>  {
>  #ifndef NDEBUG
> -	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_mr *mr;
>  	int mr_n =3D 0;
>  	int chunk_n =3D 0;
> =20
> -	rte_rwlock_read_lock(&priv->mr.rwlock);
> +	rte_rwlock_read_lock(&sh->mr.rwlock);
>  	/* Iterate all the existing MRs. */
> -	LIST_FOREACH(mr, &priv->mr.mr_list, mr) {
> +	LIST_FOREACH(mr, &sh->mr.mr_list, mr) {
>  		unsigned int n;
> =20
> -		DEBUG("port %u MR[%u], LKey =3D 0x%x, ms_n =3D %u, ms_bmp_n =3D %u",
> -		      dev->data->port_id, mr_n++,
> +		DEBUG("device %s MR[%u], LKey =3D 0x%x, ms_n =3D %u, ms_bmp_n =3D %u",
> +		      sh->ibdev_name, mr_n++,
>  		      rte_cpu_to_be_32(mr->ibv_mr->lkey),
>  		      mr->ms_n, mr->ms_bmp_n);
>  		if (mr->ms_n =3D=3D 0)
> @@ -1583,45 +1591,40 @@ struct mr_update_mp_data {
>  			      chunk_n++, ret.start, ret.end);
>  		}
>  	}
> -	DEBUG("port %u dumping global cache", dev->data->port_id);
> -	mlx5_mr_btree_dump(&priv->mr.cache);
> -	rte_rwlock_read_unlock(&priv->mr.rwlock);
> +	DEBUG("device %s dumping global cache", sh->ibdev_name);
> +	mlx5_mr_btree_dump(&sh->mr.cache);
> +	rte_rwlock_read_unlock(&sh->mr.rwlock);
>  #endif
>  }
> =20
>  /**
> - * Release all the created MRs and resources. Remove device from memory =
callback
> + * Release all the created MRs and resources for shared device context.
>   * list.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> + * @param sh
> + *   Pointer to Ethernet device shared context.
>   */
>  void
> -mlx5_mr_release(struct rte_eth_dev *dev)
> +mlx5_mr_release(struct mlx5_ibv_shared *sh)
>  {
> -	struct mlx5_priv *priv =3D dev->data->dev_private;
>  	struct mlx5_mr *mr_next;
> =20
> -	/* Remove from memory callback device list. */
> -	rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
> -	LIST_REMOVE(priv, mem_event_cb);
> -	rte_rwlock_write_unlock(&mlx5_shared_data->mem_event_rwlock);
>  	if (rte_log_get_level(mlx5_logtype) =3D=3D RTE_LOG_DEBUG)
> -		mlx5_mr_dump_dev(dev);
> -	rte_rwlock_write_lock(&priv->mr.rwlock);
> +		mlx5_mr_dump_dev(sh);
> +	rte_rwlock_write_lock(&sh->mr.rwlock);
>  	/* Detach from MR list and move to free list. */
> -	mr_next =3D LIST_FIRST(&priv->mr.mr_list);
> +	mr_next =3D LIST_FIRST(&sh->mr.mr_list);
>  	while (mr_next !=3D NULL) {
>  		struct mlx5_mr *mr =3D mr_next;
> =20
>  		mr_next =3D LIST_NEXT(mr, mr);
>  		LIST_REMOVE(mr, mr);
> -		LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> +		LIST_INSERT_HEAD(&sh->mr.mr_free_list, mr, mr);
>  	}
> -	LIST_INIT(&priv->mr.mr_list);
> +	LIST_INIT(&sh->mr.mr_list);
>  	/* Free global cache. */
> -	mlx5_mr_btree_free(&priv->mr.cache);
> -	rte_rwlock_write_unlock(&priv->mr.rwlock);
> +	mlx5_mr_btree_free(&sh->mr.cache);
> +	rte_rwlock_write_unlock(&sh->mr.rwlock);
>  	/* Free all remaining MRs. */
> -	mlx5_mr_garbage_collect(dev);
> +	mlx5_mr_garbage_collect(sh);
>  }
> diff --git a/drivers/net/mlx5/mlx5_mr.h b/drivers/net/mlx5/mlx5_mr.h
> index 786f6a3..89e89b7 100644
> --- a/drivers/net/mlx5/mlx5_mr.h
> +++ b/drivers/net/mlx5/mlx5_mr.h
> @@ -62,6 +62,7 @@ struct mlx5_mr_ctrl {
>  	struct mlx5_mr_btree cache_bh; /* Cache for bottom-half. */
>  } __rte_packed;
> =20
> +struct mlx5_ibv_shared;
>  extern struct mlx5_dev_list  mlx5_mem_event_cb_list;
>  extern rte_rwlock_t mlx5_mem_event_rwlock;
> =20
> @@ -76,11 +77,11 @@ void mlx5_mr_mem_event_cb(enum rte_mem_event event_ty=
pe, const void *addr,
>  			  size_t len, void *arg);
>  int mlx5_mr_update_mp(struct rte_eth_dev *dev, struct mlx5_mr_ctrl *mr_c=
trl,
>  		      struct rte_mempool *mp);
> -void mlx5_mr_release(struct rte_eth_dev *dev);
> +void mlx5_mr_release(struct mlx5_ibv_shared *sh);
> =20
>  /* Debug purpose functions. */
>  void mlx5_mr_btree_dump(struct mlx5_mr_btree *bt);
> -void mlx5_mr_dump_dev(struct rte_eth_dev *dev);
> +void mlx5_mr_dump_dev(struct mlx5_ibv_shared *sh);
> =20
>  /**
>   * Look up LKey from given lookup table by linear search. Firstly look u=
p the
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 9965b2b..ed82594 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -814,7 +814,7 @@ struct mlx5_txq_ctrl *
>  		goto error;
>  	}
>  	/* Save pointer of global generation number to check memory event. */
> -	tmpl->txq.mr_ctrl.dev_gen_ptr =3D &priv->mr.dev_gen;
> +	tmpl->txq.mr_ctrl.dev_gen_ptr =3D &priv->sh->mr.dev_gen;
>  	assert(desc > MLX5_TX_COMP_THRESH);
>  	tmpl->txq.offloads =3D conf->offloads |
>  			     dev->data->dev_conf.txmode.offloads;
> --=20
> 1.8.3.1
>=20