Re: [dpdk-stable] [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache

patches for DPDK stable branches
 help / color / mirror / Atom feed

From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: Feifei Wang <feifei.wang2@arm.com>,
	Matan Azrad <matan@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>,
	Yongseok Koh <yskoh@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "nd@arm.com" <nd@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	Ruifeng Wang <ruifeng.wang@arm.com>
Subject: Re: [dpdk-stable] [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache
Date: Mon, 12 Apr 2021 08:27:30 +0000	[thread overview]
Message-ID: <DM6PR12MB3753D18DAFD3B5FCB4782EA3DF709@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20210318071840.359957-4-feifei.wang2@arm.com>

Hi, Feifei

Sorry, I do not follow what this patch fixes. Do we have some issue/bug with MR cache in practice?

Each Tx queue has its own dedicated "local" cache for MRs to convert buffer address in mbufs being transmitted
to LKeys (HW-related entity handle) and the "global" cache for all MR registered on the device.

AFAIK, how conversion happens in datapath:
- check the local queue cache flush request
- lookup in local cache
- if not found:
- acquire lock for global cache read access
- lookup in global cache
- release lock for global cache

How cache update on memory freeing/unregistering happens:
- acquire lock for global cache write access 
- [a] remove relevant MRs from the global cache
- [b] set local caches flush request 
- free global cache lock

If I understand correctly, your patch swaps [a] and [b],
and local caches flush is requested earlier. What problem does it solve?
It is not supposed there are in datapath some mbufs referencing
to the memory being freed. Application must ensure this and must not allocate
new mbufs from this memory regions being freed. Hence, the lookups for these
MRs in caches should not occur.

For other side, the cache flush has negative effect - the local cache is
getting empty and can't provide translation for other valid (not being removed) MRs,
and the translation has to look up in the global cache, that is locked now
for rebuilding, this causes the delays in datapatch on acquiring global cache lock.
So, I see some potential performance impact. 

With best regards,
Slava

> -----Original Message-----
> From: Feifei Wang <feifei.wang2@arm.com>
> Sent: Thursday, March 18, 2021 9:19
> To: Matan Azrad <matan@nvidia.com>; Shahaf Shuler
> <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> Yongseok Koh <yskoh@mellanox.com>
> Cc: dev@dpdk.org; nd@arm.com; Feifei Wang <feifei.wang2@arm.com>;
> stable@dpdk.org; Ruifeng Wang <ruifeng.wang@arm.com>
> Subject: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache
> 
> 'dev_gen' is a variable to inform other cores to flush their local cache when
> global cache is rebuilt.
> 
> However, if 'dev_gen' is updated after global cache is rebuilt, other cores
> may load a wrong memory region lkey value from old local cache.
> 
> Timeslot        main core               worker core
>   1         rebuild global cache
>   2                                  load unchanged dev_gen
>   3            update dev_gen
>   4                                  look up old local cache
> 
> From the example above, we can see that though global cache is rebuilt, due
> to that dev_gen is not updated, the worker core may look up old cache table
> and receive a wrong memory region lkey value.
> 
> To fix this, updating 'dev_gen' should be moved before rebuilding global
> cache to inform worker cores to flush their local cache when global cache
> start rebuilding. And wmb can ensure the sequence of this process.
> 
> Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")
> Cc: stable@dpdk.org
> 
> Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/mlx5/mlx5_mr.c | 37 +++++++++++++++++--------------------
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index
> da4e91fc2..7ce1d3e64 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -103,20 +103,18 @@ mlx5_mr_mem_event_free_cb(struct
> mlx5_dev_ctx_shared *sh,
>  		rebuild = 1;
>  	}
>  	if (rebuild) {
> -		mlx5_mr_rebuild_cache(&sh->share_cache);
> +		++sh->share_cache.dev_gen;
> +		DEBUG("broadcasting local cache flush, gen=%d",
> +			sh->share_cache.dev_gen);
> +
>  		/*
>  		 * Flush local caches by propagating invalidation across cores.
> -		 * rte_smp_wmb() is enough to synchronize this event. If
> one of
> -		 * freed memsegs is seen by other core, that means the
> memseg
> -		 * has been allocated by allocator, which will come after this
> -		 * free call. Therefore, this store instruction (incrementing
> -		 * generation below) will be guaranteed to be seen by other
> core
> -		 * before the core sees the newly allocated memory.
> +		 * rte_smp_wmb() is to keep the order that dev_gen
> updated before
> +		 * rebuilding global cache. Therefore, other core can flush
> their
> +		 * local cache on time.
>  		 */
> -		++sh->share_cache.dev_gen;
> -		DEBUG("broadcasting local cache flush, gen=%d",
> -		      sh->share_cache.dev_gen);
>  		rte_smp_wmb();
> +		mlx5_mr_rebuild_cache(&sh->share_cache);
>  	}
>  	rte_rwlock_write_unlock(&sh->share_cache.rwlock);
>  }
> @@ -407,20 +405,19 @@ mlx5_dma_unmap(struct rte_pci_device *pdev,
> void *addr,
>  	mlx5_mr_free(mr, sh->share_cache.dereg_mr_cb);
>  	DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
>  	      (void *)mr);
> -	mlx5_mr_rebuild_cache(&sh->share_cache);
> +
> +	++sh->share_cache.dev_gen;
> +	DEBUG("broadcasting local cache flush, gen=%d",
> +		sh->share_cache.dev_gen);
> +
>  	/*
>  	 * Flush local caches by propagating invalidation across cores.
> -	 * rte_smp_wmb() is enough to synchronize this event. If one of
> -	 * freed memsegs is seen by other core, that means the memseg
> -	 * has been allocated by allocator, which will come after this
> -	 * free call. Therefore, this store instruction (incrementing
> -	 * generation below) will be guaranteed to be seen by other core
> -	 * before the core sees the newly allocated memory.
> +	 * rte_smp_wmb() is to keep the order that dev_gen updated
> before
> +	 * rebuilding global cache. Therefore, other core can flush their
> +	 * local cache on time.
>  	 */
> -	++sh->share_cache.dev_gen;
> -	DEBUG("broadcasting local cache flush, gen=%d",
> -	      sh->share_cache.dev_gen);
>  	rte_smp_wmb();
> +	mlx5_mr_rebuild_cache(&sh->share_cache);
>  	rte_rwlock_read_unlock(&sh->share_cache.rwlock);
>  	return 0;
>  }
> --
> 2.25.1

next prev parent reply	other threads:[~2021-04-12  8:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210318071840.359957-1-feifei.wang2@arm.com>
2021-03-18  7:18 ` [dpdk-stable] [PATCH v1 1/4] net/mlx4: " Feifei Wang
2021-03-18  7:18 ` [dpdk-stable] [PATCH v1 3/4] net/mlx5: " Feifei Wang
2021-04-12  8:27   ` Slava Ovsiienko [this message]
2021-04-13  5:20     ` [dpdk-stable] 回复: " Feifei Wang
2021-04-19 18:50       ` [dpdk-stable] " Slava Ovsiienko
2021-04-20  5:53         ` [dpdk-stable] 回复: " Feifei Wang
2021-04-20  7:29           ` Feifei Wang
2021-04-20  7:53             ` [dpdk-stable] " Slava Ovsiienko
2021-04-20  8:42               ` [dpdk-stable] 回复: " Feifei Wang
2021-05-06  2:52                 ` Feifei Wang
2021-05-06 11:21                   ` [dpdk-stable] " Slava Ovsiienko
2021-05-07  6:36                     ` [dpdk-stable] 回复: " Feifei Wang
2021-05-07 10:14                       ` [dpdk-stable] " Slava Ovsiienko
2021-05-08  3:13                         ` [dpdk-stable] 回复: " Feifei Wang
2021-05-11  8:18                           ` [dpdk-stable] " Slava Ovsiienko
2021-05-12  5:34                             ` [dpdk-stable] 回复: " Feifei Wang
2021-05-12 11:07                               ` [dpdk-stable] " Slava Ovsiienko
2021-05-13  5:49                                 ` [dpdk-stable] 回复: " Feifei Wang
2021-05-13 10:49                                   ` [dpdk-stable] " Slava Ovsiienko
2021-05-14  5:18                                     ` [dpdk-stable] 回复: " Feifei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB3753D18DAFD3B5FCB4782EA3DF709@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=feifei.wang2@arm.com \
    --cc=matan@nvidia.com \
    --cc=nd@arm.com \
    --cc=ruifeng.wang@arm.com \
    --cc=shahafs@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).