DPDK patches and discussions
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: Feifei Wang <Feifei.Wang2@arm.com>,
	Matan Azrad <matan@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache
Date: Wed, 12 May 2021 11:07:47 +0000
Message-ID: <DM6PR12MB375370B85D291BE7DC8D2A45DF529@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <DB9PR08MB6923F8CB86CF59921CA3DD31C8529@DB9PR08MB6923.eurprd08.prod.outlook.com>

Hi, Feifei

..ship..
> 
> If I understand correctly, your meaning is that if without wmb, maybe other
> agents observe changed "dev_gen", but they also observe unchanged
> "global" cache.
> This can be defined as  memory inconsistent state.
> 
> 					Fig1
> ----------------------------------------------------------------------------------------------
> -------
> Timeslot        	      agent_1               		               agent_2
> 1		     take_lock
> 2          		update dev_gen
> 3                       					     observe changed dev_gen
> 4						         clear local cache
> 
> 5            		rebuild global cache		               wait_lock
> 6		      free_lock
> 7		         wmb			                take_lock
> 8							get(new MR)
> 9                           				             	   free_lock
> ----------------------------------------------------------------------------------------------
> -------

Yes, something like that.

 
> 1. However, in out-of-order platform, though adding a 'wmb at last',
> 'dev_gen' maybe updated before global cache rebuild, and then other
> agents can observe changed 'dev_ge'
> before rebuilding global cache.
> 
> Thus, though add a 'wmb at last', It is still unable to prevent other agents
> from observing some inconsistent state. As a result, 'wmb at last' fails to
> keep consistence.
> 
> 2. On the other hand, due to lock, agent_2 will wait to take a lock until global
> cache rebuilt by agent_1, and this ensures agent_2 can get a correct new MR
> and update new local cache correctly.
> 
> In summary, 'wmb at last' cannot guarantee other agents to observe the
> consistent state.
> But lock can fix this error. So, the existence of wmb at last is redundant and
> we can remove it.

If dev_gen change is committed and cache's one is not yet - the agent_2 might see
inconsistent state even inside the lock-protected section. Hence, we must commit
all writes before leaving the locked section in agent_1.

Let’s suppose there is no wmb in agent_1 at all, and dev_gen is arbitrary committed
by CPU and MR cache data change is not. We leave the locked section in agent_1, agent_2
sees dev_gen changed, takes the lock and sees inconsistent MR-cache state due to not all changes
made in agent_1 are committed. With wmb we have now in the existing code - there is no issue like that.

With best regards,
Slava

> 
> Best Regards
> Feifei
> 
> > With best regards,
> > Slava
> >
> >
> > >
> > > Best Regards
> > > Feifei
> > >
> > > > -----邮件原件-----
> > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > 发送时间: 2021年5月7日 18:15
> > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Ruifeng
> > Wang
> > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region
> > > > cache
> > > >
> > > > Hi, Feifei
> > > >
> > > > We should consider the locks in your scenario - it is crucial for
> > > > the complete model description:
> > > >
> > > > How agent_1 (in your terms) rebuilds global cache:
> > > >
> > > > 1a) lock()
> > > > 1b) rebuild(global cache)
> > > > 1c) update(dev_gen)
> > > > 1d) wmb()
> > > > 1e) unlock()
> > > >
> > > > How agent_2 checks:
> > > >
> > > > 2a) check(dev_gen) (assume positive - changed)
> > > > 2b) clear(local_cache)
> > > > 2c) miss(on empty local_cache) - > eventually it goes to
> > > > mr_lookup_caches()
> > > > 2d) lock()
> > > > 2e) get(new MR)
> > > > 2f) unlock()
> > > > 2g) update(local cache with obtained new MR)
> > > >
> > > > Hence, even if 1c) becomes visible in 2a) before 1b) committed
> > > > (say, due to out-of-order Arch) - the agent 2 would be blocked on
> > > > 2d) and scenario depicted on your Fig2 would not happen (agent_2
> > > > will wait before step 3 till agent 1 unlocks after its step 5).
> > > >
> > > > With best regards,
> > > > Slava
> > > >
> > > > > -----Original Message-----
> > > > > From: Feifei Wang <Feifei.Wang2@arm.com>
> > > > > Sent: Friday, May 7, 2021 9:36> To: Slava Ovsiienko
> > > > > <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>;
> Shahaf
> > > > > Shuler <shahafs@nvidia.com>
> > > > > Cc: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Ruifeng
> > Wang
> > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory
> > > > > Region cache
> > > > >
> > > > > Hi, Slava
> > > > >
> > > > > Thanks very much for your reply.
> > > > >
> > > > > > -----邮件原件-----
> > > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > > 发送时间: 2021年5月6日 19:22
> > > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> Ruifeng
> > > > Wang
> > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory
> > > > > > Region cache
> > > > > >
> > > > > > Hi, Feifei
> > > > > >
> > > > > > Sorry, I do not follow why we should get rid of the last
> > > > > > (after dev_gen update) wmb.
> > > > > > We've rebuilt the global cache, we should notify other agents
> > > > > > it's happened and they should flush local caches. So, dev_gen
> > > > > > change should be made visible to other agents to trigger this
> > > > > > activity and the second wmb is here to ensure this.
> > > > >
> > > > > 1. For the first problem why we should get rid of the last wmb
> > > > > and move it before dev_gen updated, I think our attention is how
> > > > > the wmb implements the synchronization between multiple agents.
> > > > > 					Fig1
> > > > > ----------------------------------------------------------------
> > > > > --
> > > > > --
> > > > > --
> > > > > ------------------------
> > > > > -------
> > > > > Timeslot        		agent_1               		   agent_2
> > > > > 1          		rebuild global cache
> > > > > 2                       		wmb
> > > > > 3            		     update dev_gen ----------------------- load changed
> > > > > dev_gen
> > > > > 4                                  			        	           rebuild local
> cache
> > > > > ----------------------------------------------------------------
> > > > > --
> > > > > --
> > > > > --
> > > > > ------------------------
> > > > > -------
> > > > >
> > > > > First, wmb is only for local thread to keep the order between
> > > > > local
> > > > > write- write :
> > > > > Based on the picture above, for agent_1, wmb keeps the order
> > > > > that rebuilding global cache is always before updating dev_gen.
> > > > >
> > > > > Second, agent_1 communicates with agent_2 by the global variable
> > > > > "dev_gen" :
> > > > > If agent_1 updates dev_gen, agent_2 will load it and then it
> > > > > knows it should rebuild local cache
> > > > >
> > > > > Finally, agent_2 rebuilds local cache according to whether
> > > > > agent_1 has rebuilt global cache, and agent_2 knows this
> > > > > information by the variable
> > > > "dev_gen".
> > > > > 					Fig2
> > > > > ----------------------------------------------------------------
> > > > > --
> > > > > --
> > > > > --
> > > > > ------------------------
> > > > > -------
> > > > > Timeslot        		agent_1               		   agent_2
> > > > > 1		        update dev_gen
> > > > > 2						      load changed
> dev_gen
> > > > > 3						          rebuild local
> cache
> > > > > 4        		    rebuild global cache
> > > > > 5			 wmb
> > > > > ----------------------------------------------------------------
> > > > > --
> > > > > --
> > > > > --
> > > > > ------------------------
> > > > > -------
> > > > >
> > > > > However, in arm platform, if wmb is after dev_gen updated,
> "dev_gen"
> > > > > may be updated before agent_1 rebuilding global cache, then
> > > > > agent_2 maybe receive error message and rebuild its local cache
> > > > > in
> > advance.
> > > > >
> > > > > To summarize, it is not important which time other agents can
> > > > > see the changed global variable "dev_gen".
> > > > > (Actually, wmb after "dev_gen" cannot ensure changed "dev_gen"
> > > > > is committed to the global).
> > > > > It is more important that if other agents see the changed
> > > > > "dev_gen", they also can know global cache has been updated.
> > > > >
> > > > > > One more point, due to registering new/destroying existing MR
> > > > > > involves FW (via kernel) calls, it takes so many CPU cycles
> > > > > > that we could neglect wmb overhead at all.
> > > > >
> > > > > We just move the last wmb into the right place, and not delete
> > > > > it for performance.
> > > > >
> > > > > >
> > > > > > Also, regarding this:
> > > > > >
> > > > > >  > > Another question suddenly occurred to me, in order to
> > > > > > keep the
> > > > > > >
> > > > > > > order that rebuilding global cache before updating
> > > > > > > ”dev_gen“, the
> > > > > > > > wmb should be before updating "dev_gen" rather than after it.
> > > > > >  > > Otherwise, in the out-of-order platforms, current order
> > > > > > cannot be
> > > > > kept.
> > > > > >
> > > > > > it is not clear why ordering is important - global cache
> > > > > > update and dev_gen change happen under spinlock protection, so
> > > > > > only the last wmb is meaningful.
> > > > > >
> > > > >
> > > > > 2. The second function of wmb before "dev_gen" updated is for
> > > > > performance according to our previous discussion.
> > > > > According to Fig2, if there is no wmb between "global cache updated"
> > > > > and "dev_gen updated", "dev_gen" may update before global cache
> > > > updated.
> > > > >
> > > > > Then agent_2 may see the changed "dev_gen" and flush entire
> > > > > local cache in advance.
> > > > >
> > > > > This entire flush can degrade the performance:
> > > > > "the local cache is getting empty and can't provide translation
> > > > > for other valid (not being removed) MRs, and the translation has
> > > > > to look up in the global cache, that is locked now for
> > > > > rebuilding, this causes the delays in data path on acquiring global
> cache lock."
> > > > >
> > > > > Furthermore, spinlock is just for global cache, not for dev_gen
> > > > > and local cache.
> > > > >
> > > > > > To summarize, in my opinion:
> > > > > > - if you see some issue with ordering of global cache
> > > > > > update/dev_gen signalling,
> > > > > >   could you, please, elaborate? I'm not sure we should
> > > > > > maintain an order (due to spinlock protection)
> > > > > > - the last rte_smp_wmb() after dev_gen incrementing should be
> > > > > > kept intact
> > > > > >
> > > > >
> > > > > At last, for my view, there are two functions that moving wmb
> > > > > before "dev_gen"
> > > > > for the write-write order:
> > > > > --------------------------------
> > > > > a) rebuild global cache;
> > > > > b) rte_smp_wmb();
> > > > > c) updating dev_gen
> > > > > -------------------------------- 1. Achieve synchronization
> > > > > between multiple threads in the right way 2.
> > > > > Prevent other agents from flushing local cache early to ensure
> > > > > performance
> > > > >
> > > > > Best Regards
> > > > > Feifei
> > > > >
> > > > > > With best regards,
> > > > > > Slava
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Feifei Wang <Feifei.Wang2@arm.com>
> > > > > > > Sent: Thursday, May 6, 2021 5:52
> > > > > > > To: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad
> > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > Cc: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Ruifeng
> > > > Wang
> > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > Memory Region cache
> > > > > > >
> > > > > > > Hi, Slava
> > > > > > >
> > > > > > > Would you have more comments about this patch?
> > > > > > > For my sight, only one wmb before "dev_gen" updating is
> > > > > > > enough to synchronize.
> > > > > > >
> > > > > > > Thanks very much for your attention.
> > > > > > >
> > > > > > >
> > > > > > > Best Regards
> > > > > > > Feifei
> > > > > > >
> > > > > > > > -----邮件原件-----
> > > > > > > > 发件人: Feifei Wang
> > > > > > > > 发送时间: 2021年4月20日 16:42
> > > > > > > > 收件人: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan
> Azrad
> > > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > Ruifeng
> > > > > > Wang
> > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > Memory
> > > > > Region
> > > > > > > > cache
> > > > > > > >
> > > > > > > > Hi, Slava
> > > > > > > >
> > > > > > > > I think the second wmb can be removed.
> > > > > > > > As I know, wmb is just a barrier to keep the order between
> > > > > > > > write and
> > > > > > write.
> > > > > > > > and it cannot tell the CPU when it should commit the changes.
> > > > > > > >
> > > > > > > > It is usually used before guard variable to keep the order
> > > > > > > > that updating guard variable after some changes, which you
> > > > > > > > want to release,
> > > > > > > have been done.
> > > > > > > >
> > > > > > > > For example, for the wmb  after global cache update/before
> > > > > > > > altering dev_gen, it can ensure the order that updating
> > > > > > > > global cache before altering
> > > > > > > > dev_gen:
> > > > > > > > 1)If other agent load the changed "dev_gen", it can know
> > > > > > > > the global cache has been updated.
> > > > > > > > 2)If other agents load the unchanged, "dev_gen", it means
> > > > > > > > the global cache has not been updated, and the local cache
> > > > > > > > will not be
> > > > > flushed.
> > > > > > > >
> > > > > > > > As a result, we use  wmb and guard variable "dev_gen" to
> > > > > > > > ensure the global cache updating is "visible".
> > > > > > > > The "visible" means when updating guard variable "dev_gen"
> > > > > > > > is known by other agents, they also can confirm global
> > > > > > > > cache has been updated in the meanwhile. Thus, just one
> > > > > > > > wmb before altering dev_gen can ensure
> > > > > > > this.
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Feifei
> > > > > > > >
> > > > > > > > > -----邮件原件-----
> > > > > > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > > > > > 发送时间: 2021年4月20日 15:54
> > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan Azrad
> > > > > > > > > <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > > Ruifeng
> > > > > > > Wang
> > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd
> > > > <nd@arm.com>;
> > > > > > nd
> > > > > > > > > <nd@arm.com>
> > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > > Memory Region cache
> > > > > > > > >
> > > > > > > > > Hi, Feifei
> > > > > > > > >
> > > > > > > > > In my opinion, there should be 2 barriers:
> > > > > > > > >  - after global cache update/before altering dev_gen, to
> > > > > > > > > ensure the correct order
> > > > > > > > >  - after altering dev_gen to make this change visible
> > > > > > > > > for other agents and to trigger local cache update
> > > > > > > > >
> > > > > > > > > With best regards,
> > > > > > > > > Slava
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Feifei Wang <Feifei.Wang2@arm.com>
> > > > > > > > > > Sent: Tuesday, April 20, 2021 10:30
> > > > > > > > > > To: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan
> > > > > > > > > > Azrad <matan@nvidia.com>; Shahaf Shuler
> > > > > > > > > > <shahafs@nvidia.com>
> > > > > > > > > > Cc: dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org;
> > > > > > > > > > Ruifeng
> > > > > > > Wang
> > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd
> > > > > <nd@arm.com>;
> > > > > > > nd
> > > > > > > > > > <nd@arm.com>
> > > > > > > > > > Subject: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug
> > > > > > > > > > for Memory Region cache
> > > > > > > > > >
> > > > > > > > > > Hi, Slava
> > > > > > > > > >
> > > > > > > > > > Another question suddenly occurred to me, in order to
> > > > > > > > > > keep the order that rebuilding global cache before
> > > > > > > > > > updating ”dev_gen“, the wmb should be before updating
> > "dev_gen"
> > > > > > > > > > rather
> > > > than after it.
> > > > > > > > > > Otherwise, in the out-of-order platforms, current
> > > > > > > > > > order cannot be
> > > > > > kept.
> > > > > > > > > >
> > > > > > > > > > Thus, we should change the code as:
> > > > > > > > > > a) rebuild global cache;
> > > > > > > > > > b) rte_smp_wmb();
> > > > > > > > > > c) updating dev_gen
> > > > > > > > > >
> > > > > > > > > > Best Regards
> > > > > > > > > > Feifei
> > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > 发件人: Feifei Wang
> > > > > > > > > > > 发送时间: 2021年4月20日 13:54
> > > > > > > > > > > 收件人: Slava Ovsiienko <viacheslavo@nvidia.com>;
> Matan
> > > > Azrad
> > > > > > > > > > > <matan@nvidia.com>; Shahaf Shuler
> > > > > > > > > > > <shahafs@nvidia.com>
> > > > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>;
> stable@dpdk.org;
> > > > > > Ruifeng
> > > > > > > > > Wang
> > > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd
> > > > > <nd@arm.com>
> > > > > > > > > > > 主题: 回复: [PATCH v1 3/4] net/mlx5: fix rebuild bug for
> > > > > > > > > > > Memory
> > > > > > > > Region
> > > > > > > > > > > cache
> > > > > > > > > > >
> > > > > > > > > > > Hi, Slava
> > > > > > > > > > >
> > > > > > > > > > > Thanks very much for your explanation.
> > > > > > > > > > >
> > > > > > > > > > > I can understand the app can wait all mbufs are
> > > > > > > > > > > returned to the memory pool, and then it can free
> > > > > > > > > > > this mbufs, I agree with
> > > > > this.
> > > > > > > > > > >
> > > > > > > > > > > As a result, I will remove the bug fix patch from
> > > > > > > > > > > this series and just replace the smp barrier with
> > > > > > > > > > > C11 thread fence. Thanks very much for your patient
> > > > > > > > > > > explanation
> > again.
> > > > > > > > > > >
> > > > > > > > > > > Best Regards
> > > > > > > > > > > Feifei
> > > > > > > > > > >
> > > > > > > > > > > > -----邮件原件-----
> > > > > > > > > > > > 发件人: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > > > > > > > > > 发送时间: 2021年4月20日 2:51
> > > > > > > > > > > > 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Matan
> > > Azrad
> > > > > > > > > > > > <matan@nvidia.com>; Shahaf Shuler
> > > > > > > > > > > > <shahafs@nvidia.com>
> > > > > > > > > > > > 抄送: dev@dpdk.org; nd <nd@arm.com>;
> > stable@dpdk.org;
> > > > > > > Ruifeng
> > > > > > > > > > Wang
> > > > > > > > > > > > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > > > > > > > > > > 主题: RE: [PATCH v1 3/4] net/mlx5: fix rebuild bug
> > > > > > > > > > > > for Memory Region cache
> > > > > > > > > > > >
> > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > >
> > > > > > > > > > > > Please, see below
> > > > > > > > > > > >
> > > > > > > > > > > > ....
> > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, Feifei
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry, I do not follow what this patch fixes.
> > > > > > > > > > > > > > Do we have some issue/bug with MR cache in
> practice?
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch fixes the bug which is based on
> > > > > > > > > > > > > logical deduction, and it doesn't actually happen.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Each Tx queue has its own dedicated "local"
> > > > > > > > > > > > > > cache for MRs to convert buffer address in
> > > > > > > > > > > > > > mbufs being transmitted to LKeys (HW-related
> > > > > > > > > > > > > > entity
> > > > > > > > > > > > > > handle) and the "global" cache for all MR
> > > > > > > > > > > > > > registered on the
> > > > > > > device.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > AFAIK, how conversion happens in datapath:
> > > > > > > > > > > > > > - check the local queue cache flush request
> > > > > > > > > > > > > > - lookup in local cache
> > > > > > > > > > > > > > - if not found:
> > > > > > > > > > > > > > - acquire lock for global cache read access
> > > > > > > > > > > > > > - lookup in global cache
> > > > > > > > > > > > > > - release lock for global cache
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > How cache update on memory
> > > > > > > > > > > > > > freeing/unregistering
> > > > > happens:
> > > > > > > > > > > > > > - acquire lock for global cache write access
> > > > > > > > > > > > > > - [a] remove relevant MRs from the global
> > > > > > > > > > > > > > cache
> > > > > > > > > > > > > > - [b] set local caches flush request
> > > > > > > > > > > > > > - free global cache lock
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If I understand correctly, your patch swaps
> > > > > > > > > > > > > > [a] and [b], and local caches flush is requested earlier.
> > > > > > > > > > > > > > What problem does it
> > > > > > > solve?
> > > > > > > > > > > > > > It is not supposed there are in datapath some
> > > > > > > > > > > > > > mbufs referencing to the memory being freed.
> > > > > > > > > > > > > > Application must ensure this and must not
> > > > > > > > > > > > > > allocate new mbufs from this memory regions
> > > > > > > > > being freed.
> > > > > > > > > > > > > > Hence, the lookups for these MRs in caches
> > > > > > > > > > > > > > should not
> > > > > occur.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For your first point that, application can take
> > > > > > > > > > > > > charge of preventing MR freed memory being
> > > > > > > > > > > > > allocated to
> > > data path.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Does it means that If there is an emergency of
> > > > > > > > > > > > > MR fragment, such as hotplug, the application
> > > > > > > > > > > > > must inform thedata path in advance, and this
> > > > > > > > > > > > > memory will not be allocated, and then the
> > > > > > > > > > > > > control path will free this memory? If
> > > > > > > > > > > > > application  can do like this, I agree that this
> > > > > > > > > > > > > bug
> > > > > > > > > > > cannot happen.
> > > > > > > > > > > >
> > > > > > > > > > > > Actually,  this is the only correct way for
> > > > > > > > > > > > application to
> > > operate.
> > > > > > > > > > > > Let's suppose we have some memory area that
> > > > > > > > > > > > application wants to
> > > > > > > > > free.
> > > > > > > > > > > > ALL references to this area must be removed. If we
> > > > > > > > > > > > have some mbufs allocated from this area, it means
> > > > > > > > > > > > that we have memory pool created
> > > > > > > > > > there.
> > > > > > > > > > > >
> > > > > > > > > > > > What application should do:
> > > > > > > > > > > > - notify all its components/agents the memory area
> > > > > > > > > > > > is going to be freed
> > > > > > > > > > > > - all components/agents free the mbufs they might
> > > > > > > > > > > > own
> > > > > > > > > > > > - PMD might not support freeing for some mbufs
> > > > > > > > > > > > (for example being sent and awaiting for
> > > > > > > > > > > > completion), so app should just wait
> > > > > > > > > > > > - wait till all mbufs are returned to the memory
> > > > > > > > > > > > pool (by monitoring available obj == pool size)
> > > > > > > > > > > >
> > > > > > > > > > > > Otherwise - it is dangerous to free the memory.
> > > > > > > > > > > > There are just some mbufs still allocated, it is
> > > > > > > > > > > > regardless to buf address to MR translation. We
> > > > > > > > > > > > just can't free the memory - the mapping will be
> > > > > > > > > > > > destroyed and might cause the segmentation fault
> > > > > > > > > > > > by SW or some HW issues on DMA access to unmapped
> > > > > > > > > > > > memory.  It is very generic safety approach - do
> > > > > > > > > > > > not free the memory that is still in
> > > > > > > use.
> > > > > > > > > > > > Hence, at the moment of freeing and unregistering
> > > > > > > > > > > > the MR, there MUST BE NO any
> > > > > > > > > > > mbufs in flight referencing to the addresses being freed.
> > > > > > > > > > > > No translation to MR being invalidated can happen.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > For other side, the cache flush has negative
> > > > > > > > > > > > > > effect
> > > > > > > > > > > > > > - the local cache is getting empty and can't
> > > > > > > > > > > > > > provide translation for other valid (not being
> > > > > > > > > > > > > > removed) MRs, and the translation has to look
> > > > > > > > > > > > > > up in the global cache, that is locked now for
> > > > > > > > > > > > > > rebuilding, this causes the delays in
> > > > > > > > > > > > > > datapatch
> > > > > > > > > > > > > on acquiring global cache lock.
> > > > > > > > > > > > > > So, I see some potential performance impact.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If above assumption is true, we can go to your
> > > > > > > > > > > > > second
> > > point.
> > > > > > > > > > > > > I think this is a problem of the tradeoff
> > > > > > > > > > > > > between cache coherence and
> > > > > > > > > > > > performance.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can understand your meaning that though global
> > > > > > > > > > > > > cache has been changed, we should keep the valid
> > > > > > > > > > > > > MR in local cache as long as possible to ensure
> > > > > > > > > > > > > the fast
> > > searching speed.
> > > > > > > > > > > > > In the meanwhile, the local cache can be rebuilt
> > > > > > > > > > > > > later to reduce its waiting time for acquiring
> > > > > > > > > > > > > the global
> > > cache lock.
> > > > > > > > > > > > >
> > > > > > > > > > > > > However,  this mechanism just ensures the
> > > > > > > > > > > > > performance unchanged for the first few mbufs.
> > > > > > > > > > > > > During the next mbufs lkey searching after 'dev_gen'
> > > > > > > > > > > > > updated, it is still necessary to update the local cache.
> > > > > > > > > > > > > And the performance can firstly reduce and then
> returns.
> > > > > > > > > > > > > Thus, no matter whether there is this patch or
> > > > > > > > > > > > > not, the performance will jitter in a certain
> > > > > > > > > > > > > period of
> > > > > > > > > > time.
> > > > > > > > > > > >
> > > > > > > > > > > > Local cache should be updated to remove MRs no
> > > > > > > > > > > > longer
> > > valid.
> > > > > > > > > > > > But we just flush the entire cache.
> > > > > > > > > > > > Let's suppose we have valid MR0, MR1, and not
> > > > > > > > > > > > valid MRX in local
> > > > > > > > cache.
> > > > > > > > > > > > And there are traffic in the datapath for MR0 and
> > > > > > > > > > > > MR1, and no traffic for MRX anymore.
> > > > > > > > > > > >
> > > > > > > > > > > > 1) If we do as you propose:
> > > > > > > > > > > > a) take a lock
> > > > > > > > > > > > b) request flush local cache first - all MR0, MR1,
> > > > > > > > > > > > MRX will be removed on translation in datapath
> > > > > > > > > > > > c) update global cache,
> > > > > > > > > > > > d) free lock
> > > > > > > > > > > > All the traffic for valid MR0, MR1 ALWAYS will be
> > > > > > > > > > > > blocked on lock taken for cache update since point
> > > > > > > > > > > > b) till point
> > > > d).
> > > > > > > > > > > >
> > > > > > > > > > > > 2) If we do as it is implemented now:
> > > > > > > > > > > > a) take a lock
> > > > > > > > > > > > b) update global cache
> > > > > > > > > > > > c) request flush local cache
> > > > > > > > > > > > d) free lock
> > > > > > > > > > > > The traffic MIGHT be locked ONLY for MRs
> > > > > > > > > > > > non-existing in local cache (not happens for MR0
> > > > > > > > > > > > and MR1, must not happen for MRX), and probability
> > > > > > > > > > > > should be minor. And lock might happen since
> > > > > > > > > > > > c) till
> > > > > > > > > > > > d)
> > > > > > > > > > > > - quite short period of time
> > > > > > > > > > > >
> > > > > > > > > > > > Summary, the difference between 1) and 2)
> > > > > > > > > > > >
> > > > > > > > > > > > Lock probability:
> > > > > > > > > > > > - 1) lock ALWAYS happen for ANY MR translation after b),
> > > > > > > > > > > >   2) lock MIGHT happen, for cache miss ONLY, after
> > > > > > > > > > > > c)
> > > > > > > > > > > >
> > > > > > > > > > > > Lock duration:
> > > > > > > > > > > > - 1) lock since b) till d),
> > > > > > > > > > > >   2) lock since c) till d), that seems to be  much shorter.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Finally, in conclusion, I tend to think that the
> > > > > > > > > > > > > bottom layer can do more things to ensure the
> > > > > > > > > > > > > correct execution of the program, which may have
> > > > > > > > > > > > > a negative impact on the performance in a short
> > > > > > > > > > > > > time, but in the long run, the performance will
> > > > > > > > > > > > > eventually
> > > > > > > > > come back.
> > > > > > > > > > > > > Furthermore, maybe we should pay attention to
> > > > > > > > > > > > > the performance in the stable period, and try
> > > > > > > > > > > > > our best to ensure the correctness of the
> > > > > > > > > > > > > program in case of
> > > > > > > > > > > > emergencies.
> > > > > > > > > > > >
> > > > > > > > > > > > If we have some mbufs still allocated in memory
> > > > > > > > > > > > being freed
> > > > > > > > > > > > - there is nothing to say about correctness, it is
> > > > > > > > > > > > totally incorrect. In my opinion, we should not
> > > > > > > > > > > > think how to mitigate this incorrect behavior, we
> > > > > > > > > > > > should not encourage application developers to
> > > > > > > > > > > > follow the wrong
> > > > > > > > > > > approaches.
> > > > > > > > > > > >
> > > > > > > > > > > > With best regards, Slava
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > Feifei
> > > > > > > > > > > > >
> > > > > > > > > > > > > > With best regards, Slava

  reply	other threads:[~2021-05-12 11:07 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18  7:18 [dpdk-dev] [PATCH v1 0/4] refactor SMP barriers for net/mlx Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 1/4] net/mlx4: fix rebuild bug for Memory Region cache Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 2/4] net/mlx4: replace SMP barrier with C11 barriers Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 3/4] net/mlx5: fix rebuild bug for Memory Region cache Feifei Wang
2021-04-12  8:27   ` Slava Ovsiienko
2021-04-13  5:20     ` [dpdk-dev] 回复: " Feifei Wang
2021-04-19 18:50       ` [dpdk-dev] " Slava Ovsiienko
2021-04-20  5:53         ` [dpdk-dev] 回复: " Feifei Wang
2021-04-20  7:29           ` Feifei Wang
2021-04-20  7:53             ` [dpdk-dev] " Slava Ovsiienko
2021-04-20  8:42               ` [dpdk-dev] 回复: " Feifei Wang
2021-05-06  2:52                 ` Feifei Wang
2021-05-06 11:21                   ` [dpdk-dev] " Slava Ovsiienko
2021-05-07  6:36                     ` [dpdk-dev] 回复: " Feifei Wang
2021-05-07 10:14                       ` [dpdk-dev] " Slava Ovsiienko
2021-05-08  3:13                         ` [dpdk-dev] 回复: " Feifei Wang
2021-05-11  8:18                           ` [dpdk-dev] " Slava Ovsiienko
2021-05-12  5:34                             ` [dpdk-dev] 回复: " Feifei Wang
2021-05-12 11:07                               ` Slava Ovsiienko [this message]
2021-05-13  5:49                                 ` Feifei Wang
2021-05-13 10:49                                   ` [dpdk-dev] " Slava Ovsiienko
2021-05-14  5:18                                     ` [dpdk-dev] 回复: " Feifei Wang
2021-03-18  7:18 ` [dpdk-dev] [PATCH v1 4/4] net/mlx5: replace SMP barriers with C11 barriers Feifei Wang
2021-04-07  1:45 ` [dpdk-dev] [PATCH v1 0/4] refactor SMP barriers for net/mlx Alexander Kozyrev
2021-05-17 10:00 ` [dpdk-dev] [PATCH v2 0/2] remove wmb " Feifei Wang
2021-05-17 10:00   ` [dpdk-dev] [PATCH v2 1/2] net/mlx4: remove unnecessary wmb for Memory Region cache Feifei Wang
2021-05-17 10:00   ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: " Feifei Wang
2021-05-17 14:15     ` Slava Ovsiienko
2021-05-18  8:52       ` [dpdk-dev] 回复: " Feifei Wang
2021-05-18  8:50 ` [dpdk-dev] [PATCH v3 0/2] remove wmb for net/mlx Feifei Wang
2021-05-18  8:50   ` [dpdk-dev] [PATCH v3 1/2] net/mlx4: remove unnecessary wmb for Memory Region cache Feifei Wang
2021-05-18 12:13     ` Slava Ovsiienko
2021-05-18  8:50   ` [dpdk-dev] [PATCH v3 2/2] net/mlx5: " Feifei Wang
2021-05-18 10:17     ` Slava Ovsiienko
2021-05-19  1:54       ` [dpdk-dev] 回复: " Feifei Wang
2021-05-27  8:37   ` [dpdk-dev] [PATCH v3 0/2] remove wmb for net/mlx Raslan Darawsheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB375370B85D291BE7DC8D2A45DF529@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=Feifei.Wang2@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=nd@arm.com \
    --cc=shahafs@nvidia.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git