DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v1 1/2] lib/lpm: memory orderings to avoid race conditions for v1604
Date: Mon, 10 Jun 2019 16:22:45 +0100	[thread overview]
Message-ID: <d5d563ab-0411-3faf-39ec-4994f2bc9f6f@intel.com> (raw)
In-Reply-To: <AM0PR08MB513828AB6F3ADA5BB20C0CE798160@AM0PR08MB5138.eurprd08.prod.outlook.com>

Hi Honnappa, Wang,

On 05/06/2019 20:23, Honnappa Nagarahalli wrote:
>>> Hi Wang,
>>>
>>> On 05/06/2019 06:54, Ruifeng Wang wrote:
>>>> When a tbl8 group is getting attached to a tbl24 entry, lookup might
>>>> fail even though the entry is configured in the table.
>>>>
>>>> For ex: consider a LPM table configured with 10.10.10.1/24.
>>>> When a new entry 10.10.10.32/28 is being added, a new tbl8 group is
>>>> allocated and tbl24 entry is changed to point to the tbl8 group. If
>>>> the tbl24 entry is written without the tbl8 group entries updated, a
>>>> lookup on 10.10.10.9 will return failure.
>>>>
>>>> Correct memory orderings are required to ensure that the store to
>>>> tbl24 does not happen before the stores to tbl8 group entries
>>>> complete.
>>>>
>>>> The orderings have impact on LPM performance test.
>>>> On Arm A72 platform, delete operation has 2.7% degradation, while
>>>> add / lookup has no notable performance change.
>>>> On x86 E5 platform, add operation has 4.3% degradation, delete
>>>> operation has 2.2% - 10.2% degradation, lookup has no performance
>>>> change.
>>> I think it is possible to avoid add/del performance degradation
> My understanding was that the degradation on x86, is happening because of the additional compiler barriers this patch introduces. For Arm platform the degradation is caused by the store-release memory barriers.
Just made some tests on skylake and sandy bridge. On the Skylake there 
is no performance degradation after applying this patchset. On the 
Sandybridge there is performance drop for rte_lpm_add() (from 460k 
cycles to 530k cycles in lpm_performance unit test). This is caused by 1 
chunk of this patchset  (add_depth_small_v1604() ). And it looks like 
after uninlining of this function performance get back to original 460k 
cycles it was before patch.
>
>>> 1. Explicitly mark struct rte_lpm_tbl_entry 4-byte aligned
> The ' rte_lpm_tbl_entry' is already 32b, shouldn't it be aligned on 4-byte boundary already?
>
>>> 2. Cast value to uint32_t (uint16_t for 2.0 version) on memory write
>>>
>>> 3. Use rte_wmb() after memory write
> (It would be good to point the locations in the patch). I assume you are referring to __atomic_store(__ATOMIC_RELEASE). I am wondering if rte_wmb() is required? My understanding is that x86 would require just a compiler barrier. So, should it be rte_smp_wmb()? __atomic_store(__ATOMIC_RELEASE) just adds a compiler barrier for x86.
You right, it needs just a compiller barrier for x86 and a memory 
barrier instruction (dmb ?) for arm, so rte_smp_wmb() looks appropriate 
here as well as __atomic_store(__ATOMIC_RELEASE).
>
>> Thanks for your suggestions.
>> Point 1 & 2 make sense.
>>
>> For point 3, are you suggesting using rte_wmb() instead of __atomic_store()?
>> rte_wmb() is DPDK made memory model. Maybe we can use __atomic_store()
>> with 'RTE_USE_C11_MEM_MODEL=y', and use rte_wmb() otherwise?
> IMO, code becomes difficult to manage.
>
>>>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>> ---
>>>>    lib/librte_lpm/rte_lpm.c | 32 +++++++++++++++++++++++++-------
>>>>    lib/librte_lpm/rte_lpm.h |  4 ++++
>>>>    2 files changed, 29 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
>>>> index
>>>> 6b7b28a2e..6ec450a08 100644
>>>> --- a/lib/librte_lpm/rte_lpm.c
>>>> +++ b/lib/librte_lpm/rte_lpm.c
>>>> @@ -806,7 +806,8 @@ add_depth_small_v1604(struct rte_lpm *lpm,
>>> uint32_t ip, uint8_t depth,
>>>>    			/* Setting tbl24 entry in one go to avoid race
>>>>    			 * conditions
>>>>    			 */
>>>> -			lpm->tbl24[i] = new_tbl24_entry;
>>>> +			__atomic_store(&lpm->tbl24[i], &new_tbl24_entry,
>>>> +					__ATOMIC_RELEASE);

I don't see reordering issue here in this patch chunk. However direct 
assignment was translated to 2 MOV ops

mov    (%rdi,%rcx,4),%edx  <-- get lpm->tbl24[i]

and    $0xff000000,%edx    <-- clean .next_hop

or     %r9d,%edx        <-- save new next_hop

mov    %edx,(%rdi,%rcx,4)  <-- save an entry with new next_hop but old 
depth and valid bitfields

mov    %r11b,0x3(%rdi,%rcx,4)  <-- save new depth and valid bitfields

so agree with __atomic_store() here.

>>>>
>>>>    			continue;
>>>>    		}
>>>> @@ -1017,7 +1018,11 @@ add_depth_big_v1604(struct rte_lpm *lpm,
>>> uint32_t ip_masked, uint8_t depth,
>>>>    			.depth = 0,
>>>>    		};
>>>>
>>>> -		lpm->tbl24[tbl24_index] = new_tbl24_entry;
>>>> +		/* The tbl24 entry must be written only after the
>>>> +		 * tbl8 entries are written.
>>>> +		 */
>>>> +		__atomic_store(&lpm->tbl24[tbl24_index],
>>> &new_tbl24_entry,
>>>> +				__ATOMIC_RELEASE);
>>>>
>>>>    	} /* If valid entry but not extended calculate the index into Table8. */
>>>>    	else if (lpm->tbl24[tbl24_index].valid_group == 0) { @@ -1063,7
>>>> +1068,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t
>>> ip_masked, uint8_t depth,
>>>>    				.depth = 0,
>>>>    		};
>>>>
>>>> -		lpm->tbl24[tbl24_index] = new_tbl24_entry;
>>>> +		/* The tbl24 entry must be written only after the
>>>> +		 * tbl8 entries are written.
>>>> +		 */
>>>> +		__atomic_store(&lpm->tbl24[tbl24_index],
>>> &new_tbl24_entry,
>>>> +				__ATOMIC_RELEASE);
>>>>
>>>>    	} else { /*
>>>>    		* If it is valid, extended entry calculate the index into tbl8.
>>>> @@ -1391,6 +1400,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm,
>>> uint32_t ip_masked,
>>>>    	/* Calculate the range and index into Table24. */
>>>>    	tbl24_range = depth_to_range(depth);
>>>>    	tbl24_index = (ip_masked >> 8);
>>>> +	struct rte_lpm_tbl_entry zero_tbl24_entry = {0};
>>>>
>>>>    	/*
>>>>    	 * Firstly check the sub_rule_index. A -1 indicates no
>>>> replacement rule @@ -1405,7 +1415,8 @@
>>>> delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>>>>
>>>>    			if (lpm->tbl24[i].valid_group == 0 &&
>>>>    					lpm->tbl24[i].depth <= depth) {
>>>> -				lpm->tbl24[i].valid = INVALID;
>>>> +				__atomic_store(&lpm->tbl24[i],
>>>> +					&zero_tbl24_entry,
>>> __ATOMIC_RELEASE);
>>>>    			} else if (lpm->tbl24[i].valid_group == 1) {
>>>>    				/*
>>>>    				 * If TBL24 entry is extended, then there has
>>> @@ -1450,7 +1461,8
>>>> @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>>>>
>>>>    			if (lpm->tbl24[i].valid_group == 0 &&
>>>>    					lpm->tbl24[i].depth <= depth) {
>>>> -				lpm->tbl24[i] = new_tbl24_entry;
>>>> +				__atomic_store(&lpm->tbl24[i],
>>> &new_tbl24_entry,
>>>> +						__ATOMIC_RELEASE);
>>>>    			} else  if (lpm->tbl24[i].valid_group == 1) {
>>>>    				/*
>>>>    				 * If TBL24 entry is extended, then there has
>>> @@ -1713,8
>>>> +1725,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t
>>> ip_masked,
>>>>    	tbl8_recycle_index = tbl8_recycle_check_v1604(lpm->tbl8,
>>>> tbl8_group_start);
>>>>
>>>>    	if (tbl8_recycle_index == -EINVAL) {
>>>> -		/* Set tbl24 before freeing tbl8 to avoid race condition. */
>>>> +		/* Set tbl24 before freeing tbl8 to avoid race condition.
>>>> +		 * Prevent the free of the tbl8 group from hoisting.
>>>> +		 */
>>>>    		lpm->tbl24[tbl24_index].valid = 0;
>>>> +		__atomic_thread_fence(__ATOMIC_RELEASE);
>>>>    		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
>>>>    	} else if (tbl8_recycle_index > -1) {
>>>>    		/* Update tbl24 entry. */
>>>> @@ -1725,8 +1740,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
>>> uint32_t ip_masked,
>>>>    			.depth = lpm->tbl8[tbl8_recycle_index].depth,
>>>>    		};
>>>>
>>>> -		/* Set tbl24 before freeing tbl8 to avoid race condition. */
>>>> +		/* Set tbl24 before freeing tbl8 to avoid race condition.
>>>> +		 * Prevent the free of the tbl8 group from hoisting.
>>>> +		 */
>>>>    		lpm->tbl24[tbl24_index] = new_tbl24_entry;
>>>> +		__atomic_thread_fence(__ATOMIC_RELEASE);
>>>>    		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
>>>>    	}
>>>>    #undef group_idx
>>>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>>>> index b886f54b4..6f5704c5c 100644
>>>> --- a/lib/librte_lpm/rte_lpm.h
>>>> +++ b/lib/librte_lpm/rte_lpm.h
>>>> @@ -354,6 +354,10 @@ rte_lpm_lookup(struct rte_lpm *lpm, uint32_t
>>>> ip,
>>> uint32_t *next_hop)
>>>>    	ptbl = (const uint32_t *)(&lpm->tbl24[tbl24_index]);
>>>>    	tbl_entry = *ptbl;
>>>>
>>>> +	/* Memory ordering is not required in lookup. Because dataflow
>>>> +	 * dependency exists, compiler or HW won't be able to re-order
>>>> +	 * the operations.
>>>> +	 */
>>>>    	/* Copy tbl8 entry (only if needed) */
>>>>    	if (unlikely((tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>>>    			RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> --
>>> Regards,
>>> Vladimir
>> Regards,
>> /Ruifeng

-- 
Regards,
Vladimir


  reply	other threads:[~2019-06-10 15:22 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05  5:54 Ruifeng Wang
2019-06-05  5:54 ` [dpdk-dev] [PATCH v1 2/2] lib/lpm: memory orderings to avoid race conditions for v20 Ruifeng Wang
2019-06-05 10:50 ` [dpdk-dev] [PATCH v1 1/2] lib/lpm: memory orderings to avoid race conditions for v1604 Medvedkin, Vladimir
2019-06-05 14:12   ` Ruifeng Wang (Arm Technology China)
2019-06-05 19:23     ` Honnappa Nagarahalli
2019-06-10 15:22       ` Medvedkin, Vladimir [this message]
2019-06-17 15:27         ` Ruifeng Wang (Arm Technology China)
2019-06-17 15:33           ` Medvedkin, Vladimir
2019-07-12  3:09 ` [dpdk-dev] [PATCH v5 0/6] LPM4 memory ordering changes Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 1/6] lib/lpm: not inline unnecessary functions Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 2/6] lib/lpm: memory orderings to avoid race conditions for v1604 Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 3/6] lib/lpm: memory orderings to avoid race conditions for v20 Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 4/6] lib/lpm: use atomic store to avoid partial update Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 5/6] lib/lpm: data update optimization for v1604 Ruifeng Wang
2019-07-12 20:08     ` Honnappa Nagarahalli
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 6/6] lib/lpm: data update optimization for v20 Ruifeng Wang
2019-07-12 20:09     ` Honnappa Nagarahalli
2019-07-18  6:22 ` [dpdk-dev] [PATCH v6 0/4] LPM4 memory ordering changes Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 1/4] lib/lpm: not inline unnecessary functions Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 2/4] lib/lpm: memory orderings to avoid race conditions for v1604 Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 3/4] lib/lpm: memory orderings to avoid race conditions for v20 Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 4/4] lib/lpm: use atomic store to avoid partial update Ruifeng Wang
2019-07-18 14:00   ` [dpdk-dev] [PATCH v6 0/4] LPM4 memory ordering changes Medvedkin, Vladimir
2019-07-19 10:37     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5d563ab-0411-3faf-39ec-4994f2bc9f6f@intel.com \
    --to=vladimir.medvedkin@intel.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).