DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v1 1/2] lib/lpm: memory orderings to avoid race conditions for v1604
Date: Mon, 17 Jun 2019 16:33:51 +0100	[thread overview]
Message-ID: <10f3353e-5dab-0bd3-3e4c-b42080e76fd0@intel.com> (raw)
In-Reply-To: <AM0PR08MB44187EEAA93F5E5AA0871C2C9EEB0@AM0PR08MB4418.eurprd08.prod.outlook.com>

Hi Wang,

On 17/06/2019 16:27, Ruifeng Wang (Arm Technology China) wrote:
> Hi Vladimir,
>
>
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Monday, June 10, 2019 23:23
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; bruce.richardson@intel.com
> Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v1 1/2] lib/lpm: memory orderings to avoid race conditions for v1604
>
> Hi Honnappa, Wang,
>
> On 05/06/2019 20:23, Honnappa Nagarahalli wrote:
>
> Hi Wang,
>
> On 05/06/2019 06:54, Ruifeng Wang wrote:
> When a tbl8 group is getting attached to a tbl24 entry, lookup might
> fail even though the entry is configured in the table.
>
> For ex: consider a LPM table configured with 10.10.10.1/24.
> When a new entry 10.10.10.32/28 is being added, a new tbl8 group is
> allocated and tbl24 entry is changed to point to the tbl8 group. If
> the tbl24 entry is written without the tbl8 group entries updated, a
> lookup on 10.10.10.9 will return failure.
>
> Correct memory orderings are required to ensure that the store to
> tbl24 does not happen before the stores to tbl8 group entries
> complete.
>
> The orderings have impact on LPM performance test.
> On Arm A72 platform, delete operation has 2.7% degradation, while
> add / lookup has no notable performance change.
> On x86 E5 platform, add operation has 4.3% degradation, delete
> operation has 2.2% - 10.2% degradation, lookup has no performance
> change.
>
> I think it is possible to avoid add/del performance degradation
> My understanding was that the degradation on x86, is happening because of the additional compiler barriers this patch introduces. For Arm platform the degradation is caused by the store-release memory barriers.
> Just made some tests on skylake and sandy bridge. On the Skylake there is no performance degradation after applying this patchset. On the Sandybridge there is performance drop for rte_lpm_add() (from 460k cycles to 530k cycles in lpm_performance unit test). This is caused by 1 chunk of this patchset  (add_depth_small_v1604() ). And it looks like after uninlining of this function performance get back to original 460k cycles it was before patch.
>
> [Ruifeng] Are you suggesting to un-inline add_depth_small_v1604()? I'm OK with such change since the function is too big and is not necessary to be inlined.
That's right. Try to uninline it (and maybe add_depth_big(), it depends 
on your set of prefixes) and  run your performance tests.
>
>
> 1. Explicitly mark struct rte_lpm_tbl_entry 4-byte aligned
> The ' rte_lpm_tbl_entry' is already 32b, shouldn't it be aligned on 4-byte boundary already?
>
>
> 2. Cast value to uint32_t (uint16_t for 2.0 version) on memory write
>
> 3. Use rte_wmb() after memory write
> (It would be good to point the locations in the patch). I assume you are referring to __atomic_store(__ATOMIC_RELEASE). I am wondering if rte_wmb() is required? My understanding is that x86 would require just a compiler barrier. So, should it be rte_smp_wmb()? __atomic_store(__ATOMIC_RELEASE) just adds a compiler barrier for x86.
> You right, it needs just a compiller barrier for x86 and a memory barrier instruction (dmb ?) for arm, so rte_smp_wmb() looks appropriate here as well as __atomic_store(__ATOMIC_RELEASE).
>
>
>
> Thanks for your suggestions.
> Point 1 & 2 make sense.
>
> For point 3, are you suggesting using rte_wmb() instead of __atomic_store()?
> rte_wmb() is DPDK made memory model. Maybe we can use __atomic_store()
> with 'RTE_USE_C11_MEM_MODEL=y', and use rte_wmb() otherwise?
> IMO, code becomes difficult to manage.
>
>
>
> Signed-off-by: Honnappa Nagarahalli mailto:honnappa.nagarahalli@arm.com
> Signed-off-by: Ruifeng Wang mailto:ruifeng.wang@arm.com
> ---
>    lib/librte_lpm/rte_lpm.c | 32 +++++++++++++++++++++++++-------
>    lib/librte_lpm/rte_lpm.h |  4 ++++
>    2 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index
> 6b7b28a2e..6ec450a08 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -806,7 +806,8 @@ add_depth_small_v1604(struct rte_lpm *lpm,
> uint32_t ip, uint8_t depth,
>    			/* Setting tbl24 entry in one go to avoid race
>    			 * conditions
>    			 */
> -			lpm->tbl24[i] = new_tbl24_entry;
> +			__atomic_store(&lpm->tbl24[i], &new_tbl24_entry,
> +					__ATOMIC_RELEASE);
> I don't see reordering issue here in this patch chunk. However direct assignment was translated to 2 MOV ops
> mov    (%rdi,%rcx,4),%edx  <-- get lpm->tbl24[i]
> and    $0xff000000,%edx    <-- clean .next_hop
> or     %r9d,%edx        <-- save new next_hop
> mov    %edx,(%rdi,%rcx,4)  <-- save an entry with new next_hop but old depth and valid bitfields
> mov    %r11b,0x3(%rdi,%rcx,4)  <-- save new depth and valid bitfields
> so agree with __atomic_store() here.
>
>
>    			continue;
>    		}
> @@ -1017,7 +1018,11 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
>    			.depth = 0,
>    		};
>
> -		lpm->tbl24[tbl24_index] = new_tbl24_entry;
> +		/* The tbl24 entry must be written only after the
> +		 * tbl8 entries are written.
> +		 */
> +		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> +				__ATOMIC_RELEASE);
>
>    	} /* If valid entry but not extended calculate the index into Table8. */
>    	else if (lpm->tbl24[tbl24_index].valid_group == 0) { @@ -1063,7
> +1068,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth,
>    				.depth = 0,
>    		};
>
> -		lpm->tbl24[tbl24_index] = new_tbl24_entry;
> +		/* The tbl24 entry must be written only after the
> +		 * tbl8 entries are written.
> +		 */
> +		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> +				__ATOMIC_RELEASE);
>
>    	} else { /*
>    		* If it is valid, extended entry calculate the index into tbl8.
> @@ -1391,6 +1400,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
>    	/* Calculate the range and index into Table24. */
>    	tbl24_range = depth_to_range(depth);
>    	tbl24_index = (ip_masked >> 8);
> +	struct rte_lpm_tbl_entry zero_tbl24_entry = {0};
>
>    	/*
>    	 * Firstly check the sub_rule_index. A -1 indicates no
> replacement rule @@ -1405,7 +1415,8 @@
> delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>
>    			if (lpm->tbl24[i].valid_group == 0 &&
>    					lpm->tbl24[i].depth <= depth) {
> -				lpm->tbl24[i].valid = INVALID;
> +				__atomic_store(&lpm->tbl24[i],
> +					&zero_tbl24_entry,
> __ATOMIC_RELEASE);
>    			} else if (lpm->tbl24[i].valid_group == 1) {
>    				/*
>    				 * If TBL24 entry is extended, then there has
> @@ -1450,7 +1461,8
> @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>
>    			if (lpm->tbl24[i].valid_group == 0 &&
>    					lpm->tbl24[i].depth <= depth) {
> -				lpm->tbl24[i] = new_tbl24_entry;
> +				__atomic_store(&lpm->tbl24[i],
> &new_tbl24_entry,
> +						__ATOMIC_RELEASE);
>    			} else  if (lpm->tbl24[i].valid_group == 1) {
>    				/*
>    				 * If TBL24 entry is extended, then there has
> @@ -1713,8
> +1725,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked,
>    	tbl8_recycle_index = tbl8_recycle_check_v1604(lpm->tbl8,
> tbl8_group_start);
>
>    	if (tbl8_recycle_index == -EINVAL) {
> -		/* Set tbl24 before freeing tbl8 to avoid race condition. */
> +		/* Set tbl24 before freeing tbl8 to avoid race condition.
> +		 * Prevent the free of the tbl8 group from hoisting.
> +		 */
>    		lpm->tbl24[tbl24_index].valid = 0;
> +		__atomic_thread_fence(__ATOMIC_RELEASE);
>    		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
>    	} else if (tbl8_recycle_index > -1) {
>    		/* Update tbl24 entry. */
> @@ -1725,8 +1740,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
>    			.depth = lpm->tbl8[tbl8_recycle_index].depth,
>    		};
>
> -		/* Set tbl24 before freeing tbl8 to avoid race condition. */
> +		/* Set tbl24 before freeing tbl8 to avoid race condition.
> +		 * Prevent the free of the tbl8 group from hoisting.
> +		 */
>    		lpm->tbl24[tbl24_index] = new_tbl24_entry;
> +		__atomic_thread_fence(__ATOMIC_RELEASE);
>    		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
>    	}
>    #undef group_idx
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b886f54b4..6f5704c5c 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -354,6 +354,10 @@ rte_lpm_lookup(struct rte_lpm *lpm, uint32_t
> ip,
> uint32_t *next_hop)
>    	ptbl = (const uint32_t *)(&lpm->tbl24[tbl24_index]);
>    	tbl_entry = *ptbl;
>
> +	/* Memory ordering is not required in lookup. Because dataflow
> +	 * dependency exists, compiler or HW won't be able to re-order
> +	 * the operations.
> +	 */
>    	/* Copy tbl8 entry (only if needed) */
>    	if (unlikely((tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>    			RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>
> --
> Regards,
> Vladimir
>
> Regards,
> /Ruifeng

-- 
Regards,
Vladimir


  reply	other threads:[~2019-06-17 15:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05  5:54 Ruifeng Wang
2019-06-05  5:54 ` [dpdk-dev] [PATCH v1 2/2] lib/lpm: memory orderings to avoid race conditions for v20 Ruifeng Wang
2019-06-05 10:50 ` [dpdk-dev] [PATCH v1 1/2] lib/lpm: memory orderings to avoid race conditions for v1604 Medvedkin, Vladimir
2019-06-05 14:12   ` Ruifeng Wang (Arm Technology China)
2019-06-05 19:23     ` Honnappa Nagarahalli
2019-06-10 15:22       ` Medvedkin, Vladimir
2019-06-17 15:27         ` Ruifeng Wang (Arm Technology China)
2019-06-17 15:33           ` Medvedkin, Vladimir [this message]
2019-07-12  3:09 ` [dpdk-dev] [PATCH v5 0/6] LPM4 memory ordering changes Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 1/6] lib/lpm: not inline unnecessary functions Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 2/6] lib/lpm: memory orderings to avoid race conditions for v1604 Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 3/6] lib/lpm: memory orderings to avoid race conditions for v20 Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 4/6] lib/lpm: use atomic store to avoid partial update Ruifeng Wang
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 5/6] lib/lpm: data update optimization for v1604 Ruifeng Wang
2019-07-12 20:08     ` Honnappa Nagarahalli
2019-07-12  3:09   ` [dpdk-dev] [PATCH v5 6/6] lib/lpm: data update optimization for v20 Ruifeng Wang
2019-07-12 20:09     ` Honnappa Nagarahalli
2019-07-18  6:22 ` [dpdk-dev] [PATCH v6 0/4] LPM4 memory ordering changes Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 1/4] lib/lpm: not inline unnecessary functions Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 2/4] lib/lpm: memory orderings to avoid race conditions for v1604 Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 3/4] lib/lpm: memory orderings to avoid race conditions for v20 Ruifeng Wang
2019-07-18  6:22   ` [dpdk-dev] [PATCH v6 4/4] lib/lpm: use atomic store to avoid partial update Ruifeng Wang
2019-07-18 14:00   ` [dpdk-dev] [PATCH v6 0/4] LPM4 memory ordering changes Medvedkin, Vladimir
2019-07-19 10:37     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10f3353e-5dab-0bd3-3e4c-b42080e76fd0@intel.com \
    --to=vladimir.medvedkin@intel.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).