patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Yongseok Koh <yskoh@mellanox.com>
To: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Cc: Gavin Hu <gavin.hu@arm.com>, dpdk stable <stable@dpdk.org>
Subject: [dpdk-stable] patch 'timer: fix race condition' has been queued to LTS release 17.11.6
Date: Fri,  8 Mar 2019 09:46:52 -0800	[thread overview]
Message-ID: <20190308174749.30771-14-yskoh@mellanox.com> (raw)
In-Reply-To: <20190308174749.30771-1-yskoh@mellanox.com>

Hi,

FYI, your patch has been queued to LTS release 17.11.6

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objection by 03/13/19. So please
shout if anyone has objection.

Also note that after the patch there's a diff of the upstream commit vs the patch applied
to the branch. If the code is different (ie: not only metadata diffs), due for example to
a change in context or macro names, please double check it.

Thanks.

Yongseok

---
>From 692b6919438f84d1fb1f422cb56d7dc275601337 Mon Sep 17 00:00:00 2001
From: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Date: Wed, 19 Dec 2018 10:09:34 -0600
Subject: [PATCH] timer: fix race condition

[ upstream commit 7079e29f7f28661b712620f46e6a8514eb0a708a ]

rte_timer_manage() adds expired timers to a "run list", and walks the
list, transitioning each timer from the PENDING to the RUNNING state.
If another lcore resets or stops the timer at precisely this
moment, the timer state would instead be set to CONFIG by that other
lcore, which would cause timer_manage() to skip over it. This is
expected behavior.

However, if a timer expires quickly enough, there exists the
following race condition that causes the timer_manage() routine to
misinterpret a timer in CONFIG state, resulting in lost timers:

- Thread A:
  - starts a timer with rte_timer_reset()
  - the timer is moved to CONFIG state
  - the spinlock associated with the appropriate skiplist is acquired
  - timer is inserted into the skiplist
  - the spinlock is released
- Thread B:
  - executes rte_timer_manage()
  - find above timer as expired, add it to run list
  - walk run list, see above timer still in CONFIG state, unlink it from
    run list and continue on
- Thread A:
  - move timer to PENDING state
  - return from rte_timer_reset()
  - timer is now in PENDING state, but not actually linked into a
    pending list or a run list and will never get processed further
    by rte_timer_manage()

This commit fixes this race condition by only releasing the spinlock
after the timer state has been transitioned from CONFIG to PENDING,
which prevents rte_timer_manage() from seeing an incorrect state.

Fixes: 9b15ba895b9f ("timer: use a skip list")

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_timer/rte_timer.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index ba436cd87..56f9d6275 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -270,24 +270,17 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
 	}
 }
 
-/*
- * add in list, lock if needed
+/* call with lock held as necessary
+ * add in list
  * timer must be in config state
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned tim_lcore, int local_is_locked)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore)
 {
-	unsigned lcore_id = rte_lcore_id();
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
-	/* if timer needs to be scheduled on another core, we need to
-	 * lock the list; if it is on local core, we need to lock if
-	 * we are not called from rte_timer_manage() */
-	if (tim_lcore != lcore_id || !local_is_locked)
-		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
-
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
 	timer_get_prev_entries(tim->expire, tim_lcore, prev);
@@ -311,9 +304,6 @@ timer_add(struct rte_timer *tim, unsigned tim_lcore, int local_is_locked)
 	 * NOTE: this is not atomic on 32-bit*/
 	priv_timer[tim_lcore].pending_head.expire = priv_timer[tim_lcore].\
 			pending_head.sl_next[0]->expire;
-
-	if (tim_lcore != lcore_id || !local_is_locked)
-		rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
 }
 
 /*
@@ -408,8 +398,15 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	tim->f = fct;
 	tim->arg = arg;
 
+	/* if timer needs to be scheduled on another core, we need to
+	 * lock the destination list; if it is on local core, we need to lock if
+	 * we are not called from rte_timer_manage()
+	 */
+	if (tim_lcore != lcore_id || !local_is_locked)
+		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
+
 	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore, local_is_locked);
+	timer_add(tim, tim_lcore);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -418,6 +415,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	status.owner = (int16_t)tim_lcore;
 	tim->status.u32 = status.u32;
 
+	if (tim_lcore != lcore_id || !local_is_locked)
+		rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
+
 	return 0;
 }
 
-- 
2.11.0

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2019-03-08 09:46:41.076676989 -0800
+++ 0014-timer-fix-race-condition.patch	2019-03-08 09:46:40.015398000 -0800
@@ -1,8 +1,10 @@
-From 7079e29f7f28661b712620f46e6a8514eb0a708a Mon Sep 17 00:00:00 2001
+From 692b6919438f84d1fb1f422cb56d7dc275601337 Mon Sep 17 00:00:00 2001
 From: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
 Date: Wed, 19 Dec 2018 10:09:34 -0600
 Subject: [PATCH] timer: fix race condition
 
+[ upstream commit 7079e29f7f28661b712620f46e6a8514eb0a708a ]
+
 rte_timer_manage() adds expired timers to a "run list", and walks the
 list, transitioning each timer from the PENDING to the RUNNING state.
 If another lcore resets or stops the timer at precisely this
@@ -37,7 +39,6 @@
 which prevents rte_timer_manage() from seeing an incorrect state.
 
 Fixes: 9b15ba895b9f ("timer: use a skip list")
-Cc: stable@dpdk.org
 
 Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
 Reviewed-by: Gavin Hu <gavin.hu@arm.com>
@@ -46,10 +47,10 @@
  1 file changed, 14 insertions(+), 14 deletions(-)
 
 diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
-index 590488c7e..30c7b0ab4 100644
+index ba436cd87..56f9d6275 100644
 --- a/lib/librte_timer/rte_timer.c
 +++ b/lib/librte_timer/rte_timer.c
-@@ -241,24 +241,17 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
+@@ -270,24 +270,17 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  	}
  }
  
@@ -77,7 +78,7 @@
  	/* find where exactly this element goes in the list of elements
  	 * for each depth. */
  	timer_get_prev_entries(tim->expire, tim_lcore, prev);
-@@ -282,9 +275,6 @@ timer_add(struct rte_timer *tim, unsigned tim_lcore, int local_is_locked)
+@@ -311,9 +304,6 @@ timer_add(struct rte_timer *tim, unsigned tim_lcore, int local_is_locked)
  	 * NOTE: this is not atomic on 32-bit*/
  	priv_timer[tim_lcore].pending_head.expire = priv_timer[tim_lcore].\
  			pending_head.sl_next[0]->expire;
@@ -87,7 +88,7 @@
  }
  
  /*
-@@ -379,8 +369,15 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
+@@ -408,8 +398,15 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
  	tim->f = fct;
  	tim->arg = arg;
  
@@ -104,7 +105,7 @@
  
  	/* update state: as we are in CONFIG state, only us can modify
  	 * the state so we don't need to use cmpset() here */
-@@ -389,6 +386,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
+@@ -418,6 +415,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
  	status.owner = (int16_t)tim_lcore;
  	tim->status.u32 = status.u32;
  

  parent reply	other threads:[~2019-03-08 17:48 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-08 17:46 [dpdk-stable] patch 'net/bnx2x: cleanup info logs' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/i40e: fix getting RSS configuration' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/i40e: fix using recovery mode firmware' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/ixgbe: fix overwriting RSS RETA' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/mlx5: fix validation of Rx queue number' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'ethdev: fix typo in queue setup error log' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'drivers/net: fix several Tx prepare functions' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'vhost: fix crash after mmap failure' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/i40e: revert fix offload not supported mask' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/i40e: remove redundant reset of queue number' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'doc: fix garbage text in generated HTML guides' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'eventdev: fix xstats documentation typo' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'crypto/qat: fix block size error handling' " Yongseok Koh
2019-03-08 17:46 ` Yongseok Koh [this message]
2019-03-08 17:46 ` [dpdk-stable] patch 'net/i40e: fix statistics inconsistency' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'vhost: fix race condition when adding fd in the fdset' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/ena: add supported RSS offloads types' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/ena: update completion queue after cleanup' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net: fix underflow for checksum of invalid IPv4 packets' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/tap: add buffer overflow checks before checksum' " Yongseok Koh
2019-03-08 17:46 ` [dpdk-stable] patch 'net/af_packet: fix setting MTU decrements sockaddr twice' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/tap: fix possible uninitialized variable access' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'app/testpmd: expand RED queue thresholds to 64 bits' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/i40e: fix get RSS conf' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'devtools: fix wrong headline lowercase for arm' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'drivers/crypto: fix PMDs memory leak' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'test/crypto: fix misleading trace message' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'examples/ipsec-secgw: fix outbound codepath for single SA' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'examples/ipsec-secgw: make local variables static' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'kni: fix build on RHEL 8' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'kni: fix build on RHEL8 for arm and Power9' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'mk: fix scope of disabling AVX512F support' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/ixgbe: fix over using multicast table for VF' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'vhost: fix possible out of bound access in vector filling' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/fm10k: fix internal switch initial status' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/dpaa: fix secondary process' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'examples/flow_filtering: fix example documentation' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'doc: fix a parameter name in testpmd guide' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'app/testpmd: fix quit to stop all ports before close' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/bonding: fix possible null pointer reference' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/sfc: discard last seen VLAN TCI if Tx packet is dropped' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/dpaa2: fix device init for secondary process' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/sfc: fix typo in preprocessor check' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'drivers: fix sprintf with snprintf' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/mlx5: fix instruction hotspot on replenishing Rx buffer' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'examples/tep_term: remove unused constant' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'eal: fix core number validation' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'eal: fix out of bound access when no CPU available' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'eal: check string parameter lengths' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'gro: check invalid TCP header length' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/i40e: fix VF overwrite PF RSS LUT for X722' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/sfc: fix VF error/missed stats mapping' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/sfc: fix datapath name references in logs' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'vhost: fix memory leak on realloc failure' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'examples/vhost: fix path allocation failure handling' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/sfc: fix Rx packets counter' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'doc: add missing loopback option in testpmd guide' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'efd: fix tail queue leak' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/qede: fix performance bottleneck in Rx path' " Yongseok Koh
2019-03-12 17:04   ` [dpdk-stable] [EXT] " Shahed Shaikh
2019-03-12 22:03     ` Yongseok Koh
2019-03-19 19:12       ` Shahed Shaikh
2019-03-27 18:47         ` Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/qede: remove prefetch in Tx " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'doc: fix references in power management guide' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'ethdev: fix errno to have positive value' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'gso: fix VxLAN/GRE tunnel checks' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'kni: fix build for dev_open in Linux 5.0' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'kni: fix build for igb_ndo_bridge_setlink " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'vfio: fix error message' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/i40e: fix queue region DCB configure' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/virtio-user: fix used ring in cvq handling' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/ena: fix dev init with multi-process' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'net/ena: fix errno to positive value' " Yongseok Koh
2019-03-08 17:47 ` [dpdk-stable] patch 'doc: add dependency for PDF in contributing guide' " Yongseok Koh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190308174749.30771-14-yskoh@mellanox.com \
    --to=yskoh@mellanox.com \
    --cc=erik.g.carrillo@intel.com \
    --cc=gavin.hu@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).