From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by dpdk.org (Postfix) with ESMTP id C796DC5A8 for ; Tue, 28 Jul 2015 00:46:23 +0200 (CEST) Received: by igbpg9 with SMTP id pg9so112929148igb.0 for ; Mon, 27 Jul 2015 15:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=hJ80rZmPyYkzQZfWxHNF/ErMd8UFVNJDqx2v+7NuLG8=; b=0aBTcdZPqzZ8z2RiNNXqqKGcw+ouOQE8m33DFzs/PTFbSdcwKUgtvlg8NnjIHk5plo wTedsQjcBUSf0N8ncPC3+amcJdVcn8GTQMR9ldV91R/AiWjGnO2ARnN7kybl4ELbwm+X KlO3vrAZ4DpgaX90i0P1tTGxuGCNdwBm+Skwe1N4WLP9b/vP6kR3vxg/Egg0tCm3sLkR n9siMfRHx2noHiXPQgd6zLdoUmPNoCyEuhpFaCCIPa0SSy3hDVnGqhdQqMMUbgHC6tIa 7MMQyYizkOZJYWWAmpDOD0VB4to3mSCNBwCeQJrAhy2pGUGBXMIvuc+D6TL5dZR5TvfM yGcw== X-Received: by 10.107.169.105 with SMTP id s102mr47835045ioe.151.1438037183387; Mon, 27 Jul 2015 15:46:23 -0700 (PDT) Received: from localhost.localdomain ([23.79.237.14]) by smtp.gmail.com with ESMTPSA id j18sm7063061igf.2.2015.07.27.15.46.22 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2015 15:46:23 -0700 (PDT) From: rsanford2@gmail.com To: dev@dpdk.org Date: Mon, 27 Jul 2015 18:46:03 -0400 Message-Id: <1438037168-639-1-git-send-email-rsanford2@gmail.com> X-Mailer: git-send-email 1.7.1 In-Reply-To: <1437691347-58708-1-git-send-email-rsanford2@gmail.com> References: <1437691347-58708-1-git-send-email-rsanford2@gmail.com> Subject: [dpdk-dev] [PATCH v2 0/3] timer: fix rte_timer_manage and improve unit tests X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jul 2015 22:46:24 -0000 From: Robert Sanford This patchset fixes a bug in timer stress test 2, adds a new stress test to expose a race condition bug in API rte_timer_manage(), and then fixes the rte_timer_manage() bug. Description of rte_timer_manage() race condition bug: Through code inspection, we notice a potential problem in rte_timer_manage() that leads to corruption of per-lcore pending-lists (implemented as skip-lists). The race condition occurs when rte_timer_manage() expires multiple timers on lcore A, while lcore B simultaneously invokes rte_timer_reset() for one of the expiring timers (other than the first one). Lcore A splits its pending-list, creating a local list of expired timers linked through their sl_next[0] pointers, and sets the first expired timer to the RUNNING state, all during one list-lock round trip. Lcore A then unlocks the list-lock to run the first callback, and that is when A and B can have different interpretations of the subsequent expired timers' true state. Lcore B sees an expired timer still in the PENDING state, atomically changes the timer to the CONFIG state, locks lcore A's list-lock, and reinserts the timer into A's pending-list. The two lcores try to use the same next-pointers to maintain both lists! v2 changes: Move patch descriptions to their respective patches. Correct checkpatch warnings. Robert Sanford (3): fix stress test 2 sync bug add timer manage race condition test fix race condition in rte_timer_manage app/test/Makefile | 1 + app/test/test_timer.c | 154 +++++++++++++++++++++++------- app/test/test_timer_racecond.c | 209 ++++++++++++++++++++++++++++++++++++++++ lib/librte_timer/rte_timer.c | 56 +++++++---- 4 files changed, 366 insertions(+), 54 deletions(-) create mode 100644 app/test/test_timer_racecond.c