From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by dpdk.org (Postfix) with ESMTP id 55D831B738 for ; Wed, 7 Feb 2018 17:47:20 +0100 (CET) Received: by mail-wm0-f67.google.com with SMTP id r71so4395859wmd.1 for ; Wed, 07 Feb 2018 08:47:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8q4UNpIctMmzIox3p1nB16j7UpMz3lCgwli/7EeEhM8=; b=r/IkWtjjIxmUKyU6Cw8WwTW0E5pbHd+JAsMKLcLgtoll7vdYzh5VxapD7RyPfgY6ZS Gno4rU8bNex0JNWvOEzXaXMWIecThZFsVS2TaRT0rAbX06ouzApz8c/t9tgp6TKxO/KA VgMhyEnwyqZ7nh0UCTr3Bb2uRWVAAtxENq2bAgmv3WiIjwz2EZs1+92bbR9oBBuISBNJ VZ4sq5I6o3USdkrXg3ItlCKXuCJDm9UclmlXr3rMqt4JK34IVPqfPebaKqmKO40SA/xg Mc2dXhZ91gmMJlFi1pxco2pq5NwK5PxN38fC5LRLloxDTM2DoUBDz1foOO4xhPVyzora yrfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8q4UNpIctMmzIox3p1nB16j7UpMz3lCgwli/7EeEhM8=; b=F7s0be5op2dxlQ07hAwH99mcSyLArASVVgSdgnQZYZa1StMV32h7X9iECD7S1n5Wis OiG1H2ckv65PXoYWm89Z4VQVzn0dZcsSJWs85CVKSFKR6Evq/DpTEMBc+ZheRGZkgqR+ EDBBHHYmIdgOJ7PsJuledmToK57z+ytLEpAXB6MHyY0oe0FffetT2Hn8/0qHrKXxcOMy csXhAlGDm/kRyN0xFhTDUzs4ZPhIKlMYDXlETd094rUHHl1057rS2IMw1jx0V7DxQ/aG 3q9QhGq0B7cSFQ7Mo1CwtVW0Pq17aHvkyShMPGuXsSs6Wk6vZg/npM4Z5oBhHzvU6+B1 IxgQ== X-Gm-Message-State: APf1xPD2Wc6cBGhJntJaaY38BXqMSByg4PN3F6mYgTAqcXKO/LyvyxM7 rZssafWzpNEtsYeO5bBpIvo= X-Google-Smtp-Source: AH8x225fi4DE9zYeoh1xBhInUbhWpt7Z1C/afnCVADVv+fX3lckgLMI1wiYRoRrrAEc7PuKHKUz/wg== X-Received: by 10.28.130.72 with SMTP id e69mr5455842wmd.50.1518022038956; Wed, 07 Feb 2018 08:47:18 -0800 (PST) Received: from localhost ([2a00:23c5:bef3:400:9531:588b:44ae:bec4]) by smtp.gmail.com with ESMTPSA id 81sm1993899wmi.26.2018.02.07.08.47.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 07 Feb 2018 08:47:18 -0800 (PST) From: luca.boccassi@gmail.com To: Konstantin Ananyev Cc: Bruce Richardson , dpdk stable Date: Wed, 7 Feb 2018 16:46:33 +0000 Message-Id: <20180207164705.29052-2-luca.boccassi@gmail.com> X-Mailer: git-send-email 2.14.2 In-Reply-To: <20180207164705.29052-1-luca.boccassi@gmail.com> References: <20180126131332.15346-62-luca.boccassi@gmail.com> <20180207164705.29052-1-luca.boccassi@gmail.com> Subject: [dpdk-stable] patch 'eal/x86: use lock-prefixed instructions for SMP barrier' has been queued to LTS release 16.11.5 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Feb 2018 16:47:20 -0000 Hi, FYI, your patch has been queued to LTS release 16.11.5 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 02/09/18. So please shout if anyone has objections. Thanks. Luca Boccassi --- >>From 4b93724891bb8e1ee7c5c2e5e269728416595320 Mon Sep 17 00:00:00 2001 From: Konstantin Ananyev Date: Mon, 15 Jan 2018 15:09:31 +0000 Subject: [PATCH] eal/x86: use lock-prefixed instructions for SMP barrier [ upstream commit 096ffd811fe21d652e51f07a7859967ffaabc72c ] On x86 it is possible to use lock-prefixed instructions to get the similar effect as mfence. As pointed by Java guys, on most modern HW that gives a better performance than using mfence: https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ That patch adopts that technique for rte_smp_mb() implementation. On BDW 2.2 mb_autotest on single lcore reports 2X cycle reduction, i.e. from ~110 to ~55 cycles per operation. Signed-off-by: Konstantin Ananyev Acked-by: Bruce Richardson --- .../common/include/arch/x86/rte_atomic.h | 44 +++++++++++++++++++++- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h index 00b1cdf5d..d12b679a3 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h @@ -55,12 +55,52 @@ extern "C" { #define rte_rmb() _mm_lfence() -#define rte_smp_mb() rte_mb() - #define rte_smp_wmb() rte_compiler_barrier() #define rte_smp_rmb() rte_compiler_barrier() +/* + * From Intel Software Development Manual; Vol 3; + * 8.2.2 Memory Ordering in P6 and More Recent Processor Families: + * ... + * . Reads are not reordered with other reads. + * . Writes are not reordered with older reads. + * . Writes to memory are not reordered with other writes, + * with the following exceptions: + * . streaming stores (writes) executed with the non-temporal move + * instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and + * . string operations (see Section 8.2.4.1). + * ... + * . Reads may be reordered with older writes to different locations but not + * with older writes to the same location. + * . Reads or writes cannot be reordered with I/O instructions, + * locked instructions, or serializing instructions. + * . Reads cannot pass earlier LFENCE and MFENCE instructions. + * . Writes ... cannot pass earlier LFENCE, SFENCE, and MFENCE instructions. + * . LFENCE instructions cannot pass earlier reads. + * . SFENCE instructions cannot pass earlier writes ... + * . MFENCE instructions cannot pass earlier reads, writes ... + * + * As pointed by Java guys, that makes possible to use lock-prefixed + * instructions to get the same effect as mfence and on most modern HW + * that gives a better perfomance then using mfence: + * https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ + * Basic idea is to use lock prefixed add with some dummy memory location + * as the destination. From their experiments 128B(2 cache lines) below + * current stack pointer looks like a good candidate. + * So below we use that techinque for rte_smp_mb() implementation. + */ + +static inline void __attribute__((always_inline)) +rte_smp_mb(void) +{ +#ifdef RTE_ARCH_I686 + asm volatile("lock addl $0, -128(%%esp); " ::: "memory"); +#else + asm volatile("lock addl $0, -128(%%rsp); " ::: "memory"); +#endif +} + /*------------------------- 16 bit atomic operations -------------------------*/ #ifndef RTE_FORCE_INTRINSICS -- 2.14.2