From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A619FA0350; Sun, 16 Jan 2022 15:10:04 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1DD0940041; Sun, 16 Jan 2022 15:10:04 +0100 (CET) Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by mails.dpdk.org (Postfix) with ESMTP id 2281C40040; Sun, 16 Jan 2022 15:10:02 +0100 (CET) Received: by mail-pl1-f180.google.com with SMTP id v2so2717576ply.11; Sun, 16 Jan 2022 06:10:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=QjtrNvT+pTz7gFlYk068TBofAwMaUCfT6qfJgmw/Wm0=; b=g8L2ATNvVuWVWsM68zBv19e1X4kRwUKpaybJ4DJKgk521HyzplT3pTjbMpB814HfqH yMIpnhWTn6UJW30Y4z9uqJpMoacleco4jzMpK0RFs24QcuxY6zkkEiWatIAjbRX5RzB5 Km0qQrDUvF6F0DH+Rbhx/IpRda/kG+VKxtOOQoTeCzDh6bR6tI51RBO9MECh0tlbs+mK cNBXC5mffeo9LLeYgVjoBhkNrVNlUW2ptgGVq86mLAM/cW1N552lKFH/oplwSeNDqKzI BX8sI1J8vkhZIrVbxbzCDEex+NoJcSXb2F7e2oASMVFwwJTZjIi5VXuLyVlhqPORvTmw B6EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=QjtrNvT+pTz7gFlYk068TBofAwMaUCfT6qfJgmw/Wm0=; b=Nfxcj2wI4GgpvkaBKYB6Z4tYlRXhKbYyXOZIHg/7VYIlSal4QDnLzBubWaKsv/DQsi zHL5ZTFsQsEGrSEbtRXZ2/h0a6MQ/Mr6njjLg76NbF6xHF/GFrszso8EeCYA9Urzbrdq /K4bwrH8C7SO6v2GACS8BbnnXz62vuxektiIcZ34Uh8iko+oxlYbk5KMA8qLDTBaXtj0 p16eJu+HwLONDMiXbhj7GaDceFvnQLtWiT7EptT1K2wFZtkqjZqlZeccM74ysV+lx7BA VzIFxVOvEOIl3GviCJuhD7O/Ovl73MQrwF85Ocs28a7pqVXyJXGLwthe0HQ+8+QPpy8U dx3A== X-Gm-Message-State: AOAM531o/Ms3s+bDQl6+R1gcLxSNLJrfOhClmAhYMq9JxtfooUyL1aRf Xp4R1q0HYWBrO7C2Ka/FqxmlEDqD/ReS6VD6WrQ= X-Google-Smtp-Source: ABdhPJy9NQDZWZ0qdbp9RdpLCxVau4C8quv5n+EBQuRPeS2ER/ojuKB1YPLsLn3/F9fBc/ugDNCaO6HYp2ds7EukJT4= X-Received: by 2002:a17:90a:c586:: with SMTP id l6mr29709394pjt.217.1642342200786; Sun, 16 Jan 2022 06:10:00 -0800 (PST) MIME-Version: 1.0 References: <20220115194102.444140-1-lucp.at.work@gmail.com> <20220115213949.449313-1-lucp.at.work@gmail.com> <20220115141342.396a5f3a@hermes.local> In-Reply-To: <20220115141342.396a5f3a@hermes.local> From: Luc Pelletier Date: Sun, 16 Jan 2022 09:09:49 -0500 Message-ID: Subject: Re: [PATCH v2] eal: fix unaligned loads/stores in rte_memcpy_generic To: stephen@networkplumber.org Cc: bruce.richardson@intel.com, konstantin.ananyev@intel.com, dev , Xiaoyun Li , dpdk stable Content-Type: text/plain; charset="UTF-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > X86 always allows unaligned access. Irregardless of what tools say. > Why impose additional overhead in performance critical code. Let me preface my response by saying that I'm not a C compiler developer. Hopefully someone who is will read this and chime in. I agree that X86 allows unaligned store/load. However, the C standard doesn't, and says that it's undefined behavior. This means that the code relies on undefined behavior. It may do the right thing all the time, almost all the time, some of the time... it's undefined. It may work now but it may stop working in the future. Here's a good discussion on SO about unaligned accesses in C on x86: https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment/46790815#46790815 There's no way to do the unaligned store/load in C (that I know of) without invoking undefined behavior. I can see 2 options, either write the code in assembly, or use some other C construct that doesn't rely on undefined behavior. While the for loop may seem slower than the other options, it surprisingly results in fewer load/store operations in certain scenarios. For example, if n == 15 and it's known at compile-time, the compiler will generate 2 overlapping qword load/store operations (rather than the 4 that are currently being done with the current code). All that being said, I can go back to something similar to my first patch. Using inline assembly, and making sure this time that it works for 32-bit too. I will post a patch in a few minutes that does exactly that. Maintainers can then chime in with their preferred option.