From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A2628A051C; Fri, 26 Jun 2020 17:54:29 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 186A81BFA4; Fri, 26 Jun 2020 17:54:29 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id EE66B1BF95 for ; Fri, 26 Jun 2020 17:54:26 +0200 (CEST) IronPort-SDR: HY3dS1u4cJ3Ca5JpDJpEyEvykfUaWCxjvrDQ8jNlXSvznncKoH09xQMMb6STUFEKA3HebFWNDY F6pAcGywsT/w== X-IronPort-AV: E=McAfee;i="6000,8403,9664"; a="144460616" X-IronPort-AV: E=Sophos;i="5.75,284,1589266800"; d="scan'208";a="144460616" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2020 08:54:25 -0700 IronPort-SDR: OLWbcAf2wekGhBZZFSl+PYQI106ABhP/2nAAoc1rvxOQTZRjvMKkSua5aEmdFe8fFfi/k5k/94 jIInDgrzFFig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,284,1589266800"; d="scan'208";a="264242652" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.213.211.228]) ([10.213.211.228]) by fmsmga007.fm.intel.com with ESMTP; 26 Jun 2020 08:54:24 -0700 To: "Van Haaren, Harry" , =?UTF-8?Q?Morten_Br=c3=b8rup?= , "dev@dpdk.org" Cc: Olivier Matz , "Ananyev, Konstantin" References: <98CBD80474FA8B44BF855DF32C47DC35C610C4@smartserver.smartshare.dk> <6b67ce84-92ee-550d-2fba-af8c4c1bb2aa@intel.com> From: Ferruh Yigit Autocrypt: addr=ferruh.yigit@intel.com; prefer-encrypt=mutual; keydata= mQINBFXZCFABEADCujshBOAaqPZpwShdkzkyGpJ15lmxiSr3jVMqOtQS/sB3FYLT0/d3+bvy qbL9YnlbPyRvZfnP3pXiKwkRoR1RJwEo2BOf6hxdzTmLRtGtwWzI9MwrUPj6n/ldiD58VAGQ +iR1I/z9UBUN/ZMksElA2D7Jgg7vZ78iKwNnd+vLBD6I61kVrZ45Vjo3r+pPOByUBXOUlxp9 GWEKKIrJ4eogqkVNSixN16VYK7xR+5OUkBYUO+sE6etSxCr7BahMPKxH+XPlZZjKrxciaWQb +dElz3Ab4Opl+ZT/bK2huX+W+NJBEBVzjTkhjSTjcyRdxvS1gwWRuXqAml/sh+KQjPV1PPHF YK5LcqLkle+OKTCa82OvUb7cr+ALxATIZXQkgmn+zFT8UzSS3aiBBohg3BtbTIWy51jNlYdy ezUZ4UxKSsFuUTPt+JjHQBvF7WKbmNGS3fCid5Iag4tWOfZoqiCNzxApkVugltxoc6rG2TyX CmI2rP0mQ0GOsGXA3+3c1MCdQFzdIn/5tLBZyKy4F54UFo35eOX8/g7OaE+xrgY/4bZjpxC1 1pd66AAtKb3aNXpHvIfkVV6NYloo52H+FUE5ZDPNCGD0/btFGPWmWRmkPybzColTy7fmPaGz cBcEEqHK4T0aY4UJmE7Ylvg255Kz7s6wGZe6IR3N0cKNv++O7QARAQABtCVGZXJydWggWWln aXQgPGZlcnJ1aC55aWdpdEBpbnRlbC5jb20+iQJsBBMBCgBWAhsDAh4BAheABQsJCAcDBRUK CQgLBRYCAwEABQkKqZZ8FiEE0jZTh0IuwoTjmYHH+TPrQ98TYR8FAl6ha3sXGHZrczovL2tl eXMub3BlbnBncC5vcmcACgkQ+TPrQ98TYR8uLA//QwltuFliUWe60xwmu9sY38c1DXvX67wk UryQ1WijVdIoj4H8cf/s2KtyIBjc89R254KMEfJDao/LrXqJ69KyGKXFhFPlF3VmFLsN4XiT PSfxkx8s6kHVaB3O183p4xAqnnl/ql8nJ5ph9HuwdL8CyO5/7dC/MjZ/mc4NGq5O9zk3YRGO lvdZAp5HW9VKW4iynvy7rl3tKyEqaAE62MbGyfJDH3C/nV/4+mPc8Av5rRH2hV+DBQourwuC ci6noiDP6GCNQqTh1FHYvXaN4GPMHD9DX6LtT8Fc5mL/V9i9kEVikPohlI0WJqhE+vQHFzR2 1q5nznE+pweYsBi3LXIMYpmha9oJh03dJOdKAEhkfBr6n8BWkWQMMiwfdzg20JX0o7a/iF8H 4dshBs+dXdIKzPfJhMjHxLDFNPNH8zRQkB02JceY9ESEah3wAbzTwz+e/9qQ5OyDTQjKkVOo cxC2U7CqeNt0JZi0tmuzIWrfxjAUulVhBmnceqyMOzGpSCQIkvalb6+eXsC9V1DZ4zsHZ2Mx Hi+7pCksdraXUhKdg5bOVCt8XFmx1MX4AoV3GWy6mZ4eMMvJN2hjXcrreQgG25BdCdcxKgqp e9cMbCtF+RZax8U6LkAWueJJ1QXrav1Jk5SnG8/5xANQoBQKGz+yFiWcgEs9Tpxth15o2v59 gXK5Ag0EV9ZMvgEQAKc0Db17xNqtSwEvmfp4tkddwW9XA0tWWKtY4KUdd/jijYqc3fDD54ES YpV8QWj0xK4YM0dLxnDU2IYxjEshSB1TqAatVWz9WtBYvzalsyTqMKP3w34FciuL7orXP4Ai bPtrHuIXWQOBECcVZTTOdZYGAzaYzxiAONzF9eTiwIqe9/oaOjTwTLnOarHt16QApTYQSnxD UQljeNvKYt1lZE/gAUUxNLWsYyTT+22/vU0GDUahsJxs1+f1yEr+OGrFiEAmqrzpF0lCS3f/ 3HVTU6rS9cK3glVUeaTF4+1SK5ZNO35piVQCwphmxa+dwTG/DvvHYCtgOZorTJ+OHfvCnSVj sM4kcXGjJPy3JZmUtyL9UxEbYlrffGPQI3gLXIGD5AN5XdAXFCjjaID/KR1c9RHd7Oaw0Pdc q9UtMLgM1vdX8RlDuMGPrj5sQrRVbgYHfVU/TQCk1C9KhzOwg4Ap2T3tE1umY/DqrXQgsgH7 1PXFucVjOyHMYXXugLT8YQ0gcBPHy9mZqw5mgOI5lCl6d4uCcUT0l/OEtPG/rA1lxz8ctdFB VOQOxCvwRG2QCgcJ/UTn5vlivul+cThi6ERPvjqjblLncQtRg8izj2qgmwQkvfj+h7Ex88bI 8iWtu5+I3K3LmNz/UxHBSWEmUnkg4fJlRr7oItHsZ0ia6wWQ8lQnABEBAAGJAjwEGAEKACYC GwwWIQTSNlOHQi7ChOOZgcf5M+tD3xNhHwUCXqFrngUJCKxSYAAKCRD5M+tD3xNhH3YWD/9b cUiWaHJasX+OpiuZ1Li5GG3m9aw4lR/k2lET0UPRer2Jy1JsL+uqzdkxGvPqzFTBXgx/6Byz EMa2mt6R9BCyR286s3lxVS5Bgr5JGB3EkpPcoJT3A7QOYMV95jBiiJTy78Qdzi5LrIu4tW6H o0MWUjpjdbR01cnj6EagKrDx9kAsqQTfvz4ff5JIFyKSKEHQMaz1YGHyCWhsTwqONhs0G7V2 0taQS1bGiaWND0dIBJ/u0pU998XZhmMzn765H+/MqXsyDXwoHv1rcaX/kcZIcN3sLUVcbdxA WHXOktGTQemQfEpCNuf2jeeJlp8sHmAQmV3dLS1R49h0q7hH4qOPEIvXjQebJGs5W7s2vxbA 5u5nLujmMkkfg1XHsds0u7Zdp2n200VC4GQf8vsUp6CSMgjedHeF9zKv1W4lYXpHp576ZV7T GgsEsvveAE1xvHnpV9d7ZehPuZfYlP4qgo2iutA1c0AXZLn5LPcDBgZ+KQZTzm05RU1gkx7n gL9CdTzVrYFy7Y5R+TrE9HFUnsaXaGsJwOB/emByGPQEKrupz8CZFi9pkqPuAPwjN6Wonokv ChAewHXPUadcJmCTj78Oeg9uXR6yjpxyFjx3vdijQIYgi5TEGpeTQBymLANOYxYWYOjXk+ae dYuOYKR9nbPv+2zK9pwwQ2NXbUBystaGyQ== Message-ID: Date: Fri, 26 Jun 2020 16:54:23 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 6/26/2020 1:41 PM, Van Haaren, Harry wrote: >> -----Original Message----- >> From: Yigit, Ferruh >> Sent: Friday, June 26, 2020 1:08 PM >> To: Morten Brørup ; dev@dpdk.org >> Cc: Olivier Matz ; Van Haaren, Harry >> ; Ananyev, Konstantin >> >> Subject: Re: [dpdk-dev] rte_ether_addr_copy() strange comment >> >> On 6/25/2020 4:45 PM, Morten Brørup wrote: >>> The function rte_ether_addr_copy() checks for __INTEL_COMPILER and has a >> comment about "a strange gcc warning". It says: >>> >>> static inline void rte_ether_addr_copy(const struct rte_ether_addr *ea_from, >>> struct rte_ether_addr *ea_to) >>> { >>> #ifdef __INTEL_COMPILER >>> uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes); >>> uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes); >>> >>> to_words[0] = from_words[0]; >>> to_words[1] = from_words[1]; >>> to_words[2] = from_words[2]; >>> #else >>> /* >>> * Use the common way, because of a strange gcc warning. >>> */ >>> *ea_to = *ea_from; >>> #endif >>> } >>> >>> I can see that from_words discards the const qualifier. Is that the "strange" gcc >> warning the comment is referring to? >>> >>> This goes back to before the first public release of DPDK in 2013, ref. >> https://git.dpdk.org/dpdk/log/lib/librte_ether/rte_ether.h >>> >>> >>> It should be fixed as follows: >>> >>> -uint16_t *from_words = (uint16_t *)(ea_from->addr_bytes); >>> -uint16_t *to_words = (uint16_t *)(ea_to->addr_bytes); >>> +const uint16_t *from_words = (const uint16_t *)ea_from; >>> +uint16_t *to_words = (uint16_t *)ea_to; >>> >>> And the consequential question: Is copying the three shorts faster than >> copying the struct? In other words: Should we get rid of the #ifdef and use the >> first method only? >> >> >> I tried to investigate this in godbolt: https://godbolt.org/z/YSmvDn > > There was a small hiccup in the struct mac definition, it is aligned to 2, not 16 as the above Godbolt link... > With the aligned attribute changed to 2 (as per DPDK header https://git.dpdk.org/dpdk/tree/lib/librte_net/rte_ether.h#n59 ) > we get the required (but less performant) smaller stores: > > WORD_COPY = 0, Aligned = 16 > NOTE: Incorrect alignment provided, and invalid ASM as it stores over the 10 bytes after eth addr. > This code is from a GodBolt test only, and this bug is NOT present in upstream DPDK. Thanks for the clarification. (not sure how I end up testing 16 byte alignment) <...> > PS: For extra bonus points, here's a SIMD version that only uses one store > https://godbolt.org/z/VAR2La. Unless you intend on copying billions of > L1 resident eth addrs, this may or may not be a useful optimization. > Note that it requires the 10 bytes after the ether addr to be valid to read. > It loads 16B across both SRC and DST, blends 48 bits of SRC into DST and > writes the result back to DST. > movdqu (%rsi), %xmm0 > movdqu (%rdi), %xmm1 > pblendw $7, %xmm1, %xmm0 > movups %xmm0, (%rdi) > ret > > Actually, its possible to do this using a uint64_t (8 byte scalar) load/store too, > with some masking and bitwise OR... left as an exercise to the reader? :) > Does below work? (not for real life usage, just to experiment single store solutions :) [https://godbolt.org/z/TmqwQh] movzwl 6(%rdi), %eax salq $48, %rax orq (%rsi), %rax movq %rax, (%rdi) ret ---- void copy(struct mac *dst, const struct mac *src) { uint64_t *s = (uint64_t *) &src->addr; uint64_t *d = (uint64_t *) &dst->addr; uint16_t dd = ((uint16_t *)d)[3]; *d = (*s & ~(0xffffUL<48)) | ((uint64_t)dd << 48); }