From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E2902A034F; Sun, 28 Mar 2021 15:39:23 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5F6B040042; Sun, 28 Mar 2021 15:39:23 +0200 (CEST) Received: from m12-18.163.com (m12-18.163.com [220.181.12.18]) by mails.dpdk.org (Postfix) with ESMTP id 8E29B40040 for ; Sun, 28 Mar 2021 15:39:21 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Subject:From:Message-ID:Date:MIME-Version; bh=6mwoV IS3aDEvO8W9vRZMT97pHoTK6oXi60FsZXWVwvk=; b=qTJERMNDxrfiCOG9ixRTf OtATigQrSKgbKt7G7aYEdpWZYBCtvKU/uS3FXA9BB8k8f7+DshmElPXWUcguMuRx 96vjCn94KdIfaOAf1p6mV2hZbA/fyRRl46Gnwz0Rar7xQOpItW6F4EyQ/gryjH8W L8fF4Fkpp6voftN8/M0e/E= Received: from appledeMacBook-Pro.local (unknown [112.10.94.65]) by smtp14 (Coremail) with SMTP id EsCowAC3r+0Ch2BgA1qHbA--.40966S3; Sun, 28 Mar 2021 21:39:15 +0800 (CST) To: Slava Ovsiienko , "dev@dpdk.org" Cc: "zhujiawei12@huawei.com" , Matan Azrad , Shahaf Shuler References: <1616427966-3481-1-git-send-email-17826875952@163.com> From: Jiawei Zhu <17826875952@163.com> Message-ID: <9baf6e83-4c77-49a7-e640-49555bf4ccfd@163.com> Date: Sun, 28 Mar 2021 21:39:14 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID: EsCowAC3r+0Ch2BgA1qHbA--.40966S3 X-Coremail-Antispam: 1Uf129KBjvJXoW7ur1rAr45Kr48GF13Zr1UKFg_yoW8KFWrpr WrKFZrWFWkAr9Yvr1xta1vvFWrZrZ3tw43CFnIgrWkC345CF1DtFZaga43uFy7Crsay3yj qFWUWr47Aa1UZFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07be_-PUUUUU= X-Originating-IP: [112.10.94.65] X-CM-SenderInfo: bprxmjywyxkmivs6il2tof0z/1tbiRRVg9ll91sM5ngABsR Subject: Re: [dpdk-dev] [PATCH] net/mlx5: add Rx checksum offload flag return bad X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, Slava Thanks for your detailed explanation!You are right,I didn't look carefully! With best regards, Jiawei On 2021/3/25 7:55 下午, Slava Ovsiienko wrote: > Hi, Jiawei > >> -----Original Message----- >> From: Jiawei Zhu <17826875952@163.com> >> Sent: Wednesday, March 24, 2021 18:22 >> To: Slava Ovsiienko ; dev@dpdk.org >> Cc: zhujiawei12@huawei.com; Matan Azrad ; Shahaf >> Shuler >> Subject: Re: [PATCH] net/mlx5: add Rx checksum offload flag return bad >> >> Hi,Slava >> >> Thanks for your explain,the multiplications and divisions are in the >> TRANSPOSE,not in the rte_be_to_cpu_16. > > [SO] > Yes, TRANSPOSE is the macro with mul and div operators. But, these ones > are translated by compiler to the simple shifts (due to operands are power of 2). > The only place where TRANSPOSE is used is the rxq_cq_to_ol_flags() routine. > I've compiled this one and provided the assembly listing - please see one > in my previous reply. It illustrates how TRASPOSE was compiled to and > presents the x86 code - we see only shifts: > > 43 0047 48C1EA02 shrq $2,%rdx > 44 004b 48C1E802 shrq $2,%rax > > No any mul/div, exactly as we expected. > >> So I think use if-else directly could improves the performance. > > [SO] > The if/else construction is usually compiled to conditional jumps, the branch > prediction in CPU over the various ingress traffic patterns (we are analyzing the > flags of the received packets) might not work well and we’ll get performance penalty. > Hence, it seems the best practice is not to have the conditional jumps at all. > The existing code follows this approach as we can see from the assembly listing - there > is no conditional jumps. > > With best regards, > Slava > > PS. Just removed embarrassing details from the listing - this is merely the resulting code > of rxq_cq_to_ol_flags(). I removed static and made this one non-inline to see the > isolated piece of code: > > rxq_cq_to_ol_flags: > movzwl 28(%rdi),%edx // endianness conversion optimized out at all > movl %edx,%eax > andw $512,%dx > andw $1024,%ax > movzwl %dx,%edx > movzwl %ax,%eax > shrq $2,%rdx > shrq $2,%rax > orl %edx,%eax > ret > > PPS. As we can see - the shift values are the same for both flags, so there might be some area to optimize > (we could have only one shift and only one masking with AND) >