DPDK patches and discussions
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: Jiawei Zhu <17826875952@163.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "zhujiawei12@huawei.com" <zhujiawei12@huawei.com>,
	Matan Azrad <matan@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>
Subject: Re: [dpdk-dev] [PATCH] net/mlx5: add Rx checksum offload flag return bad
Date: Thu, 25 Mar 2021 11:55:50 +0000	[thread overview]
Message-ID: <DM6PR12MB3753A18CEE790690F61176D2DF629@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <edc6d950-c3a6-29ff-22b4-8514c4597d9a@163.com>

Hi, Jiawei

> -----Original Message-----
> From: Jiawei Zhu <17826875952@163.com>
> Sent: Wednesday, March 24, 2021 18:22
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>
> Subject: Re: [PATCH] net/mlx5: add Rx checksum offload flag return bad
> Hi,Slava
> Thanks for your explain,the multiplications and divisions are in the
> TRANSPOSE,not in the rte_be_to_cpu_16. 

Yes, TRANSPOSE is the macro with mul and div operators. But, these ones
are translated by compiler to the simple shifts (due to operands are power of 2).
The only place where TRANSPOSE is used is the rxq_cq_to_ol_flags() routine.
I've compiled this one  and provided the assembly listing - please see one
in my previous reply. It illustrates how TRASPOSE was compiled to and 
presents the x86 code - we see only shifts:

43 0047 48C1EA02 	 shrq $2,%rdx
44 004b 48C1E802 	 shrq $2,%rax

No any mul/div, exactly as we expected.

> So I think use if-else directly could improves the performance.

The if/else construction is usually compiled to conditional jumps, the branch
prediction in CPU over the various ingress traffic patterns  (we are analyzing the
flags of the received packets) might not work well and we’ll get performance penalty.
Hence, it seems the best practice is not to have the conditional jumps at all.
The existing code follows this approach as we can see from the assembly listing - there
is no conditional jumps.

With best regards,

PS. Just removed embarrassing details from the listing - this is merely the resulting code
of rxq_cq_to_ol_flags(). I removed static and made this one non-inline to see the
isolated piece of code:

  movzwl 28(%rdi),%edx   // endianness conversion optimized out at all
  movl %edx,%eax
  andw $512,%dx
  andw $1024,%ax
  movzwl %dx,%edx
  movzwl %ax,%eax
  shrq $2,%rdx
  shrq $2,%rax
  orl %edx,%eax

PPS. As we can see - the shift values are the same for both flags, so there might be some area to optimize
(we could have only one shift and only one masking with AND)

  reply	other threads:[~2021-03-25 11:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22 15:46 Jiawei Zhu
2021-03-22 16:26 ` Jiawei Zhu
2021-03-23 16:33 ` Slava Ovsiienko
2021-03-24 16:22   ` Jiawei Zhu
2021-03-25 11:55     ` Slava Ovsiienko [this message]
2021-03-28 13:39       ` Jiawei Zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB3753A18CEE790690F61176D2DF629@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=17826875952@163.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=zhujiawei12@huawei.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).