DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Liu, Yong" <yong.liu@intel.com>
To: Jason Wang <jasowang@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Bie, Tiwei" <tiwei.bie@intel.com>,
	"maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>
Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue function
Date: Tue, 13 Aug 2019 09:02:25 +0000	[thread overview]
Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E63369B33@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <25f7d7e5-899b-ca04-30c0-c84ce1fd4210@redhat.com>

Hi Jason,
Unrolled option effect is highly dependent on compilers. Just tried some compilers around my side.
Vhost en-queue/de-queue path is separated into small parts which can assure compilers can do unroll optimization.
Since only GCC8 support unroll program, only GCC8 added "#pragma GCC unroll".

GCC8 and Clang shown much less performance gap than ICC and elder GCC. 
Now we have one better performance with fixed batch version code and another less performance with auto unrolled version.
What's your option on the choice? Thanks in advance. 

|----------------|---------------|-------------|------|
| Compiler       | Auto unrolled | Fixed batch | Gap  |
|----------------|---------------|-------------|------|
| Clang6.0.0     | 13.1M         | 13.5M       | 0.4M |
|----------------|---------------|-------------|------|
| GCC 8.3.0      | 13.9M         | 14.4M       | 0.5M |
|----------------|---------------|-------------|------|
| GCC 7.4.0      | 12.6M         | 13.5M       | 0.9M |
|----------------|---------------|-------------|------|
| ICC 19.0.4.243 | 11.0M         | 12.3M       | 1.3M |
|----------------|---------------|-------------|------|

Regards,
Marvin

> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Thursday, July 11, 2019 5:55 PM
> To: Liu, Yong <yong.liu@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> maxime.coquelin@redhat.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast
> enqueue function
> 
> 
> On 2019/7/11 下午5:49, Liu, Yong wrote:
> >
> >> -----Original Message-----
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Thursday, July 11, 2019 12:11 PM
> >> To: Liu, Yong <yong.liu@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> >> maxime.coquelin@redhat.com; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast
> enqueue
> >> function
> >>
> >>
> >> On 2019/7/10 下午3:30, Liu, Yong wrote:
> >>>> -----Original Message-----
> >>>> From: Jason Wang [mailto:jasowang@redhat.com]
> >>>> Sent: Wednesday, July 10, 2019 12:28 PM
> >>>> To: Liu, Yong <yong.liu@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> >>>> maxime.coquelin@redhat.com; dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast
> >> enqueue
> >>>> function
> >>>>
> >>>>
> >>>> On 2019/7/9 上午1:13, Marvin Liu wrote:
> >>>>> In fast enqueue function, will first check whether descriptors are
> >>>>> cache aligned. Fast enqueue function will check prerequisites in the
> >>>>> beginning. Fast enqueue function do not support chained mbufs, normal
> >>>>> function will handle that.
> >>>>>
> >>>>> Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >>>> Any reason for not letting compiler to unroll the loops?
> >>>>
> >>> Hi Jason,
> >>> I'm not sure about how much compiler can help on unrolling loops as it
> >> can't know how much loops will create in one call.
> >>> After force not using unroll-loop optimization by "-fno-unroll-loops",
> >> virtio_dev_rx_packed function size remained the same.
> >>> So look like gcc unroll-loop optimization do not help here.
> >>
> >> I meant something like "pragma GCC unroll N" just before the loop you
> >> want unrolled.
> >>
> >> Thanks
> >>
> > Hi Jason,
> > Just tired with gcc8.3.0 and master code, only 0.1Mpps performance gain
> with "#pragma GCC unroll".
> > I think this compiler pragma is not helpful in the big loop which
> contained so much functions.
> >
> > Thanks,
> > Marvin
> 
> 
> Yes, it probably need some trick e.g break the big loop into small ones.
> What I want do here is unroll the loop based on
> PACKED_DESC_PER_CACHELINE instead of a hard-coded 4.
> 
> Thanks
> 
> 
> >>> And fast enqueue function not only did unroll loop, it also checked
> cache
> >> alignment which can help performance in another side.
> >>> Regards,
> >>> Marvin
> >>>
> >>>> Thanks
> >>>>

  reply	other threads:[~2019-08-13  9:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08 17:13 [dpdk-dev] [RFC] vhost packed ring performance optimization Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 01/13] add vhost normal enqueue function Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast " Marvin Liu
     [not found]   ` <CGME20190708113801eucas1p25d89717d8b298790326077852c9933c8@eucas1p2.samsung.com>
2019-07-08 11:37     ` Ilya Maximets
2019-07-09  1:15       ` Liu, Yong
2019-07-10  4:28   ` Jason Wang
2019-07-10  7:30     ` Liu, Yong
2019-07-11  4:11       ` Jason Wang
2019-07-11  9:49         ` Liu, Yong
2019-07-11  9:54           ` Jason Wang
2019-08-13  9:02             ` Liu, Yong [this message]
2019-07-11  8:35   ` Jason Wang
2019-07-11  9:37     ` Liu, Yong
2019-07-11  9:44       ` Jason Wang
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 03/13] add vhost packed ring normal dequeue function Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 04/13] add vhost packed ring fast " Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 05/13] add enqueue shadow used descs update and flush functions Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 06/13] add vhost fast enqueue flush function Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 07/13] add vhost dequeue shadow descs update function Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 08/13] add vhost fast dequeue flush function Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 09/13] replace vhost enqueue packed ring function Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 10/13] add vhost fast zero copy dequeue " Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 11/13] replace vhost " Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 12/13] support inorder in vhost dequeue path Marvin Liu
2019-07-08 17:13 ` [dpdk-dev] [RFC PATCH 13/13] remove useless vhost functions Marvin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86228AFD5BCD8E4EBFD2B90117B5E81E63369B33@SHSMSX103.ccr.corp.intel.com \
    --to=yong.liu@intel.com \
    --cc=dev@dpdk.org \
    --cc=jasowang@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=tiwei.bie@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).