From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0D2B4A00BE; Tue, 28 Apr 2020 17:56:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D881A1D60C; Tue, 28 Apr 2020 17:56:05 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 7798A1D604 for ; Tue, 28 Apr 2020 17:56:03 +0200 (CEST) IronPort-SDR: DIpU7auSk361wj+Whf38wXIVAhh+fUIbuXpP7P1R8GGRpmnnhuBI1DN8WmHwc9tpdqlrYfqauT XgPMrQzRgm3w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2020 08:56:01 -0700 IronPort-SDR: KSMcb1igMmtNMbSx9+x5r5FlPVa4L5ZRCNnbwkz7QpqORjp3evRj368HGoOG6HlcuHD6ZfUdSJ F4Eh/sZIEkrw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,328,1583222400"; d="scan'208";a="282187721" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga004.fm.intel.com with ESMTP; 28 Apr 2020 08:56:00 -0700 Received: from shsmsx153.ccr.corp.intel.com (10.239.6.53) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.439.0; Tue, 28 Apr 2020 08:56:00 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.146]) by SHSMSX153.ccr.corp.intel.com ([169.254.12.89]) with mapi id 14.03.0439.000; Tue, 28 Apr 2020 23:55:56 +0800 From: "Liu, Yong" To: Maxime Coquelin , "Ye, Xiaolong" , "Wang, Zhihong" CC: "dev@dpdk.org" , Honnappa Nagarahalli , "jerinj@marvell.com" Thread-Topic: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path Thread-Index: AQHWG3Ez/QtVlUuuCEi04+ox0lqZDaiMTtSAgAFrqFD///rlgIAAx6WA//+MxACAAJKhsP//f0uAABEHg+D//4WlgP//dq8A Date: Tue, 28 Apr 2020 15:55:55 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E63548814@SHSMSX103.ccr.corp.intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200426021943.43158-1-yong.liu@intel.com> <20200426021943.43158-7-yong.liu@intel.com> <672a584a-46d1-c78b-7b21-9ed7bc060814@redhat.com> <86228AFD5BCD8E4EBFD2B90117B5E81E63546D0C@SHSMSX103.ccr.corp.intel.com> <86228AFD5BCD8E4EBFD2B90117B5E81E6354755C@SHSMSX103.ccr.corp.intel.com> <679f4dc1-a4ac-ab1b-a7b1-297d66158820@redhat.com> <86228AFD5BCD8E4EBFD2B90117B5E81E635476DC@SHSMSX103.ccr.corp.intel.com> <2c383833-c67f-c1fb-ea3f-c3802b24de79@redhat.com> <86228AFD5BCD8E4EBFD2B90117B5E81E63547785@SHSMSX103.ccr.corp.intel.com> <77b2e42a-f02a-4731-ebc7-3565e6fd1f28@redhat.com> In-Reply-To: <77b2e42a-f02a-4731-ebc7-3565e6fd1f28@redhat.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Maxime Coquelin > Sent: Tuesday, April 28, 2020 11:40 PM > To: Liu, Yong ; Ye, Xiaolong ; > Wang, Zhihong > Cc: dev@dpdk.org; Honnappa Nagarahalli > ; jerinj@marvell.com > Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx pa= th >=20 >=20 >=20 > On 4/28/20 5:35 PM, Liu, Yong wrote: > > > > > >> -----Original Message----- > >> From: Maxime Coquelin > >> Sent: Tuesday, April 28, 2020 10:50 PM > >> To: Liu, Yong ; Ye, Xiaolong > ; > >> Wang, Zhihong > >> Cc: dev@dpdk.org; Honnappa Nagarahalli > >> ; jerinj@marvell.com > >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx > path > >> > >> > >> > >> On 4/28/20 4:43 PM, Liu, Yong wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Maxime Coquelin > >>>> Sent: Tuesday, April 28, 2020 9:46 PM > >>>> To: Liu, Yong ; Ye, Xiaolong > >> ; > >>>> Wang, Zhihong > >>>> Cc: dev@dpdk.org; Honnappa Nagarahalli > >>>> ; jerinj@marvell.com > >>>> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring = Rx > >> path > >>>> > >>>> > >>>> > >>>> On 4/28/20 3:01 PM, Liu, Yong wrote: > >>>>>>> Maxime, > >>>>>>> Thanks for point it out, it will add extra cache miss in datapath= . > >>>>>>> And its impact on performance is around 1% in loopback case. > >>>>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks > >>>>>> on my side because when doing IO loopback, the cache pressure is > >>>>>> much less important. > >>>>>> > >>>>>>> While benefit of vectorized path will be more than that number. > >>>>>> Ok, but I disagree for two reasons: > >>>>>> 1. You have to keep in mind than non-vectorized is the default an= d > >>>>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like n= ot > >>>>>> checking header length (so no error stats), etc... > >>>>>> > >>>>> Ok, I will keep non-vectorized same as before. > >>>>> > >>>>>> 2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A > >> because > >>>>>> the gain is 20% on $CPU_VENDOR_B. > >>>>>> > >>>>>> In the case we see more degradation in real-world scenario, you > might > >>>>>> want to consider using ifdefs to avoid adding padding in the non- > >>>>>> vectorized case, like you did to differentiate Virtio PMD to Virti= o- > user > >>>>>> PMD in patch 7. > >>>>>> > >>>>> Maxime, > >>>>> The performance difference is so slight, so I ignored for it look l= ike a > >>>> sampling error. > >>>> > >>>> Agree for IO loopback, but it adds one more cache line access per > burst, > >>>> which might be see in some real-life use cases. > >>>> > >>>>> It maybe not suitable to add new configuration for such setting > which > >>>> only used inside driver. > >>>> > >>>> Wait, the Virtio-user #ifdef is based on the defconfig options? How > can > >>>> it work since both Virtio PMD and Virtio-user PMD can be selected at > the > >>>> same time? > >>>> > >>>> I thought it was a define set before the headers inclusion and unset > >>>> afterwards, but I didn't checked carefully. > >>>> > >>> > >>> Maxime, > >>> The difference between virtio PMD and Virtio-user PMD addresses is > >> handled by vq->offset. > >>> > >>> When virtio PMD is running, offset will be set to buf_iova. > >>> vq->offset =3D offsetof(struct rte_mbuf, buf_iova); > >>> > >>> When virtio_user PMD is running, offset will be set to buf_addr. > >>> vq->offset =3D offsetof(struct rte_mbuf, buf_addr); > >> > >> Ok, but below is a build time check: > >> > >> +#ifdef RTE_VIRTIO_USER > >> + __m128i flag_offset =3D _mm_set_epi64x(flags_temp, (uint64_t)vq- > >>> offset); > >> +#else > >> + __m128i flag_offset =3D _mm_set_epi64x(flags_temp, 0); > >> +#endif > >> > >> So how can it work for a single build for both Virtio and Virtio-user? > >> > > > > Sorry, here is an implementation error. vq->offset should be used in > descs_base for getting the iova address. > > It will work the same as VIRTIO_MBUF_ADDR macro. > > > >>>>> Virtio driver can check whether virtqueue is using vectorized path > when > >>>> initialization, will use padded structure if it is. > >>>>> I have added some tested code and now performance came back. > Since > >>>> code has changed in initialization process, it need some time for > >> regression > >>>> check. > >>>> > >>>> Ok, works for me. > >>>> > >>>> I am investigating a linkage issue with your series, which does not > >>>> happen systematically (see below, it happens also with clang). David > >>>> pointed me to some Intel patches removing the usage if __rte_weak, > >>>> could it be related? > >>>> > >>> > >>> I checked David's patch, it only changed i40e driver. Meanwhile > attribute > >> __rte_weak should still be in virtio_rxtx.c. > >>> I will follow David's patch, eliminate the usage of weak attribute. > >> > >> Yeah, I meant below issue could be linked to __rte_weak, not that i40e > >> patch was the cause of this problem. > >> > > > > Maxime, > > I haven't seen any build issue related to __rte_weak both with gcc and > clang. >=20 > Note that this build (which does not fail systematically) is when using > binutils 2.30, which cause AVX512 support to be disabled. >=20 Just change to binutils 2.30, AVX512 code will be skipped as expected in m= eson build.=20 Could you please supply more information, I will try to reproduce it. > > Thanks, > > Marvin > >