From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 94BDC5A3E for ; Mon, 30 Mar 2015 17:56:30 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP; 30 Mar 2015 08:56:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,494,1422950400"; d="scan'208";a="474479406" Received: from pgsmsx108.gar.corp.intel.com ([10.221.44.103]) by FMSMGA003.fm.intel.com with ESMTP; 30 Mar 2015 08:56:28 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by PGSMSX108.gar.corp.intel.com (10.221.44.103) with Microsoft SMTP Server (TLS) id 14.3.224.2; Mon, 30 Mar 2015 23:56:18 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.24]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.71]) with mapi id 14.03.0224.002; Mon, 30 Mar 2015 23:56:18 +0800 From: "Xie, Huawei" To: Linhaifeng Thread-Topic: [dpdk-dev] [PATCH] cast used->idx to volatile Thread-Index: AQHQZWQ7XK9uXNwZJ06D/GCM+7Oz0A== Date: Mon, 30 Mar 2015 15:56:18 +0000 Message-ID: References: <1426925237-8312-1-git-send-email-haifeng.lin@huawei.com> <551005DD.4060508@huawei.com> <5510B709.5030907@huawei.com> <551115A9.8060106@huawei.com> <55191575.5020805@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2015 15:56:31 -0000 On 3/30/2015 5:21 PM, Linhaifeng wrote:=0A= >=0A= > On 2015/3/24 18:06, Xie, Huawei wrote:=0A= >> On 3/24/2015 3:44 PM, Linhaifeng wrote:=0A= >>> On 2015/3/24 9:53, Xie, Huawei wrote:=0A= >>>> On 3/24/2015 9:00 AM, Linhaifeng wrote:=0A= >>>>> On 2015/3/23 20:54, Xie, Huawei wrote:=0A= >>>>>>> -----Original Message-----=0A= >>>>>>> From: Linhaifeng [mailto:haifeng.lin@huawei.com]=0A= >>>>>>> Sent: Monday, March 23, 2015 8:24 PM=0A= >>>>>>> To: dev@dpdk.org=0A= >>>>>>> Cc: Ouyang, Changchun; Xie, Huawei=0A= >>>>>>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile=0A= >>>>>>>=0A= >>>>>>>=0A= >>>>>>>=0A= >>>>>>> On 2015/3/21 16:07, linhaifeng wrote:=0A= >>>>>>>> From: Linhaifeng =0A= >>>>>>>>=0A= >>>>>>>> Same as rte_vhost_enqueue_burst we should cast used->idx=0A= >>>>>>>> to volatile before notify guest.=0A= >>>>>>>>=0A= >>>>>>>> Signed-off-by: Linhaifeng =0A= >>>>>>>> ---=0A= >>>>>>>> lib/librte_vhost/vhost_rxtx.c | 2 +-=0A= >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)=0A= >>>>>>>>=0A= >>>>>>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhos= t_rxtx.c=0A= >>>>>>>> index 535c7a1..8d674d1 100644=0A= >>>>>>>> --- a/lib/librte_vhost/vhost_rxtx.c=0A= >>>>>>>> +++ b/lib/librte_vhost/vhost_rxtx.c=0A= >>>>>>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev= ,=0A= >>>>>>> uint16_t queue_id,=0A= >>>>>>>> }=0A= >>>>>>>>=0A= >>>>>>>> rte_compiler_barrier();=0A= >>>>>>>> - vq->used->idx +=3D entry_success;=0A= >>>>>>>> + *(volatile uint16_t *)&vq->used->idx +=3D entry_success;=0A= >>>>>> Haifeng:=0A= >>>>>> We have compiler barrier before and an external function call behind= , so we don't need volatile here.=0A= >>>>>> Do you meet issue?=0A= >>>>>>=0A= >>>>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thoug= ht there are no buffers in tx_q and virtio_net=0A= >>>>> though vhost haven't handle all packets so we have to restart VM to r= estore work.=0A= >>>>>=0A= >>>>> The status in VM is:=0A= >>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_p= oll=0A= >>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_b= uf=0A= >>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num= =3D239=0A= >>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: svq:avail= ->idx=3D52939 used->idx=3D52939 num_free=3D18 num_added=3D0 svq->last_used_= idx=3D52820=0A= >>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: rvq:avail= ->idx=3D36215 used->idx=3D35977 num_free=3D18 num_added=3D0 rvq->last_used_= idx=3D35977=0A= >>>>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue= _xmit, qdisc->flags=3D4, qdisc->state deactiveed=3D0=0A= >>>>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue= _xmit, txq->state=3D1, stopped=3D1=0A= >>>>>=0A= >>>>> Why compiler barrier not take effect in our case? Is compiler barrier= depended on -O3 option? We use -O2 option.=0A= >>>> compiler barrier always works regardless of the optimization option.= =0A= >>>> I don't get your story, but the key thing is, do you check the asm cod= e?=0A= >>>> If called from outside as an API, how is it possible it is optimized?= =0A= >>>> there is only one update to used->idx in that function.=0A= >>> Do you mean rte_vhost_enqueue_burst also not need cast used->idx to vol= atile ? Why not remove it?=0A= >> I checked the code. Seems we can remove. That is another issue.=0A= >> For your issue, you meet problem, and submit this this patch, but i am a= =0A= >> bit confused it is the root cause. Do you check the asm code that=0A= >> volatile is optimized?=0A= >>=0A= > I had wrote a demo try to find out the different between rte_compiler_bar= rier and volatile.=0A= > It seems no any effect on rte_compiler_barrier().=0A= =0A= Haifeng:=0A= =0A= I think it doesn't make too much sense to use volatile for local variables.= =0A= =0A= In our rte_vhost_dequeue_burst, there is one memory write to the=0A= used->idx, and there is compiler barrier to keep the order.=0A= Besides, as an API, how could that memory write to be optimized as=0A= register access?=0A= =0A= Even if you call rte_vhost_dequeue_burst in the same src file, which=0A= means in the same translation unit, there is function call after which=0A= has side effect, it still couldn't be optimized.=0A= =0A= Anyway, could we directly check the asm code of rte_vhost_dequeue_burst=0A= to see whether it is optimized?=0A= =0A= -huawei=0A= >=0A= > -------->test1: without rte_compiler_barrier and volatile=0A= >=0A= > #include =0A= >=0A= > int main()=0A= > {=0A= > int i,j;=0A= >=0A= > *(int*)&i =3D 2;=0A= > *(int*)&j =3D 3;=0A= > printf("i=3D%d j=3D%d", i, j);=0A= > }=0A= > linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I /usr/include/dpdk-1.7= .0/x86_64-native-linuxapp-gcc/include/ -O3=0A= > linux-LOubNs:/mnt/sdc/linhf/test # cat test.s |grep main -B 10=0A= > .file "test.c"=0A= > .section .rodata.str1.1,"aMS",@progbits,1=0A= > .LC0:=0A= > .string "i=3D%d j=3D%d"=0A= > .text=0A= > .p2align 4,,15=0A= > .globl main=0A= > .type main, @function=0A= > main:=0A= > .LFB571:=0A= > movl $3, %edx=0A= > movl $2, %esi=0A= > movl $.LC0, %edi=0A= > xorl %eax, %eax=0A= > jmp printf=0A= > .LFE571:=0A= > .size main, .-main=0A= > =0A= > =0A= > -------->test2: use rte_compiler_barrier=0A= > note: the asm code same as test1=0A= > =0A= > linux-LOubNs:/mnt/sdc/linhf/test # cat test.c=0A= > #include =0A= > #include =0A= >=0A= > int main()=0A= > {=0A= > int i,j;=0A= >=0A= > *(int*)&i =3D 2;=0A= > rte_compiler_barrier();=0A= > *(int*)&j =3D 3;=0A= > printf("i=3D%d j=3D%d", i, j);=0A= > }=0A= > linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I /usr/include/dpdk-1.7= .0/x86_64-native-linuxapp-gcc/include/ -O3=0A= > linux-LOubNs:/mnt/sdc/linhf/test # cat test.s |grep main -B 10=0A= > .file "test.c"=0A= > .section .rodata.str1.1,"aMS",@progbits,1=0A= > .LC0:=0A= > .string "i=3D%d j=3D%d"=0A= > .text=0A= > .p2align 4,,15=0A= > .globl main=0A= > .type main, @function=0A= > main:=0A= > .LFB571:=0A= > movl $3, %edx=0A= > movl $2, %esi=0A= > movl $.LC0, %edi=0A= > xorl %eax, %eax=0A= > jmp printf=0A= > .LFE571:=0A= > .size main, .-main=0A= >=0A= > =0A= > -------->test3: use volatile=0A= > =0A= > linux-LOubNs:/mnt/sdc/linhf/test # cat test.c=0A= > #include =0A= > #include =0A= >=0A= > int main()=0A= > {=0A= > int i,j;=0A= >=0A= > *(volatile int*)&i =3D 2;=0A= > *(volatile int*)&j =3D 3;=0A= > printf("i=3D%d j=3D%d", i, j);=0A= > }=0A= > linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I /usr/include/dpdk-1.7= .0/x86_64-native-linuxapp-gcc/include/ -O3=0A= > linux-LOubNs:/mnt/sdc/linhf/test # cat test.s |grep main -B 10=0A= > .file "test.c"=0A= > .section .rodata.str1.1,"aMS",@progbits,1=0A= > .LC0:=0A= > .string "i=3D%d j=3D%d"=0A= > .text=0A= > .p2align 4,,15=0A= > .globl main=0A= > .type main, @function=0A= > main:=0A= > .LFB571:=0A= > movl $2, -4(%rsp)=0A= > movl $3, -8(%rsp)=0A= > movl $.LC0, %edi=0A= > movl -8(%rsp), %edx=0A= > movl -4(%rsp), %esi=0A= > xorl %eax, %eax=0A= > jmp printf=0A= > .LFE571:=0A= > .size main, .-main=0A= >=0A= >>>>>>>> /* Kick guest if required. */=0A= >>>>>>>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))=0A= >>>>>>>> eventfd_write((int)vq->callfd, 1);=0A= >>>>>>>>=0A= >>>>>>> --=0A= >>>>>>> Regards,=0A= >>>>>>> Haifeng=0A= >>>> .=0A= >>>>=0A= >>=0A= >> .=0A= >>=0A= >=0A= >=0A= =0A=