From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout4.w1.samsung.com (mailout4.w1.samsung.com [210.118.77.14]) by dpdk.org (Postfix) with ESMTP id BB7956CAD for ; Wed, 6 Jul 2016 13:19:15 +0200 (CEST) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O9W00ARJ4S2TK60@mailout4.w1.samsung.com> for dev@dpdk.org; Wed, 06 Jul 2016 12:19:14 +0100 (BST) X-AuditID: cbfec7f5-f792a6d000001302-a1-577ce9318821 Received: from eusync1.samsung.com ( [203.254.199.211]) by eucpsbgm2.samsung.com (EUCPMTA) with SMTP id 88.5A.04866.139EC775; Wed, 6 Jul 2016 12:19:13 +0100 (BST) Received: from [106.109.129.180] by eusync1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O9W007J54S02Z40@eusync1.samsung.com>; Wed, 06 Jul 2016 12:19:13 +0100 (BST) To: Yuanhan Liu References: <1463748604-27251-1-git-send-email-i.maximets@samsung.com> <20160701073506.GQ2831@yliu-dev.sh.intel.com> Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Jianfeng Tan From: Ilya Maximets Message-id: <577CE930.2070007@samsung.com> Date: Wed, 06 Jul 2016 14:19:12 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-version: 1.0 In-reply-to: <20160701073506.GQ2831@yliu-dev.sh.intel.com> Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsVy+t/xy7qGL2vCDb40GFq8+7SdyWLa59vs Fu0zzzJZXGn/yW7RPfsLm8Xk2VIW1ydcYHVg9/i1YCmrx+I9L5k85p0M9OjbsooxgCWKyyYl NSezLLVI3y6BK+PR/bnMBd9UKh7M/8DWwLhHpouRk0NCwETiX9t5ZghbTOLCvfVsXYxcHEIC Sxkl/lyezQzhvGCUOD15B1iVsICzxISzfYwgtoiArsTTOetYQWwhgTKJOV8/sYA0MAusZpS4 +3oVC0iCTUBH4tTqI2ANvAJaEtNuzQFawcHBIqAq8f06F0hYVCBCYtb2H0wQJYISPybfA2vl FLCSOHdrJjtIObOAnsT9i1ogYWYBeYnNa94yT2AUmIWkYxZC1SwkVQsYmVcxiqaWJhcUJ6Xn GukVJ+YWl+al6yXn525ihIT11x2MS49ZHWIU4GBU4uGd8Lw6XIg1say4MvcQowQHs5IIr+zz mnAh3pTEyqrUovz4otKc1OJDjNIcLErivDN3vQ8REkhPLEnNTk0tSC2CyTJxcEo1MB7tuPWa c7K20Jate/tu+jsYOqayrPrWLG3EX8QWuSzFUmnqZ9mAilquYxP+3TedtGa2sWLk/X3S6cs2 SK958S/0+Ia9j3M2Wy0q2bk/YdYno42BB1IuWqRr5gRY7jv6OmbOg7xDso5WsnuNFjQkKUhw H9pX9vjC2cWSYv3TFxn+l7n/U8Cfb6YSS3FGoqEWc1FxIgDIfK31ZwIAAA== Subject: Re: [dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 11:19:15 -0000 On 01.07.2016 10:35, Yuanhan Liu wrote: > Hi, > > Sorry for the long delay. > > On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: >> In current implementation guest application can reinitialize vrings >> by executing start after stop. In the same time host application >> can still poll virtqueue while device stopped in guest and it will >> crash with segmentation fault while vring reinitialization because >> of dereferencing of bad descriptor addresses. > > Yes, you are right that vring will be reinitialized after restart. > But even though, I don't see the reason it will cause a vhost crash, > since the reinitialization will reset all the vring memeory by 0: > > memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); > > That means those bad descriptors will be skipped, safely, at vhost > side by: > > if (unlikely(desc->len < dev->vhost_hlen)) > return -1; > >> >> OVS crash for example: >> <------------------------------------------------------------------------> >> [test-pmd inside guest VM] >> >> testpmd> port stop all >> Stopping ports... >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> testpmd> port config all rxq 2 >> testpmd> port config all txq 2 >> testpmd> port start all >> Configuring Port 0 (socket 0) >> Port 0: 52:54:00:CB:44:C8 >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> >> [OVS on host] >> Program received signal SIGSEGV, Segmentation fault. >> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h > > Interesting, so it bypasses the above check since desc->len is non-zero > while desc->addr is zero. The size (2056) also looks weird. > > Do you mind to check this issue a bit deeper, say why desc->addr is > zero, however, desc->len is not? OK. I checked this few more times. Actually, I see, that desc->addr is not zero. All desc memory looks like some rubbish: <------------------------------------------------------------------------------> (gdb) #3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363, m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0) desc = 0x2aabc00ff530 desc_addr = 0 mbuf_offset = 0 prev = 0x7fe9db269400 nr_desc = 1 desc_offset = 12 cur = 0x7fe9db269400 hdr = 0x0 desc_avail = 1012591375 mbuf_avail = 1526 cpy_len = 1526 (gdb) p *desc $2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516} <------------------------------------------------------------------------------> And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge address to host's. Scenario was the same. SIGSEGV received right after 'port start all'. Another thought: Actually, there is a race window between 'memset' in guest and reading of 'desc->len' and 'desc->addr' on host. So, it's possible to read non zero 'len' and zero 'addr' right after that. But you're right, this case should be very rare. > >> (gdb) bt >> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >> #1 copy_desc_to_mbuf >> #2 rte_vhost_dequeue_burst >> #3 netdev_dpdk_vhost_rxq_recv >> ... >> >> (gdb) bt full >> #0 rte_memcpy >> ... >> #1 copy_desc_to_mbuf >> desc_addr = 0 >> mbuf_offset = 0 >> desc_offset = 12 >> ... >> <------------------------------------------------------------------------> >> >> Fix that by checking addresses of descriptors before using them. >> >> Note: For mergeable buffers this patch checks only guest's address for >> zero, but in non-meargeable case host's address checked. This is done >> because checking of host's address in mergeable case requires additional >> refactoring to keep virtqueue in consistent state in case of error. >> >> Signed-off-by: Ilya Maximets >> --- >> >> Actually, current virtio implementation looks broken for me. Because >> 'virtio_dev_start' breaks virtqueue while it still available from the vhost >> side. > > Yes, this sounds buggy. Maybe we could not reset the avail idx, in such > case vhost dequeue/enqueue will just return as there are no more packets > to dequeue and no more space to enqueue, respectively? Maybe this will be a good fix for virtio because vhost will not try to receive from wrong descriptors. But this will not help if vhost already tries to receive something in time of guest's reconfiguration. Best regards, Ilya Maximets.