From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 786352C29; Mon, 20 Mar 2017 12:13:53 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8D16D6574D; Mon, 20 Mar 2017 11:13:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8D16D6574D Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=ktraynor@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 8D16D6574D Received: from ktraynor.remote.csb (unknown [10.36.118.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id A6E2E60461; Mon, 20 Mar 2017 11:13:50 +0000 (UTC) To: Maxime Coquelin , Yuanhan Liu References: <1489605049-18686-1-git-send-email-ktraynor@redhat.com> <20170316062122.GN18844@yliu-dev.sh.intel.com> <20170317054725.GC18844@yliu-dev.sh.intel.com> <9cd39232-5b26-30cd-c51d-c6ce11068bee@redhat.com> Cc: dev@dpdk.org, stable@dpdk.org From: Kevin Traynor Organization: Red Hat Message-ID: Date: Mon, 20 Mar 2017 11:13:49 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <9cd39232-5b26-30cd-c51d-c6ce11068bee@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 20 Mar 2017 11:13:53 +0000 (UTC) Subject: Re: [dpdk-stable] [PATCH] vhost: fix virtio_net cache sharing of broadcast_rarp X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Mar 2017 11:13:53 -0000 On 03/17/2017 10:01 AM, Maxime Coquelin wrote: > > > On 03/17/2017 06:47 AM, Yuanhan Liu wrote: >> On Thu, Mar 16, 2017 at 10:10:05AM +0000, Kevin Traynor wrote: >>> On 03/16/2017 06:21 AM, Yuanhan Liu wrote: >>>> On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote: >>>>> The virtio_net structure is used in both enqueue and dequeue >>>>> datapaths. >>>>> broadcast_rarp is checked with cmpset in the dequeue datapath >>>>> regardless >>>>> of whether descriptors are available or not. >>>>> >>>>> It is observed in some cases where dequeue and enqueue are >>>>> performed by >>>>> different cores and no packets are available on the dequeue datapath >>>>> (i.e. uni-directional traffic), the frequent checking of >>>>> broadcast_rarp >>>>> in dequeue causes performance degradation for the enqueue datapath. >>>>> >>>>> In OVS the issue can cause a uni-directional performance drop of up >>>>> to 15%. >>>>> >>>>> Fix that by moving broadcast_rarp to a different cache line in >>>>> virtio_net struct. >>>> >>>> Thanks, but I'm a bit confused. The drop looks like being caused by >>>> cache false sharing, but I don't see anything would lead to a false >>>> sharing. I mean, there is no write in the same cache line where the >>>> broadcast_rarp belongs. Or, the "volatile" type is the culprit here? >>>> >>> >>> Yes, the cmpset code uses cmpxchg and that performs a write regardless >>> of the result - it either writes the new value or back the old value. >> >> Oh, right, I missed this part! >> >>>> Talking about that, I had actually considered to turn "broadcast_rarp" >>>> to a simple "int" or "uint16_t" type, to make it more light weight. >>>> The reason I used atomic type is to exactly send one broadcast RARP >>>> packet once SEND_RARP request is recieved. Otherwise, we may send more >>>> than one RARP packet when MQ is invovled. But I think we don't have >>>> to be that accurate: it's tolerable when more RARP are sent. I saw 4 >>>> SEND_RARP requests (aka 4 RARP packets) in the last time I tried >>>> vhost-user live migration after all. I don't quite remember why >>>> it was 4 though. >>>> >>>> That said, I think it also would resolve the performance issue if you >>>> change "rte_atomic16_t" to "uint16_t", without moving the place? >>>> >>> >>> Yes, that should work fine, with the side effect you mentioned of >>> possibly some more rarps - no big deal. >>> >>> I tested another solution also - as it is unlikely we would need to send >>> the broadcast_rarp, you can first read and only do the cmpset if it is >>> likely to succeed. This resolved the issue too. >>> >>> --- a/lib/librte_vhost/virtio_net.c >>> +++ b/lib/librte_vhost/virtio_net.c >>> @@ -1057,7 +1057,8 @@ static inline bool __attribute__((always_inline)) >>> * >>> * Check user_send_rarp() for more information. >>> */ >>> - if (unlikely(rte_atomic16_cmpset((volatile uint16_t *) >>> + if (unlikely(rte_atomic16_read(&dev->broadcast_rarp) && >>> + rte_atomic16_cmpset((volatile uint16_t *) >>> &dev->broadcast_rarp.cnt, 1, >>> 0))) { >>> rarp_mbuf = rte_pktmbuf_alloc(mbuf_pool); >>> if (rarp_mbuf == NULL) { >> >> I'm okay with this one. It's simple and clean enough, that it could >> be picked to a stable release. Later, I'd like to send another patch >> to turn it to "uint16_t". Since it changes the behaviour a bit, it >> is not a good candidate for stable release. >> >> BTW, would you please include the root cause (false sharing) into >> your commit log? > And maybe also adds the info to the comment just above? > I will help people wondering why we read before cmpset. > Sure, I will re-spin, do some testing and submit a v2. > Maxime