From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f180.google.com (mail-pf0-f180.google.com [209.85.192.180]) by dpdk.org (Postfix) with ESMTP id 965E8199C8 for ; Fri, 20 Oct 2017 03:58:08 +0200 (CEST) Received: by mail-pf0-f180.google.com with SMTP id b6so8718479pfh.7 for ; Thu, 19 Oct 2017 18:58:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=F5b3BwseqIUPw1GTZPKsXAaYkJfxON2hSr6lHjBt3Lk=; b=rMN2lk7uSLC6h/IW+Wvf0omaDcj0wvVAa3inji1guZKdm5JqBTHMd5CsDwyDUfNYDP Yj4gVovghf9ToBqz6ncIlniYL3y8hJoGAkrqm54a2RaIg/ri/jPufAkC55amk4DXdTna g9PtNL5VwPN2ih8cYtZUun3BmE+r9pejtHI5qUvsWrnkLNIvQ/QVbLyEl0DZ5Tdpm/wB hsC+k5Qe9vqDOcKH+4cF3vV7gFIM6l57YKpFsDv6tILnBFmYBtaNirbOW3mbHVt1H0oB KVZK5JM3wM7HhrjRCxaUPV0lLzUrs53dSLMDOieIsAWCxSdPWZednNcbWVdPIkjV1bfS NGig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=F5b3BwseqIUPw1GTZPKsXAaYkJfxON2hSr6lHjBt3Lk=; b=pVjCalO0s226o7Z73JazV0qIJGOW30FXSyNBvJsU9OrC5jw6GSx4Ff4DGKaFOi4kHb C3leICR48a6MLlVhp4YSs5UfMjivTA72OH9bsseJ7YRTp2Wd73xjDqNO6X19I54XbnLO UysBBsPfIX1ND8XE4+cxZE7SyWSL9HAwB31tvxRAw3XfPQYETm8RYhYR1vfKbNFa41Gl dAg2l89FU3ntbNJByvN+4TIXvinVqjCVGZWtURLTbDWI55lifWu2avaxnYAgucxU7Pmd +Qw0HXsxBTprmtnhOX6bJOR4JOTnkcfLNutiGRQpDUGnBHtiKtXGCAKajNqrDdJmy9J6 sQSQ== X-Gm-Message-State: AMCzsaVGsWH68QFOM0En8Af+tf2Sj01eOOz7UGVnFwEGlBfiHGYR58Q0 RDNczuo1z/ACzHGnChtqjwk= X-Google-Smtp-Source: ABhQp+RbvAow0yfLVSCXwFBvAzBkzFXrzT/LiJAJrqBn50iIBIE4ogynd9qkq7JDOCsFcqO2gV4cbA== X-Received: by 10.99.188.9 with SMTP id q9mr3026379pge.104.1508464687496; Thu, 19 Oct 2017 18:58:07 -0700 (PDT) Received: from [0.0.0.0] (67.209.179.165.16clouds.com. [67.209.179.165]) by smtp.gmail.com with ESMTPSA id b9sm6982622pff.48.2017.10.19.18.58.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 19 Oct 2017 18:58:06 -0700 (PDT) To: "Ananyev, Konstantin" , "Zhao, Bing" , Jerin Jacob Cc: Olivier MATZ , "dev@dpdk.org" , "jia.he@hxt-semitech.com" , "jie2.liu@hxt-semitech.com" , "bing.zhao@hxt-semitech.com" , "Richardson, Bruce" References: <20171010095636.4507-1-hejianet@gmail.com> <20171012155350.j34ddtivxzd27pag@platinum> <2601191342CEEE43887BDE71AB9772585FAA859F@IRSMSX103.ger.corp.intel.com> <20171012172311.GA8524@jerin> <2601191342CEEE43887BDE71AB9772585FAAB171@IRSMSX103.ger.corp.intel.com> <8806e2bd-c57b-03ff-a315-0a311690f1d9@163.com> <2601191342CEEE43887BDE71AB9772585FAAB404@IRSMSX103.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772585FAAB570@IRSMSX103.ger.corp.intel.com> From: Jia He Message-ID: <3e580cd7-2854-d855-be9c-7c4ce06e3ed5@gmail.com> Date: Fri, 20 Oct 2017 09:57:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAAB570@IRSMSX103.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Oct 2017 01:58:08 -0000 On 10/20/2017 4:02 AM, Ananyev, Konstantin Wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin >> Sent: Thursday, October 19, 2017 3:15 PM >> To: Zhao, Bing ; Jia He ; Jerin Jacob >> Cc: Olivier MATZ ; dev@dpdk.org; jia.he@hxt-semitech.com; jie2.liu@hxt-semitech.com; bing.zhao@hxt- >> semitech.com >> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue >> >>> Hi, >>> >>> On 2017/10/19 18:02, Ananyev, Konstantin wrote: >>>> Hi Jia, >>>> >>>>> Hi >>>>> >>>>> >>>>> On 10/13/2017 9:02 AM, Jia He Wrote: >>>>>> Hi Jerin >>>>>> >>>>>> >>>>>> On 10/13/2017 1:23 AM, Jerin Jacob Wrote: >>>>>>> -----Original Message----- >>>>>>>> Date: Thu, 12 Oct 2017 17:05:50 +0000 >>>>>>>> >>>>> [...] >>>>>>> On the same lines, >>>>>>> >>>>>>> Jia He, jie2.liu, bing.zhao, >>>>>>> >>>>>>> Is this patch based on code review or do you saw this issue on any of >>>>>>> the >>>>>>> arm/ppc target? arm64 will have performance impact with this change. >>>>> sorry, miss one important information >>>>> Our platform is an aarch64 server with 46 cpus. >>>>> If we reduced the involved cpu numbers, the bug occurred less frequently. >>>>> >>>>> Yes, mb barrier impact the performance, but correctness is more >>>>> important, isn't it ;-) >>>>> Maybe we can  find any other lightweight barrier here? >>>>> >>>>> Cheers, >>>>> Jia >>>>>> Based on mbuf_autotest, the rte_panic will be invoked in seconds. >>>>>> >>>>>> PANIC in test_refcnt_iter(): >>>>>> (lcore=0, iter=0): after 10s only 61 of 64 mbufs left free >>>>>> 1: [./test(rte_dump_stack+0x38) [0x58d868]] >>>>>> Aborted (core dumped) >>>>>> >>>> So is it only reproducible with mbuf refcnt test? >>>> Could it be reproduced with some 'pure' ring test >>>> (no mempools/mbufs refcnt, etc.)? >>>> The reason I am asking - in that test we also have mbuf refcnt updates >>>> (that's what for that test was created) and we are doing some optimizations here too >>>> to avoid excessive atomic updates. >>>> BTW, if the problem is not reproducible without mbuf refcnt, >>>> can I suggest to extend the test with: >>>> - add a check that enqueue() operation was successful >>>> - walk through the pool and check/printf refcnt of each mbuf. >>>> Hopefully that would give us some extra information what is going wrong here. >>>> Konstantin >>>> >>>> >>> Currently, the issue is only found in this case here on the ARM >>> platform, not sure how it is going with the X86_64 platform >> I understand that it is only reproducible on arm so far. >> What I am asking - with dpdk is there any other way to reproduce it (on arm) >> except then running mbuf_autotest? >> Something really simple that not using mbuf/mempool etc? >> Just do dequeue/enqueue from multiple threads and check data integrity at the end? >> If not - what makes you think that the problem is precisely in rte_ring code? >> Why not in rte_mbuf let say? > Actually I reread your original mail and finally get your point. > If I understand you correctly the problem with read reordering here is that > after we read prot.tail but before we read cons.head > both cons.head and prod.tail might be updated, > but for us prod.tail change might be unnoticed. > As an example: > time 0 (cons.head == 0, prod.tail == 0): > prod_tail = r->prod.tail; /* due read reordering */ > /* prod_tail == 0 */ > > time 1 (cons.head ==5, prod.tail == 5): > *old_head = r->cons.head; > /* cons.head == 5 */ > *entries = (prod_tail - *old_head); > /* *entries == (0 - 5) == 0xFFFFFFFB */ > > And we will move our cons.head forward, even though there are no filled entries in the ring. > Is that right? Yes > As I side notice, I wonder why we have here: > *entries = (prod_tail - *old_head); > instead of: > *entries = r->size + prod_tail - *old_head; > ? Yes, I agree with you at this code line. But reordering will still mess up things even after this change(I have tested, still the same as before) I thought the *entries is a door to prevent consumer from moving forward too fast than the producer. But in some cases, it is correct that prod_tail is smaller than *old_head due to  the cirular queue. In other cases, it is incorrect because of read/read reordering. AFAICT, the root cause here is the producer tail and cosumer head are dependant on each other. Thus a memory barrier is neccessary. Cheers, Jia > > Konstantin > >>> . In another >>> mail of this thread, we've made a simple test based on this and captured >>> some information and I pasted there.(I pasted the patch there :-)) >> Are you talking about that one: >> http://dpdk.org/dev/patchwork/patch/30405/ >> ? >> It still uses test/test/test_mbuf.c..., >> but anyway I don't really understand how mbuf_autotest supposed >> to work with these changes: >> @@ -730,7 +739,7 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter, >> rte_ring_enqueue(refcnt_mbuf_ring, m); >> } >> } >> - rte_pktmbuf_free(m); >> + // rte_pktmbuf_free(m); >> } >> @@ -741,6 +750,12 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter, >> while (!rte_ring_empty(refcnt_mbuf_ring)) >> ; >> >> + if (NULL != m) { >> + if (1 != rte_mbuf_refcnt_read(m)) >> + printf("m ref is %u\n", rte_mbuf_refcnt_read(m)); >> + rte_pktmbuf_free(m); >> + } >> + >> /* check that all mbufs are back into mempool by now */ >> for (wn = 0; wn != REFCNT_MAX_TIMEOUT; wn++) { >> if ((i = rte_mempool_avail_count(refcnt_pool)) == n) { >> >> That means all your mbufs (except the last one) will still be allocated. >> So the test would fail - as it should, I think. >> >>> And >>> it seems that Juhamatti & Jacod found some reverting action several >>> months ago. >> Didn't get that one either. >> Konstantin