From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f176.google.com (mail-qk0-f176.google.com [209.85.220.176]) by dpdk.org (Postfix) with ESMTP id A0DB96936 for ; Mon, 10 Oct 2016 08:57:47 +0200 (CEST) Received: by mail-qk0-f176.google.com with SMTP id f128so73152840qkb.1 for ; Sun, 09 Oct 2016 23:57:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tJZWnoQ+7iaIdOowVkg1k8tz7VFK3fsnXFd+3A838Es=; b=MMH4NTXrjqS8XZFd//mFvPj9eWOSog4mxIfW6901b+RJqCSQY5DwPbJLULjvlYV92h u3MOfvtWcSF9ofPlIFC1jJw9epBTH01TekFHr0rlwDi8jf0o+nG4WKaKuYyQ/dro4lQP UpOUlVothvHXvnBweIQW9SIxkfj9FqGfNqqYA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tJZWnoQ+7iaIdOowVkg1k8tz7VFK3fsnXFd+3A838Es=; b=Kbh2nWEwOemqGbNYvtJuXpGrUXv9eL2rrwTTAJehQ+aCVZ+Eb6hgs1OM0dgC9kGnBX 9wCZRBhScbcG6I7bhoK9oSf/zfQKkfk4xpLxmSzQCE1/bakjZ45QDj1c1/lPE4YSzZ+k Cz3n2T/wOihG+xROrvw85etwIWQPtzoQeR76PGSqv5KOhtLiOfy3ovPRhjtYlqXpRxTL Epw1D0sChLv7sR0H2DEgMbRsmUotQ14rmnF496NA7n2dwAs32H8b6u5DulI6n51UbMBx 9QXfuZ4HTd9fvn75qjmTzxK2Rh4cXOqxrPKQ0QmbUsHiYG9j0YroqC9Eqn+o0yKvfksj y18A== X-Gm-Message-State: AA6/9RnL4lmOL8JsyYv8JkTIXHKP44DrfafiTaj8flmym+v/46lmOa6ESZ4VIXpIegQL033iz87e8g/De67XslS/ X-Received: by 10.194.97.69 with SMTP id dy5mr26407304wjb.53.1476082667010; Sun, 09 Oct 2016 23:57:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.172.121 with HTTP; Sun, 9 Oct 2016 23:57:46 -0700 (PDT) In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7BC1BD@SHSMSX103.ccr.corp.intel.com> References: <8F6C2BD409508844A0EFC19955BE09414E7B5581@SHSMSX103.ccr.corp.intel.com> <20160922022903.GJ23158@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B5DAE@SHSMSX103.ccr.corp.intel.com> <20160927102123.GL25823@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B7C0B@SHSMSX103.ccr.corp.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com> <20161010024428.GT1597@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7BC1BD@SHSMSX103.ccr.corp.intel.com> From: Jianbo Liu Date: Mon, 10 Oct 2016 14:57:46 +0800 Message-ID: To: "Wang, Zhihong" Cc: Yuanhan Liu , Maxime Coquelin , "dev@dpdk.org" Content-Type: text/plain; charset=UTF-8 Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Oct 2016 06:57:48 -0000 On 10 October 2016 at 14:22, Wang, Zhihong wrote: > > >> -----Original Message----- >> From: Jianbo Liu [mailto:jianbo.liu@linaro.org] >> Sent: Monday, October 10, 2016 1:32 PM >> To: Yuanhan Liu >> Cc: Wang, Zhihong ; Maxime Coquelin >> ; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue >> >> On 10 October 2016 at 10:44, Yuanhan Liu >> wrote: >> > On Sun, Oct 09, 2016 at 12:09:07PM +0000, Wang, Zhihong wrote: >> >> > > > Tested with testpmd, host: txonly, guest: rxonly >> >> > > > size (bytes) improvement (%) >> >> > > > 64 4.12 >> >> > > > 128 6 >> >> > > > 256 2.65 >> >> > > > 512 -1.12 >> >> > > > 1024 -7.02 >> >> > > >> >> > > There is a difference between Zhihong's code and the old I spotted in >> >> > > the first time: Zhihong removed the avail_idx prefetch. I understand >> >> > > the prefetch becomes a bit tricky when mrg-rx code path is >> considered; >> >> > > thus, I didn't comment on that. >> >> > > >> >> > > That's one of the difference that, IMO, could drop a regression. I then >> >> > > finally got a chance to add it back. >> >> > > >> >> > > A rough test shows it improves the performance of 1400B packet size >> >> > greatly >> >> > > in the "txonly in host and rxonly in guest" case: +33% is the number I >> get >> >> > > with my test server (Ivybridge). >> >> > >> >> > Thanks Yuanhan! I'll validate this on x86. >> >> >> >> Hi Yuanhan, >> >> >> >> Seems your code doesn't perform correctly. I write a new version >> >> of avail idx prefetch but didn't see any perf benefit. >> >> >> >> To be honest I doubt the benefit of this idea. The previous mrg_off >> >> code has this method but doesn't give any benefits. >> > >> > Good point. I thought of that before, too. But you know that I made it >> > in rush, that I didn't think further and test more. >> > >> > I looked the code a bit closer this time, and spotted a bug: the prefetch >> > actually didn't happen, due to following code piece: >> > >> > if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) { >> > prefetch_avail_idx(vq); >> > ... >> > } >> > >> > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path, >> > prefetch_avail_idx() will be called. The fix is easy though: just put >> > prefetch_avail_idx before invoking enqueue_packet. >> > >> > In summary, Zhihong is right, I see no more gains with that fix :( >> > >> > However, as stated, that's kind of the only difference I found between >> > yours and the old code, that maybe it's still worthwhile to have a >> > test on ARM, Jianbo? >> > >> I haven't tested it, but I think it could be no improvement for ARM either. >> >> A smalll suggestion for enqueue_packet: >> >> ..... >> + /* start copy from mbuf to desc */ >> + while (mbuf_avail || mbuf->next) { >> ..... >> >> Considering pkt_len is in the first cache line (same as data_len), >> while next pointer is in the second cache line, >> is it better to check the total packet len, instead of the last mbuf's >> next pointer to jump out of while loop and avoid possible cache miss? > > Jianbo, > > Thanks for the reply! > > This idea sounds good, but it won't help the general perf in my > opinion, since the 2nd cache line is accessed anyway prior in > virtio_enqueue_offload. > Yes, you are right. I'm thinking of prefetching beforehand. And if it's a chained mbuf, virtio_enqueue_offload will not be called in next loop. > Also this would bring a NULL check when actually access mbuf->next. > > BTW, could you please publish the number of: > > 1. mrg_rxbuf=on, comparison between original and original + this patch > > 2. mrg_rxbuf=off, comparison between original and original + this patch > > So we can have a whole picture of how this patch impact on ARM platform. > I think you already have got many results in my previous emails. Sorry I can't test right now and busy with other things.