From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) by dpdk.org (Postfix) with ESMTP id F1CA75597 for ; Mon, 10 Oct 2016 07:31:49 +0200 (CEST) Received: by mail-lf0-f49.google.com with SMTP id x79so107344327lff.0 for ; Sun, 09 Oct 2016 22:31:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wSWHbHQ7tq8zDUzuwmISQnQSrgDBXAKAoUHRsvXTOP8=; b=Xhc5txibhsPMwhmm2Ke8c0BHzX7uJkv4ak5Gk2ZGFJnQx/36mXKPqGMReW58/VAnl3 tBhKMe4RTR6cCH2kHtM3BYOKdlEHUZCey0D7JEWTHXJUQZ+uvYQ1xRUgnaeJBJVhVdaB vSn+Bo8PSqO2Q45gBkjfaiUvft9q8za2Y9cVY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wSWHbHQ7tq8zDUzuwmISQnQSrgDBXAKAoUHRsvXTOP8=; b=gScFPtCSoSCKnwx+4JO4ldws/8su7PU9xGmEufkFCQ5G/l9zZs1GmF0qEXaXCnWhAK 6D9szKSH+N915iVXaH1aIW1G5q3mYLdMviTJCYxrAQpk1hYsOHL0HcHpKBWVLV64LBfF 2BNNVBiBYbUCg0ZQlr16K12N3IkRRslRS5sNkvfv9c26gI3LDsLukWQbQPSwycpkkUO7 xK6Ms7N+TyqkOLvOhoU6jrdNpLBIDddqXW2spUOHylvje3M3rB+MUjxOoFedHF/qk88x tGxx0Ur4p/XS5EVdOxBRw7QJeXEFiHwkqBnlyRPu+NE3B+vZUlw+tXRI157LrzhpolTC ZPxA== X-Gm-Message-State: AA6/9Rm34THzEooQkfHag/poWWFXwukIBBtBgY4s/HJ6L2jF1hKwQ8Q0gt5/VBKwD3AMWJ8sH+vDpCdA6w0nkKZF X-Received: by 10.194.97.69 with SMTP id dy5mr25968521wjb.53.1476077509501; Sun, 09 Oct 2016 22:31:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.172.121 with HTTP; Sun, 9 Oct 2016 22:31:48 -0700 (PDT) In-Reply-To: <20161010024428.GT1597@yliu-dev.sh.intel.com> References: <8F6C2BD409508844A0EFC19955BE09414E7B5581@SHSMSX103.ccr.corp.intel.com> <20160922022903.GJ23158@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B5DAE@SHSMSX103.ccr.corp.intel.com> <20160927102123.GL25823@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B7C0B@SHSMSX103.ccr.corp.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com> <20161010024428.GT1597@yliu-dev.sh.intel.com> From: Jianbo Liu Date: Mon, 10 Oct 2016 13:31:48 +0800 Message-ID: To: Yuanhan Liu Cc: "Wang, Zhihong" , Maxime Coquelin , "dev@dpdk.org" Content-Type: text/plain; charset=UTF-8 Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Oct 2016 05:31:50 -0000 On 10 October 2016 at 10:44, Yuanhan Liu wrote: > On Sun, Oct 09, 2016 at 12:09:07PM +0000, Wang, Zhihong wrote: >> > > > Tested with testpmd, host: txonly, guest: rxonly >> > > > size (bytes) improvement (%) >> > > > 64 4.12 >> > > > 128 6 >> > > > 256 2.65 >> > > > 512 -1.12 >> > > > 1024 -7.02 >> > > >> > > There is a difference between Zhihong's code and the old I spotted in >> > > the first time: Zhihong removed the avail_idx prefetch. I understand >> > > the prefetch becomes a bit tricky when mrg-rx code path is considered; >> > > thus, I didn't comment on that. >> > > >> > > That's one of the difference that, IMO, could drop a regression. I then >> > > finally got a chance to add it back. >> > > >> > > A rough test shows it improves the performance of 1400B packet size >> > greatly >> > > in the "txonly in host and rxonly in guest" case: +33% is the number I get >> > > with my test server (Ivybridge). >> > >> > Thanks Yuanhan! I'll validate this on x86. >> >> Hi Yuanhan, >> >> Seems your code doesn't perform correctly. I write a new version >> of avail idx prefetch but didn't see any perf benefit. >> >> To be honest I doubt the benefit of this idea. The previous mrg_off >> code has this method but doesn't give any benefits. > > Good point. I thought of that before, too. But you know that I made it > in rush, that I didn't think further and test more. > > I looked the code a bit closer this time, and spotted a bug: the prefetch > actually didn't happen, due to following code piece: > > if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) { > prefetch_avail_idx(vq); > ... > } > > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path, > prefetch_avail_idx() will be called. The fix is easy though: just put > prefetch_avail_idx before invoking enqueue_packet. > > In summary, Zhihong is right, I see no more gains with that fix :( > > However, as stated, that's kind of the only difference I found between > yours and the old code, that maybe it's still worthwhile to have a > test on ARM, Jianbo? > I haven't tested it, but I think it could be no improvement for ARM either. A smalll suggestion for enqueue_packet: ..... + /* start copy from mbuf to desc */ + while (mbuf_avail || mbuf->next) { ..... Considering pkt_len is in the first cache line (same as data_len), while next pointer is in the second cache line, is it better to check the total packet len, instead of the last mbuf's next pointer to jump out of while loop and avoid possible cache miss?