From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by dpdk.org (Postfix) with ESMTP id 516241BA4B for ; Thu, 26 Oct 2017 14:12:31 +0200 (CEST) Received: by mail-wm0-f67.google.com with SMTP id b9so7889814wmh.0 for ; Thu, 26 Oct 2017 05:12:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=CPH5+Q0x5fCwVuDNhx0Y5WBozXTzlhPvlZd5gVVERcE=; b=sp5GU/LVaZioySagqxdBWpm+A4RLGCGCmkknbu7/kUQH2Sl/4nXOC7YQeLRYiZf1a3 cCYhPoGv5ubnnTYDIVITT1w8BEtPnHMcOvFT/u1WgeSE/7rggbfOVH81+YKXW3pVEEPK 6ow64Q/sg/R3TOIJeMzTY+hk2PShb0M+COOzbHwKtmgF1LPS877bZYkMMYz8+rBN4vFV hFh/ztfqO0pkenS3dEPMawFsw4E3vkCo0yJDcmVHpu4kRMRqgRztw6wF65lyxDU+to4O SeuXWwP1DOnJYM2+93qg4012ouAtXmKgEHbSdYq2HBhhlQOhzx1dpNb6gnGQ5eok6S5m 6BpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=CPH5+Q0x5fCwVuDNhx0Y5WBozXTzlhPvlZd5gVVERcE=; b=H6DYJh9hlFcoYv2iXIHwQ1uNWVG+pMHrAp5Pa7xbyoTepMau0ztRKMRuAQgyLgynKd 3OqcVra+iQCAHeJIg8UiqyyPYlZmK9IKnlb2B03GtsifL0i7gx2W+nnRgGKP45lAkPUb Ky1ayhBbLiCxw8JVB7fnkIDNXISsc/1OfVU5/qSOPZH/gWCrGcLnA1SrvHQDrxc+xUb0 8FUnP+wJW6wsFVpzK3zDrSdZ1FS7+LzmgmSAKFV/w2+NgOuBLtInAuE9VBehqULnk2+l QcKrvLF6h0zry7OQa15v5nb5vvWG0YB4QwzPyd0YpcDdWBPqzKc8aXaEVTL1cZOf4PF/ 6EkQ== X-Gm-Message-State: AMCzsaWaAsQ9TkG49qtzY9vuvlNdc61EviSZAKkM6WBtfAahSYlQSjLK lB/CJmY+qj9Q6sFpTUU01JCx8sEHsg== X-Google-Smtp-Source: ABhQp+T5rc4Jzsy4Ya5lw/pYYR+mROpP8kXz5Ldz3zB6XV9DdtNjHZ8oong/jMWsQJoaRTFxclkJ/A== X-Received: by 10.80.132.101 with SMTP id 92mr27641499edp.165.1509019950882; Thu, 26 Oct 2017 05:12:30 -0700 (PDT) Received: from laranjeiro-vm (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id h2sm3503625edc.89.2017.10.26.05.12.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Oct 2017 05:12:30 -0700 (PDT) Date: Thu, 26 Oct 2017 14:12:19 +0200 From: =?iso-8859-1?Q?N=E9lio?= Laranjeiro To: Matan Azrad Cc: Ophir Munk , Adrien Mazarguil , "dev@dpdk.org" , Thomas Monjalon , Olga Shern , Mordechay Haimovsky Message-ID: <20171026121219.ke3dz7hv4a5zfpih@laranjeiro-vm> References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-5-git-send-email-ophirmu@mellanox.com> <20171024135149.fyg4nzcbygo2amtz@laranjeiro-vm> <20171025075006.znxl7mezy4pfyzsj@laranjeiro-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Oct 2017 12:12:31 -0000 On Thu, Oct 26, 2017 at 10:31:06AM +0000, Matan Azrad wrote: > Hi Nelio > > I think the memory barrier discussion is not relevant for this patch > (if it will be relevant I will create new one). > Please see my comments inline. It was not my single comment. There is also useless code like having null segments in the packets which is not allowed on DPDK. > Regarding this specific patch, I didn't see any comment from you, Are > you agree with it? > > > -----Original Message----- > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > Sent: Wednesday, October 25, 2017 10:50 AM > > To: Ophir Munk > > Cc: Adrien Mazarguil ; dev@dpdk.org; > > Thomas Monjalon ; Olga Shern > > ; Matan Azrad > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions > > > > On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote: > > > Hi, > > > > > > On Tuesday, October 24, 2017 4:52 PM, Nélio Laranjeiro wrote: > > > > > > > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote: > > > > > From: Matan Azrad > > > > > > > > > > Merge tx_burst and mlx4_post_send functions to prevent double > > > > > asking about WQ remain space. > > > > > > > > > > This should improve performance. > > > > > > > > > > Signed-off-by: Matan Azrad > > > > > --- > > > > > drivers/net/mlx4/mlx4_rxtx.c | 353 > > > > > +++++++++++++++++++++---------------------- > > > > > 1 file changed, 170 insertions(+), 183 deletions(-) > > > > > > > > What are the real expectation you have on the remaining patches of > > > > the series? > > > > > > > > According to the comment of this commit log "This should improve > > > > performance" there are too many barriers at each packet/segment > > > > level to improve something. > > > > > > > > The point is, mlx4_burst_tx() should write all the WQE without any > > > > barrier as it is processing a burst of packets (whereas Verbs > > > > functions which may only process a single packet). > > > > > > > The lonely barrier which should be present is the one to ensure that > > > > all the host memory is flushed before triggering the Tx doorbell. > > > > > > > > > > There is a known ConnectX-3 HW limitation: the first 4 bytes of every > > > TXWBB (64 bytes chunks) should be > > > written in a reversed order (from last TXWBB to first TXWBB). > > > > This means the first WQE filled by the burst function is the doorbell. > > In such situation, the first four bytes of it can be written before > > leaving the burst function and after a write memory barrier. > > > > Until this first WQE is not complete, the NIC won't start processing the > > packets. Memory barriers per packets becomes useless. > > I think this is not true, Since mlx4 HW can prefetch advanced TXbbs if their first 4 > bytes are valid in spite of the first WQE is still not valid (please read the spec). A compiler barrier is enough on x86 to forbid the CPU to re-order the instructions, on arm you need a memory barrier, there is a macro in DPDK for that, rte_io_wmb(). Before triggering the doorbell you must flush the case, this is the only place where the rte_wmb() should be used. > > It gives something like: > > > > uint32_t tx_bb_db = 0; > > void *first_wqe = NULL; > > > > /* > > * Prepare all Packets by writing the WQEs without the 4 first bytes of > > * the first WQE. > > */ > > for () { > > if (!wqe) { > > first_wqe = wqe; > > tx_bb_db = foo; > > } > > } > > /* Leaving. */ > > rte_wmb(); > > *(uin32_t*)wqe = tx_bb_db; > > return n; > > > > I will take care to check if we can do 2 loops: > Write all last 60B per TXbb. > Memory barrier. > Write all first 4B per TXbbs. > > > > The last 60 bytes of any TXWBB can be written in any order (before > > > writing the first 4 bytes). > > > Is your last statement (using lonely barrier) is in accordance with > > > this limitation? Please explain. > > > > > > > There is also too many cases handled which are useless in bursts > > situation, > > > > this function needs to be re-written to its minimal use case i.e. > > processing a > > > > valid burst of packets/segments and triggering at the end of the burst the > > Tx > > > > doorbell. > > > > > > > > Regards, > > > > -- > > Nélio Laranjeiro > > 6WIND Regards, -- Nélio Laranjeiro 6WIND