From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f196.google.com (mail-wr0-f196.google.com [209.85.128.196]) by dpdk.org (Postfix) with ESMTP id BB43E1BAEE for ; Thu, 26 Oct 2017 15:44:35 +0200 (CEST) Received: by mail-wr0-f196.google.com with SMTP id z55so3235068wrz.1 for ; Thu, 26 Oct 2017 06:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=259RlH5eIXDE7e5HPO+snfOOjPL0inbhfxmK+9WrGMk=; b=QKu51msh7VMBXeChP0WdMlxtlzFBTM164L3F3ewB8LtLAe3fEJlE6PNTbT96bdEfrt eZnsuhJ2QWm7sVBPZggGXfjMCaRtyPiOcTHQChlSiBEGf7kWku2qLGuZoqEBe7CZOzaA 9rZN6W57gMSxOxTpYLUVTe1mPzqG9nxaET525QItQFDQDGW3F3UwEZrLnveu3q2E/rto k+kMVaZX5l6JVlINmuqvfSRKcShtxp0Bmtj05zJh6EehzrJ2oDXhQWW4hygG6jimObvA 8umkPA3518dxXTErCGeU1K9Jaa43N0zReR5rmj3ysbY+NNTVuwsRdl9Ol9UPWU7lXF+H EyWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=259RlH5eIXDE7e5HPO+snfOOjPL0inbhfxmK+9WrGMk=; b=A/mpbV4Q7/OpKUqi++fBtr33uby14PazWsuVtYyau0jrqCFFF5L87z/AsRnnXIFNzt kU1IqewEVSUoidP93BHY1+k3SwHQmWMmeOY40TH0GmIGwx68thearQMwBh23RQZlT2Fv i/GmdB9qgr9WRhJ/Ix9c96JKzsAPB9+GigDDJPppW4+nIfWMIXWdfboGVsm6Y+11x3+K cr8z3fd7BzxA29GoeDpBTgzAVpH0W5OjOhnr145EYPI/mhXaTmzfccDyaiWc7kkI0Exc WvwPHV42eSMyVYKnxgh4whoGVfypSwCjton3Pac/nstFxSr4vhsSxEwY1+aejFQL366S K2mg== X-Gm-Message-State: AMCzsaUB9Pt/7Fgv+malL2gm6Q9IjR/BOj97uA3F6vzQMIGYkz38MAL2 FXtOHIx6AYSggrwwY6X131J5 X-Google-Smtp-Source: ABhQp+RlKC0U7ScYoO8I59zwcXZ5f/t641U6M3qvoNneuOax26vewjeoepeRqFmg9HT32HlqGr0VmA== X-Received: by 10.223.186.140 with SMTP id p12mr5212351wrg.235.1509025475124; Thu, 26 Oct 2017 06:44:35 -0700 (PDT) Received: from laranjeiro-vm (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id q4sm1100542wmd.19.2017.10.26.06.44.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Oct 2017 06:44:34 -0700 (PDT) Date: Thu, 26 Oct 2017 15:44:24 +0200 From: =?iso-8859-1?Q?N=E9lio?= Laranjeiro To: Matan Azrad Cc: Ophir Munk , Adrien Mazarguil , "dev@dpdk.org" , Thomas Monjalon , Olga Shern , Mordechay Haimovsky Message-ID: <20171026134424.6hww2zyc3crbe322@laranjeiro-vm> References: <1508752838-30408-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1508768520-4810-5-git-send-email-ophirmu@mellanox.com> <20171024135149.fyg4nzcbygo2amtz@laranjeiro-vm> <20171025075006.znxl7mezy4pfyzsj@laranjeiro-vm> <20171026121219.ke3dz7hv4a5zfpih@laranjeiro-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Oct 2017 13:44:35 -0000 On Thu, Oct 26, 2017 at 12:30:54PM +0000, Matan Azrad wrote: > Hi Nelio > Please see my comments below (3). > > > > -----Original Message----- > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > Sent: Thursday, October 26, 2017 3:12 PM > > To: Matan Azrad > > Cc: Ophir Munk ; Adrien Mazarguil > > ; dev@dpdk.org; Thomas Monjalon > > ; Olga Shern ; Mordechay > > Haimovsky > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions > > > > On Thu, Oct 26, 2017 at 10:31:06AM +0000, Matan Azrad wrote: > > > Hi Nelio > > > > > > I think the memory barrier discussion is not relevant for this patch > > > (if it will be relevant I will create new one). > > > Please see my comments inline. > > > > It was not my single comment. There is also useless code like having null > > segments in the packets which is not allowed on DPDK. > > Sorry, but I can't find comments in the previous mails. You should search in the series, > Moreover this comment(first time I see it) is not relevant to this patch and asking something else. > All what this patch does is to merge 2 functions to prevent double > asking about WQ remain space... Again in the series itself. The point, this series embed 7 patches for "performance improvement", whereas the single improvement is avoiding to call an outside function by copy/pasting it into the PMD. In fact it will save few cycles, but this improvements could have been much more if the it was not a bare copy/paste. The real question is what is the improvement? If the improvement is significant, it worse having this series, otherwise it does not as it may also bring some bugs which may be resolve from its original source whereas this one will remain. > Remove memory\compiler barriers or dealing with null segments are not in the scope here. > > > > > > Regarding this specific patch, I didn't see any comment from you, Are > > > you agree with it? > > > > > > > -----Original Message----- > > > > From: Nélio Laranjeiro [mailto:nelio.laranjeiro@6wind.com] > > > > Sent: Wednesday, October 25, 2017 10:50 AM > > > > To: Ophir Munk > > > > Cc: Adrien Mazarguil ; dev@dpdk.org; > > > > Thomas Monjalon ; Olga Shern > > > > ; Matan Azrad > > > > Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path > > > > functions > > > > > > > > On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote: > > > > > Hi, > > > > > > > > > > On Tuesday, October 24, 2017 4:52 PM, Nélio Laranjeiro wrote: > > > > > > > > > > > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote: > > > > > > > From: Matan Azrad > > > > > > > > > > > > > > Merge tx_burst and mlx4_post_send functions to prevent double > > > > > > > asking about WQ remain space. > > > > > > > > > > > > > > This should improve performance. > > > > > > > > > > > > > > Signed-off-by: Matan Azrad > > > > > > > --- > > > > > > > drivers/net/mlx4/mlx4_rxtx.c | 353 > > > > > > > +++++++++++++++++++++---------------------- > > > > > > > 1 file changed, 170 insertions(+), 183 deletions(-) > > > > > > > > > > > > What are the real expectation you have on the remaining patches > > > > > > of the series? > > > > > > > > > > > > According to the comment of this commit log "This should improve > > > > > > performance" there are too many barriers at each packet/segment > > > > > > level to improve something. > > > > > > > > > > > > The point is, mlx4_burst_tx() should write all the WQE without > > > > > > any barrier as it is processing a burst of packets (whereas > > > > > > Verbs functions which may only process a single packet). > > > > > > > > > > > The lonely barrier which should be present is the one to ensure > > > > > > that all the host memory is flushed before triggering the Tx doorbell. > > > > > > > > > > > > > > > > There is a known ConnectX-3 HW limitation: the first 4 bytes of > > > > > every TXWBB (64 bytes chunks) should be written in a reversed > > > > > order (from last TXWBB to first TXWBB). > > > > > > > > This means the first WQE filled by the burst function is the doorbell. > > > > In such situation, the first four bytes of it can be written before > > > > leaving the burst function and after a write memory barrier. > > > > > > > > Until this first WQE is not complete, the NIC won't start processing > > > > the packets. Memory barriers per packets becomes useless. > > > > > > I think this is not true, Since mlx4 HW can prefetch advanced TXbbs if > > > their first 4 bytes are valid in spite of the first WQE is still not valid (please > > read the spec). > > > > A compiler barrier is enough on x86 to forbid the CPU to re-order the > > instructions, on arm you need a memory barrier, there is a macro in DPDK for > > that, rte_io_wmb(). > > > We are also using compiler barrier here. > > > Before triggering the doorbell you must flush the case, this is the only place > > where the rte_wmb() should be used. > > > > We are also using memory barrier only for this reason. > > > > > It gives something like: > > > > > > > > uint32_t tx_bb_db = 0; > > > > void *first_wqe = NULL; > > > > > > > > /* > > > > * Prepare all Packets by writing the WQEs without the 4 first bytes of > > > > * the first WQE. > > > > */ > > > > for () { > > > > if (!wqe) { > > > > first_wqe = wqe; > > > > tx_bb_db = foo; > > > > } > > > > } > > > > /* Leaving. */ > > > > rte_wmb(); > > > > *(uin32_t*)wqe = tx_bb_db; > > > > return n; > > > > > > > > > > I will take care to check if we can do 2 loops: > > > Write all last 60B per TXbb. > > > Memory barrier. > > > Write all first 4B per TXbbs. > > > > > > > > The last 60 bytes of any TXWBB can be written in any order (before > > > > > writing the first 4 bytes). > > > > > Is your last statement (using lonely barrier) is in accordance > > > > > with this limitation? Please explain. > > > > > > > > > > > There is also too many cases handled which are useless in bursts > > > > situation, > > > > > > this function needs to be re-written to its minimal use case i.e. > > > > processing a > > > > > > valid burst of packets/segments and triggering at the end of the > > > > > > burst the > > > > Tx > > > > > > doorbell. > > > > > > > > > > > > > > Regards, > > > > > > > > -- > > > > Nélio Laranjeiro > > > > 6WIND > > > > Regards, > > > > -- > > Nélio Laranjeiro > > 6WIND -- Nélio Laranjeiro 6WIND