From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66]) by dpdk.org (Postfix) with ESMTP id 17F717CE1 for ; Mon, 23 Oct 2017 09:50:21 +0200 (CEST) Received: by mail-wm0-f66.google.com with SMTP id u138so7935473wmu.4 for ; Mon, 23 Oct 2017 00:50:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=9wvq4FTQ4+4A8QTkIAIvtkZKsnJ1GRIsv6icm/6HmLQ=; b=FwJDseXllkA27bM3Lbun2gs6fdi8FDmRTriMWPSjQhcsmWbo1qMLSkY/annyucPBxO 2SVEPXFNyX+GrWB3XcMLYYMk6Ctn/ITCGuZZx92lMH566KuvcrtFZcj6XooSS7wXP+ta jsaQDQbUbwcQ98utdHLjtekXrr+QGATGhxNihmDpT7BVdUEbQuA78UT2UtXti3NQHR4J GtKA/cTdU1L9ZhzHKndWlJ4/oIJI6n4/q4ezwBTSal9cHWEClgRSCaPzX/6IGz6Xhz7k AQUoYzd0oY4j7YHLUr/ojV7SWTpzX7Q6elz19HKij73zuBHQRHVNpLQ+GxsFzFOGGoWA ID7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=9wvq4FTQ4+4A8QTkIAIvtkZKsnJ1GRIsv6icm/6HmLQ=; b=V879XuzdYmshVl/IIu/RlT63pbXfevAB1CWpPJHVabrSFck2jgKXQBTeYe9blNXOJ2 W9tckzQpBd0vKzE/tE/JfEABSX+OkRIsW/ltYKWVr2m3TGXH70I7D7uuyK0Be8thTLMt NjxZRRZxtCdzOXKqLnIdDeYhDOWbQuW0tqSweT8mE7rC0nXEhqNBpjsALzlZRzKjOBZM h5pHnU3LXVRfDx+rNjOaV5+xb9Wp8QKmzF6uTtA8l0VoxB1hb3UQ0DO/gsLUwMiq+yf6 TUulkC0+6vi/1tNoM5/ZBj5Tw6CoSxOZeJi9HHNbHQlmKo2P69Ns1wH8+rAsaybHH+3l QYFA== X-Gm-Message-State: AMCzsaUSyMmfMTC8RFP5PiY+nqQN7TqgNu4O0Ca3mzbvrebIlE/oxo34 s+FnYlTYPEVIZrWSdYTjjUt4 X-Google-Smtp-Source: ABhQp+SL/Nw5cdnC7mhDtWcDlwWjRsMTT1aE1whlj4f//uIdfj5+A/BNUUow/LCIgMzNB7SnRPX8MQ== X-Received: by 10.80.218.72 with SMTP id a8mr15651468edk.221.1508745021481; Mon, 23 Oct 2017 00:50:21 -0700 (PDT) Received: from laranjeiro-vm (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id j39sm2995818ede.7.2017.10.23.00.50.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 23 Oct 2017 00:50:20 -0700 (PDT) From: "=?iso-8859-1?Q?N=E9lio?= Laranjeiro" X-Google-Original-From: =?iso-8859-1?Q?N=E9lio?= Laranjeiro Date: Mon, 23 Oct 2017 09:50:14 +0200 To: Yongseok Koh Cc: Sagi Grimberg , adrien.mazarguil@6wind.com, nelio.laranjeiro@6wind.com, dev@dpdk.org, stable@dpdk.org, Alexander Solganik Message-ID: <20171023075014.bzubdh2y37s3dirk@laranjeiro-vm> References: <20171022080022.13528-1-yskoh@mellanox.com> <20171022220103.GA18571@minint-98vp2qg> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20171022220103.GA18571@minint-98vp2qg> User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH] net/mlx5: fix Tx doorbell memory barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Oct 2017 07:50:22 -0000 Yongseok, Sagi, my small contribution to this discussion, On Sun, Oct 22, 2017 at 03:01:04PM -0700, Yongseok Koh wrote: > On Sun, Oct 22, 2017 at 12:46:53PM +0300, Sagi Grimberg wrote: > > > > > Configuring UAR as IO-mapped makes maximum throughput decline by noticeable > > > amount. If UAR is configured as write-combining register, a write memory > > > barrier is needed on ringing a doorbell. rte_wmb() is mostly effective when > > > the size of a burst is comparatively small. > > > > Personally I don't think that the flag is really a good interface > > choice. But also I'm not convinced that its dependent on the burst size. > > > > What guarantees that even for larger bursts the mmio write was flushed? > > it comes after a set of writes that were flushed prior to the db update > > and its not guaranteed that the application will immediately have more > > data to trigger this writes to flush. > > Yes, I already knew the concern. I don't know you were aware but that can only > happen when burst size is exactly multiple of 32 in the vectorized Tx. If you > look at the mlx5_tx_burst_raw_vec(), every Tx bursts having more than 32 packets > will be calling txq_burst_v() more than one time. For example, if pkts_n is 45, > then it will firstly call txq_burst_v(32) and txq_burst_v(13) will follow with > setting barrier at the end. The only pitfall is when pkts_n is exactly multiple > of 32, e.g. 32, 64, 96 and so on. This shall not be likely when an app is > forwarding packets and the rate is low (if packet rate is high, we are good). > > So, the only possible case of it is when an app generates traffic at > comparatively low rate in bursty way with burst size being multiple of 32. A routing application will consume more than the 50 cycles the PMD needs to process such burst. It is not a rare case, it is the real one, routing lookup table, parsing the packet to find the layers, modifying them (decreasing the TTL, changing addresses and updating the checksum) is not something fast. The probability to have a full 32 packets burst entering in the PMD is something "normal". > If a user encounters such a rare case and latency is critical in their app, we will > recommend to set MLX5_SHUT_UP_BF=1 either by exporting in a shell or by > embedding it in their app's initialization. Or, they can use other > non-vectorized tx_burst() functions because the barrier is still enforced in > such functions like you firstly suggested. > > It is always true that we can't make everyone satisfied. Some apps prefers > better performance to better latency. As vectorized Tx outperforms all the other > tx_burst() functions, I want to leave it as only one exceptional case. Actually, > we already received a complaint that 1-core performance of vPMD declined by 10% > (53Mpps -> 48Mpps) due to the patch (MLX5_SHUT_UP_BF=1). So, I wanted to give > users/apps more versatile options/knobs. DPDK is written for throughput, that the reason why Tx/Rx burst functions are "burst" and written to consider it will process a large amount of packets at each call and this to minimize the cost of all memory barriers and doorbells. If there are some application which also needs latency, it is maybe time to introduce new API for that i.e. a dedicated to send/receive with a single packet or a little more in an efficient way. By the way, I am not sure such application handles such big sizes of burst, Sagi you may have more informations, that you may want to share. > Before sending out this patch, I've done RFC2544 latency tests with Ixia and the > result was as good as before (actually same). That's why we think it is a good > compromise. You cannot do that with testpmd, it does not match the real application behavior as it receives a burst of packets and send them back without touching them. An application will at least process/route all received packets to some other destination or port. The send action will only be triggered when the whole routing process is finished to maximize the burst sizes. According to the traffic, the latency will change. >>From what I know, we don't have such kind of example/tool in DPDK. > Thanks for your comment, > Yongseok Thanks, -- Nélio Laranjeiro 6WIND