From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by dpdk.org (Postfix) with ESMTP id 9269F2C6B for ; Fri, 13 Jan 2017 17:44:13 +0100 (CET) Received: by mail-wm0-f41.google.com with SMTP id c85so72575167wmi.1 for ; Fri, 13 Jan 2017 08:44:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xP1fplQkcsJhztkhncmXUGOth2AaBKekuOFrvWDk+0A=; b=SSZ6tDdG4gbzmsQS/g2orbW7Dyjj7+jiBZ+/KBKmFpazKjfPzNXWjmHAROqwmuIptX R11KR5Ixt5aZYPTYCTY50xHmu5n1PTlwIIYcXRhjbnC59QVkgNHnJhbTmHxyfe2tm274 08vuySolSLdWvvPh9F4UTJfqiN/YKYI33uH/5XadyMRDBdah9udRpeHOEzLvmfWv4Gq8 52V4WpNNDFY/Ucw+2pKwEtlHY/y8kLT8fEHbnBGOArQrCHYC5tsHxnbrh4HuR6J9D2Eq qDc1g87qBU/QvX9GqcDq30Lo9SGhihedfj4ByWNgIDcZky2LQTr8GhiZ+60/C2ZPEqK0 uzQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xP1fplQkcsJhztkhncmXUGOth2AaBKekuOFrvWDk+0A=; b=SUTxPC9g4dYc2S09B697rfvbpO/6j62jJVDLxzUeOGbDC5h/3hO0oAQ9ahWmL949It ciXOQMP7YpRoXQr3kX90QwXKyWuIAQbyKcfINayozXNo4QTK+PWdzp0fbEvze8Vb85Lp ZRxVuXc18zKWX9HJGpVmAJ5wnrVkYF4q6eAjYRytgtrtJuA3v9XzKDV1vQZWc1ksKuGl cubvu0n79rtplTa9MfbFdqZKJz5upvfZuaiQY6VdszoBiyPihXoMa2oWAY7OuwwDUOjE Xdmwolzf2tDQB2li9D7fVM52kP10uX5KjgW3tQOYWt8q6FMkvhpME8M+ku1MNs9ARul0 gYrw== X-Gm-Message-State: AIkVDXLHzpAqgTOXttgKMc+cZZJ4E6EuWaRt6n8VVQljjCAImXnfboateopIaJ4x9VnAPSlf X-Received: by 10.28.144.66 with SMTP id s63mr3160550wmd.134.1484325853137; Fri, 13 Jan 2017 08:44:13 -0800 (PST) Received: from platinum (2a01cb0c03c651000226b0fffeed02fc.ipv6.abo.wanadoo.fr. [2a01:cb0c:3c6:5100:226:b0ff:feed:2fc]) by smtp.gmail.com with ESMTPSA id 14sm5417823wmk.1.2017.01.13.08.44.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 13 Jan 2017 08:44:13 -0800 (PST) Date: Fri, 13 Jan 2017 17:44:09 +0100 From: Olivier Matz To: dev@dpdk.org Cc: thomas.monjalon@6wind.com, konstantin.ananyev@intel.com, wenzhuo.lu@intel.com, helin.zhang@intel.com, "Richardson, Bruce" Message-ID: <20170113174409.2f1be0b5@platinum> In-Reply-To: <1479981261-19512-1-git-send-email-olivier.matz@6wind.com> References: <1479981261-19512-1-git-send-email-olivier.matz@6wind.com> X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC 0/9] get Rx and Tx used descriptors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jan 2017 16:44:13 -0000 Hi, On Thu, 24 Nov 2016 10:54:12 +0100, Olivier Matz wrote: > This RFC patchset introduces a new ethdev API function > rte_eth_tx_queue_count() which is the tx counterpart of > rte_eth_rx_queue_count(). It implements this API on some > Intel drivers for reference, and it also optimizes the > implementation of rte_eth_rx_queue_count(). > I'm planning to send a new version of this patchset, fixing the issues seen by Ferruh, plus a bug fix in the e1000 implementation. Does anyone have any comment about the new API or about questions raised in the cover letter? Especially about the real meaning of "used descriptor": should it include the descriptors hold by the driver? Any comment about the method (binary search to find the used descriptors)? I'm also wondering about adding rte_eth_tx_descriptor_done() in the API at the same time. Regards, Olivier > The usage of these functions can be: > - on Rx, anticipate that the cpu is not fast enough to process > all incoming packets, and take dispositions to solve the > problem (add more cpus, drop specific packets, ...) > - on Tx, detect that the link is overloaded, and take dispositions > to solve the problem (notify flow control, drop specific > packets) > > The tests I've done (instrumenting testpmd) show that browsing > the descriptors linearly is slow when the ring size increases. > Accessing the head/tail registers through pci is also slow > whatever the size of the ring. A binary search is a good compromise > that gives quite good performance whatever the size of the ring. > > Remaining question are about: > - should we keep this name? I'd say "queue_count" is quite confusing, > and I would expect it returns the number of queues, not the > number of used descriptors > - how shall we count the used descriptors, knowing that the driver > can hold some to free them by bulk, which reduces the effective > size of the ring > > I would be happy to have some feedback about this RFC before > I send it as a patch. > > Here are some helpers to understand the code more easily (I sometimes > make some shortcuts between like 1 pkt == 1 desc). > > RX side > ======= > > - sw advances the tail pointer > - hw advances the head pointer > - the software populates the ring with descs to buffers that are > filled when the hw receives packets > - head == tail means there is no available buffer for hw to receive a > packet > - head points to the next descriptor to be filled > - hw owns all descriptors between [head...tail] > - when a packet is written in a descriptor, the DD (descriptor done) > bit is set, and the head is advanced > - the driver never reads the head (needs a pci transaction), instead > it monitors the DD bit of next descriptor > - when a filled packet is retrieved by the software, the descriptor > has to be populated with a new empty buffer. This is not done for each > packet: the driver holds them and waits until it has many > descriptors to populate, and do it by bulk. > (by the way, it means that the effective size a queue of size=N is > lower than N since these descriptors cannot be used by the hw) > > rxq->rx_tail: current value of the sw tail (the idx of the next > packet to be received). The real tail (hw) can be different since the > driver can hold descriptors. > rxq->nb_rx_hold: number of held descriptors > rxq->rxrearm_nb: same, but for vector driver > rxq->rx_free_thresh: when the number of held descriptors reaches this > threshold, descriptors are populated with buffers to be filled, and > sw advances the tail > > Example with a ring size of 64: > > |----------------------------------------------------------------| > | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | > | x buffers filled with data by hw x | > | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | > |----------------------------------------------------------------| > ^hw_tail=20 > ^sw_tail=35 > ^hw_head=54 > <--- nb_hold --> > <-pkts in hw queue-> > > The descriptors marked with 'x' has their DD bit set, the other > (' ') reference empty buffers. > The next packet to be received by software is at index 35. > The software holds 15 descriptors that will be rearmed later. > There are 19 packets waiting in the hw queue. > > We want the function rx_queue_count() to return the number of > "used" descriptors. The first question is: what does that mean > exactly? Should it be pkts_in_hw_queue or pkts_in_hw_queue + nb_hold? > The current implementation returns pkts_in_hw_queue, but including > nb_hold could be useful to know how many descriptors are really > free (= size - used). > > The current implementation checks the DD bit starting from sw_tail, > every 4 packets. It can be quite slow for large rings. An alternative > is to read the head register, but it's also slow. > > This patchset optimizes rx_queue_count() by doing a binary > search (checking for DD) between sw_tail and hw_tail, instead of a > linear search. > > TX side > ======= > > - sw advances the tail pointer > - hw advances the head pointer > - the software populates the ring with full buffers to be sent by > the hw > - head points to the in-progress descriptor. > - sw writes new descriptors at tail > - head == tail means that the transmit queue is empty > - when the hw has processed a descriptor, it sets the DD bit if > the descriptor has the RS (report status) bit. > - the driver never reads the head (needs a pci transaction), instead > it monitors the DD bit of a descriptor that has the RS bit > > txq->tx_tail: sw value for tail register > txq->tx_free_thresh: free buffers if count(free descriptors) < this > value txq->tx_rs_thresh: RS bit is set every X descriptor > txq->tx_next_dd: next desc to scan for DD bit > txq->tx_next_rs: next desc to set RS bit > txq->last_desc_cleaned: last descriptor that have been cleaned > txq->nb_tx_free: number of free descriptors > > Example: > > |----------------------------------------------------------------| > | D R R R | > | ............xxxxxxxxxxxxxxxxxxxxxxxxx | > | <- descs not sent yet -> | > | ............xxxxxxxxxxxxxxxxxxxxxxxxx | > |----------------------------------------------------------------| > ^last_desc_cleaned=8 ^next_rs=47 > ^next_dd=15 ^tail=45 > ^hw_head=20 > > <---- nb_used ---------> > > The hardware is currently processing the descriptor 20 > 'R' means the descriptor has the RS bit > 'D' means the descriptor has the DD + RS bits > 'x' are packets in txq (not sent) > '.' are packet already sent but not freed by sw > > In this example, we have rs_thres=8. On next call to > ixgbe_tx_free_bufs(), some buffers will be freed. > > The new implementation does a binary search (checking for DD) between > next_dd and tail. > > > > Olivier Matz (9): > ethdev: clarify api comments of rx queue count > ethdev: move queue id check in generic layer > ethdev: add handler for Tx queue descriptor count > net/ixgbe: optimize Rx queue descriptor count > net/ixgbe: add handler for Tx queue descriptor count > net/igb: optimize rx queue descriptor count > net/igb: add handler for tx queue descriptor count > net/e1000: optimize rx queue descriptor count > net/e1000: add handler for tx queue descriptor count > > drivers/net/e1000/e1000_ethdev.h | 10 +++- > drivers/net/e1000/em_ethdev.c | 1 + > drivers/net/e1000/em_rxtx.c | 109 > ++++++++++++++++++++++++++++------ drivers/net/e1000/igb_ethdev.c > | 1 + drivers/net/e1000/igb_rxtx.c | 109 > ++++++++++++++++++++++++++++------ drivers/net/i40e/i40e_rxtx.c > | 5 -- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + > drivers/net/ixgbe/ixgbe_ethdev.h | 4 +- > drivers/net/ixgbe/ixgbe_rxtx.c | 123 > +++++++++++++++++++++++++++++++++------ > drivers/net/ixgbe/ixgbe_rxtx.h | 2 + > drivers/net/nfp/nfp_net.c | 6 -- > lib/librte_ether/rte_ethdev.h | 48 +++++++++++++-- 12 files > changed, 344 insertions(+), 75 deletions(-) >