From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from proxy.6wind.com (host.76.145.23.62.rev.coltfrance.com [62.23.145.76]) by dpdk.org (Postfix) with ESMTP id 3C5F647CD for ; Thu, 24 Nov 2016 10:59:38 +0100 (CET) Received: from glumotte.dev.6wind.com (unknown [10.16.0.195]) by proxy.6wind.com (Postfix) with ESMTP id DCC9E250BB; Thu, 24 Nov 2016 10:59:33 +0100 (CET) From: Olivier Matz To: dev@dpdk.org Cc: thomas.monjalon@6wind.com, konstantin.ananyev@intel.com, wenzhuo.lu@intel.com, helin.zhang@intel.com Date: Thu, 24 Nov 2016 10:54:12 +0100 Message-Id: <1479981261-19512-1-git-send-email-olivier.matz@6wind.com> X-Mailer: git-send-email 2.8.1 Subject: [dpdk-dev] [RFC 0/9] get Rx and Tx used descriptors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Nov 2016 09:59:38 -0000 This RFC patchset introduces a new ethdev API function rte_eth_tx_queue_count() which is the tx counterpart of rte_eth_rx_queue_count(). It implements this API on some Intel drivers for reference, and it also optimizes the implementation of rte_eth_rx_queue_count(). The usage of these functions can be: - on Rx, anticipate that the cpu is not fast enough to process all incoming packets, and take dispositions to solve the problem (add more cpus, drop specific packets, ...) - on Tx, detect that the link is overloaded, and take dispositions to solve the problem (notify flow control, drop specific packets) The tests I've done (instrumenting testpmd) show that browsing the descriptors linearly is slow when the ring size increases. Accessing the head/tail registers through pci is also slow whatever the size of the ring. A binary search is a good compromise that gives quite good performance whatever the size of the ring. Remaining question are about: - should we keep this name? I'd say "queue_count" is quite confusing, and I would expect it returns the number of queues, not the number of used descriptors - how shall we count the used descriptors, knowing that the driver can hold some to free them by bulk, which reduces the effective size of the ring I would be happy to have some feedback about this RFC before I send it as a patch. Here are some helpers to understand the code more easily (I sometimes make some shortcuts between like 1 pkt == 1 desc). RX side ======= - sw advances the tail pointer - hw advances the head pointer - the software populates the ring with descs to buffers that are filled when the hw receives packets - head == tail means there is no available buffer for hw to receive a packet - head points to the next descriptor to be filled - hw owns all descriptors between [head...tail] - when a packet is written in a descriptor, the DD (descriptor done) bit is set, and the head is advanced - the driver never reads the head (needs a pci transaction), instead it monitors the DD bit of next descriptor - when a filled packet is retrieved by the software, the descriptor has to be populated with a new empty buffer. This is not done for each packet: the driver holds them and waits until it has many descriptors to populate, and do it by bulk. (by the way, it means that the effective size a queue of size=N is lower than N since these descriptors cannot be used by the hw) rxq->rx_tail: current value of the sw tail (the idx of the next packet to be received). The real tail (hw) can be different since the driver can hold descriptors. rxq->nb_rx_hold: number of held descriptors rxq->rxrearm_nb: same, but for vector driver rxq->rx_free_thresh: when the number of held descriptors reaches this threshold, descriptors are populated with buffers to be filled, and sw advances the tail Example with a ring size of 64: |----------------------------------------------------------------| | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | | x buffers filled with data by hw x | | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |----------------------------------------------------------------| ^hw_tail=20 ^sw_tail=35 ^hw_head=54 <--- nb_hold --> <-pkts in hw queue-> The descriptors marked with 'x' has their DD bit set, the other (' ') reference empty buffers. The next packet to be received by software is at index 35. The software holds 15 descriptors that will be rearmed later. There are 19 packets waiting in the hw queue. We want the function rx_queue_count() to return the number of "used" descriptors. The first question is: what does that mean exactly? Should it be pkts_in_hw_queue or pkts_in_hw_queue + nb_hold? The current implementation returns pkts_in_hw_queue, but including nb_hold could be useful to know how many descriptors are really free (= size - used). The current implementation checks the DD bit starting from sw_tail, every 4 packets. It can be quite slow for large rings. An alternative is to read the head register, but it's also slow. This patchset optimizes rx_queue_count() by doing a binary search (checking for DD) between sw_tail and hw_tail, instead of a linear search. TX side ======= - sw advances the tail pointer - hw advances the head pointer - the software populates the ring with full buffers to be sent by the hw - head points to the in-progress descriptor. - sw writes new descriptors at tail - head == tail means that the transmit queue is empty - when the hw has processed a descriptor, it sets the DD bit if the descriptor has the RS (report status) bit. - the driver never reads the head (needs a pci transaction), instead it monitors the DD bit of a descriptor that has the RS bit txq->tx_tail: sw value for tail register txq->tx_free_thresh: free buffers if count(free descriptors) < this value txq->tx_rs_thresh: RS bit is set every X descriptor txq->tx_next_dd: next desc to scan for DD bit txq->tx_next_rs: next desc to set RS bit txq->last_desc_cleaned: last descriptor that have been cleaned txq->nb_tx_free: number of free descriptors Example: |----------------------------------------------------------------| | D R R R | | ............xxxxxxxxxxxxxxxxxxxxxxxxx | | <- descs not sent yet -> | | ............xxxxxxxxxxxxxxxxxxxxxxxxx | |----------------------------------------------------------------| ^last_desc_cleaned=8 ^next_rs=47 ^next_dd=15 ^tail=45 ^hw_head=20 <---- nb_used ---------> The hardware is currently processing the descriptor 20 'R' means the descriptor has the RS bit 'D' means the descriptor has the DD + RS bits 'x' are packets in txq (not sent) '.' are packet already sent but not freed by sw In this example, we have rs_thres=8. On next call to ixgbe_tx_free_bufs(), some buffers will be freed. The new implementation does a binary search (checking for DD) between next_dd and tail. Olivier Matz (9): ethdev: clarify api comments of rx queue count ethdev: move queue id check in generic layer ethdev: add handler for Tx queue descriptor count net/ixgbe: optimize Rx queue descriptor count net/ixgbe: add handler for Tx queue descriptor count net/igb: optimize rx queue descriptor count net/igb: add handler for tx queue descriptor count net/e1000: optimize rx queue descriptor count net/e1000: add handler for tx queue descriptor count drivers/net/e1000/e1000_ethdev.h | 10 +++- drivers/net/e1000/em_ethdev.c | 1 + drivers/net/e1000/em_rxtx.c | 109 ++++++++++++++++++++++++++++------ drivers/net/e1000/igb_ethdev.c | 1 + drivers/net/e1000/igb_rxtx.c | 109 ++++++++++++++++++++++++++++------ drivers/net/i40e/i40e_rxtx.c | 5 -- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_ethdev.h | 4 +- drivers/net/ixgbe/ixgbe_rxtx.c | 123 +++++++++++++++++++++++++++++++++------ drivers/net/ixgbe/ixgbe_rxtx.h | 2 + drivers/net/nfp/nfp_net.c | 6 -- lib/librte_ether/rte_ethdev.h | 48 +++++++++++++-- 12 files changed, 344 insertions(+), 75 deletions(-) -- 2.8.1