DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier Matz <olivier.matz@6wind.com>
To: dev@dpdk.org
Cc: thomas.monjalon@6wind.com, konstantin.ananyev@intel.com,
	wenzhuo.lu@intel.com, helin.zhang@intel.com, "Richardson,
	Bruce" <bruce.richardson@intel.com>
Subject: Re: [dpdk-dev] [RFC 0/9] get Rx and Tx used descriptors
Date: Fri, 13 Jan 2017 17:44:09 +0100	[thread overview]
Message-ID: <20170113174409.2f1be0b5@platinum> (raw)
In-Reply-To: <1479981261-19512-1-git-send-email-olivier.matz@6wind.com>

Hi,

On Thu, 24 Nov 2016 10:54:12 +0100, Olivier Matz
<olivier.matz@6wind.com> wrote:
> This RFC patchset introduces a new ethdev API function
> rte_eth_tx_queue_count() which is the tx counterpart of
> rte_eth_rx_queue_count(). It implements this API on some
> Intel drivers for reference, and it also optimizes the
> implementation of rte_eth_rx_queue_count().
> 

I'm planning to send a new version of this patchset, fixing the issues
seen by Ferruh, plus a bug fix in the e1000 implementation.

Does anyone have any comment about the new API or about questions
raised in the cover letter? Especially about the real meaning of "used
descriptor": should it include the descriptors hold by the driver?

Any comment about the method (binary search to find the used
descriptors)?

I'm also wondering about adding rte_eth_tx_descriptor_done() in the API
at the same time.

Regards,
Olivier




> The usage of these functions can be:
> - on Rx, anticipate that the cpu is not fast enough to process
>   all incoming packets, and take dispositions to solve the
>   problem (add more cpus, drop specific packets, ...)
> - on Tx, detect that the link is overloaded, and take dispositions
>   to solve the problem (notify flow control, drop specific
>   packets)
> 
> The tests I've done (instrumenting testpmd) show that browsing
> the descriptors linearly is slow when the ring size increases.
> Accessing the head/tail registers through pci is also slow
> whatever the size of the ring. A binary search is a good compromise
> that gives quite good performance whatever the size of the ring.
> 
> Remaining question are about:
> - should we keep this name? I'd say "queue_count" is quite confusing,
>   and I would expect it returns the number of queues, not the
>   number of used descriptors
> - how shall we count the used descriptors, knowing that the driver
>   can hold some to free them by bulk, which reduces the effective
>   size of the ring
> 
> I would be happy to have some feedback about this RFC before
> I send it as a patch.
> 
> Here are some helpers to understand the code more easily (I sometimes
> make some shortcuts between like 1 pkt == 1 desc).
> 
> RX side
> =======
> 
> - sw advances the tail pointer
> - hw advances the head pointer
> - the software populates the ring with descs to buffers that are
> filled when the hw receives packets
> - head == tail means there is no available buffer for hw to receive a
> packet
> - head points to the next descriptor to be filled
> - hw owns all descriptors between [head...tail]
> - when a packet is written in a descriptor, the DD (descriptor done)
>   bit is set, and the head is advanced
> - the driver never reads the head (needs a pci transaction), instead
> it monitors the DD bit of next descriptor
> - when a filled packet is retrieved by the software, the descriptor
> has to be populated with a new empty buffer. This is not done for each
>   packet: the driver holds them and waits until it has many
> descriptors to populate, and do it by bulk.
>   (by the way, it means that the effective size a queue of size=N is
>   lower than N since these descriptors cannot be used by the hw)
> 
> rxq->rx_tail: current value of the sw tail (the idx of the next
> packet to be received). The real tail (hw) can be different since the
> driver can hold descriptors.
> rxq->nb_rx_hold: number of held descriptors
> rxq->rxrearm_nb: same, but for vector driver
> rxq->rx_free_thresh: when the number of held descriptors reaches this
> threshold, descriptors are populated with buffers to be filled, and
> sw advances the tail
> 
> Example with a ring size of 64:
> 
> |----------------------------------------------------------------|
> |                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          |
> |                    x buffers filled with data by hw x          |
> |                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          |
> |----------------------------------------------------------------|
>                      ^hw_tail=20
>                                     ^sw_tail=35
>                                                        ^hw_head=54
>                      <--- nb_hold -->
>                                     <-pkts in hw queue->
> 
> The descriptors marked with 'x' has their DD bit set, the other
>   (' ') reference empty buffers.
> The next packet to be received by software is at index 35.
> The software holds 15 descriptors that will be rearmed later.
> There are 19 packets waiting in the hw queue.
> 
> We want the function rx_queue_count() to return the number of
> "used" descriptors. The first question is: what does that mean
> exactly? Should it be pkts_in_hw_queue or pkts_in_hw_queue + nb_hold?
> The current implementation returns pkts_in_hw_queue, but including
> nb_hold could be useful to know how many descriptors are really
> free (= size - used).
> 
> The current implementation checks the DD bit starting from sw_tail,
> every 4 packets. It can be quite slow for large rings. An alternative
> is to read the head register, but it's also slow.
> 
> This patchset optimizes rx_queue_count() by doing a binary
> search (checking for DD) between sw_tail and hw_tail, instead of a
> linear search.
> 
> TX side
> =======
> 
> - sw advances the tail pointer
> - hw advances the head pointer
> - the software populates the ring with full buffers to be sent by
>   the hw
> - head points to the in-progress descriptor.
> - sw writes new descriptors at tail
> - head == tail means that the transmit queue is empty
> - when the hw has processed a descriptor, it sets the DD bit if
>   the descriptor has the RS (report status) bit.
> - the driver never reads the head (needs a pci transaction), instead
> it monitors the DD bit of a descriptor that has the RS bit
> 
> txq->tx_tail: sw value for tail register
> txq->tx_free_thresh: free buffers if count(free descriptors) < this
> value txq->tx_rs_thresh: RS bit is set every X descriptor
> txq->tx_next_dd: next desc to scan for DD bit
> txq->tx_next_rs: next desc to set RS bit
> txq->last_desc_cleaned: last descriptor that have been cleaned
> txq->nb_tx_free: number of free descriptors
> 
> Example:
> 
> |----------------------------------------------------------------|
> |               D       R       R       R                        |
> |        ............xxxxxxxxxxxxxxxxxxxxxxxxx                   |
> |        <descs sent><- descs not sent yet  ->                   |
> |        ............xxxxxxxxxxxxxxxxxxxxxxxxx                   |
> |----------------------------------------------------------------|
>         ^last_desc_cleaned=8                    ^next_rs=47
>                 ^next_dd=15                   ^tail=45
>                      ^hw_head=20
> 
>                      <----  nb_used  --------->
> 
> The hardware is currently processing the descriptor 20
> 'R' means the descriptor has the RS bit
> 'D' means the descriptor has the DD + RS bits
> 'x' are packets in txq (not sent)
> '.' are packet already sent but not freed by sw
> 
> In this example, we have rs_thres=8. On next call to
> ixgbe_tx_free_bufs(), some buffers will be freed.
> 
> The new implementation does a binary search (checking for DD) between
> next_dd and tail.
> 
> 
> 
> Olivier Matz (9):
>   ethdev: clarify api comments of rx queue count
>   ethdev: move queue id check in generic layer
>   ethdev: add handler for Tx queue descriptor count
>   net/ixgbe: optimize Rx queue descriptor count
>   net/ixgbe: add handler for Tx queue descriptor count
>   net/igb: optimize rx queue descriptor count
>   net/igb: add handler for tx queue descriptor count
>   net/e1000: optimize rx queue descriptor count
>   net/e1000: add handler for tx queue descriptor count
> 
>  drivers/net/e1000/e1000_ethdev.h |  10 +++-
>  drivers/net/e1000/em_ethdev.c    |   1 +
>  drivers/net/e1000/em_rxtx.c      | 109
> ++++++++++++++++++++++++++++------ drivers/net/e1000/igb_ethdev.c
> |   1 + drivers/net/e1000/igb_rxtx.c     | 109
> ++++++++++++++++++++++++++++------ drivers/net/i40e/i40e_rxtx.c
> |   5 -- drivers/net/ixgbe/ixgbe_ethdev.c |   1 +
>  drivers/net/ixgbe/ixgbe_ethdev.h |   4 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c   | 123
> +++++++++++++++++++++++++++++++++------
> drivers/net/ixgbe/ixgbe_rxtx.h   |   2 +
> drivers/net/nfp/nfp_net.c        |   6 --
> lib/librte_ether/rte_ethdev.h    |  48 +++++++++++++-- 12 files
> changed, 344 insertions(+), 75 deletions(-)
> 

  parent reply	other threads:[~2017-01-13 16:44 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-24  9:54 Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 1/9] ethdev: clarify api comments of rx queue count Olivier Matz
2016-11-24 10:52   ` Ferruh Yigit
2016-11-24 11:13     ` Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 2/9] ethdev: move queue id check in generic layer Olivier Matz
2016-11-24 10:59   ` Ferruh Yigit
2016-11-24 13:05     ` Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 3/9] ethdev: add handler for Tx queue descriptor count Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 4/9] net/ixgbe: optimize Rx " Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 5/9] net/ixgbe: add handler for Tx " Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 6/9] net/igb: optimize rx " Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 7/9] net/igb: add handler for tx " Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 8/9] net/e1000: optimize rx " Olivier Matz
2016-11-24  9:54 ` [dpdk-dev] [RFC 9/9] net/e1000: add handler for tx " Olivier Matz
2017-01-13 16:44 ` Olivier Matz [this message]
2017-01-13 17:32   ` [dpdk-dev] [RFC 0/9] get Rx and Tx used descriptors Richardson, Bruce
2017-01-17  8:24     ` Olivier Matz
2017-01-17 13:56       ` Bruce Richardson
2017-03-01 17:19 ` [dpdk-dev] [PATCH 0/6] get status of Rx and Tx descriptors Olivier Matz
2017-03-01 17:19   ` [dpdk-dev] [PATCH 1/6] ethdev: add descriptor status API Olivier Matz
2017-03-01 18:22     ` Andrew Rybchenko
2017-03-02 13:57       ` Olivier Matz
2017-03-02 14:19         ` Andrew Rybchenko
2017-03-02 14:54           ` Olivier Matz
2017-03-02 15:05             ` Andrew Rybchenko
2017-03-02 15:14               ` Olivier Matz
2017-03-01 17:19   ` [dpdk-dev] [PATCH 2/6] net/ixgbe: implement " Olivier Matz
2017-03-01 17:19   ` [dpdk-dev] [PATCH 3/6] net/e1000: implement descriptor status API (igb) Olivier Matz
2017-03-02  1:28     ` Lu, Wenzhuo
2017-03-02 13:58       ` Olivier Matz
2017-03-01 17:19   ` [dpdk-dev] [PATCH 4/6] net/e1000: implement descriptor status API (em) Olivier Matz
2017-03-02  1:22     ` Lu, Wenzhuo
2017-03-02 14:46       ` Olivier Matz
2017-03-03  1:15         ` Lu, Wenzhuo
2017-03-01 17:19   ` [dpdk-dev] [PATCH 5/6] net/mlx5: implement descriptor status API Olivier Matz
2017-03-02  7:56     ` Nélio Laranjeiro
2017-03-01 17:19   ` [dpdk-dev] [PATCH 6/6] net/i40e: " Olivier Matz
2017-03-01 18:02   ` [dpdk-dev] [PATCH 0/6] get status of Rx and Tx descriptors Andrew Rybchenko
2017-03-02 13:40     ` Olivier Matz
2017-03-06 10:41       ` Thomas Monjalon
2017-03-01 18:07   ` Stephen Hemminger
2017-03-02 13:43     ` Olivier Matz
2017-03-06 10:41       ` Thomas Monjalon
2017-03-02 15:32   ` Bruce Richardson
2017-03-02 16:14     ` Olivier Matz
2017-03-03 16:18       ` Venkatesan, Venky
2017-03-03 16:45         ` Olivier Matz
2017-03-03 18:46           ` Venkatesan, Venky
2017-03-04 20:45             ` Olivier Matz
2017-03-06 11:02               ` Thomas Monjalon
2017-03-07 15:59   ` [dpdk-dev] [PATCH v2 " Olivier Matz
2017-03-07 15:59     ` [dpdk-dev] [PATCH v2 1/6] ethdev: add descriptor status API Olivier Matz
2017-03-09 11:49       ` Andrew Rybchenko
2017-03-21  8:32       ` Yang, Qiming
2017-03-24 12:49         ` Olivier Matz
2017-03-27  1:28           ` Yang, Qiming
2017-03-07 15:59     ` [dpdk-dev] [PATCH v2 2/6] net/ixgbe: implement " Olivier Matz
2017-03-07 15:59     ` [dpdk-dev] [PATCH v2 3/6] net/e1000: implement descriptor status API (igb) Olivier Matz
2017-03-08  1:17       ` Lu, Wenzhuo
2017-03-07 15:59     ` [dpdk-dev] [PATCH v2 4/6] net/e1000: implement descriptor status API (em) Olivier Matz
2017-03-07 15:59     ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: implement descriptor status API Olivier Matz
2017-03-07 15:59     ` [dpdk-dev] [PATCH v2 6/6] net/i40e: " Olivier Matz
2017-03-08  1:17       ` Lu, Wenzhuo
2017-03-29  8:36     ` [dpdk-dev] [PATCH v3 0/6] get status of Rx and Tx descriptors Olivier Matz
2017-03-29  8:36       ` [dpdk-dev] [PATCH v3 1/6] ethdev: add descriptor status API Olivier Matz
2017-03-29  8:36       ` [dpdk-dev] [PATCH v3 2/6] net/ixgbe: implement " Olivier Matz
2017-03-29  8:36       ` [dpdk-dev] [PATCH v3 3/6] net/e1000: implement descriptor status API (igb) Olivier Matz
2017-03-29  8:36       ` [dpdk-dev] [PATCH v3 4/6] net/e1000: implement descriptor status API (em) Olivier Matz
2017-03-29  8:36       ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: implement descriptor status API Olivier Matz
2017-03-29  8:36       ` [dpdk-dev] [PATCH v3 6/6] net/i40e: " Olivier Matz
2017-03-30 13:30       ` [dpdk-dev] [PATCH v3 0/6] get status of Rx and Tx descriptors Thomas Monjalon
2017-04-19 15:50         ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170113174409.2f1be0b5@platinum \
    --to=olivier.matz@6wind.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=thomas.monjalon@6wind.com \
    --cc=wenzhuo.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).