DPDK patches and discussions
 help / color / Atom feed
From: "Di, ChenxuX" <chenxux.di@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "Yang, Qiming" <qiming.yang@intel.com>
Subject: Re: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
Date: Tue, 7 Jan 2020 10:46:13 +0000
Message-ID: <3B926E44943CB04AA3A39AC16328CE39B9440F@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <SN6PR11MB25585FDADFE1F159829B305D9A3C0@SN6PR11MB2558.namprd11.prod.outlook.com>

Hi

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, January 6, 2020 9:26 PM
> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v6 3/4] net/ixgbe: cleanup Tx buffers
> 
> 
> 
> > > > > > + * tx_tail is the last sent packet on the sw_ring. Goto the
> > > > > > + end
> > > > > > + * of that packet (the last segment in the packet chain) and
> > > > > > + * then the next segment will be the start of the oldest
> > > > > > + segment
> > > > > > + * in the sw_ring.
> > > > >
> > > > > Not sure I understand the sentence above.
> > > > > tx_tail is the value of TDT HW register (most recently armed by SW TD).
> > > > > last_id  is the index of last descriptor for multi-seg packet.
> > > > > next_id is just the index of next descriptor in HW TD ring.
> > > > > How do you conclude that it will be the ' oldest segment in the sw_ring'?
> > > > >
> > > >
> > > > The tx_tail is the last sent packet on the sw_ring. While the
> > > > xmit_cleanup or Tx_free_bufs will be call when the nb_tx_free <
> > > tx_free_thresh .
> > > > So the sw_ring[tx_tail].next_id must be the begin of mbufs which
> > > > are not used or  Already freed . then begin the loop until the
> > > > mbuf is used and
> > > begin to free them.
> > > >
> > > >
> > > >
> > > > > Another question why do you need to write your own functions?
> > > > > Why can't you reuse existing ixgbe_xmit_cleanup() for
> > > > > full(offload) path and
> > > > > ixgbe_tx_free_bufs() for simple path?
> > > > > Yes,  ixgbe_xmit_cleanup() doesn't free mbufs, but at least it
> > > > > could be used to determine finished TX descriptors.
> > > > > Based on that you can you can free appropriate sw_ring[] entries.
> > > > >
> > > >
> > > > The reason why I don't reuse existing function is that they all
> > > > free several mbufs While the free_cnt of the API
> > > > rte_eth_tx_done_cleanup() is the
> > > number of packets.
> > > > It also need to be done that check which mbuffs are from the same packet.
> > >
> > > At first, I don't see anything bad if tx_done_cleanup() will free
> > > only some segments from the packet. As long as it is safe - there is no
> problem with that.
> > > I think rte_eth_tx_done_cleanup() operates on mbuf, not packet quantities.
> > > But in our case I think it doesn't matter, as ixgbe_xmit_cleanup()
> > > mark TXDs as free only when HW is done with all TXDs for that packet.
> > > As long as there is a way to reuse existing code and avoid
> > > duplication (without introducing any degradation) - we should use it.
> > > And I think there is a very good opportunity here to reuse existing
> > > ixgbe_xmit_cleanup() for tx_done_cleanup() implementation.
> > > Moreover because your code doesn’t follow
> > > ixgbe_xmit_pkts()/ixgbe_xmit_cleanup()
> > > logic and infrastructure, it introduces unnecessary scans over TXD
> > > ring, and in some cases doesn't work as expected:
> > >
> > > +while (1) {
> > > +tx_last = sw_ring[tx_id].last_id;
> > > +
> > > +if (sw_ring[tx_last].mbuf) {
> > > +if (txr[tx_last].wb.status &
> > > +IXGBE_TXD_STAT_DD) {
> > > ...
> > > +} else {
> > > +/*
> > > + * mbuf still in use, nothing left to
> > > + * free.
> > > + */
> > > +break;
> > >
> > > It is not correct to expect that IXGBE_TXD_STAT_DD will be set on
> > > last TXD for
> > > *every* packet.
> > > We set IXGBE_TXD_CMD_RS bit only on threshold packet last descriptor.
> > > Plus ixgbe_xmit_cleanup() can cleanup TXD wb.status.
> > >
> > > So I strongly recommend to reuse ixgbe_xmit_cleanup() here.
> > > It would be much less error prone and will help to avoid code duplication.
> > >
> > > Konstantin
> > >
> >
> > At first.
> > The function ixgbe_xmit_cleanup(struct ixgbe_tx_queue *txq) will cleanup
> TXD wb.status.
> > the number of status cleanuped is always txq->tx_rs_thresh.
> 
> Yes, and what's wrong with it?
> 
> >
> > The  API rte_eth_tx_done_cleanup() in rte_eth_dev.h show that @param
> > free_cnt
> >  *   Maximum number of packets to free. Use 0 to indicate all possible packets
> >  *   should be freed. Note that a packet may be using multiple mbufs.
> 
> I don't think it is a good approach, better would be to report number of freed
> mbufs, but ok, as it is a public API, we probably need to keep it as it is.
> 
> > a number must be set while ixgbe_xmit_cleanup and ixgbe_tx_free_bufs only
> have one parameter txq.
> 
> Yes, ixgbe_xmit_cleanup() cleans up at least txq->tx_rs_thresh TXDs.
> So if user requested more packets to be freed we can call ixgbe_xmit_cleanup()
> in a loop.
> 

That is a great idea and I discuss with my workmate about it today. there is also some
Question that we don’t confirm. 
Actually it can call ixgbe_xmit_cleanup() in a loop if user requested more packets,
How to deal with the MOD. For example:
	The default tx_rs_thresh is 32.if the count of mbufs we need free is 50.
	We can call ixgbe_xmit_cleanup() one time to free 32 mbufs.
	Then how about other 18 mbufs. 
	1.If we do nothing, it looks not good. 
	2.if we call ixgbe_xmit_cleanup() successfully, we free 14 mbufs more.
	3.if we call ixgbe_xmit_cleanup() fail, the status of No.32 mbufs is not DD
	  While No .18 is DD. So we do not free 18 mbufs what we can and should free.

We have try some plans about it, likes add parameter for ixgbe_xmit_cleanup(), change 
 Tx_rs_thresh or copy the code or ixgbe_xmit_cleanup() as a new function with more 
Parameter. But all of them seem not perfect. 

So  can you give some comment about it? It seems not easy as we think by reuse function.


> > And what should do is not only free buffers and status but also check
> > which bufs are from  One packet and count the packet freed.
> 
> ixgbe_xmit_cleanup() itself doesn't free mbufs itself.
> It only cleans up TXDs.
> So in tx_done_cleanup() after calling ixgbe_xmit_cleanup() you'll still need to go
> through sw_ring[] entries that correspond to free TXDs and call mbuf_seg_free().
> You can count number of full packets here.
> 
> > So I think it can't be implemented that reuse function xmit_cleanup without
> change it.
> > And create a new function with the code of xmit_cleanup will cause many
> duplication.
> 
> I don't think it would.
> I think all we need here is something like that (note it is schematic one, doesn't
> take into accounf that TXD ring is circular):
> 
> tx_done_cleanup(..., uint32_t cnt)
> {
>       /* we have txq->nb_tx_free TXDs starting from txq->tx_tail.
>            Scan them first and free as many mbufs as we can.
>            If we need more mbufs to free call  ixgbe_xmit_cleanup()
>            to free more TXDs. */
> 
>        swr = txq->sw_ring;
>        txr     = txq->tx_ring;
>        id   = txq->tx_tail;
>        free =  txq->nb_tx_free;
> 
>        for (n = 0; n < cnt && free != 0; ) {
> 
>           for (j = 0; j != free && n < cnt; j++) {
>              swe = &swr[id + j];
>              if (swe->mbuf != NULL) {
>                    rte_pktmbuf_free_seg(swe->mbuf);
>                    swe->mbuf = NULL;
>              }
>              n += (swe->last_id == id + j)
>           }
> 
>           if (n < cnt) {
>                ixgbe_xmit_cleanup(txq);
>                free =   txq->nb_tx_free - free;
>           }
>      }
>      return n;
> }
> 
> >
> > Above all , it seem not a perfect idea to reuse ixgbe_xmit_cleanup().
> 
> Totally disagree, see above.
> 
> >
> > Second.
> > The function in patch is copy from code in igb_rxtx.c. it already
> > updated in 2017, The commit id is
> 8d907d2b79f7a54c809f1c44970ff455fa2865e1.
> 
> I realized that.
> But I think it as a problem, not a positive thing.
> While they do have some similarities, igb abd ixgbe are PMDs for different
> devices, and their TX code differs quite a lot. Let say igb doesn't use
> tx_rs_threshold, but instead set RS bit for each last TXD.
> So, just blindly copying tx_done_cleanup() from igb to ixgbe doesn't look like a
> good idea to me.
> 
> > I trust the logic of code is right.
> > Actually it don't complete for ixgbe, i40e and ice, while it don't
> > change the value of last_desc_cleaned and tx_next_dd. And it's
> > beginning prefer last_desc_cleaned or  tx_next_dd(for offload or simple) to
> tx_tail.
> >
> > So, I suggest to use the old function and fix the issue.
> >
> > > >
> > > >
> > > > > >This is the first packet that will be
> > > > > > + * attempted to be freed.
> > > > > > + */
> > > > > > +
> > > > > > +/* Get last segment in most recently added packet. */ tx_last
> > > > > > += sw_ring[txq->tx_tail].last_id;
> > > > > > +
> > > > > > +/* Get the next segment, which is the oldest segment in ring.
> > > > > > +*/ tx_first = sw_ring[tx_last].next_id;
> > > > > > +
> > > > > > +/* Set the current index to the first. */ tx_id = tx_first;
> > > > > > +
> > > > > > +/*
> > > > > > + * Loop through each packet. For each packet, verify that an
> > > > > > + * mbuf exists and that the last segment is free. If so, free
> > > > > > + * it and move on.
> > > > > > + */
> > > > > > +while (1) {
> > > > > > +tx_last = sw_ring[tx_id].last_id;
> > > > > > +
> > > > > > +if (sw_ring[tx_last].mbuf) {
> > > > > > +if (!(txr[tx_last].wb.status &
> > > > > > +IXGBE_TXD_STAT_DD))
> > > > > > +break;
> > > > > > +
> > > > > > +/* Get the start of the next packet. */ tx_next =
> > > > > > +sw_ring[tx_last].next_id;
> > > > > > +
> > > > > > +/*
> > > > > > + * Loop through all segments in a
> > > > > > + * packet.
> > > > > > + */
> > > > > > +do {
> > > > > > +rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
> > > > > > +sw_ring[tx_id].mbuf = NULL;
> > > > > > +sw_ring[tx_id].last_id = tx_id;
> > > > > > +
> > > > > > +/* Move to next segment. */
> > > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > > +
> > > > > > +} while (tx_id != tx_next);
> > > > > > +
> > > > > > +/*
> > > > > > + * Increment the number of packets
> > > > > > + * freed.
> > > > > > + */
> > > > > > +count++;
> > > > > > +
> > > > > > +if (unlikely(count == (int)free_cnt)) break; } else {
> > > > > > +/*
> > > > > > + * There are multiple reasons to be here:
> > > > > > + * 1) All the packets on the ring have been
> > > > > > + *    freed - tx_id is equal to tx_first
> > > > > > + *    and some packets have been freed.
> > > > > > + *    - Done, exit
> > > > > > + * 2) Interfaces has not sent a rings worth of
> > > > > > + *    packets yet, so the segment after tail is
> > > > > > + *    still empty. Or a previous call to this
> > > > > > + *    function freed some of the segments but
> > > > > > + *    not all so there is a hole in the list.
> > > > > > + *    Hopefully this is a rare case.
> > > > > > + *    - Walk the list and find the next mbuf. If
> > > > > > + *      there isn't one, then done.
> > > > > > + */
> > > > > > +if (likely(tx_id == tx_first && count != 0)) break;
> > > > > > +
> > > > > > +/*
> > > > > > + * Walk the list and find the next mbuf, if any.
> > > > > > + */
> > > > > > +do {
> > > > > > +/* Move to next segment. */
> > > > > > +tx_id = sw_ring[tx_id].next_id;
> > > > > > +
> > > > > > +if (sw_ring[tx_id].mbuf)
> > > > > > +break;
> > > > > > +
> > > > > > +} while (tx_id != tx_first);
> > > > > > +
> > > > > > +/*
> > > > > > + * Determine why previous loop bailed. If there
> > > > > > + * is not an mbuf, done.
> > > > > > + */
> > > > > > +if (sw_ring[tx_id].mbuf == NULL) break; } }
> > > > > > +
> > > > > > +return count;
> > > > > > +}
> > > > > > +
> > > > > >  static void __attribute__((cold))
> > > > > > ixgbe_tx_free_swring(struct ixgbe_tx_queue *txq)  { diff --git
> > > > > > a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > b/drivers/net/ixgbe/ixgbe_rxtx.h index 505d344b9..2c3770af6
> > > > > > 100644
> > > > > > --- a/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.h
> > > > > > @@ -285,6 +285,8 @@ int
> > > > > > ixgbe_rx_vec_dev_conf_condition_check(struct
> > > > > > rte_eth_dev *dev);  int ixgbe_rxq_vec_setup(struct
> > > > > > ixgbe_rx_queue *rxq);  void
> > > > > > ixgbe_rx_queue_release_mbufs_vec(struct
> > > > > > ixgbe_rx_queue *rxq);
> > > > > >
> > > > > > +int ixgbe_tx_done_cleanup(void *txq, uint32_t free_cnt);
> > > > > > +
> > > > > >  extern const uint32_t ptype_table[IXGBE_PACKET_TYPE_MAX];
> > > > > >  extern const uint32_t
> > > > > > ptype_table_tn[IXGBE_PACKET_TYPE_TN_MAX];
> > > > > >
> > > > > > --
> > > > > > 2.17.1
> > > > >
> > > >
> > >
> >
> 


  reply index

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-03  5:51 [dpdk-dev] [PATCH 0/4] drivers/net: " Chenxu Di
2019-12-03  5:51 ` [dpdk-dev] [PATCH 1/4] net/fm10k: " Chenxu Di
2019-12-03  5:51 ` [dpdk-dev] [PATCH 2/4] net/i40e: " Chenxu Di
2019-12-03  5:51 ` [dpdk-dev] [PATCH 3/4] net/ice: " Chenxu Di
2019-12-03  5:51 ` [dpdk-dev] [PATCH 4/4] net/ixgbe: " Chenxu Di
2019-12-20  3:02 ` [dpdk-dev] [PATCH v2 0/5] drivers/net: " Chenxu Di
2019-12-20  3:02   ` [dpdk-dev] [PATCH v2 1/5] net/fm10k: " Chenxu Di
2019-12-20  3:02   ` [dpdk-dev] [PATCH v2 2/5] net/i40e: " Chenxu Di
2019-12-20  3:02   ` [dpdk-dev] [PATCH v2 3/5] net/ice: " Chenxu Di
2019-12-20  3:02   ` [dpdk-dev] [PATCH v2 4/5] net/ixgbe: " Chenxu Di
2019-12-20  3:02   ` [dpdk-dev] [PATCH v2 5/5] net/e1000: " Chenxu Di
2019-12-20  3:15 ` [dpdk-dev] [PATCH v3 0/5] drivers/net: " Chenxu Di
2019-12-20  3:15   ` [dpdk-dev] [PATCH v3 1/5] net/fm10k: " Chenxu Di
2019-12-20  3:15   ` [dpdk-dev] [PATCH v3 2/5] net/i40e: " Chenxu Di
2019-12-20  3:15   ` [dpdk-dev] [PATCH v3 3/5] net/ice: " Chenxu Di
2019-12-20  3:15   ` [dpdk-dev] [PATCH v3 4/5] net/ixgbe: " Chenxu Di
2019-12-20  3:15   ` [dpdk-dev] [PATCH v3 5/5] net/e1000: " Chenxu Di
2019-12-24  2:39 ` [dpdk-dev] [PATCH v4 0/5] drivers/net: " Chenxu Di
2019-12-24  2:39   ` [dpdk-dev] [PATCH v4 1/5] net/fm10k: " Chenxu Di
2019-12-24  2:39   ` [dpdk-dev] [PATCH v4 2/5] net/i40e: " Chenxu Di
2019-12-26  8:24     ` Xing, Beilei
2019-12-24  2:39   ` [dpdk-dev] [PATCH v4 3/5] net/ice: " Chenxu Di
2019-12-24  2:39   ` [dpdk-dev] [PATCH v4 4/5] net/ixgbe: " Chenxu Di
2019-12-24  2:39   ` [dpdk-dev] [PATCH v4 5/5] net/e1000: " Chenxu Di
2019-12-30  9:38 ` [dpdk-dev] [PATCH v6 0/4] drivers/net: " Chenxu Di
2019-12-30  9:38   ` [dpdk-dev] [PATCH v6 1/4] net/i40e: " Chenxu Di
2019-12-30 13:01     ` Ananyev, Konstantin
2019-12-30  9:38   ` [dpdk-dev] [PATCH v6 2/4] net/ice: " Chenxu Di
2019-12-30  9:38   ` [dpdk-dev] [PATCH v6 3/4] net/ixgbe: " Chenxu Di
2019-12-30 12:53     ` Ananyev, Konstantin
2020-01-03  9:01       ` Di, ChenxuX
2020-01-05 23:36         ` Ananyev, Konstantin
2020-01-06  9:03           ` Di, ChenxuX
2020-01-06 13:26             ` Ananyev, Konstantin
2020-01-07 10:46               ` Di, ChenxuX [this message]
2020-01-07 14:09                 ` Ananyev, Konstantin
2020-01-08 10:15                   ` Di, ChenxuX
2020-01-08 15:12                     ` Ananyev, Konstantin
2019-12-30  9:38   ` [dpdk-dev] [PATCH v6 4/4] net/e1000: " Chenxu Di
2020-01-09 10:38 ` [dpdk-dev] [PATCH v7 0/4] drivers/net: " Chenxu Di
2020-01-09 10:38   ` [dpdk-dev] [PATCH v7 1/4] net/i40e: " Chenxu Di
2020-01-09 10:38   ` [dpdk-dev] [PATCH v7 2/4] net/ice: " Chenxu Di
2020-01-09 10:38   ` [dpdk-dev] [PATCH v7 3/4] net/ixgbe: " Chenxu Di
2020-01-09 14:01     ` Ananyev, Konstantin
2020-01-10 10:08       ` Di, ChenxuX
2020-01-10 12:46         ` Ananyev, Konstantin
2020-01-09 10:38   ` [dpdk-dev] [PATCH v7 4/4] net/e1000: " Chenxu Di
2020-01-10  9:58 ` [dpdk-dev] [PATCH v8 0/4] drivers/net: " Chenxu Di
2020-01-10  9:58   ` [dpdk-dev] [PATCH v8 1/4] net/i40e: " Chenxu Di
2020-01-10  9:58   ` [dpdk-dev] [PATCH v8 2/4] net/ice: " Chenxu Di
2020-01-10  9:58   ` [dpdk-dev] [PATCH v8 3/4] net/ixgbe: " Chenxu Di
2020-01-10 13:49     ` Ananyev, Konstantin
2020-01-10  9:59   ` [dpdk-dev] [PATCH v8 4/4] net/e1000: " Chenxu Di
2020-01-13  9:57 ` [dpdk-dev] [PATCH v9 0/4] drivers/net: " Chenxu Di
2020-01-13  9:57   ` [dpdk-dev] [PATCH v9 1/4] net/i40e: " Chenxu Di
2020-01-13 11:08     ` Ananyev, Konstantin
2020-01-13  9:57   ` [dpdk-dev] [PATCH v9 2/4] net/ice: " Chenxu Di
2020-01-14  1:55     ` Yang, Qiming
2020-01-14 12:40     ` Ferruh Yigit
2020-01-15 14:34       ` Ferruh Yigit
2020-01-16  1:40         ` Di, ChenxuX
2020-01-16  7:09           ` [dpdk-dev] [PATCH] net/ice: cleanup for vec path check Xiaolong Ye
2020-01-16 10:19             ` Ferruh Yigit
2020-01-17  2:21             ` Yang, Qiming
2020-01-16  8:43     ` [dpdk-dev] [PATCH v9 2/4] net/ice: cleanup Tx buffers Ferruh Yigit
2020-01-13  9:57   ` [dpdk-dev] [PATCH v9 3/4] net/ixgbe: " Chenxu Di
2020-01-13 11:07     ` Ananyev, Konstantin
2020-01-16  8:44     ` Ferruh Yigit
2020-01-16 14:47     ` Ferruh Yigit
2020-01-16 15:23       ` Ferruh Yigit
2020-01-13  9:57   ` [dpdk-dev] [PATCH v9 4/4] net/e1000: " Chenxu Di
2020-01-13 11:08     ` Ananyev, Konstantin
2020-01-14  2:49     ` Ye Xiaolong
2020-01-14  2:22 ` [dpdk-dev] [PATCH 0/4] drivers/net: " Ye Xiaolong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3B926E44943CB04AA3A39AC16328CE39B9440F@SHSMSX101.ccr.corp.intel.com \
    --to=chenxux.di@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=qiming.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox