DPDK patches and discussions
 help / color / mirror / Atom feed
From: Feifei Wang <Feifei.Wang2@arm.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Qi Zhang" <qi.z.zhang@intel.com>,
	"Beilei Xing" <beilei.xing@intel.com>,
	"Konstantin Ananyev" <konstantin.v.ananyev@yandex.ru>,
	"konstantin.ananyev@huawei.com" <konstantin.ananyev@huawei.com>,
	"Ruifeng Wang" <Ruifeng.Wang@arm.com>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	"Morten Brørup" <mb@smartsharesystems.com>, nd <nd@arm.com>
Subject: 回复: 回复: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side
Date: Tue, 28 Feb 2023 06:43:26 +0000	[thread overview]
Message-ID: <AS8PR08MB77180D81A851D524DF890690C8AC9@AS8PR08MB7718.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <996c2239-1a3f-2fbd-d8af-40c3e17f375a@intel.com>

Hi, Ferruh

This email summarizes our latest improvement work for direct-rearm and
hope it can fix some concerns about direct-rearm.

Best Regards
Feifei

> -----邮件原件-----
> 发件人: Ferruh Yigit <ferruh.yigit@intel.com>
> 发送时间: Tuesday, January 18, 2022 11:52 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; Morten Brørup
> <mb@smartsharesystems.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; thomas@monjalon.net; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Qi Zhang
> <qi.z.zhang@intel.com>; Beilei Xing <beilei.xing@intel.com>
> 主题: Re: 回复: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive
> side
> 
> On 12/28/2021 6:55 AM, Feifei Wang wrote:
> > Thanks for your comments.
> >
> >> -----邮件原件-----
> >> 发件人: Morten Brørup <mb@smartsharesystems.com>
> >> 发送时间: Sunday, December 26, 2021 6:25 PM
> >> 收件人: Feifei Wang <Feifei.Wang2@arm.com>
> >> 抄送: dev@dpdk.org; nd <nd@arm.com>
> >> 主题: RE: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive
> >> side
> >>
> >>> From: Feifei Wang [mailto:feifei.wang2@arm.com]
> >>> Sent: Friday, 24 December 2021 17.46
> >>>
> >>> Currently, the transmit side frees the buffers into the lcore cache
> >>> and the receive side allocates buffers from the lcore cache. The
> >>> transmit side typically frees 32 buffers resulting in 32*8=256B of
> >>> stores to lcore cache. The receive side allocates 32 buffers and
> >>> stores them in the receive side software ring, resulting in
> >>> 32*8=256B of stores and 256B of load from the lcore cache.
> >>>
> >>> This patch proposes a mechanism to avoid freeing to/allocating from
> >>> the lcore cache. i.e. the receive side will free the buffers from
> >>> transmit side directly into it's software ring. This will avoid the
> >>> 256B of loads and stores introduced by the lcore cache. It also
> >>> frees up the cache lines used by the lcore cache.
> >>>
> >>> However, this solution poses several constraint:
> >>>
> >>> 1)The receive queue needs to know which transmit queue it should
> >>> take the buffers from. The application logic decides which transmit
> >>> port to use to send out the packets. In many use cases the NIC might
> >>> have a single port ([1], [2], [3]), in which case a given transmit
> >>> queue is always mapped to a single receive queue (1:1 Rx queue: Tx
> >>> queue). This is easy to configure.
> >>>
> >>> If the NIC has 2 ports (there are several references), then we will
> >>> have
> >>> 1:2 (RX queue: TX queue) mapping which is still easy to configure.
> >>> However, if this is generalized to 'N' ports, the configuration can
> >>> be long. More over the PMD would have to scan a list of transmit
> >>> queues to pull the buffers from.
> >>
> >> I disagree with the description of this constraint.
> >>
> >> As I understand it, it doesn't matter now many ports or queues are in
> >> a NIC or system.
> >>
> >> The constraint is more narrow:
> >>
> >> This patch requires that all packets ingressing on some port/queue
> >> must egress on the specific port/queue that it has been configured to
> >> ream its buffers from. I.e. an application cannot route packets
> >> between multiple ports with this patch.
> >
> > First, I agree with that direct-rearm mode is suitable for the case
> > that user should know the direction of the flow in advance and map
> > rx/tx with each other. It is not suitable for the normal packet random route
> case.
> >
> > Second, our proposed two cases: one port NIC and two port NIC means
> > the direction of flow is determined. Furthermore, for two port NIC,
> > there maybe two flow directions: from port 0 to port 1, or from port 0
> > to port 0. Thus we need to have
> > 1:2 (Rx queue :  Tx queue) mapping.
> >
> > At last, maybe we can change our description as follows:
> > "The first constraint is that user should know the direction of the
> > flow in advance, and based on this, user needs to map the Rx and Tx
> queues according to the flow direction:
> > For example, if the NIC just has one port
> >   ......
> > Or if the NIC have two ports
> > ......."
> >
> >>
> >>>
> >>> 2)The other factor that needs to be considered is 'run-to-completion'
> >>> vs
> >>> 'pipeline' models. In the run-to-completion model, the receive side
> >>> and the transmit side are running on the same lcore serially. In the
> >>> pipeline model. The receive side and transmit side might be running
> >>> on different lcores in parallel. This requires locking. This is not
> >>> supported at this point.
> >>>
> >>> 3)Tx and Rx buffers must be from the same mempool. And we also must
> >>> ensure Tx buffer free number is equal to Rx buffer free number:
> >>> (txq->tx_rs_thresh == RTE_I40E_RXQ_REARM_THRESH) Thus,
> 'tx_next_dd'
> >>> can be updated correctly in direct-rearm mode. This is due to
> >>> tx_next_dd is a variable to compute tx sw-ring free location.
> >>> Its value will be one more round than the position where next time
> >>> free starts.
> >>>
> >>
> >> You are missing the fourth constraint:
> >>
> >> 4) The application must transmit all received packets immediately,
> >> i.e. QoS queueing and similar is prohibited.
> >>
> >
> > You are right and this is indeed one of the limitations.
> >
> >>> Current status in this RFC:
> >>> 1)An API is added to allow for mapping a TX queue to a RX queue.
> >>>    Currently it supports 1:1 mapping.
> >>> 2)The i40e driver is changed to do the direct re-arm of the receive
> >>>    side.
> >>> 3)L3fwd application is hacked to do the mapping for the following
> >>> command:
> >>>    one core two flows case:
> >>>    $./examples/dpdk-l3fwd -n 4 -l 1 -a 0001:01:00.0 -a 0001:01:00.1
> >>>    -- -p 0x3 -P --config='(0,0,1),(1,0,1)'
> >>>    where:
> >>>    Port 0 Rx queue 0 is mapped to Port 1 Tx queue 0
> >>>    Port 1 Rx queue 0 is mapped to Port 0 Tx queue 0
> >>>
> >>> Testing status:
> >>> 1)Tested L3fwd with the above command:
> >>> The testing results for L3fwd are as follows:
> >>> -------------------------------------------------------------------
> >>> N1SDP:
> >>> Base performance(with this patch)   with direct re-arm mode enabled
> >>>        0%                                  +14.1%
> >>>
> >>> Ampere Altra:
> >>> Base performance(with this patch)   with direct re-arm mode enabled
> >>>        0%                                  +17.1%
> >>> -------------------------------------------------------------------
> >>> This patch can not affect performance of normal mode, and if enable
> >>> direct-rearm mode, performance can be improved by 14% - 17% in n1sdp
> >>> and ampera-altra.
> >>>
> >>> Feedback requested:
> >>> 1) Has anyone done any similar experiments, any lessons learnt?
> >>> 2) Feedback on API
> >>>
> >>> Next steps:
> >>> 1) Update the code for supporting 1:N(Rx : TX) mapping
> >>> 2) Automate the configuration in L3fwd sample application
> >>>
> >>> Reference:
> >>> [1] https://store.nvidia.com/en-
> >>> us/networking/store/product/MCX623105AN-
> >>>
> >>
> CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECrypt
> >> oDisabled
> >>> / [2]
> >>>
> https://www.intel.com/content/www/us/en/products/sku/192561/intel-
> >>> ethernet-network-adapter-e810cqda1/specifications.html
> >>> [3] https://www.broadcom.com/products/ethernet-
> >> connectivity/network-
> >>> adapters/100gb-nic-ocp/n1100g
> >>>
> >>> Feifei Wang (4):
> >>>    net/i40e: enable direct re-arm mode
> >>>    ethdev: add API for direct re-arm mode
> >>>    net/i40e: add direct re-arm mode internal API
> >>>    examples/l3fwd: give an example for direct rearm mode
> >>>
> >>>   drivers/net/i40e/i40e_ethdev.c        |  34 ++++++
> >>>   drivers/net/i40e/i40e_rxtx.h          |   4 +
> >>>   drivers/net/i40e/i40e_rxtx_vec_neon.c | 149
> >> +++++++++++++++++++++++++-
> >>>   examples/l3fwd/main.c                 |   3 +
> >>>   lib/ethdev/ethdev_driver.h            |  15 +++
> >>>   lib/ethdev/rte_ethdev.c               |  14 +++
> >>>   lib/ethdev/rte_ethdev.h               |  31 ++++++
> >>>   lib/ethdev/version.map                |   3 +
> >>>   8 files changed, 251 insertions(+), 2 deletions(-)
> >>>
> >>> --
> >>> 2.25.1
> >>>
> >>
> >> The patch provides a significant performance improvement, but I am
> >> wondering if any real world applications exist that would use this.
> >> Only a "router on a stick" (i.e. a single-port router) comes to my
> >> mind, and that is probably sufficient to call it useful in the real
> >> world. Do you have any other examples to support the usefulness of this
> patch?
> >>
> > One case I have is about network security. For network firewall, all
> > packets need to ingress on the specified port and egress on the specified
> port to do packet filtering.
> > In this case, we can know flow direction in advance.
> >
> 
> I also have some concerns on how useful this API will be in real life, and does
> the use case worth the complexity it brings.
> And it looks too much low level detail for the application.

Concerns of direct rearm:
1. Earlier version of the design required the rxq/txq pairing to be done before
starting the data plane threads. This required the user to know the direction
of the packet flow in advance. This limited the use cases.

In the latest version, direct-rearm mode is packaged as a separate API. 
This allows for the users to change rxq/txq pairing in real time in data plane,
according to the analysis of the packet flow by the application, for example:
------------------------------------------------------------------------------------------------------------
Step 1: upper application analyse the flow direction
Step 2: rxq_rearm_data = rte_eth_rx_get_rearm_data(rx_portid, rx_queueid)
Step 3: rte_eth_dev_direct_rearm(rx_portid, rx_queueid, tx_portid, tx_queueid, rxq_rearm_data);
Step 4: rte_eth_rx_burst(rx_portid,rx_queueid);
Step 5: rte_eth_tx_burst(tx_portid,tx_queueid);
------------------------------------------------------------------------------------------------------------
Above can support user to change rxq/txq pairing  at runtime and user does not need to
know the direction of flow in advance. This can effectively expand direct-rearm
use scenarios.

2. Earlier version of direct rearm was breaking the independence between the RX and TX path.
In the latest version, we use a structure to let Rx and Tx interact, for example:
-----------------------------------------------------------------------------------------------------------------------------------
struct rte_eth_rxq_rearm_data {
       struct rte_mbuf **buf_ring; /**< Buffer ring of Rx queue. */
       uint16_t *refill_head;            /**< Head of buffer ring refilling descriptors. */
       uint16_t *receive_tail;          /**< Tail of buffer ring receiving pkts. */
       uint16_t nb_buf;                    /**< configured number of buffer ring. */
}  rxq_rearm_data;

data path:
	/* Get direct-rearm info for a receive queue of an Ethernet device. */
	rxq_rearm_data = rte_eth_rx_get_rearm_data(rx_portid, rx_queueid);
	rte_eth_dev_direct_rearm(rx_portid, rx_queueid, tx_portid, tx_queueid, rxq_rearm_data) {

		/*  Using Tx used buffer to refill Rx buffer ring in direct rearm mode */
		nb_rearm = rte_eth_tx_fill_sw_ring(tx_portid, tx_queueid, rxq_rearm_data );

		/* Flush Rx descriptor in direct rearm mode */
		rte_eth_rx_flush_descs(rx_portid, rx_queuid, nb_rearm) ;
	}
	rte_eth_rx_burst(rx_portid,rx_queueid);
	rte_eth_tx_burst(tx_portid,tx_queueid);
-----------------------------------------------------------------------------------------------------------------------------------
Furthermore, this let direct-rearm usage no longer limited to the same pmd,
it can support moving buffers between different vendor pmds, even can put the buffer
anywhere into your Rx buffer ring as long as the address of the buffer ring can be provided.
In the latest version, we enable direct-rearm in i40e pmd and ixgbe pmd, and also try to
use i40e driver in Rx, ixgbe driver in Tx, and then achieve 7-9% performance improvement
by direct-rearm.

3. Difference between direct rearm, ZC API used in mempool  and general path
For general path: 
                Rx: 32 pkts memcpy from mempool cache to rx_sw_ring
                Tx: 32 pkts memcpy from tx_sw_ring to temporary variable + 32 pkts memcpy from temporary variable to mempool cache
For ZC API used in mempool:
                Rx: 32 pkts memcpy from mempool cache to rx_sw_ring
                Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool cache
                Refer link: http://patches.dpdk.org/project/dpdk/patch/20230221055205.22984-2-kamalakshitha.aligeri@arm.com/
For direct_rearm:
                Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring
Thus we can see in the one loop, compared to general path direct rearm reduce 32+32=64 pkts memcpy;
Compared to ZC API used in mempool, we can see direct rearm reduce 32 pkts memcpy in each loop.
So, direct_rearm has its own benefits.

4. Performance test and real cases
For performance test, in l3fwd, we achieve the performance improvement of up to 15% in Arm server.
For real cases, we have enabled direct-rearm in vpp and achieved performance improvement.

> 
> cc'ed a few more folks for comment.
> 
> >> Anyway, the patch doesn't do any harm if unused, and the only
> >> performance cost is the "if (rxq->direct_rxrearm_enable)" branch in
> >> the Ethdev driver. So I don't oppose to it.
> >>
> >


  parent reply	other threads:[~2023-02-28  6:43 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-24 16:46 Feifei Wang
2021-12-24 16:46 ` [RFC PATCH v1 1/4] net/i40e: enable direct re-arm mode Feifei Wang
2021-12-24 16:46 ` [RFC PATCH v1 2/4] ethdev: add API for " Feifei Wang
2021-12-24 19:38   ` Stephen Hemminger
2021-12-26  9:49     ` 回复: " Feifei Wang
2021-12-26 10:31       ` Morten Brørup
2021-12-24 16:46 ` [RFC PATCH v1 3/4] net/i40e: add direct re-arm mode internal API Feifei Wang
2021-12-24 16:46 ` [RFC PATCH v1 4/4] examples/l3fwd: give an example for direct rearm mode Feifei Wang
2021-12-26 10:25 ` [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side Morten Brørup
2021-12-28  6:55   ` 回复: " Feifei Wang
2022-01-18 15:51     ` Ferruh Yigit
2022-01-18 16:53       ` Thomas Monjalon
2022-01-18 17:27         ` Morten Brørup
2022-01-27  5:24           ` Honnappa Nagarahalli
2022-01-27 16:45             ` Ananyev, Konstantin
2022-02-02 19:46               ` Honnappa Nagarahalli
2022-01-27  5:16         ` Honnappa Nagarahalli
2023-02-28  6:43       ` Feifei Wang [this message]
2023-02-28  6:52         ` 回复: " Feifei Wang
2022-01-27  4:06   ` Honnappa Nagarahalli
2022-01-27 17:13     ` Morten Brørup
2022-01-28 11:29     ` Morten Brørup
2023-03-23 10:43 ` [PATCH v4 0/3] Recycle buffers from Tx to Rx Feifei Wang
2023-03-23 10:43   ` [PATCH v4 1/3] ethdev: add API for buffer recycle mode Feifei Wang
2023-03-23 11:41     ` Morten Brørup
2023-03-29  2:16       ` Feifei Wang
2023-03-23 10:43   ` [PATCH v4 2/3] net/i40e: implement recycle buffer mode Feifei Wang
2023-03-23 10:43   ` [PATCH v4 3/3] net/ixgbe: " Feifei Wang
2023-03-30  6:29 ` [PATCH v5 0/3] Recycle buffers from Tx to Rx Feifei Wang
2023-03-30  6:29   ` [PATCH v5 1/3] ethdev: add API for buffer recycle mode Feifei Wang
2023-03-30  7:19     ` Morten Brørup
2023-03-30  9:31       ` Feifei Wang
2023-03-30 15:15         ` Morten Brørup
2023-03-30 15:58         ` Morten Brørup
2023-04-26  6:59           ` Feifei Wang
2023-04-19 14:46     ` Ferruh Yigit
2023-04-26  7:29       ` Feifei Wang
2023-03-30  6:29   ` [PATCH v5 2/3] net/i40e: implement recycle buffer mode Feifei Wang
2023-03-30  6:29   ` [PATCH v5 3/3] net/ixgbe: " Feifei Wang
2023-04-19 14:46     ` Ferruh Yigit
2023-04-26  7:36       ` Feifei Wang
2023-03-30 15:04   ` [PATCH v5 0/3] Recycle buffers from Tx to Rx Stephen Hemminger
2023-04-03  2:48     ` Feifei Wang
2023-04-19 14:56   ` Ferruh Yigit
2023-04-25  7:57     ` Feifei Wang
2023-05-25  9:45 ` [PATCH v6 0/4] Recycle mbufs from Tx queue to Rx queue Feifei Wang
2023-05-25  9:45   ` [PATCH v6 1/4] ethdev: add API for mbufs recycle mode Feifei Wang
2023-05-25 15:08     ` Morten Brørup
2023-05-31  6:10       ` Feifei Wang
2023-06-05 12:53     ` Константин Ананьев
2023-06-06  2:55       ` Feifei Wang
2023-06-06  7:10         ` Konstantin Ananyev
2023-06-06  7:31           ` Feifei Wang
2023-06-06  8:34             ` Konstantin Ananyev
2023-06-07  0:00               ` Ferruh Yigit
2023-06-12  3:25                 ` Feifei Wang
2023-05-25  9:45   ` [PATCH v6 2/4] net/i40e: implement " Feifei Wang
2023-06-05 13:02     ` Константин Ананьев
2023-06-06  3:16       ` Feifei Wang
2023-06-06  7:18         ` Konstantin Ananyev
2023-06-06  7:58           ` Feifei Wang
2023-06-06  8:27             ` Konstantin Ananyev
2023-06-12  3:05               ` Feifei Wang
2023-05-25  9:45   ` [PATCH v6 3/4] net/ixgbe: " Feifei Wang
2023-05-25  9:45   ` [PATCH v6 4/4] app/testpmd: add recycle mbufs engine Feifei Wang
2023-06-05 13:08     ` Константин Ананьев
2023-06-06  6:32       ` Feifei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AS8PR08MB77180D81A851D524DF890690C8AC9@AS8PR08MB7718.eurprd08.prod.outlook.com \
    --to=feifei.wang2@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=beilei.xing@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=mb@smartsharesystems.com \
    --cc=nd@arm.com \
    --cc=qi.z.zhang@intel.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).