DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
@ 2021-04-16 11:04 Chengchang Tang
  2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-04-16 11:04 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit

This patch add Tx prepare for bonding device.

Currently, the bonding driver has not implemented the callback of
rte_eth_tx_prepare function. Therefore, the TX prepare function of the
slave devices will never be invoked. When hardware offloading such as
CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
to adjust packets (for example, set correct pseudo packet headers).
Otherwise, related offloading fails and even packets are sent
incorrectly. Due to this limitation, the bonded device cannot use these
HW offloading in the Tx direction.

Because packet sending algorithms are numerous and complex in bond PMD,
it is hard to design the callback for rte_eth_tx_prepare. In this patch,
the tx_prepare callback of bonding PMD is not implemented. Instead,
rte_eth_tx_prepare has been called in tx_burst callback. And a global
variable is introduced to control whether the bonded device need call
the rte_eth_tx_prepare. If upper-layer users need to use some TX
offloading that depend on tx_prepare , they should enable the preparation
function. In this way, the bonded device will call the rte_eth_tx_prepare
for the fast path packets in the tx_burst callback.

Chengchang Tang (2):
  net/bonding: add Tx prepare for bonding
  app/testpmd: add cmd for bonding Tx prepare

 app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
 drivers/net/bonding/eth_bond_private.h      |  1 +
 drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
 drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
 drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
 drivers/net/bonding/version.map             |  5 +++
 7 files changed, 167 insertions(+), 4 deletions(-)

--
2.7.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding
  2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
@ 2021-04-16 11:04 ` Chengchang Tang
  2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-04-16 11:04 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit

To use the HW offloads capability (e.g. checksum and TSO) in the Tx
direction, the upper-layer users need to call rte_eth_dev_prepare to do
some adjustment to the packets before sending them (e.g. processing
pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
callback of the bond driver is not implemented. Therefore, related
offloads can not be used unless the upper layer users process the packet
properly in their own application. But it is bad for the
transplantability.

However, it is difficult to design the tx_prepare callback for bonding
driver. Because when a bonded device sends packets, the bonded device
allocates the packets to different slave devices based on the real-time
link status and bonding mode. That is, it is very difficult for the
bonding device to determine which slave device's prepare function should
be invoked. In addition, if the link status changes after the packets are
prepared, the packets may fail to be sent because packets allocation may
change.

So, in this patch, the tx_prepare callback of bonding driver is not
implemented. Instead, the prepare function of the slave device is added to
the tx_burst callback. And a global variable is introduced to control
whether the bonded device need call the rte_eth_tx_prepare. If upper-layer
users need to use related offloads, they should enable the preparation
function. In this way, the bonded device will call the rte_eth_tx_prepare
for the fast path packets in the tx_burst callback.

Note:
The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
because in broadcast mode, a packet needs to be sent by all slave ports.
Different PMDs process the packets differently in tx_prepare. As a result,
the sent packet may be incorrect.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
---
 drivers/net/bonding/eth_bond_private.h |  1 +
 drivers/net/bonding/rte_eth_bond.h     | 29 +++++++++++++++++++++++++++++
 drivers/net/bonding/rte_eth_bond_api.c | 28 ++++++++++++++++++++++++++++
 drivers/net/bonding/rte_eth_bond_pmd.c | 33 +++++++++++++++++++++++++++++----
 drivers/net/bonding/version.map        |  5 +++++
 5 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/eth_bond_private.h b/drivers/net/bonding/eth_bond_private.h
index 75fb8dc..72ec4a0 100644
--- a/drivers/net/bonding/eth_bond_private.h
+++ b/drivers/net/bonding/eth_bond_private.h
@@ -126,6 +126,7 @@ struct bond_dev_private {
 	/**< Flag for whether MAC address is user defined or not */

 	uint8_t link_status_polling_enabled;
+	uint8_t tx_prepare_enabled;
 	uint32_t link_status_polling_interval_ms;

 	uint32_t link_down_delay_ms;
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index 874aa91..8ec09eb 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -343,6 +343,35 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
 int
 rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);

+/**
+ * Enable Tx prepare for bonded port
+ *
+ * To perform some HW offloads in the Tx direction, some PMDs need to call
+ * rte_eth_tx_prepare to do some adjustment for packets. This function
+ * enables packets preparation in the fast path for bonded device.
+ *
+ * @param bonded_port_id      Bonded device id
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_tx_prepare_enable(uint16_t bonded_port_id);
+
+/**
+ * Disable Tx prepare for bonded port
+ *
+ * This function disables Tx prepare for the fast path packets.
+ *
+ * @param bonded_port_id      Bonded device id
+ *
+ * @return
+ *   0 on success, negative value otherwise.
+ */
+__rte_experimental
+int
+rte_eth_bond_tx_prepare_disable(uint16_t bonded_port_id);

 #ifdef __cplusplus
 }
diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c
index 17e6ff8..b04806a 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -1050,3 +1050,31 @@ rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id)

 	return internals->link_up_delay_ms;
 }
+
+int
+rte_eth_bond_tx_prepare_enable(uint16_t bonded_port_id)
+{
+	struct bond_dev_private *internals;
+
+	if (valid_bonded_port_id(bonded_port_id) != 0)
+		return -1;
+
+	internals = rte_eth_devices[bonded_port_id].data->dev_private;
+	internals->tx_prepare_enabled = 1;
+
+	return 0;
+}
+
+int
+rte_eth_bond_tx_prepare_disable(uint16_t bonded_port_id)
+{
+	struct bond_dev_private *internals;
+
+	if (valid_bonded_port_id(bonded_port_id) != 0)
+		return -1;
+
+	internals = rte_eth_devices[bonded_port_id].data->dev_private;
+	internals->tx_prepare_enabled = 0;
+
+	return 0;
+}
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 2e9cea5..3b7870f 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
 	/* Send packet burst on each slave device */
 	for (i = 0; i < num_of_slaves; i++) {
 		if (slave_nb_pkts[i] > 0) {
+			int nb_prep_pkts = slave_nb_pkts[i];
+			if (internals->tx_prepare_enabled)
+				nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
+						bd_tx_q->queue_id,
+						slave_bufs[i], nb_prep_pkts);
+
 			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
-					slave_bufs[i], slave_nb_pkts[i]);
+					slave_bufs[i], nb_prep_pkts);

 			/* if tx burst fails move packets to end of bufs */
 			if (unlikely(num_tx_slave < slave_nb_pkts[i])) {
@@ -632,6 +638,7 @@ bond_ethdev_tx_burst_active_backup(void *queue,
 {
 	struct bond_dev_private *internals;
 	struct bond_tx_queue *bd_tx_q;
+	int nb_prep_pkts = nb_pkts;

 	bd_tx_q = (struct bond_tx_queue *)queue;
 	internals = bd_tx_q->dev_private;
@@ -639,8 +646,13 @@ bond_ethdev_tx_burst_active_backup(void *queue,
 	if (internals->active_slave_count < 1)
 		return 0;

+	if (internals->tx_prepare_enabled)
+		nb_prep_pkts =
+			rte_eth_tx_prepare(internals->current_primary_port,
+				bd_tx_q->queue_id, bufs, nb_prep_pkts);
+
 	return rte_eth_tx_burst(internals->current_primary_port, bd_tx_q->queue_id,
-			bufs, nb_pkts);
+			bufs, nb_prep_pkts);
 }

 static inline uint16_t
@@ -939,6 +951,7 @@ bond_ethdev_tx_burst_tlb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	}

 	for (i = 0; i < num_of_slaves; i++) {
+		int nb_prep_pkts;
 		rte_eth_macaddr_get(slaves[i], &active_slave_addr);
 		for (j = num_tx_total; j < nb_pkts; j++) {
 			if (j + 3 < nb_pkts)
@@ -955,8 +968,14 @@ bond_ethdev_tx_burst_tlb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 #endif
 		}

+		nb_prep_pkts = nb_pkts - num_tx_total;
+		if (internals->tx_prepare_enabled)
+			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
+					bd_tx_q->queue_id, bufs + num_tx_total,
+					nb_prep_pkts);
+
 		num_tx_total += rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
-				bufs + num_tx_total, nb_pkts - num_tx_total);
+				bufs + num_tx_total, nb_prep_pkts);

 		if (num_tx_total == nb_pkts)
 			break;
@@ -1159,12 +1178,18 @@ tx_burst_balance(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs,

 	/* Send packet burst on each slave device */
 	for (i = 0; i < slave_count; i++) {
+		int nb_prep_pkts;
 		if (slave_nb_bufs[i] == 0)
 			continue;
+		nb_prep_pkts = slave_nb_bufs[i];
+		if (internals->tx_prepare_enabled)
+			nb_prep_pkts = rte_eth_tx_prepare(slave_port_ids[i],
+					bd_tx_q->queue_id, slave_bufs[i],
+					nb_prep_pkts);

 		slave_tx_count = rte_eth_tx_burst(slave_port_ids[i],
 				bd_tx_q->queue_id, slave_bufs[i],
-				slave_nb_bufs[i]);
+				nb_prep_pkts);

 		total_tx_count += slave_tx_count;

diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index df81ee7..b642729 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -31,3 +31,8 @@ DPDK_21 {

 	local: *;
 };
+
+EXPERIMENTAL {
+	rte_eth_bond_tx_prepare_disable;
+	rte_eth_bond_tx_prepare_enable;
+};
--
2.7.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare
  2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
  2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
@ 2021-04-16 11:04 ` Chengchang Tang
  2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-04-16 11:04 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit

Add new command to support enable/disable Tx prepare on each slave of a
bonded device. This helps to test some Tx HW offloads (e.g. checksum and
TSO) for boned devices in testpmd. The related commands are as follows:

set bonding tx_prepare <port_id> [enable|disable]

When this option is enabled, bonding driver would call rte_eth_dev_prepare
to do some adjustment to the packets in the fast path to meet the device's
requirement to turn on some HW offload(e.g. processing pseudo headers when
Tx checksum offload enabled). This help bonded device to use more Tx
offloads.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
---
 app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
 2 files changed, 75 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f44116b..2d1b3b6 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -647,6 +647,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"set bonding lacp dedicated_queues <port_id> (enable|disable)\n"
 			"	Enable/disable dedicated queues for LACP control traffic.\n\n"

+			"set bonding tx_prepare <port_id> (enable|disable)\n"
+			"	Enable/disable tx_prepare for fast path traffic.\n\n"
+
 #endif
 			"set link-up port (port_id)\n"
 			"	Set link up for a port.\n\n"
@@ -5886,6 +5889,68 @@ cmdline_parse_inst_t cmd_set_lacp_dedicated_queues = {
 		}
 };

+/* *** SET BONDING TX_PREPARE *** */
+struct cmd_set_bonding_tx_prepare_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t bonding;
+	cmdline_fixed_string_t tx_prepare;
+	portid_t port_id;
+	cmdline_fixed_string_t mode;
+};
+
+static void cmd_set_bonding_tx_prepare_parsed(void *parsed_result,
+		__rte_unused  struct cmdline *cl,
+		__rte_unused void *data)
+{
+	struct cmd_set_bonding_tx_prepare_result *res = parsed_result;
+	portid_t port_id = res->port_id;
+
+	if (!strcmp(res->mode, "enable")) {
+		if (rte_eth_bond_tx_prepare_enable(port_id) == 0)
+			printf("Tx prepare for bonding device enabled\n");
+		else
+			printf("Enabling bonding device Tx prepare "
+					"on port %d failed\n", port_id);
+	} else if (!strcmp(res->mode, "disable")) {
+		if (rte_eth_bond_tx_prepare_disable(port_id) == 0)
+			printf("Tx prepare for bonding device disabled\n");
+		else
+			printf("Disabling bonding device Tx prepare "
+					"on port %d failed\n", port_id);
+	}
+}
+
+cmdline_parse_token_string_t cmd_setbonding_tx_prepare_set =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_tx_prepare_result,
+		set, "set");
+cmdline_parse_token_string_t cmd_setbonding_tx_prepare_bonding =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_tx_prepare_result,
+		bonding, "bonding");
+cmdline_parse_token_string_t cmd_setbonding_tx_prepare_tx_prepare =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_tx_prepare_result,
+		tx_prepare, "tx_prepare");
+cmdline_parse_token_num_t cmd_setbonding_tx_prepare_port_id =
+TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_tx_prepare_result,
+		port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_setbonding_tx_prepare_mode =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_tx_prepare_result,
+		mode, "enable#disable");
+
+cmdline_parse_inst_t cmd_set_bond_tx_prepare = {
+		.f = cmd_set_bonding_tx_prepare_parsed,
+		.help_str = "set bonding tx_prepare <port_id> enable|disable: "
+			"Enable/disable tx_prepare for port_id",
+		.data = NULL,
+		.tokens = {
+			(void *)&cmd_setbonding_tx_prepare_set,
+			(void *)&cmd_setbonding_tx_prepare_bonding,
+			(void *)&cmd_setbonding_tx_prepare_tx_prepare,
+			(void *)&cmd_setbonding_tx_prepare_port_id,
+			(void *)&cmd_setbonding_tx_prepare_mode,
+			NULL
+		}
+};
+
 /* *** SET BALANCE XMIT POLICY *** */
 struct cmd_set_bonding_balance_xmit_policy_result {
 	cmdline_fixed_string_t set;
@@ -16966,6 +17031,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *) &cmd_set_balance_xmit_policy,
 	(cmdline_parse_inst_t *) &cmd_set_bond_mon_period,
 	(cmdline_parse_inst_t *) &cmd_set_lacp_dedicated_queues,
+	(cmdline_parse_inst_t *) &cmd_set_bond_tx_prepare,
 	(cmdline_parse_inst_t *) &cmd_set_bonding_agg_mode_policy,
 #endif
 	(cmdline_parse_inst_t *)&cmd_vlan_offload,
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 36f0a32..bdbf1ea 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -2590,6 +2590,15 @@ when in mode 4 (link-aggregation-802.3ad)::
    testpmd> set bonding lacp dedicated_queues (port_id) (enable|disable)


+set bonding tx_prepare
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Enable Tx prepare on bonding devices to help the slave devices prepare the
+packets for some HW offloading (e.g. checksum and TSO)::
+
+   testpmd> set bonding tx_prepare (port_id) (enable|disable)
+
+
 set bonding agg_mode
 ~~~~~~~~~~~~~~~~~~~~

--
2.7.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
  2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
  2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
@ 2021-04-16 11:12 ` Min Hu (Connor)
  2021-04-20  1:26 ` Ferruh Yigit
  2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
  4 siblings, 0 replies; 31+ messages in thread
From: Min Hu (Connor) @ 2021-04-16 11:12 UTC (permalink / raw)
  To: Chengchang Tang, dev; +Cc: linuxarm, chas3, ferruh.yigit

Looks good to me.

在 2021/4/16 19:04, Chengchang Tang 写道:
> This patch add Tx prepare for bonding device.
> 
> Currently, the bonding driver has not implemented the callback of
> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
> slave devices will never be invoked. When hardware offloading such as
> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
> to adjust packets (for example, set correct pseudo packet headers).
> Otherwise, related offloading fails and even packets are sent
> incorrectly. Due to this limitation, the bonded device cannot use these
> HW offloading in the Tx direction.
> 
> Because packet sending algorithms are numerous and complex in bond PMD,
> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
> the tx_prepare callback of bonding PMD is not implemented. Instead,
> rte_eth_tx_prepare has been called in tx_burst callback. And a global
> variable is introduced to control whether the bonded device need call
> the rte_eth_tx_prepare. If upper-layer users need to use some TX
> offloading that depend on tx_prepare , they should enable the preparation
> function. In this way, the bonded device will call the rte_eth_tx_prepare
> for the fast path packets in the tx_burst callback.
> 
> Chengchang Tang (2):
>    net/bonding: add Tx prepare for bonding
>    app/testpmd: add cmd for bonding Tx prepare
> 
>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
>   drivers/net/bonding/eth_bond_private.h      |  1 +
>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
>   drivers/net/bonding/version.map             |  5 +++
>   7 files changed, 167 insertions(+), 4 deletions(-)
> 
> --
> 2.7.4
> 
> .
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
                   ` (2 preceding siblings ...)
  2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
@ 2021-04-20  1:26 ` Ferruh Yigit
  2021-04-20  2:44   ` Chengchang Tang
  2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
  4 siblings, 1 reply; 31+ messages in thread
From: Ferruh Yigit @ 2021-04-20  1:26 UTC (permalink / raw)
  To: Chengchang Tang, dev; +Cc: linuxarm, chas3, humin29

On 4/16/2021 12:04 PM, Chengchang Tang wrote:
> This patch add Tx prepare for bonding device.
> 
> Currently, the bonding driver has not implemented the callback of
> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
> slave devices will never be invoked. When hardware offloading such as
> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
> to adjust packets (for example, set correct pseudo packet headers).
> Otherwise, related offloading fails and even packets are sent
> incorrectly. Due to this limitation, the bonded device cannot use these
> HW offloading in the Tx direction.
> 
> Because packet sending algorithms are numerous and complex in bond PMD,
> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
> the tx_prepare callback of bonding PMD is not implemented. Instead,
> rte_eth_tx_prepare has been called in tx_burst callback. And a global
> variable is introduced to control whether the bonded device need call
> the rte_eth_tx_prepare. If upper-layer users need to use some TX
> offloading that depend on tx_prepare , they should enable the preparation
> function. In this way, the bonded device will call the rte_eth_tx_prepare
> for the fast path packets in the tx_burst callback.
> 

What do you think to add a devarg to bonding PMD to control the tx_prepare?
It won't be as dynamic as API, since it can be possible to change the behavior 
after application is started with API, but do we really need this?

> Chengchang Tang (2):
>    net/bonding: add Tx prepare for bonding
>    app/testpmd: add cmd for bonding Tx prepare
> 
>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
>   drivers/net/bonding/eth_bond_private.h      |  1 +
>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
>   drivers/net/bonding/version.map             |  5 +++
>   7 files changed, 167 insertions(+), 4 deletions(-)
> 
> --
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-20  1:26 ` Ferruh Yigit
@ 2021-04-20  2:44   ` Chengchang Tang
  2021-04-20  8:33     ` Ananyev, Konstantin
  0 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-04-20  2:44 UTC (permalink / raw)
  To: Ferruh Yigit, dev; +Cc: linuxarm, chas3, humin29

On 2021/4/20 9:26, Ferruh Yigit wrote:
> On 4/16/2021 12:04 PM, Chengchang Tang wrote:
>> This patch add Tx prepare for bonding device.
>>
>> Currently, the bonding driver has not implemented the callback of
>> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
>> slave devices will never be invoked. When hardware offloading such as
>> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
>> to adjust packets (for example, set correct pseudo packet headers).
>> Otherwise, related offloading fails and even packets are sent
>> incorrectly. Due to this limitation, the bonded device cannot use these
>> HW offloading in the Tx direction.
>>
>> Because packet sending algorithms are numerous and complex in bond PMD,
>> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
>> the tx_prepare callback of bonding PMD is not implemented. Instead,
>> rte_eth_tx_prepare has been called in tx_burst callback. And a global
>> variable is introduced to control whether the bonded device need call
>> the rte_eth_tx_prepare. If upper-layer users need to use some TX
>> offloading that depend on tx_prepare , they should enable the preparation
>> function. In this way, the bonded device will call the rte_eth_tx_prepare
>> for the fast path packets in the tx_burst callback.
>>
> 
> What do you think to add a devarg to bonding PMD to control the tx_prepare?
> It won't be as dynamic as API, since it can be possible to change the behavior after application is started with API, but do we really need this?

If an API is not added, unnecessary constraints may be introduced. If the
bonding device is created through the rte_eth_bond_create interface instead
devarg "vdev", this function cannot be used because devargs does not take effect
in this case. But from an ease-of-use perspective, adding a devarg is a good
idea. I will add related implementations in the later official patches.

If I understand correctly, the current community does not want to introduce
more private APIs for PMDs. However, the absence of an API on this issue would
introduce some unnecessary constraints, and from that point of view, I think
adding an API seems necessary.
> 
>> Chengchang Tang (2):
>>    net/bonding: add Tx prepare for bonding
>>    app/testpmd: add cmd for bonding Tx prepare
>>
>>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
>>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
>>   drivers/net/bonding/eth_bond_private.h      |  1 +
>>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
>>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
>>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
>>   drivers/net/bonding/version.map             |  5 +++
>>   7 files changed, 167 insertions(+), 4 deletions(-)
>>
>> -- 
>> 2.7.4
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-20  2:44   ` Chengchang Tang
@ 2021-04-20  8:33     ` Ananyev, Konstantin
  2021-04-20 12:44       ` Chengchang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Ananyev, Konstantin @ 2021-04-20  8:33 UTC (permalink / raw)
  To: Chengchang Tang, Yigit, Ferruh, dev; +Cc: linuxarm, chas3, humin29

Hi everyone,

> 
> On 2021/4/20 9:26, Ferruh Yigit wrote:
> > On 4/16/2021 12:04 PM, Chengchang Tang wrote:
> >> This patch add Tx prepare for bonding device.
> >>
> >> Currently, the bonding driver has not implemented the callback of
> >> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
> >> slave devices will never be invoked. When hardware offloading such as
> >> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
> >> to adjust packets (for example, set correct pseudo packet headers).
> >> Otherwise, related offloading fails and even packets are sent
> >> incorrectly. Due to this limitation, the bonded device cannot use these
> >> HW offloading in the Tx direction.
> >>
> >> Because packet sending algorithms are numerous and complex in bond PMD,
> >> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
> >> the tx_prepare callback of bonding PMD is not implemented. Instead,
> >> rte_eth_tx_prepare has been called in tx_burst callback. And a global
> >> variable is introduced to control whether the bonded device need call
> >> the rte_eth_tx_prepare. If upper-layer users need to use some TX
> >> offloading that depend on tx_prepare , they should enable the preparation
> >> function. In this way, the bonded device will call the rte_eth_tx_prepare
> >> for the fast path packets in the tx_burst callback.

I admit that I didn't look at the implementation yet, but it sounds like 
overcomplication to me. Can't we just have a new TX function for bonding PMD
when TX offloads are enabled? And inside that function we will do:
tx_prepare(); tx_burst(); for selected device.
We can select this function at setup stage analysing requested by user TX offloads. 


> >>
> >
> > What do you think to add a devarg to bonding PMD to control the tx_prepare?
> > It won't be as dynamic as API, since it can be possible to change the behavior after application is started with API, but do we really need
> this?
> 
> If an API is not added, unnecessary constraints may be introduced. If the
> bonding device is created through the rte_eth_bond_create interface instead
> devarg "vdev", this function cannot be used because devargs does not take effect
> in this case. But from an ease-of-use perspective, adding a devarg is a good
> idea. I will add related implementations in the later official patches.

I am also against introducing new devarg to control tx_prepare() invocation.
I think at dev_config/queue_setup phase PMD will have enough information to decide.

> 
> If I understand correctly, the current community does not want to introduce
> more private APIs for PMDs. However, the absence of an API on this issue would
> introduce some unnecessary constraints, and from that point of view, I think
> adding an API seems necessary.
> >
> >> Chengchang Tang (2):
> >>    net/bonding: add Tx prepare for bonding
> >>    app/testpmd: add cmd for bonding Tx prepare
> >>
> >>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
> >>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
> >>   drivers/net/bonding/eth_bond_private.h      |  1 +
> >>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
> >>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
> >>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
> >>   drivers/net/bonding/version.map             |  5 +++
> >>   7 files changed, 167 insertions(+), 4 deletions(-)
> >>
> >> --
> >> 2.7.4
> >>
> >
> >
> > .
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-20  8:33     ` Ananyev, Konstantin
@ 2021-04-20 12:44       ` Chengchang Tang
  2021-04-20 13:18         ` Ananyev, Konstantin
  0 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-04-20 12:44 UTC (permalink / raw)
  To: Ananyev, Konstantin, Yigit, Ferruh, dev; +Cc: linuxarm, chas3, humin29

Hi
On 2021/4/20 16:33, Ananyev, Konstantin wrote:
> Hi everyone,
> 
>>
>> On 2021/4/20 9:26, Ferruh Yigit wrote:
>>> On 4/16/2021 12:04 PM, Chengchang Tang wrote:
>>>> This patch add Tx prepare for bonding device.
>>>>
>>>> Currently, the bonding driver has not implemented the callback of
>>>> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
>>>> slave devices will never be invoked. When hardware offloading such as
>>>> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
>>>> to adjust packets (for example, set correct pseudo packet headers).
>>>> Otherwise, related offloading fails and even packets are sent
>>>> incorrectly. Due to this limitation, the bonded device cannot use these
>>>> HW offloading in the Tx direction.
>>>>
>>>> Because packet sending algorithms are numerous and complex in bond PMD,
>>>> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
>>>> the tx_prepare callback of bonding PMD is not implemented. Instead,
>>>> rte_eth_tx_prepare has been called in tx_burst callback. And a global
>>>> variable is introduced to control whether the bonded device need call
>>>> the rte_eth_tx_prepare. If upper-layer users need to use some TX
>>>> offloading that depend on tx_prepare , they should enable the preparation
>>>> function. In this way, the bonded device will call the rte_eth_tx_prepare
>>>> for the fast path packets in the tx_burst callback.
> 
> I admit that I didn't look at the implementation yet, but it sounds like 
> overcomplication to me. Can't we just have a new TX function for bonding PMD
> when TX offloads are enabled? And inside that function we will do:
> tx_prepare(); tx_burst(); for selected device.

The solution you mentioned is workable and may perform better. However, the current
solution is also simple and has a limited impact on performance. It is actually:
if (tx_prepare_enable)
	tx_prepare();
tx_burst();

Overall, it adds almost only one judgment to the case where the related Tx offloads
is not turned on.

> We can select this function at setup stage analysing requested by user TX offloads. 
> 

In PMDs, it is a common practice to select different Tx/Rx function during the setup
phase. But for a 'vdev' device like Bonding, we may need to think more about it.
The reasons are explained below.
> 
>>>>
>>>
>>> What do you think to add a devarg to bonding PMD to control the tx_prepare?
>>> It won't be as dynamic as API, since it can be possible to change the behavior after application is started with API, but do we really need
>> this?
>>
>> If an API is not added, unnecessary constraints may be introduced. If the
>> bonding device is created through the rte_eth_bond_create interface instead
>> devarg "vdev", this function cannot be used because devargs does not take effect
>> in this case. But from an ease-of-use perspective, adding a devarg is a good
>> idea. I will add related implementations in the later official patches.
> 
> I am also against introducing new devarg to control tx_prepare() invocation.
> I think at dev_config/queue_setup phase PMD will have enough information to decide.
> 
Currently, the community does not specify which Tx offloads need to invoke tx_prepare.
For Vdev devices such as bond, all NIC devices need to be considered. Generally,
tx_prepare is used in CKSUM and TSO. It is possible that for some NIC devices, even
CKSUM and TSO do not need to invoke tx_prepare, or for some NIC devices, there are
other Tx offloads that need to call tx_prepare. From this perspective, leaving the
choice to the user seems to be a better choice.
>>
>> If I understand correctly, the current community does not want to introduce
>> more private APIs for PMDs. However, the absence of an API on this issue would
>> introduce some unnecessary constraints, and from that point of view, I think
>> adding an API seems necessary.
>>>
>>>> Chengchang Tang (2):
>>>>    net/bonding: add Tx prepare for bonding
>>>>    app/testpmd: add cmd for bonding Tx prepare
>>>>
>>>>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
>>>>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
>>>>   drivers/net/bonding/eth_bond_private.h      |  1 +
>>>>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
>>>>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
>>>>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
>>>>   drivers/net/bonding/version.map             |  5 +++
>>>>   7 files changed, 167 insertions(+), 4 deletions(-)
>>>>
>>>> --
>>>> 2.7.4
>>>>
>>>
>>>
>>> .
>>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-20 12:44       ` Chengchang Tang
@ 2021-04-20 13:18         ` Ananyev, Konstantin
  2021-04-20 14:06           ` Chengchang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Ananyev, Konstantin @ 2021-04-20 13:18 UTC (permalink / raw)
  To: Chengchang Tang, Yigit, Ferruh, dev; +Cc: linuxarm, chas3, humin29



> -----Original Message-----
> From: Chengchang Tang <tangchengchang@huawei.com>
> Sent: Tuesday, April 20, 2021 1:44 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org
> Cc: linuxarm@huawei.com; chas3@att.com; humin29@huawei.com
> Subject: Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
> 
> Hi
> On 2021/4/20 16:33, Ananyev, Konstantin wrote:
> > Hi everyone,
> >
> >>
> >> On 2021/4/20 9:26, Ferruh Yigit wrote:
> >>> On 4/16/2021 12:04 PM, Chengchang Tang wrote:
> >>>> This patch add Tx prepare for bonding device.
> >>>>
> >>>> Currently, the bonding driver has not implemented the callback of
> >>>> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
> >>>> slave devices will never be invoked. When hardware offloading such as
> >>>> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
> >>>> to adjust packets (for example, set correct pseudo packet headers).
> >>>> Otherwise, related offloading fails and even packets are sent
> >>>> incorrectly. Due to this limitation, the bonded device cannot use these
> >>>> HW offloading in the Tx direction.
> >>>>
> >>>> Because packet sending algorithms are numerous and complex in bond PMD,
> >>>> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
> >>>> the tx_prepare callback of bonding PMD is not implemented. Instead,
> >>>> rte_eth_tx_prepare has been called in tx_burst callback. And a global
> >>>> variable is introduced to control whether the bonded device need call
> >>>> the rte_eth_tx_prepare. If upper-layer users need to use some TX
> >>>> offloading that depend on tx_prepare , they should enable the preparation
> >>>> function. In this way, the bonded device will call the rte_eth_tx_prepare
> >>>> for the fast path packets in the tx_burst callback.
> >
> > I admit that I didn't look at the implementation yet, but it sounds like
> > overcomplication to me. Can't we just have a new TX function for bonding PMD
> > when TX offloads are enabled? And inside that function we will do:
> > tx_prepare(); tx_burst(); for selected device.
> 
> The solution you mentioned is workable and may perform better. However, the current
> solution is also simple and has a limited impact on performance. It is actually:
> if (tx_prepare_enable)
> 	tx_prepare();
> tx_burst();
> 
> Overall, it adds almost only one judgment to the case where the related Tx offloads
> is not turned on.
> 
> > We can select this function at setup stage analysing requested by user TX offloads.
> >
> 
> In PMDs, it is a common practice to select different Tx/Rx function during the setup
> phase. But for a 'vdev' device like Bonding, we may need to think more about it.
> The reasons are explained below.
> >
> >>>>
> >>>
> >>> What do you think to add a devarg to bonding PMD to control the tx_prepare?
> >>> It won't be as dynamic as API, since it can be possible to change the behavior after application is started with API, but do we really need
> >> this?
> >>
> >> If an API is not added, unnecessary constraints may be introduced. If the
> >> bonding device is created through the rte_eth_bond_create interface instead
> >> devarg "vdev", this function cannot be used because devargs does not take effect
> >> in this case. But from an ease-of-use perspective, adding a devarg is a good
> >> idea. I will add related implementations in the later official patches.
> >
> > I am also against introducing new devarg to control tx_prepare() invocation.
> > I think at dev_config/queue_setup phase PMD will have enough information to decide.
> >
> Currently, the community does not specify which Tx offloads need to invoke tx_prepare.

I think inside bond PMD we can safely assume that any TX offload does need tx_prepare().
If that's not the case then slave dev tx_prepare pointer will be NULL and rte_eth_tx_prepare()
will be just a NOOP. 

> For Vdev devices such as bond, all NIC devices need to be considered. Generally,
> tx_prepare is used in CKSUM and TSO. It is possible that for some NIC devices, even
> CKSUM and TSO do not need to invoke tx_prepare, or for some NIC devices, there are
> other Tx offloads that need to call tx_prepare. From this perspective, leaving the
> choice to the user seems to be a better choice.

Wonder how user will know when to enable/disable it?
As you said it depends on the underlying HW/PMD and can change from system to system?
I think it is PMD that needs to take this decision, and I think the safest bet might be to enable
it when any TX offloads was enabled by user.

> >>
> >> If I understand correctly, the current community does not want to introduce
> >> more private APIs for PMDs. However, the absence of an API on this issue would
> >> introduce some unnecessary constraints, and from that point of view, I think
> >> adding an API seems necessary.
> >>>
> >>>> Chengchang Tang (2):
> >>>>    net/bonding: add Tx prepare for bonding
> >>>>    app/testpmd: add cmd for bonding Tx prepare
> >>>>
> >>>>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
> >>>>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
> >>>>   drivers/net/bonding/eth_bond_private.h      |  1 +
> >>>>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
> >>>>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
> >>>>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
> >>>>   drivers/net/bonding/version.map             |  5 +++
> >>>>   7 files changed, 167 insertions(+), 4 deletions(-)
> >>>>
> >>>> --
> >>>> 2.7.4
> >>>>
> >>>
> >>>
> >>> .
> >>>
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
  2021-04-20 13:18         ` Ananyev, Konstantin
@ 2021-04-20 14:06           ` Chengchang Tang
  0 siblings, 0 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-04-20 14:06 UTC (permalink / raw)
  To: Ananyev, Konstantin, Yigit, Ferruh, dev; +Cc: linuxarm, chas3, humin29



On 2021/4/20 21:18, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Chengchang Tang <tangchengchang@huawei.com>
>> Sent: Tuesday, April 20, 2021 1:44 PM
>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org
>> Cc: linuxarm@huawei.com; chas3@att.com; humin29@huawei.com
>> Subject: Re: [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device
>>
>> Hi
>> On 2021/4/20 16:33, Ananyev, Konstantin wrote:
>>> Hi everyone,
>>>
>>>>
>>>> On 2021/4/20 9:26, Ferruh Yigit wrote:
>>>>> On 4/16/2021 12:04 PM, Chengchang Tang wrote:
>>>>>> This patch add Tx prepare for bonding device.
>>>>>>
>>>>>> Currently, the bonding driver has not implemented the callback of
>>>>>> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
>>>>>> slave devices will never be invoked. When hardware offloading such as
>>>>>> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
>>>>>> to adjust packets (for example, set correct pseudo packet headers).
>>>>>> Otherwise, related offloading fails and even packets are sent
>>>>>> incorrectly. Due to this limitation, the bonded device cannot use these
>>>>>> HW offloading in the Tx direction.
>>>>>>
>>>>>> Because packet sending algorithms are numerous and complex in bond PMD,
>>>>>> it is hard to design the callback for rte_eth_tx_prepare. In this patch,
>>>>>> the tx_prepare callback of bonding PMD is not implemented. Instead,
>>>>>> rte_eth_tx_prepare has been called in tx_burst callback. And a global
>>>>>> variable is introduced to control whether the bonded device need call
>>>>>> the rte_eth_tx_prepare. If upper-layer users need to use some TX
>>>>>> offloading that depend on tx_prepare , they should enable the preparation
>>>>>> function. In this way, the bonded device will call the rte_eth_tx_prepare
>>>>>> for the fast path packets in the tx_burst callback.
>>>
>>> I admit that I didn't look at the implementation yet, but it sounds like
>>> overcomplication to me. Can't we just have a new TX function for bonding PMD
>>> when TX offloads are enabled? And inside that function we will do:
>>> tx_prepare(); tx_burst(); for selected device.
>>
>> The solution you mentioned is workable and may perform better. However, the current
>> solution is also simple and has a limited impact on performance. It is actually:
>> if (tx_prepare_enable)
>> 	tx_prepare();
>> tx_burst();
>>
>> Overall, it adds almost only one judgment to the case where the related Tx offloads
>> is not turned on.
>>
>>> We can select this function at setup stage analysing requested by user TX offloads.
>>>
>>
>> In PMDs, it is a common practice to select different Tx/Rx function during the setup
>> phase. But for a 'vdev' device like Bonding, we may need to think more about it.
>> The reasons are explained below.
>>>
>>>>>>
>>>>>
>>>>> What do you think to add a devarg to bonding PMD to control the tx_prepare?
>>>>> It won't be as dynamic as API, since it can be possible to change the behavior after application is started with API, but do we really need
>>>> this?
>>>>
>>>> If an API is not added, unnecessary constraints may be introduced. If the
>>>> bonding device is created through the rte_eth_bond_create interface instead
>>>> devarg "vdev", this function cannot be used because devargs does not take effect
>>>> in this case. But from an ease-of-use perspective, adding a devarg is a good
>>>> idea. I will add related implementations in the later official patches.
>>>
>>> I am also against introducing new devarg to control tx_prepare() invocation.
>>> I think at dev_config/queue_setup phase PMD will have enough information to decide.
>>>
>> Currently, the community does not specify which Tx offloads need to invoke tx_prepare.
> 
> I think inside bond PMD we can safely assume that any TX offload does need tx_prepare().
> If that's not the case then slave dev tx_prepare pointer will be NULL and rte_eth_tx_prepare()
> will be just a NOOP. 

Get it. I agree that these decisions should be offloaded directly into PMDs.
In the formal patch, the API that used to control enable states will be deleted.
> 
>> For Vdev devices such as bond, all NIC devices need to be considered. Generally,
>> tx_prepare is used in CKSUM and TSO. It is possible that for some NIC devices, even
>> CKSUM and TSO do not need to invoke tx_prepare, or for some NIC devices, there are
>> other Tx offloads that need to call tx_prepare. From this perspective, leaving the
>> choice to the user seems to be a better choice.
> 
> Wonder how user will know when to enable/disable it?
> As you said it depends on the underlying HW/PMD and can change from system to system?

Generally, decisions need to be made based on debugging results, which is not good.
> I think it is PMD that needs to take this decision, and I think the safest bet might be to enable
> it when any TX offloads was enabled by user.
> 

I agree that these decisions should be made by the PMDs. Even, I think the tx_prepare()
should always be called in bonding, its impact on performance should be directly controlled
by the PMDs.
>>>>
>>>> If I understand correctly, the current community does not want to introduce
>>>> more private APIs for PMDs. However, the absence of an API on this issue would
>>>> introduce some unnecessary constraints, and from that point of view, I think
>>>> adding an API seems necessary.
>>>>>
>>>>>> Chengchang Tang (2):
>>>>>>    net/bonding: add Tx prepare for bonding
>>>>>>    app/testpmd: add cmd for bonding Tx prepare
>>>>>>
>>>>>>   app/test-pmd/cmdline.c                      | 66 +++++++++++++++++++++++++++++
>>>>>>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  9 ++++
>>>>>>   drivers/net/bonding/eth_bond_private.h      |  1 +
>>>>>>   drivers/net/bonding/rte_eth_bond.h          | 29 +++++++++++++
>>>>>>   drivers/net/bonding/rte_eth_bond_api.c      | 28 ++++++++++++
>>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c      | 33 +++++++++++++--
>>>>>>   drivers/net/bonding/version.map             |  5 +++
>>>>>>   7 files changed, 167 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device
  2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
                   ` (3 preceding siblings ...)
  2021-04-20  1:26 ` Ferruh Yigit
@ 2021-04-23  9:46 ` Chengchang Tang
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
                     ` (3 more replies)
  4 siblings, 4 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-04-23  9:46 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

This patch set add Tx prepare for bonding device.

Currently, the bonding driver has not implemented the callback of
rte_eth_tx_prepare function. Therefore, the TX prepare function of the
slave devices will never be invoked. When hardware offloading such as
CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
to adjust packets (for example, set correct pseudo packet headers).
Otherwise, related offloading fails and even packets are sent
incorrectly. Due to this limitation, the bonded device cannot use these
HW offloading in the Tx direction.

Because packet sending algorithms are numerous and complex in bond PMD,
it is hard to design the callback for rte_eth_tx_prepare. In this
patchset, the tx_prepare callback of bonding PMD is not implemented.
Instead, rte_eth_tx_prepare has been called in tx_burst callback. In
this way, all tx_offloads can be processed correctly for all NIC devices.
It is the responsibility of the slave PMDs to decide when the real
tx_prepare needs to be used. If tx_prepare is not required in some cases,
then slave PMDs tx_prepare pointer should be NULL and rte_eth_tx_prepare()
will be just a NOOP. That is, the effectiveness and security of tx_prepare
and its impact on performance depend on the design of slave PMDs.

And configuring Tx offloading for bonding is also added in this patchset.
This solves the problem that we need to configure slave devices one by one
when configuring Tx offloading.

Chengchang Tang (2):
  net/bonding: support Tx prepare for bonding
  net/bonding: support configuring Tx offloading for bonding

 drivers/net/bonding/rte_eth_bond.h     |  1 -
 drivers/net/bonding/rte_eth_bond_pmd.c | 41 ++++++++++++++++++++++++++++++----
 2 files changed, 37 insertions(+), 5 deletions(-)

--
2.7.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
@ 2021-04-23  9:46   ` Chengchang Tang
  2021-06-08  9:49     ` Andrew Rybchenko
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading " Chengchang Tang
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-04-23  9:46 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

To use the HW offloads capability (e.g. checksum and TSO) in the Tx
direction, the upper-layer users need to call rte_eth_dev_prepare to do
some adjustment to the packets before sending them (e.g. processing
pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
callback of the bond driver is not implemented. Therefore, related
offloads can not be used unless the upper layer users process the packet
properly in their own application. But it is bad for the
transplantability.

However, it is difficult to design the tx_prepare callback for bonding
driver. Because when a bonded device sends packets, the bonded device
allocates the packets to different slave devices based on the real-time
link status and bonding mode. That is, it is very difficult for the
bonding device to determine which slave device's prepare function should
be invoked. In addition, if the link status changes after the packets are
prepared, the packets may fail to be sent because packets allocation may
change.

So, in this patch, the tx_prepare callback of bonding driver is not
implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
tx_offloads can be processed correctly for all NIC devices in these modes.
If tx_prepare is not required in some cases, then slave PMDs tx_prepare
pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
In these cases, the impact on performance will be very limited. It is
the responsibility of the slave PMDs to decide when the real tx_prepare
needs to be used. The information from dev_config/queue_setup is
sufficient for them to make these decisions.

Note:
The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
because in broadcast mode, a packet needs to be sent by all slave ports.
Different PMDs process the packets differently in tx_prepare. As a result,
the sent packet may be incorrect.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
---
 drivers/net/bonding/rte_eth_bond.h     |  1 -
 drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index 874aa91..1e6cc6d 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
 int
 rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);

-
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 2e9cea5..84af348 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
 	/* Send packet burst on each slave device */
 	for (i = 0; i < num_of_slaves; i++) {
 		if (slave_nb_pkts[i] > 0) {
+			int nb_prep_pkts;
+
+			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
+					bd_tx_q->queue_id, slave_bufs[i],
+					slave_nb_pkts[i]);
+
 			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
-					slave_bufs[i], slave_nb_pkts[i]);
+					slave_bufs[i], nb_prep_pkts);

 			/* if tx burst fails move packets to end of bufs */
 			if (unlikely(num_tx_slave < slave_nb_pkts[i])) {
@@ -632,6 +638,7 @@ bond_ethdev_tx_burst_active_backup(void *queue,
 {
 	struct bond_dev_private *internals;
 	struct bond_tx_queue *bd_tx_q;
+	int nb_prep_pkts;

 	bd_tx_q = (struct bond_tx_queue *)queue;
 	internals = bd_tx_q->dev_private;
@@ -639,8 +646,11 @@ bond_ethdev_tx_burst_active_backup(void *queue,
 	if (internals->active_slave_count < 1)
 		return 0;

+	nb_prep_pkts = rte_eth_tx_prepare(internals->current_primary_port,
+				bd_tx_q->queue_id, bufs, nb_pkts);
+
 	return rte_eth_tx_burst(internals->current_primary_port, bd_tx_q->queue_id,
-			bufs, nb_pkts);
+			bufs, nb_prep_pkts);
 }

 static inline uint16_t
@@ -939,6 +949,8 @@ bond_ethdev_tx_burst_tlb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	}

 	for (i = 0; i < num_of_slaves; i++) {
+		int nb_prep_pkts;
+
 		rte_eth_macaddr_get(slaves[i], &active_slave_addr);
 		for (j = num_tx_total; j < nb_pkts; j++) {
 			if (j + 3 < nb_pkts)
@@ -955,9 +967,12 @@ bond_ethdev_tx_burst_tlb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 #endif
 		}

-		num_tx_total += rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
+		nb_prep_pkts = rte_eth_tx_prepare(slaves[i], bd_tx_q->queue_id,
 				bufs + num_tx_total, nb_pkts - num_tx_total);

+		num_tx_total += rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
+				bufs + num_tx_total, nb_prep_pkts);
+
 		if (num_tx_total == nb_pkts)
 			break;
 	}
@@ -1159,12 +1174,17 @@ tx_burst_balance(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs,

 	/* Send packet burst on each slave device */
 	for (i = 0; i < slave_count; i++) {
+		int nb_prep_pkts;
+
 		if (slave_nb_bufs[i] == 0)
 			continue;
+		nb_prep_pkts = rte_eth_tx_prepare(slave_port_ids[i],
+				bd_tx_q->queue_id, slave_bufs[i],
+				slave_nb_bufs[i]);

 		slave_tx_count = rte_eth_tx_burst(slave_port_ids[i],
 				bd_tx_q->queue_id, slave_bufs[i],
-				slave_nb_bufs[i]);
+				nb_prep_pkts);

 		total_tx_count += slave_tx_count;

--
2.7.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
@ 2021-04-23  9:46   ` Chengchang Tang
  2021-06-08  9:49     ` Andrew Rybchenko
  2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
  2021-06-03  1:44   ` Chengchang Tang
  3 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-04-23  9:46 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

Currently, the TX offloading of the bonding device will not take effect by
using dev_configure. Because the related configuration will not be
delivered to the slave devices in this way.

The Tx offloading capability of the bonding device is the intersection of
the capability of all slave devices. Based on this, the following functions
are added to the bonding driver:
1. If a Tx offloading is within the capability of the bonding device (i.e.
all the slave devices support this Tx offloading), the enabling status of
the offloading of all slave devices depends on the configuration of the
bonding device.

2. For the Tx offloading that is not within the Tx offloading capability
of the bonding device, the enabling status of the offloading on the slave
devices is irrelevant to the bonding device configuration. And it depends
on the original configuration of the slave devices.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index 84af348..9922657 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
 	struct rte_flow_error flow_error;

 	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
+	uint64_t tx_offload_cap = internals->tx_offload_capa;
+	uint64_t tx_offload;

 	/* Stop slave */
 	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
@@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
 		slave_eth_dev->data->dev_conf.rxmode.offloads &=
 				~DEV_RX_OFFLOAD_JUMBO_FRAME;

+	while (tx_offload_cap != 0) {
+		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
+		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
+			slave_eth_dev->data->dev_conf.txmode.offloads |=
+				tx_offload;
+		else
+			slave_eth_dev->data->dev_conf.txmode.offloads &=
+				~tx_offload;
+		tx_offload_cap &= ~tx_offload;
+	}
+
 	nb_rx_queues = bonded_eth_dev->data->nb_rx_queues;
 	nb_tx_queues = bonded_eth_dev->data->nb_tx_queues;

--
2.7.4


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device
  2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading " Chengchang Tang
@ 2021-04-30  6:26   ` Chengchang Tang
  2021-04-30  6:47     ` Min Hu (Connor)
  2021-06-03  1:44   ` Chengchang Tang
  3 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-04-30  6:26 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

Hi,all
Any comments?

On 2021/4/23 17:46, Chengchang Tang wrote:
> This patch set add Tx prepare for bonding device.
> 
> Currently, the bonding driver has not implemented the callback of
> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
> slave devices will never be invoked. When hardware offloading such as
> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
> to adjust packets (for example, set correct pseudo packet headers).
> Otherwise, related offloading fails and even packets are sent
> incorrectly. Due to this limitation, the bonded device cannot use these
> HW offloading in the Tx direction.
> 
> Because packet sending algorithms are numerous and complex in bond PMD,
> it is hard to design the callback for rte_eth_tx_prepare. In this
> patchset, the tx_prepare callback of bonding PMD is not implemented.
> Instead, rte_eth_tx_prepare has been called in tx_burst callback. In
> this way, all tx_offloads can be processed correctly for all NIC devices.
> It is the responsibility of the slave PMDs to decide when the real
> tx_prepare needs to be used. If tx_prepare is not required in some cases,
> then slave PMDs tx_prepare pointer should be NULL and rte_eth_tx_prepare()
> will be just a NOOP. That is, the effectiveness and security of tx_prepare
> and its impact on performance depend on the design of slave PMDs.
> 
> And configuring Tx offloading for bonding is also added in this patchset.
> This solves the problem that we need to configure slave devices one by one
> when configuring Tx offloading.
> 
> Chengchang Tang (2):
>   net/bonding: support Tx prepare for bonding
>   net/bonding: support configuring Tx offloading for bonding
> 
>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>  drivers/net/bonding/rte_eth_bond_pmd.c | 41 ++++++++++++++++++++++++++++++----
>  2 files changed, 37 insertions(+), 5 deletions(-)
> 
> --
> 2.7.4
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device
  2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
@ 2021-04-30  6:47     ` Min Hu (Connor)
  0 siblings, 0 replies; 31+ messages in thread
From: Min Hu (Connor) @ 2021-04-30  6:47 UTC (permalink / raw)
  To: Chengchang Tang, dev; +Cc: linuxarm, chas3, ferruh.yigit, konstantin.ananyev



在 2021/4/30 14:26, Chengchang Tang 写道:
> Hi,all
> Any comments?
> 
> On 2021/4/23 17:46, Chengchang Tang wrote:
>> This patch set add Tx prepare for bonding device.
>>
>> Currently, the bonding driver has not implemented the callback of
>> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
>> slave devices will never be invoked. When hardware offloading such as
>> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
>> to adjust packets (for example, set correct pseudo packet headers).
>> Otherwise, related offloading fails and even packets are sent
>> incorrectly. Due to this limitation, the bonded device cannot use these
>> HW offloading in the Tx direction.
>>
>> Because packet sending algorithms are numerous and complex in bond PMD,
>> it is hard to design the callback for rte_eth_tx_prepare. In this
>> patchset, the tx_prepare callback of bonding PMD is not implemented.
>> Instead, rte_eth_tx_prepare has been called in tx_burst callback. In
>> this way, all tx_offloads can be processed correctly for all NIC devices.
>> It is the responsibility of the slave PMDs to decide when the real
>> tx_prepare needs to be used. If tx_prepare is not required in some cases,
>> then slave PMDs tx_prepare pointer should be NULL and rte_eth_tx_prepare()
>> will be just a NOOP. That is, the effectiveness and security of tx_prepare
>> and its impact on performance depend on the design of slave PMDs.
>>
>> And configuring Tx offloading for bonding is also added in this patchset.
>> This solves the problem that we need to configure slave devices one by one
>> when configuring Tx offloading.
>>
>> Chengchang Tang (2):
>>    net/bonding: support Tx prepare for bonding
>>    net/bonding: support configuring Tx offloading for bonding
>>
>>   drivers/net/bonding/rte_eth_bond.h     |  1 -
>>   drivers/net/bonding/rte_eth_bond_pmd.c | 41 ++++++++++++++++++++++++++++++----
>>   2 files changed, 37 insertions(+), 5 deletions(-)
>>
Acked-by: Min Hu (Connor) <humin29@huawei.com>
>> --
>> 2.7.4
>>
>>
>> .
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device
  2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
                     ` (2 preceding siblings ...)
  2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
@ 2021-06-03  1:44   ` Chengchang Tang
  3 siblings, 0 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-06-03  1:44 UTC (permalink / raw)
  To: dev; +Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

Hi,all
Any comments to these patches?

On 2021/4/23 17:46, Chengchang Tang wrote:
> This patch set add Tx prepare for bonding device.
> 
> Currently, the bonding driver has not implemented the callback of
> rte_eth_tx_prepare function. Therefore, the TX prepare function of the
> slave devices will never be invoked. When hardware offloading such as
> CKSUM and TSO are enabled for some drivers, tx_prepare needs to be used
> to adjust packets (for example, set correct pseudo packet headers).
> Otherwise, related offloading fails and even packets are sent
> incorrectly. Due to this limitation, the bonded device cannot use these
> HW offloading in the Tx direction.
> 
> Because packet sending algorithms are numerous and complex in bond PMD,
> it is hard to design the callback for rte_eth_tx_prepare. In this
> patchset, the tx_prepare callback of bonding PMD is not implemented.
> Instead, rte_eth_tx_prepare has been called in tx_burst callback. In
> this way, all tx_offloads can be processed correctly for all NIC devices.
> It is the responsibility of the slave PMDs to decide when the real
> tx_prepare needs to be used. If tx_prepare is not required in some cases,
> then slave PMDs tx_prepare pointer should be NULL and rte_eth_tx_prepare()
> will be just a NOOP. That is, the effectiveness and security of tx_prepare
> and its impact on performance depend on the design of slave PMDs.
> 
> And configuring Tx offloading for bonding is also added in this patchset.
> This solves the problem that we need to configure slave devices one by one
> when configuring Tx offloading.
> 
> Chengchang Tang (2):
>   net/bonding: support Tx prepare for bonding
>   net/bonding: support configuring Tx offloading for bonding
> 
>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>  drivers/net/bonding/rte_eth_bond_pmd.c | 41 ++++++++++++++++++++++++++++++----
>  2 files changed, 37 insertions(+), 5 deletions(-)
> 
> --
> 2.7.4
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
@ 2021-06-08  9:49     ` Andrew Rybchenko
  2021-06-09  6:42       ` Chengchang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Rybchenko @ 2021-06-08  9:49 UTC (permalink / raw)
  To: Chengchang Tang, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

"for bonding" is redundant in the summary since it is already "net/bonding".

On 4/23/21 12:46 PM, Chengchang Tang wrote:
> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
> direction, the upper-layer users need to call rte_eth_dev_prepare to do
> some adjustment to the packets before sending them (e.g. processing
> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
> callback of the bond driver is not implemented. Therefore, related
> offloads can not be used unless the upper layer users process the packet
> properly in their own application. But it is bad for the
> transplantability.
> 
> However, it is difficult to design the tx_prepare callback for bonding
> driver. Because when a bonded device sends packets, the bonded device
> allocates the packets to different slave devices based on the real-time
> link status and bonding mode. That is, it is very difficult for the
> bonding device to determine which slave device's prepare function should
> be invoked. In addition, if the link status changes after the packets are
> prepared, the packets may fail to be sent because packets allocation may
> change.
> 
> So, in this patch, the tx_prepare callback of bonding driver is not
> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
> tx_offloads can be processed correctly for all NIC devices in these modes.
> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
> In these cases, the impact on performance will be very limited. It is
> the responsibility of the slave PMDs to decide when the real tx_prepare
> needs to be used. The information from dev_config/queue_setup is
> sufficient for them to make these decisions.
> 
> Note:
> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
> because in broadcast mode, a packet needs to be sent by all slave ports.
> Different PMDs process the packets differently in tx_prepare. As a result,
> the sent packet may be incorrect.
> 
> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> ---
>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>  2 files changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
> index 874aa91..1e6cc6d 100644
> --- a/drivers/net/bonding/rte_eth_bond.h
> +++ b/drivers/net/bonding/rte_eth_bond.h
> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>  int
>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
> 
> -
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 2e9cea5..84af348 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>  	/* Send packet burst on each slave device */
>  	for (i = 0; i < num_of_slaves; i++) {
>  		if (slave_nb_pkts[i] > 0) {
> +			int nb_prep_pkts;
> +
> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
> +					bd_tx_q->queue_id, slave_bufs[i],
> +					slave_nb_pkts[i]);
> +

Shouldn't it be called iff queue Tx offloads are not zero?
It will allow to decrease performance degradation if no
Tx offloads are enabled. Same in all cases below.

>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> -					slave_bufs[i], slave_nb_pkts[i]);
> +					slave_bufs[i], nb_prep_pkts);

In fact it is a problem here and really big problems.
Tx prepare may fail and return less packets. Tx prepare
of some packet may always fail. If application tries to
send packets in a loop until success, it will be a
forever loop here. Since application calls Tx burst,
it is 100% legal behaviour of the function to return 0
if Tx ring is full. It is not an error indication.
However, in the case of Tx prepare it is an error
indication.

Should we change Tx burst description and enforce callers
to check for rte_errno? It sounds like a major change...

[snip]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading " Chengchang Tang
@ 2021-06-08  9:49     ` Andrew Rybchenko
  2021-06-09  6:57       ` Chengchang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Rybchenko @ 2021-06-08  9:49 UTC (permalink / raw)
  To: Chengchang Tang, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

"for bonding" is redundant in the summary since it is already
"net/bonding"

On 4/23/21 12:46 PM, Chengchang Tang wrote:
> Currently, the TX offloading of the bonding device will not take effect by

TX -> Tx

> using dev_configure. Because the related configuration will not be
> delivered to the slave devices in this way.

I think it is a major problem that Tx offloads are actually
ignored. It should be a patches with "Fixes:" which addresses
it.

> The Tx offloading capability of the bonding device is the intersection of
> the capability of all slave devices. Based on this, the following functions
> are added to the bonding driver:
> 1. If a Tx offloading is within the capability of the bonding device (i.e.
> all the slave devices support this Tx offloading), the enabling status of
> the offloading of all slave devices depends on the configuration of the
> bonding device.
> 
> 2. For the Tx offloading that is not within the Tx offloading capability
> of the bonding device, the enabling status of the offloading on the slave
> devices is irrelevant to the bonding device configuration. And it depends
> on the original configuration of the slave devices.
> 
> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 84af348..9922657 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	struct rte_flow_error flow_error;
> 
>  	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
> +	uint64_t tx_offload;
> 
>  	/* Stop slave */
>  	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  		slave_eth_dev->data->dev_conf.rxmode.offloads &=
>  				~DEV_RX_OFFLOAD_JUMBO_FRAME;
> 
> +	while (tx_offload_cap != 0) {
> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
> +				tx_offload;
> +		else
> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
> +				~tx_offload;
> +		tx_offload_cap &= ~tx_offload;
> +	}
> +

Frankly speaking I don't understand why it is that complicated.
ethdev rejects of unsupported Tx offloads. So, can't we simply:
slave_eth_dev->data->dev_conf.txmode.offloads =
    bonded_eth_dev->data->dev_conf.txmode.offloads;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-08  9:49     ` Andrew Rybchenko
@ 2021-06-09  6:42       ` Chengchang Tang
  2021-06-09  9:35         ` Andrew Rybchenko
  2021-06-09 10:25         ` Ananyev, Konstantin
  0 siblings, 2 replies; 31+ messages in thread
From: Chengchang Tang @ 2021-06-09  6:42 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

Hi, Andrew and Ferruh

On 2021/6/8 17:49, Andrew Rybchenko wrote:
> "for bonding" is redundant in the summary since it is already "net/bonding".
> 
> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>> some adjustment to the packets before sending them (e.g. processing
>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>> callback of the bond driver is not implemented. Therefore, related
>> offloads can not be used unless the upper layer users process the packet
>> properly in their own application. But it is bad for the
>> transplantability.
>>
>> However, it is difficult to design the tx_prepare callback for bonding
>> driver. Because when a bonded device sends packets, the bonded device
>> allocates the packets to different slave devices based on the real-time
>> link status and bonding mode. That is, it is very difficult for the
>> bonding device to determine which slave device's prepare function should
>> be invoked. In addition, if the link status changes after the packets are
>> prepared, the packets may fail to be sent because packets allocation may
>> change.
>>
>> So, in this patch, the tx_prepare callback of bonding driver is not
>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>> tx_offloads can be processed correctly for all NIC devices in these modes.
>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>> In these cases, the impact on performance will be very limited. It is
>> the responsibility of the slave PMDs to decide when the real tx_prepare
>> needs to be used. The information from dev_config/queue_setup is
>> sufficient for them to make these decisions.
>>
>> Note:
>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>> because in broadcast mode, a packet needs to be sent by all slave ports.
>> Different PMDs process the packets differently in tx_prepare. As a result,
>> the sent packet may be incorrect.
>>
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>> ---
>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>> index 874aa91..1e6cc6d 100644
>> --- a/drivers/net/bonding/rte_eth_bond.h
>> +++ b/drivers/net/bonding/rte_eth_bond.h
>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>  int
>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>
>> -
>>  #ifdef __cplusplus
>>  }
>>  #endif
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index 2e9cea5..84af348 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>  	/* Send packet burst on each slave device */
>>  	for (i = 0; i < num_of_slaves; i++) {
>>  		if (slave_nb_pkts[i] > 0) {
>> +			int nb_prep_pkts;
>> +
>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>> +					bd_tx_q->queue_id, slave_bufs[i],
>> +					slave_nb_pkts[i]);
>> +
> 
> Shouldn't it be called iff queue Tx offloads are not zero?
> It will allow to decrease performance degradation if no
> Tx offloads are enabled. Same in all cases below.

Regarding this point, it has been discussed in the previous RFC:
https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/

According to the TX_OFFLOAD status of the current device, PMDs can determine
whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
to NULL, so that the actual tx_prepare processing will be skipped directly in
rte_eth_tx_prepare().

> 
>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>> -					slave_bufs[i], slave_nb_pkts[i]);
>> +					slave_bufs[i], nb_prep_pkts);
> 
> In fact it is a problem here and really big problems.
> Tx prepare may fail and return less packets. Tx prepare
> of some packet may always fail. If application tries to
> send packets in a loop until success, it will be a
> forever loop here. Since application calls Tx burst,
> it is 100% legal behaviour of the function to return 0
> if Tx ring is full. It is not an error indication.
> However, in the case of Tx prepare it is an error
> indication.
> 
> Should we change Tx burst description and enforce callers
> to check for rte_errno? It sounds like a major change...
> 

I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
But what about the failure caused by other reasons? At present, it is possible
for some PMDs to fail during tx_burst due to other reasons. In this case,
repeated tries to send will also fail.

I'm not sure if all PMDs need to support the behavior of sending packets in a
loop until it succeeds. If not, I think the current problem can be reminded to
the user by adding a description to the bonding. If it is necessary, I think the
description of tx_burst should also add related instructions, so that the developers
of PMDs can better understand how tx_burst should be designed, such as putting all
hardware-related constraint checks into tx_prepare. And another prerequisite for
the above behavior is that the packets must be prepared (i.e. checked by
rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
to use rte_eth_tx_prepare() in more scenarios.

What's Ferruh's opinion on this?

> [snip]
> 
> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-06-08  9:49     ` Andrew Rybchenko
@ 2021-06-09  6:57       ` Chengchang Tang
  2021-06-09  9:11         ` Ananyev, Konstantin
  0 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-06-09  6:57 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev



On 2021/6/8 17:49, Andrew Rybchenko wrote:
> "for bonding" is redundant in the summary since it is already
> "net/bonding"
> 
> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>> Currently, the TX offloading of the bonding device will not take effect by
> 
> TX -> Tx
> 
>> using dev_configure. Because the related configuration will not be
>> delivered to the slave devices in this way.
> 
> I think it is a major problem that Tx offloads are actually
> ignored. It should be a patches with "Fixes:" which addresses
> it.
> 
>> The Tx offloading capability of the bonding device is the intersection of
>> the capability of all slave devices. Based on this, the following functions
>> are added to the bonding driver:
>> 1. If a Tx offloading is within the capability of the bonding device (i.e.
>> all the slave devices support this Tx offloading), the enabling status of
>> the offloading of all slave devices depends on the configuration of the
>> bonding device.
>>
>> 2. For the Tx offloading that is not within the Tx offloading capability
>> of the bonding device, the enabling status of the offloading on the slave
>> devices is irrelevant to the bonding device configuration. And it depends
>> on the original configuration of the slave devices.
>>
>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>> ---
>>  drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index 84af348..9922657 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>  	struct rte_flow_error flow_error;
>>
>>  	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
>> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
>> +	uint64_t tx_offload;
>>
>>  	/* Stop slave */
>>  	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
>> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>  		slave_eth_dev->data->dev_conf.rxmode.offloads &=
>>  				~DEV_RX_OFFLOAD_JUMBO_FRAME;
>>
>> +	while (tx_offload_cap != 0) {
>> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
>> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
>> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
>> +				tx_offload;
>> +		else
>> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
>> +				~tx_offload;
>> +		tx_offload_cap &= ~tx_offload;
>> +	}
>> +
> 
> Frankly speaking I don't understand why it is that complicated.
> ethdev rejects of unsupported Tx offloads. So, can't we simply:
> slave_eth_dev->data->dev_conf.txmode.offloads =
>     bonded_eth_dev->data->dev_conf.txmode.offloads;
> 

Using such a complicated method is to increase the flexibility of the slave devices,
allowing the Tx offloading of the slave devices to be incompletely consistent with
the bond device. If some offloading can be turned on without bond device awareness,
they can be retained in this case.

> 
> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-06-09  6:57       ` Chengchang Tang
@ 2021-06-09  9:11         ` Ananyev, Konstantin
  2021-06-09  9:37           ` Andrew Rybchenko
  0 siblings, 1 reply; 31+ messages in thread
From: Ananyev, Konstantin @ 2021-06-09  9:11 UTC (permalink / raw)
  To: Chengchang Tang, Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh


> 
> 
> On 2021/6/8 17:49, Andrew Rybchenko wrote:
> > "for bonding" is redundant in the summary since it is already
> > "net/bonding"
> >
> > On 4/23/21 12:46 PM, Chengchang Tang wrote:
> >> Currently, the TX offloading of the bonding device will not take effect by
> >
> > TX -> Tx
> >
> >> using dev_configure. Because the related configuration will not be
> >> delivered to the slave devices in this way.
> >
> > I think it is a major problem that Tx offloads are actually
> > ignored. It should be a patches with "Fixes:" which addresses
> > it.
> >
> >> The Tx offloading capability of the bonding device is the intersection of
> >> the capability of all slave devices. Based on this, the following functions
> >> are added to the bonding driver:
> >> 1. If a Tx offloading is within the capability of the bonding device (i.e.
> >> all the slave devices support this Tx offloading), the enabling status of
> >> the offloading of all slave devices depends on the configuration of the
> >> bonding device.
> >>
> >> 2. For the Tx offloading that is not within the Tx offloading capability
> >> of the bonding device, the enabling status of the offloading on the slave
> >> devices is irrelevant to the bonding device configuration. And it depends
> >> on the original configuration of the slave devices.
> >>
> >> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> >> ---
> >>  drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> index 84af348..9922657 100644
> >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>  	struct rte_flow_error flow_error;
> >>
> >>  	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
> >> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
> >> +	uint64_t tx_offload;
> >>
> >>  	/* Stop slave */
> >>  	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
> >> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>  		slave_eth_dev->data->dev_conf.rxmode.offloads &=
> >>  				~DEV_RX_OFFLOAD_JUMBO_FRAME;
> >>
> >> +	while (tx_offload_cap != 0) {
> >> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
> >> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
> >> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
> >> +				tx_offload;
> >> +		else
> >> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
> >> +				~tx_offload;
> >> +		tx_offload_cap &= ~tx_offload;
> >> +	}
> >> +
> >
> > Frankly speaking I don't understand why it is that complicated.
> > ethdev rejects of unsupported Tx offloads. So, can't we simply:
> > slave_eth_dev->data->dev_conf.txmode.offloads =
> >     bonded_eth_dev->data->dev_conf.txmode.offloads;
> >
> 
> Using such a complicated method is to increase the flexibility of the slave devices,
> allowing the Tx offloading of the slave devices to be incompletely consistent with
> the bond device. If some offloading can be turned on without bond device awareness,
> they can be retained in this case.


Not sure how that can that happen...
From my understanding tx_offload for bond device has to be intersection of tx_offloads
of all slaves, no? Otherwise bond device might be misconfigured.
Anyway for that code snippet above, wouldn't the same be achived by:
slave_eth_dev->data->dev_conf.txmode.offloads &= internals->tx_offload_capa & bonded_eth_dev->data->dev_conf.txmode.offloads;
?
 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-09  6:42       ` Chengchang Tang
@ 2021-06-09  9:35         ` Andrew Rybchenko
  2021-06-10  7:32           ` Chengchang Tang
  2021-06-09 10:25         ` Ananyev, Konstantin
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Rybchenko @ 2021-06-09  9:35 UTC (permalink / raw)
  To: Chengchang Tang, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

On 6/9/21 9:42 AM, Chengchang Tang wrote:
> Hi, Andrew and Ferruh
> 
> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>> "for bonding" is redundant in the summary since it is already "net/bonding".
>>
>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>>> some adjustment to the packets before sending them (e.g. processing
>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>>> callback of the bond driver is not implemented. Therefore, related
>>> offloads can not be used unless the upper layer users process the packet
>>> properly in their own application. But it is bad for the
>>> transplantability.
>>>
>>> However, it is difficult to design the tx_prepare callback for bonding
>>> driver. Because when a bonded device sends packets, the bonded device
>>> allocates the packets to different slave devices based on the real-time
>>> link status and bonding mode. That is, it is very difficult for the
>>> bonding device to determine which slave device's prepare function should
>>> be invoked. In addition, if the link status changes after the packets are
>>> prepared, the packets may fail to be sent because packets allocation may
>>> change.
>>>
>>> So, in this patch, the tx_prepare callback of bonding driver is not
>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>>> tx_offloads can be processed correctly for all NIC devices in these modes.
>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>>> In these cases, the impact on performance will be very limited. It is
>>> the responsibility of the slave PMDs to decide when the real tx_prepare
>>> needs to be used. The information from dev_config/queue_setup is
>>> sufficient for them to make these decisions.
>>>
>>> Note:
>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>>> because in broadcast mode, a packet needs to be sent by all slave ports.
>>> Different PMDs process the packets differently in tx_prepare. As a result,
>>> the sent packet may be incorrect.
>>>
>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>> ---
>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>>> index 874aa91..1e6cc6d 100644
>>> --- a/drivers/net/bonding/rte_eth_bond.h
>>> +++ b/drivers/net/bonding/rte_eth_bond.h
>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>>  int
>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>>
>>> -
>>>  #ifdef __cplusplus
>>>  }
>>>  #endif
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index 2e9cea5..84af348 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>>  	/* Send packet burst on each slave device */
>>>  	for (i = 0; i < num_of_slaves; i++) {
>>>  		if (slave_nb_pkts[i] > 0) {
>>> +			int nb_prep_pkts;
>>> +
>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>>> +					bd_tx_q->queue_id, slave_bufs[i],
>>> +					slave_nb_pkts[i]);
>>> +
>>
>> Shouldn't it be called iff queue Tx offloads are not zero?
>> It will allow to decrease performance degradation if no
>> Tx offloads are enabled. Same in all cases below.
> 
> Regarding this point, it has been discussed in the previous RFC:
> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
> 
> According to the TX_OFFLOAD status of the current device, PMDs can determine
> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
> to NULL, so that the actual tx_prepare processing will be skipped directly in
> rte_eth_tx_prepare().

I still think that the following is right:
No Tx offloads at all => Tx prepare is not necessary

Am I wrong?

>>
>>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>> -					slave_bufs[i], slave_nb_pkts[i]);
>>> +					slave_bufs[i], nb_prep_pkts);
>>
>> In fact it is a problem here and really big problems.
>> Tx prepare may fail and return less packets. Tx prepare
>> of some packet may always fail. If application tries to
>> send packets in a loop until success, it will be a
>> forever loop here. Since application calls Tx burst,
>> it is 100% legal behaviour of the function to return 0
>> if Tx ring is full. It is not an error indication.
>> However, in the case of Tx prepare it is an error
>> indication.
>>
>> Should we change Tx burst description and enforce callers
>> to check for rte_errno? It sounds like a major change...
>>
> 
> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
> But what about the failure caused by other reasons? At present, it is possible
> for some PMDs to fail during tx_burst due to other reasons. In this case,
> repeated tries to send will also fail.

If so, packet should be simply dropped by Tx burst and Tx burst
should move on. If a packet cannot be transmitted, it must be
dropped (counted) and Tx burst should move to the next packet.

> I'm not sure if all PMDs need to support the behavior of sending packets in a
> loop until it succeeds. If not, I think the current problem can be reminded to
> the user by adding a description to the bonding. If it is necessary, I think the
> description of tx_burst should also add related instructions, so that the developers
> of PMDs can better understand how tx_burst should be designed, such as putting all
> hardware-related constraint checks into tx_prepare. And another prerequisite for
> the above behavior is that the packets must be prepared (i.e. checked by
> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
> to use rte_eth_tx_prepare() in more scenarios.

IMHO any PMD specific behaviour is a nightmare to application
developer and must be avoided. Ideally application should not
care if it is running on top of tap, virtio, failsafe or
bonding. It should talk to ethdev in terms of ethdev API that's
it. I know that net/bonding is designed that application should
know about it, but IMHO the places where it requires the
knowledge must be minimized to make applications more portable
across various PMDs/HW.

I think that the only sensible solution for above problem is
to skip a packet which prepare dislikes. count it as dropped
and try to prepare/transmit subsequent packets.

It is an interesting effect of the Tx prepare just before
Tx burst inside bonding PMD. If Tx burst fails to send
something because ring is full, a number of packets will
be processed by Tx prepare again and again. I guess it is
unavoidable.

> What's Ferruh's opinion on this?
> 
>> [snip]


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-06-09  9:11         ` Ananyev, Konstantin
@ 2021-06-09  9:37           ` Andrew Rybchenko
  2021-06-10  6:29             ` Chengchang Tang
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Rybchenko @ 2021-06-09  9:37 UTC (permalink / raw)
  To: Ananyev, Konstantin, Chengchang Tang, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh

On 6/9/21 12:11 PM, Ananyev, Konstantin wrote:
> 
>>
>>
>> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>>> "for bonding" is redundant in the summary since it is already
>>> "net/bonding"
>>>
>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>> Currently, the TX offloading of the bonding device will not take effect by
>>>
>>> TX -> Tx
>>>
>>>> using dev_configure. Because the related configuration will not be
>>>> delivered to the slave devices in this way.
>>>
>>> I think it is a major problem that Tx offloads are actually
>>> ignored. It should be a patches with "Fixes:" which addresses
>>> it.
>>>
>>>> The Tx offloading capability of the bonding device is the intersection of
>>>> the capability of all slave devices. Based on this, the following functions
>>>> are added to the bonding driver:
>>>> 1. If a Tx offloading is within the capability of the bonding device (i.e.
>>>> all the slave devices support this Tx offloading), the enabling status of
>>>> the offloading of all slave devices depends on the configuration of the
>>>> bonding device.
>>>>
>>>> 2. For the Tx offloading that is not within the Tx offloading capability
>>>> of the bonding device, the enabling status of the offloading on the slave
>>>> devices is irrelevant to the bonding device configuration. And it depends
>>>> on the original configuration of the slave devices.
>>>>
>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>> ---
>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
>>>>  1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index 84af348..9922657 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>  	struct rte_flow_error flow_error;
>>>>
>>>>  	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
>>>> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
>>>> +	uint64_t tx_offload;
>>>>
>>>>  	/* Stop slave */
>>>>  	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
>>>> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>  		slave_eth_dev->data->dev_conf.rxmode.offloads &=
>>>>  				~DEV_RX_OFFLOAD_JUMBO_FRAME;
>>>>
>>>> +	while (tx_offload_cap != 0) {
>>>> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
>>>> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
>>>> +				tx_offload;
>>>> +		else
>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
>>>> +				~tx_offload;
>>>> +		tx_offload_cap &= ~tx_offload;
>>>> +	}
>>>> +
>>>
>>> Frankly speaking I don't understand why it is that complicated.
>>> ethdev rejects of unsupported Tx offloads. So, can't we simply:
>>> slave_eth_dev->data->dev_conf.txmode.offloads =
>>>     bonded_eth_dev->data->dev_conf.txmode.offloads;
>>>
>>
>> Using such a complicated method is to increase the flexibility of the slave devices,
>> allowing the Tx offloading of the slave devices to be incompletely consistent with
>> the bond device. If some offloading can be turned on without bond device awareness,
>> they can be retained in this case.
> 
> 
> Not sure how that can that happen...

+1

@Chengchang could you provide an example how it could happen.

> From my understanding tx_offload for bond device has to be intersection of tx_offloads
> of all slaves, no? Otherwise bond device might be misconfigured.
> Anyway for that code snippet above, wouldn't the same be achived by:
> slave_eth_dev->data->dev_conf.txmode.offloads &= internals->tx_offload_capa & bonded_eth_dev->data->dev_conf.txmode.offloads;
> ?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-09  6:42       ` Chengchang Tang
  2021-06-09  9:35         ` Andrew Rybchenko
@ 2021-06-09 10:25         ` Ananyev, Konstantin
  2021-06-10  6:46           ` Chengchang Tang
  1 sibling, 1 reply; 31+ messages in thread
From: Ananyev, Konstantin @ 2021-06-09 10:25 UTC (permalink / raw)
  To: Chengchang Tang, Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh

> > On 4/23/21 12:46 PM, Chengchang Tang wrote:
> >> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
> >> direction, the upper-layer users need to call rte_eth_dev_prepare to do
> >> some adjustment to the packets before sending them (e.g. processing
> >> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
> >> callback of the bond driver is not implemented. Therefore, related
> >> offloads can not be used unless the upper layer users process the packet
> >> properly in their own application. But it is bad for the
> >> transplantability.
> >>
> >> However, it is difficult to design the tx_prepare callback for bonding
> >> driver. Because when a bonded device sends packets, the bonded device
> >> allocates the packets to different slave devices based on the real-time
> >> link status and bonding mode. That is, it is very difficult for the
> >> bonding device to determine which slave device's prepare function should
> >> be invoked. In addition, if the link status changes after the packets are
> >> prepared, the packets may fail to be sent because packets allocation may
> >> change.
> >>
> >> So, in this patch, the tx_prepare callback of bonding driver is not
> >> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
> >> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
> >> tx_offloads can be processed correctly for all NIC devices in these modes.
> >> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
> >> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
> >> In these cases, the impact on performance will be very limited. It is
> >> the responsibility of the slave PMDs to decide when the real tx_prepare
> >> needs to be used. The information from dev_config/queue_setup is
> >> sufficient for them to make these decisions.
> >>
> >> Note:
> >> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
> >> because in broadcast mode, a packet needs to be sent by all slave ports.
> >> Different PMDs process the packets differently in tx_prepare. As a result,
> >> the sent packet may be incorrect.
> >>
> >> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> >> ---
> >>  drivers/net/bonding/rte_eth_bond.h     |  1 -
> >>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
> >>  2 files changed, 24 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
> >> index 874aa91..1e6cc6d 100644
> >> --- a/drivers/net/bonding/rte_eth_bond.h
> >> +++ b/drivers/net/bonding/rte_eth_bond.h
> >> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
> >>  int
> >>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
> >>
> >> -
> >>  #ifdef __cplusplus
> >>  }
> >>  #endif
> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> index 2e9cea5..84af348 100644
> >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
> >>  	/* Send packet burst on each slave device */
> >>  	for (i = 0; i < num_of_slaves; i++) {
> >>  		if (slave_nb_pkts[i] > 0) {
> >> +			int nb_prep_pkts;
> >> +
> >> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
> >> +					bd_tx_q->queue_id, slave_bufs[i],
> >> +					slave_nb_pkts[i]);
> >> +
> >
> > Shouldn't it be called iff queue Tx offloads are not zero?
> > It will allow to decrease performance degradation if no
> > Tx offloads are enabled. Same in all cases below.
> 
> Regarding this point, it has been discussed in the previous RFC:
> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
> 
> According to the TX_OFFLOAD status of the current device, PMDs can determine
> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
> to NULL, so that the actual tx_prepare processing will be skipped directly in
> rte_eth_tx_prepare().
> 
> >
> >>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> >> -					slave_bufs[i], slave_nb_pkts[i]);
> >> +					slave_bufs[i], nb_prep_pkts);
> >
> > In fact it is a problem here and really big problems.
> > Tx prepare may fail and return less packets. Tx prepare
> > of some packet may always fail. If application tries to
> > send packets in a loop until success, it will be a
> > forever loop here. Since application calls Tx burst,
> > it is 100% legal behaviour of the function to return 0
> > if Tx ring is full. It is not an error indication.
> > However, in the case of Tx prepare it is an error
> > indication.

Yes, that sounds like a problem and existing apps might be affected.

> >
> > Should we change Tx burst description and enforce callers
> > to check for rte_errno? It sounds like a major change...
> >

Agree, rte_errno for tx_burst() is probably a simplest and sanest way,
but yes, it is a change in behaviour and apps will need to be updated.  
Another option for bond PMD - just silently free mbufs for which prepare()
fails (and probably update some stats counter).
Again it is a change in behaviour, but now just for one PMD, with tx offloads enabled.
Also as, I can see some tx_burst() function for that PMD already free packets silently:
bond_ethdev_tx_burst_alb(), bond_ethdev_tx_burst_broadcast().

Actually another question - why the patch adds tx_prepare() only to some
TX modes but not all?
Is that itended? 

> 
> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
> But what about the failure caused by other reasons? At present, it is possible
> for some PMDs to fail during tx_burst due to other reasons. In this case,
> repeated tries to send will also fail.
> 
> I'm not sure if all PMDs need to support the behavior of sending packets in a
> loop until it succeeds. If not, I think the current problem can be reminded to
> the user by adding a description to the bonding. If it is necessary, I think the
> description of tx_burst should also add related instructions, so that the developers
> of PMDs can better understand how tx_burst should be designed, such as putting all
> hardware-related constraint checks into tx_prepare. And another prerequisite for
> the above behavior is that the packets must be prepared (i.e. checked by
> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
> to use rte_eth_tx_prepare() in more scenarios.
> 
> What's Ferruh's opinion on this?
> 
> > [snip]
> >
> > .
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-06-09  9:37           ` Andrew Rybchenko
@ 2021-06-10  6:29             ` Chengchang Tang
  2021-06-14 11:05               ` Ananyev, Konstantin
  0 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-06-10  6:29 UTC (permalink / raw)
  To: Andrew Rybchenko, Ananyev, Konstantin, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh

Hi, Andrew and Ananyev

On 2021/6/9 17:37, Andrew Rybchenko wrote:
> On 6/9/21 12:11 PM, Ananyev, Konstantin wrote:
>>
>>>
>>>
>>> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>>>> "for bonding" is redundant in the summary since it is already
>>>> "net/bonding"
>>>>
>>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>>> Currently, the TX offloading of the bonding device will not take effect by
>>>>
>>>> TX -> Tx
>>>>
>>>>> using dev_configure. Because the related configuration will not be
>>>>> delivered to the slave devices in this way.
>>>>
>>>> I think it is a major problem that Tx offloads are actually
>>>> ignored. It should be a patches with "Fixes:" which addresses
>>>> it.
>>>>
>>>>> The Tx offloading capability of the bonding device is the intersection of
>>>>> the capability of all slave devices. Based on this, the following functions
>>>>> are added to the bonding driver:
>>>>> 1. If a Tx offloading is within the capability of the bonding device (i.e.
>>>>> all the slave devices support this Tx offloading), the enabling status of
>>>>> the offloading of all slave devices depends on the configuration of the
>>>>> bonding device.
>>>>>
>>>>> 2. For the Tx offloading that is not within the Tx offloading capability
>>>>> of the bonding device, the enabling status of the offloading on the slave
>>>>> devices is irrelevant to the bonding device configuration. And it depends
>>>>> on the original configuration of the slave devices.
>>>>>
>>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>>> ---
>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
>>>>>  1 file changed, 13 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> index 84af348..9922657 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>  	struct rte_flow_error flow_error;
>>>>>
>>>>>  	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
>>>>> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
>>>>> +	uint64_t tx_offload;
>>>>>
>>>>>  	/* Stop slave */
>>>>>  	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
>>>>> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>  		slave_eth_dev->data->dev_conf.rxmode.offloads &=
>>>>>  				~DEV_RX_OFFLOAD_JUMBO_FRAME;
>>>>>
>>>>> +	while (tx_offload_cap != 0) {
>>>>> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
>>>>> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
>>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
>>>>> +				tx_offload;
>>>>> +		else
>>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
>>>>> +				~tx_offload;
>>>>> +		tx_offload_cap &= ~tx_offload;
>>>>> +	}
>>>>> +
>>>>
>>>> Frankly speaking I don't understand why it is that complicated.
>>>> ethdev rejects of unsupported Tx offloads. So, can't we simply:
>>>> slave_eth_dev->data->dev_conf.txmode.offloads =
>>>>     bonded_eth_dev->data->dev_conf.txmode.offloads;
>>>>
>>>
>>> Using such a complicated method is to increase the flexibility of the slave devices,
>>> allowing the Tx offloading of the slave devices to be incompletely consistent with
>>> the bond device. If some offloading can be turned on without bond device awareness,
>>> they can be retained in this case.
>>
>>
>> Not sure how that can that happen...
> 
> +1
> 
> @Chengchang could you provide an example how it could happen.
> 

For example:
device 1 capability: VLAN_INSERT | MBUF_FAST_FREE
device 2 capability: VLAN_INSERT
And the capability of bonded device will be VLAN_INSERT.
So, we can only set VLAN_INSERT for the bonded device. So what if we want to enable
MBUF_FAST_FREE in device 1 to improve performance? For the application, as long as it
can guarantee the condition of MBUF ref_cnt = 1, then it can run normally if
MBUF_FAST_FREE is turned on.

In my logic, if device 1 has been configured with MBUF_FAST_FREE, and then
added to the bonded device as a slave. The MBUF_FAST_FREE will be reserved.

>> From my understanding tx_offload for bond device has to be intersection of tx_offloads
>> of all slaves, no? Otherwise bond device might be misconfigured.
>> Anyway for that code snippet above, wouldn't the same be achived by:
>> slave_eth_dev->data->dev_conf.txmode.offloads &= internals->tx_offload_capa & bonded_eth_dev->data->dev_conf.txmode.offloads;
>> ?
> 

I think it will not achieved my purpose in the scenario I mentioned above.

> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-09 10:25         ` Ananyev, Konstantin
@ 2021-06-10  6:46           ` Chengchang Tang
  2021-06-14 11:36             ` Ananyev, Konstantin
  0 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-06-10  6:46 UTC (permalink / raw)
  To: Ananyev, Konstantin, Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh



On 2021/6/9 18:25, Ananyev, Konstantin wrote:
>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>>>> some adjustment to the packets before sending them (e.g. processing
>>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>>>> callback of the bond driver is not implemented. Therefore, related
>>>> offloads can not be used unless the upper layer users process the packet
>>>> properly in their own application. But it is bad for the
>>>> transplantability.
>>>>
>>>> However, it is difficult to design the tx_prepare callback for bonding
>>>> driver. Because when a bonded device sends packets, the bonded device
>>>> allocates the packets to different slave devices based on the real-time
>>>> link status and bonding mode. That is, it is very difficult for the
>>>> bonding device to determine which slave device's prepare function should
>>>> be invoked. In addition, if the link status changes after the packets are
>>>> prepared, the packets may fail to be sent because packets allocation may
>>>> change.
>>>>
>>>> So, in this patch, the tx_prepare callback of bonding driver is not
>>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>>>> tx_offloads can be processed correctly for all NIC devices in these modes.
>>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>>>> In these cases, the impact on performance will be very limited. It is
>>>> the responsibility of the slave PMDs to decide when the real tx_prepare
>>>> needs to be used. The information from dev_config/queue_setup is
>>>> sufficient for them to make these decisions.
>>>>
>>>> Note:
>>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>>>> because in broadcast mode, a packet needs to be sent by all slave ports.
>>>> Different PMDs process the packets differently in tx_prepare. As a result,
>>>> the sent packet may be incorrect.
>>>>
>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>> ---
>>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>>>> index 874aa91..1e6cc6d 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond.h
>>>> +++ b/drivers/net/bonding/rte_eth_bond.h
>>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>>>  int
>>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>>>
>>>> -
>>>>  #ifdef __cplusplus
>>>>  }
>>>>  #endif
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index 2e9cea5..84af348 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>>>  	/* Send packet burst on each slave device */
>>>>  	for (i = 0; i < num_of_slaves; i++) {
>>>>  		if (slave_nb_pkts[i] > 0) {
>>>> +			int nb_prep_pkts;
>>>> +
>>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>>>> +					bd_tx_q->queue_id, slave_bufs[i],
>>>> +					slave_nb_pkts[i]);
>>>> +
>>>
>>> Shouldn't it be called iff queue Tx offloads are not zero?
>>> It will allow to decrease performance degradation if no
>>> Tx offloads are enabled. Same in all cases below.
>>
>> Regarding this point, it has been discussed in the previous RFC:
>> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
>>
>> According to the TX_OFFLOAD status of the current device, PMDs can determine
>> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
>> to NULL, so that the actual tx_prepare processing will be skipped directly in
>> rte_eth_tx_prepare().
>>
>>>
>>>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>>> -					slave_bufs[i], slave_nb_pkts[i]);
>>>> +					slave_bufs[i], nb_prep_pkts);
>>>
>>> In fact it is a problem here and really big problems.
>>> Tx prepare may fail and return less packets. Tx prepare
>>> of some packet may always fail. If application tries to
>>> send packets in a loop until success, it will be a
>>> forever loop here. Since application calls Tx burst,
>>> it is 100% legal behaviour of the function to return 0
>>> if Tx ring is full. It is not an error indication.
>>> However, in the case of Tx prepare it is an error
>>> indication.
> 
> Yes, that sounds like a problem and existing apps might be affected.
> 
>>>
>>> Should we change Tx burst description and enforce callers
>>> to check for rte_errno? It sounds like a major change...
>>>
> 
> Agree, rte_errno for tx_burst() is probably a simplest and sanest way,
> but yes, it is a change in behaviour and apps will need to be updated.  
> Another option for bond PMD - just silently free mbufs for which prepare()
> fails (and probably update some stats counter).
> Again it is a change in behaviour, but now just for one PMD, with tx offloads enabled.
> Also as, I can see some tx_burst() function for that PMD already free packets silently:
> bond_ethdev_tx_burst_alb(), bond_ethdev_tx_burst_broadcast().
> 
> Actually another question - why the patch adds tx_prepare() only to some
> TX modes but not all?
> Is that itended? 
> 

Yes. Currently, I have no ideal to perform tx_prepare() in broadcast mode with limited
impact on performance. In broadcast mode, same packets will be send in several devices.
In this process, we only update the ref_cnt of mbufs, but no copy of packets. As we know,
tx_prepare() may change the data, so it may cause some problem if we perform tx_prepare()
several times on the same packet.

>>
>> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
>> But what about the failure caused by other reasons? At present, it is possible
>> for some PMDs to fail during tx_burst due to other reasons. In this case,
>> repeated tries to send will also fail.
>>
>> I'm not sure if all PMDs need to support the behavior of sending packets in a
>> loop until it succeeds. If not, I think the current problem can be reminded to
>> the user by adding a description to the bonding. If it is necessary, I think the
>> description of tx_burst should also add related instructions, so that the developers
>> of PMDs can better understand how tx_burst should be designed, such as putting all
>> hardware-related constraint checks into tx_prepare. And another prerequisite for
>> the above behavior is that the packets must be prepared (i.e. checked by
>> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
>> to use rte_eth_tx_prepare() in more scenarios.
>>
>> What's Ferruh's opinion on this?
>>
>>> [snip]
>>>
>>> .
>>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-09  9:35         ` Andrew Rybchenko
@ 2021-06-10  7:32           ` Chengchang Tang
  2021-06-14 14:16             ` Andrew Rybchenko
  0 siblings, 1 reply; 31+ messages in thread
From: Chengchang Tang @ 2021-06-10  7:32 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev



On 2021/6/9 17:35, Andrew Rybchenko wrote:
> On 6/9/21 9:42 AM, Chengchang Tang wrote:
>> Hi, Andrew and Ferruh
>>
>> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>>> "for bonding" is redundant in the summary since it is already "net/bonding".
>>>
>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>>>> some adjustment to the packets before sending them (e.g. processing
>>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>>>> callback of the bond driver is not implemented. Therefore, related
>>>> offloads can not be used unless the upper layer users process the packet
>>>> properly in their own application. But it is bad for the
>>>> transplantability.
>>>>
>>>> However, it is difficult to design the tx_prepare callback for bonding
>>>> driver. Because when a bonded device sends packets, the bonded device
>>>> allocates the packets to different slave devices based on the real-time
>>>> link status and bonding mode. That is, it is very difficult for the
>>>> bonding device to determine which slave device's prepare function should
>>>> be invoked. In addition, if the link status changes after the packets are
>>>> prepared, the packets may fail to be sent because packets allocation may
>>>> change.
>>>>
>>>> So, in this patch, the tx_prepare callback of bonding driver is not
>>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>>>> tx_offloads can be processed correctly for all NIC devices in these modes.
>>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>>>> In these cases, the impact on performance will be very limited. It is
>>>> the responsibility of the slave PMDs to decide when the real tx_prepare
>>>> needs to be used. The information from dev_config/queue_setup is
>>>> sufficient for them to make these decisions.
>>>>
>>>> Note:
>>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>>>> because in broadcast mode, a packet needs to be sent by all slave ports.
>>>> Different PMDs process the packets differently in tx_prepare. As a result,
>>>> the sent packet may be incorrect.
>>>>
>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>> ---
>>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>>>  2 files changed, 24 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>>>> index 874aa91..1e6cc6d 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond.h
>>>> +++ b/drivers/net/bonding/rte_eth_bond.h
>>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>>>  int
>>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>>>
>>>> -
>>>>  #ifdef __cplusplus
>>>>  }
>>>>  #endif
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index 2e9cea5..84af348 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>>>  	/* Send packet burst on each slave device */
>>>>  	for (i = 0; i < num_of_slaves; i++) {
>>>>  		if (slave_nb_pkts[i] > 0) {
>>>> +			int nb_prep_pkts;
>>>> +
>>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>>>> +					bd_tx_q->queue_id, slave_bufs[i],
>>>> +					slave_nb_pkts[i]);
>>>> +
>>>
>>> Shouldn't it be called iff queue Tx offloads are not zero?
>>> It will allow to decrease performance degradation if no
>>> Tx offloads are enabled. Same in all cases below.
>>
>> Regarding this point, it has been discussed in the previous RFC:
>> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
>>
>> According to the TX_OFFLOAD status of the current device, PMDs can determine
>> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
>> to NULL, so that the actual tx_prepare processing will be skipped directly in
>> rte_eth_tx_prepare().
> 
> I still think that the following is right:
> No Tx offloads at all => Tx prepare is not necessary
> 
> Am I wrong?
> 

Let PMDs determine whether tx_prepare() need be done could reduce the performance
loss in more scenarios. For example, some offload do not need a Tx prepare, and PMDs
could set tx_prepare to NULL in this scenario. Even if rte_eth_tx_prepare() is called,
it will not perform the tx_prepare callback, and then return. In this case, there is
only one judgment logic. If we judge whether tx_offloads are not zero, one more logical
judgment is added.

Of course, some PMDs currently do not optimize tx_prepare, which may have a performance
impact. We hope to force them to optimize tx_prepare in this way, just like they optimize
tx_burst. This makes it easier for users to use tx_prepare(), and no longer need to
consider that using tx_prepare() will introduce unnecessary performance degradation.

IMHO tx_prepare() should be extended to all scenarios for use, and the impact on
performance should be optimized by PMDs. Let the application consider when it should be
used and when it should not be used, in many cases it will not be used and then introduced
some problem.

>>>
>>>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>>> -					slave_bufs[i], slave_nb_pkts[i]);
>>>> +					slave_bufs[i], nb_prep_pkts);
>>>
>>> In fact it is a problem here and really big problems.
>>> Tx prepare may fail and return less packets. Tx prepare
>>> of some packet may always fail. If application tries to
>>> send packets in a loop until success, it will be a
>>> forever loop here. Since application calls Tx burst,
>>> it is 100% legal behaviour of the function to return 0
>>> if Tx ring is full. It is not an error indication.
>>> However, in the case of Tx prepare it is an error
>>> indication.
>>>
>>> Should we change Tx burst description and enforce callers
>>> to check for rte_errno? It sounds like a major change...
>>>
>>
>> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
>> But what about the failure caused by other reasons? At present, it is possible
>> for some PMDs to fail during tx_burst due to other reasons. In this case,
>> repeated tries to send will also fail.
> 
> If so, packet should be simply dropped by Tx burst and Tx burst
> should move on. If a packet cannot be transmitted, it must be
> dropped (counted) and Tx burst should move to the next packet.
> 
>> I'm not sure if all PMDs need to support the behavior of sending packets in a
>> loop until it succeeds. If not, I think the current problem can be reminded to
>> the user by adding a description to the bonding. If it is necessary, I think the
>> description of tx_burst should also add related instructions, so that the developers
>> of PMDs can better understand how tx_burst should be designed, such as putting all
>> hardware-related constraint checks into tx_prepare. And another prerequisite for
>> the above behavior is that the packets must be prepared (i.e. checked by
>> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
>> to use rte_eth_tx_prepare() in more scenarios.
> 
> IMHO any PMD specific behaviour is a nightmare to application
> developer and must be avoided. Ideally application should not
> care if it is running on top of tap, virtio, failsafe or
> bonding. It should talk to ethdev in terms of ethdev API that's
> it. I know that net/bonding is designed that application should
> know about it, but IMHO the places where it requires the
> knowledge must be minimized to make applications more portable
> across various PMDs/HW.
> 
> I think that the only sensible solution for above problem is
> to skip a packet which prepare dislikes. count it as dropped
> and try to prepare/transmit subsequent packets.

Agree, I will fix this in the next version.

> 
> It is an interesting effect of the Tx prepare just before
> Tx burst inside bonding PMD. If Tx burst fails to send
> something because ring is full, a number of packets will
> be processed by Tx prepare again and again. I guess it is
> unavoidable.
> 
>> What's Ferruh's opinion on this?
>>
>>> [snip]
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-06-10  6:29             ` Chengchang Tang
@ 2021-06-14 11:05               ` Ananyev, Konstantin
  2021-06-14 14:13                 ` Andrew Rybchenko
  0 siblings, 1 reply; 31+ messages in thread
From: Ananyev, Konstantin @ 2021-06-14 11:05 UTC (permalink / raw)
  To: Chengchang Tang, Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh



> Hi, Andrew and Ananyev
> 
> On 2021/6/9 17:37, Andrew Rybchenko wrote:
> > On 6/9/21 12:11 PM, Ananyev, Konstantin wrote:
> >>
> >>>
> >>>
> >>> On 2021/6/8 17:49, Andrew Rybchenko wrote:
> >>>> "for bonding" is redundant in the summary since it is already
> >>>> "net/bonding"
> >>>>
> >>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
> >>>>> Currently, the TX offloading of the bonding device will not take effect by
> >>>>
> >>>> TX -> Tx
> >>>>
> >>>>> using dev_configure. Because the related configuration will not be
> >>>>> delivered to the slave devices in this way.
> >>>>
> >>>> I think it is a major problem that Tx offloads are actually
> >>>> ignored. It should be a patches with "Fixes:" which addresses
> >>>> it.
> >>>>
> >>>>> The Tx offloading capability of the bonding device is the intersection of
> >>>>> the capability of all slave devices. Based on this, the following functions
> >>>>> are added to the bonding driver:
> >>>>> 1. If a Tx offloading is within the capability of the bonding device (i.e.
> >>>>> all the slave devices support this Tx offloading), the enabling status of
> >>>>> the offloading of all slave devices depends on the configuration of the
> >>>>> bonding device.
> >>>>>
> >>>>> 2. For the Tx offloading that is not within the Tx offloading capability
> >>>>> of the bonding device, the enabling status of the offloading on the slave
> >>>>> devices is irrelevant to the bonding device configuration. And it depends
> >>>>> on the original configuration of the slave devices.
> >>>>>
> >>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> >>>>> ---
> >>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
> >>>>>  1 file changed, 13 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>>> index 84af348..9922657 100644
> >>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>>> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>>>>  	struct rte_flow_error flow_error;
> >>>>>
> >>>>>  	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
> >>>>> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
> >>>>> +	uint64_t tx_offload;
> >>>>>
> >>>>>  	/* Stop slave */
> >>>>>  	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
> >>>>> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>>>>  		slave_eth_dev->data->dev_conf.rxmode.offloads &=
> >>>>>  				~DEV_RX_OFFLOAD_JUMBO_FRAME;
> >>>>>
> >>>>> +	while (tx_offload_cap != 0) {
> >>>>> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
> >>>>> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
> >>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
> >>>>> +				tx_offload;
> >>>>> +		else
> >>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
> >>>>> +				~tx_offload;
> >>>>> +		tx_offload_cap &= ~tx_offload;
> >>>>> +	}
> >>>>> +
> >>>>
> >>>> Frankly speaking I don't understand why it is that complicated.
> >>>> ethdev rejects of unsupported Tx offloads. So, can't we simply:
> >>>> slave_eth_dev->data->dev_conf.txmode.offloads =
> >>>>     bonded_eth_dev->data->dev_conf.txmode.offloads;
> >>>>
> >>>
> >>> Using such a complicated method is to increase the flexibility of the slave devices,
> >>> allowing the Tx offloading of the slave devices to be incompletely consistent with
> >>> the bond device. If some offloading can be turned on without bond device awareness,
> >>> they can be retained in this case.
> >>
> >>
> >> Not sure how that can that happen...
> >
> > +1
> >
> > @Chengchang could you provide an example how it could happen.
> >
> 
> For example:
> device 1 capability: VLAN_INSERT | MBUF_FAST_FREE
> device 2 capability: VLAN_INSERT
> And the capability of bonded device will be VLAN_INSERT.
> So, we can only set VLAN_INSERT for the bonded device. So what if we want to enable
> MBUF_FAST_FREE in device 1 to improve performance? For the application, as long as it
> can guarantee the condition of MBUF ref_cnt = 1, then it can run normally if
> MBUF_FAST_FREE is turned on.
> 
> In my logic, if device 1 has been configured with MBUF_FAST_FREE, and then
> added to the bonded device as a slave. The MBUF_FAST_FREE will be reserved.

So your intention is to allow slave device silently overrule master tx_offload settings?
If so, I don't think it is a good idea - sounds like potentially bogus and error prone approach.
Second thing - I still don't see how the code above can help you with it.
From what I read in your code - you clear tx_offload bits that are not not supported by the master.

> 
> >> From my understanding tx_offload for bond device has to be intersection of tx_offloads
> >> of all slaves, no? Otherwise bond device might be misconfigured.
> >> Anyway for that code snippet above, wouldn't the same be achived by:
> >> slave_eth_dev->data->dev_conf.txmode.offloads &= internals->tx_offload_capa & bonded_eth_dev->data->dev_conf.txmode.offloads;
> >> ?
> >
> 
> I think it will not achieved my purpose in the scenario I mentioned above.
> 
> > .
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-10  6:46           ` Chengchang Tang
@ 2021-06-14 11:36             ` Ananyev, Konstantin
  0 siblings, 0 replies; 31+ messages in thread
From: Ananyev, Konstantin @ 2021-06-14 11:36 UTC (permalink / raw)
  To: Chengchang Tang, Andrew Rybchenko, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh


> On 2021/6/9 18:25, Ananyev, Konstantin wrote:
> >>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
> >>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
> >>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
> >>>> some adjustment to the packets before sending them (e.g. processing
> >>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
> >>>> callback of the bond driver is not implemented. Therefore, related
> >>>> offloads can not be used unless the upper layer users process the packet
> >>>> properly in their own application. But it is bad for the
> >>>> transplantability.
> >>>>
> >>>> However, it is difficult to design the tx_prepare callback for bonding
> >>>> driver. Because when a bonded device sends packets, the bonded device
> >>>> allocates the packets to different slave devices based on the real-time
> >>>> link status and bonding mode. That is, it is very difficult for the
> >>>> bonding device to determine which slave device's prepare function should
> >>>> be invoked. In addition, if the link status changes after the packets are
> >>>> prepared, the packets may fail to be sent because packets allocation may
> >>>> change.
> >>>>
> >>>> So, in this patch, the tx_prepare callback of bonding driver is not
> >>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
> >>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
> >>>> tx_offloads can be processed correctly for all NIC devices in these modes.
> >>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
> >>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
> >>>> In these cases, the impact on performance will be very limited. It is
> >>>> the responsibility of the slave PMDs to decide when the real tx_prepare
> >>>> needs to be used. The information from dev_config/queue_setup is
> >>>> sufficient for them to make these decisions.
> >>>>
> >>>> Note:
> >>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
> >>>> because in broadcast mode, a packet needs to be sent by all slave ports.
> >>>> Different PMDs process the packets differently in tx_prepare. As a result,
> >>>> the sent packet may be incorrect.
> >>>>
> >>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
> >>>> ---
> >>>>  drivers/net/bonding/rte_eth_bond.h     |  1 -
> >>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
> >>>>  2 files changed, 24 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
> >>>> index 874aa91..1e6cc6d 100644
> >>>> --- a/drivers/net/bonding/rte_eth_bond.h
> >>>> +++ b/drivers/net/bonding/rte_eth_bond.h
> >>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
> >>>>  int
> >>>>  rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
> >>>>
> >>>> -
> >>>>  #ifdef __cplusplus
> >>>>  }
> >>>>  #endif
> >>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>> index 2e9cea5..84af348 100644
> >>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
> >>>>  	/* Send packet burst on each slave device */
> >>>>  	for (i = 0; i < num_of_slaves; i++) {
> >>>>  		if (slave_nb_pkts[i] > 0) {
> >>>> +			int nb_prep_pkts;
> >>>> +
> >>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
> >>>> +					bd_tx_q->queue_id, slave_bufs[i],
> >>>> +					slave_nb_pkts[i]);
> >>>> +
> >>>
> >>> Shouldn't it be called iff queue Tx offloads are not zero?
> >>> It will allow to decrease performance degradation if no
> >>> Tx offloads are enabled. Same in all cases below.
> >>
> >> Regarding this point, it has been discussed in the previous RFC:
> >> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
> >>
> >> According to the TX_OFFLOAD status of the current device, PMDs can determine
> >> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
> >> to NULL, so that the actual tx_prepare processing will be skipped directly in
> >> rte_eth_tx_prepare().
> >>
> >>>
> >>>>  			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
> >>>> -					slave_bufs[i], slave_nb_pkts[i]);
> >>>> +					slave_bufs[i], nb_prep_pkts);
> >>>
> >>> In fact it is a problem here and really big problems.
> >>> Tx prepare may fail and return less packets. Tx prepare
> >>> of some packet may always fail. If application tries to
> >>> send packets in a loop until success, it will be a
> >>> forever loop here. Since application calls Tx burst,
> >>> it is 100% legal behaviour of the function to return 0
> >>> if Tx ring is full. It is not an error indication.
> >>> However, in the case of Tx prepare it is an error
> >>> indication.
> >
> > Yes, that sounds like a problem and existing apps might be affected.
> >
> >>>
> >>> Should we change Tx burst description and enforce callers
> >>> to check for rte_errno? It sounds like a major change...
> >>>
> >
> > Agree, rte_errno for tx_burst() is probably a simplest and sanest way,
> > but yes, it is a change in behaviour and apps will need to be updated.
> > Another option for bond PMD - just silently free mbufs for which prepare()
> > fails (and probably update some stats counter).
> > Again it is a change in behaviour, but now just for one PMD, with tx offloads enabled.
> > Also as, I can see some tx_burst() function for that PMD already free packets silently:
> > bond_ethdev_tx_burst_alb(), bond_ethdev_tx_burst_broadcast().
> >
> > Actually another question - why the patch adds tx_prepare() only to some
> > TX modes but not all?
> > Is that itended?
> >
> 
> Yes. Currently, I have no ideal to perform tx_prepare() in broadcast mode with limited
> impact on performance. In broadcast mode, same packets will be send in several devices.
> In this process, we only update the ref_cnt of mbufs, but no copy of packets.
> As we know,
> tx_prepare() may change the data, so it may cause some problem if we perform tx_prepare()
> several times on the same packet.

You mean tx_prepare() for second dev can void changes made by tx_prepare() for first dev?
I suppose in theory it is possible, even if it is probably not the case right now in practise
(at least I am not aware about such cases).
Actually that's an interesting topic - same can happen even with user implementing multicast
on his own (see examples/ipv4_multicast/). 
I think these new limitations have to be documented clearly (at least).
Also probably  we need extra changes fo bond device dev_confgiure()/dev_get_info():
to check currently selected mode and based on that allow/reject tx offloads.
The question arises (again) how to figure out for which tx offloads dev->tx_prepare()
modifies the packet, for which not? 
Any thoughts here? 

> 
> >>
> >> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
> >> But what about the failure caused by other reasons? At present, it is possible
> >> for some PMDs to fail during tx_burst due to other reasons. In this case,
> >> repeated tries to send will also fail.
> >>
> >> I'm not sure if all PMDs need to support the behavior of sending packets in a
> >> loop until it succeeds. If not, I think the current problem can be reminded to
> >> the user by adding a description to the bonding. If it is necessary, I think the
> >> description of tx_burst should also add related instructions, so that the developers
> >> of PMDs can better understand how tx_burst should be designed, such as putting all
> >> hardware-related constraint checks into tx_prepare. And another prerequisite for
> >> the above behavior is that the packets must be prepared (i.e. checked by
> >> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
> >> to use rte_eth_tx_prepare() in more scenarios.
> >>
> >> What's Ferruh's opinion on this?
> >>
> >>> [snip]
> >>>
> >>> .
> >>>
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading for bonding
  2021-06-14 11:05               ` Ananyev, Konstantin
@ 2021-06-14 14:13                 ` Andrew Rybchenko
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Rybchenko @ 2021-06-14 14:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, Chengchang Tang, dev
  Cc: linuxarm, chas3, humin29, Yigit, Ferruh

On 6/14/21 2:05 PM, Ananyev, Konstantin wrote:
> 
> 
>> Hi, Andrew and Ananyev
>>
>> On 2021/6/9 17:37, Andrew Rybchenko wrote:
>>> On 6/9/21 12:11 PM, Ananyev, Konstantin wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>>>>>> "for bonding" is redundant in the summary since it is already
>>>>>> "net/bonding"
>>>>>>
>>>>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>>>>> Currently, the TX offloading of the bonding device will not take effect by
>>>>>>
>>>>>> TX -> Tx
>>>>>>
>>>>>>> using dev_configure. Because the related configuration will not be
>>>>>>> delivered to the slave devices in this way.
>>>>>>
>>>>>> I think it is a major problem that Tx offloads are actually
>>>>>> ignored. It should be a patches with "Fixes:" which addresses
>>>>>> it.
>>>>>>
>>>>>>> The Tx offloading capability of the bonding device is the intersection of
>>>>>>> the capability of all slave devices. Based on this, the following functions
>>>>>>> are added to the bonding driver:
>>>>>>> 1. If a Tx offloading is within the capability of the bonding device (i.e.
>>>>>>> all the slave devices support this Tx offloading), the enabling status of
>>>>>>> the offloading of all slave devices depends on the configuration of the
>>>>>>> bonding device.
>>>>>>>
>>>>>>> 2. For the Tx offloading that is not within the Tx offloading capability
>>>>>>> of the bonding device, the enabling status of the offloading on the slave
>>>>>>> devices is irrelevant to the bonding device configuration. And it depends
>>>>>>> on the original configuration of the slave devices.
>>>>>>>
>>>>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>>>>> ---
>>>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 13 +++++++++++++
>>>>>>>   1 file changed, 13 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> index 84af348..9922657 100644
>>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> @@ -1712,6 +1712,8 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>>>   	struct rte_flow_error flow_error;
>>>>>>>
>>>>>>>   	struct bond_dev_private *internals = bonded_eth_dev->data->dev_private;
>>>>>>> +	uint64_t tx_offload_cap = internals->tx_offload_capa;
>>>>>>> +	uint64_t tx_offload;
>>>>>>>
>>>>>>>   	/* Stop slave */
>>>>>>>   	errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
>>>>>>> @@ -1759,6 +1761,17 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>>>   		slave_eth_dev->data->dev_conf.rxmode.offloads &=
>>>>>>>   				~DEV_RX_OFFLOAD_JUMBO_FRAME;
>>>>>>>
>>>>>>> +	while (tx_offload_cap != 0) {
>>>>>>> +		tx_offload = 1ULL << __builtin_ctzll(tx_offload_cap);
>>>>>>> +		if (bonded_eth_dev->data->dev_conf.txmode.offloads & tx_offload)
>>>>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads |=
>>>>>>> +				tx_offload;
>>>>>>> +		else
>>>>>>> +			slave_eth_dev->data->dev_conf.txmode.offloads &=
>>>>>>> +				~tx_offload;
>>>>>>> +		tx_offload_cap &= ~tx_offload;
>>>>>>> +	}
>>>>>>> +
>>>>>>
>>>>>> Frankly speaking I don't understand why it is that complicated.
>>>>>> ethdev rejects of unsupported Tx offloads. So, can't we simply:
>>>>>> slave_eth_dev->data->dev_conf.txmode.offloads =
>>>>>>      bonded_eth_dev->data->dev_conf.txmode.offloads;
>>>>>>
>>>>>
>>>>> Using such a complicated method is to increase the flexibility of the slave devices,
>>>>> allowing the Tx offloading of the slave devices to be incompletely consistent with
>>>>> the bond device. If some offloading can be turned on without bond device awareness,
>>>>> they can be retained in this case.
>>>>
>>>>
>>>> Not sure how that can that happen...
>>>
>>> +1
>>>
>>> @Chengchang could you provide an example how it could happen.
>>>
>>
>> For example:
>> device 1 capability: VLAN_INSERT | MBUF_FAST_FREE
>> device 2 capability: VLAN_INSERT
>> And the capability of bonded device will be VLAN_INSERT.
>> So, we can only set VLAN_INSERT for the bonded device. So what if we want to enable
>> MBUF_FAST_FREE in device 1 to improve performance? For the application, as long as it
>> can guarantee the condition of MBUF ref_cnt = 1, then it can run normally if
>> MBUF_FAST_FREE is turned on.
>>
>> In my logic, if device 1 has been configured with MBUF_FAST_FREE, and then
>> added to the bonded device as a slave. The MBUF_FAST_FREE will be reserved.
> 
> So your intention is to allow slave device silently overrule master tx_offload settings?
> If so, I don't think it is a good idea - sounds like potentially bogus and error prone approach.

+1

> Second thing - I still don't see how the code above can help you with it.
>  From what I read in your code - you clear tx_offload bits that are not not supported by the master.

+1

>>
>>>>  From my understanding tx_offload for bond device has to be intersection of tx_offloads
>>>> of all slaves, no? Otherwise bond device might be misconfigured.
>>>> Anyway for that code snippet above, wouldn't the same be achived by:
>>>> slave_eth_dev->data->dev_conf.txmode.offloads &= internals->tx_offload_capa & bonded_eth_dev->data->dev_conf.txmode.offloads;
>>>> ?
>>>
>>
>> I think it will not achieved my purpose in the scenario I mentioned above.
>>
>>> .
>>>
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding
  2021-06-10  7:32           ` Chengchang Tang
@ 2021-06-14 14:16             ` Andrew Rybchenko
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Rybchenko @ 2021-06-14 14:16 UTC (permalink / raw)
  To: Chengchang Tang, dev
  Cc: linuxarm, chas3, humin29, ferruh.yigit, konstantin.ananyev

On 6/10/21 10:32 AM, Chengchang Tang wrote:
> On 2021/6/9 17:35, Andrew Rybchenko wrote:
>> On 6/9/21 9:42 AM, Chengchang Tang wrote:
>>> Hi, Andrew and Ferruh
>>>
>>> On 2021/6/8 17:49, Andrew Rybchenko wrote:
>>>> "for bonding" is redundant in the summary since it is already "net/bonding".
>>>>
>>>> On 4/23/21 12:46 PM, Chengchang Tang wrote:
>>>>> To use the HW offloads capability (e.g. checksum and TSO) in the Tx
>>>>> direction, the upper-layer users need to call rte_eth_dev_prepare to do
>>>>> some adjustment to the packets before sending them (e.g. processing
>>>>> pseudo headers when Tx checksum offoad enabled). But, the tx_prepare
>>>>> callback of the bond driver is not implemented. Therefore, related
>>>>> offloads can not be used unless the upper layer users process the packet
>>>>> properly in their own application. But it is bad for the
>>>>> transplantability.
>>>>>
>>>>> However, it is difficult to design the tx_prepare callback for bonding
>>>>> driver. Because when a bonded device sends packets, the bonded device
>>>>> allocates the packets to different slave devices based on the real-time
>>>>> link status and bonding mode. That is, it is very difficult for the
>>>>> bonding device to determine which slave device's prepare function should
>>>>> be invoked. In addition, if the link status changes after the packets are
>>>>> prepared, the packets may fail to be sent because packets allocation may
>>>>> change.
>>>>>
>>>>> So, in this patch, the tx_prepare callback of bonding driver is not
>>>>> implemented. Instead, the rte_eth_dev_tx_prepare() will be called for
>>>>> all the fast path packet in mode 0, 1, 2, 4, 5, 6. In this way, all
>>>>> tx_offloads can be processed correctly for all NIC devices in these modes.
>>>>> If tx_prepare is not required in some cases, then slave PMDs tx_prepare
>>>>> pointer should be NULL and rte_eth_tx_prepare() will be just a NOOP.
>>>>> In these cases, the impact on performance will be very limited. It is
>>>>> the responsibility of the slave PMDs to decide when the real tx_prepare
>>>>> needs to be used. The information from dev_config/queue_setup is
>>>>> sufficient for them to make these decisions.
>>>>>
>>>>> Note:
>>>>> The rte_eth_tx_prepare is not added to bond mode 3(Broadcast). This is
>>>>> because in broadcast mode, a packet needs to be sent by all slave ports.
>>>>> Different PMDs process the packets differently in tx_prepare. As a result,
>>>>> the sent packet may be incorrect.
>>>>>
>>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>>> ---
>>>>>   drivers/net/bonding/rte_eth_bond.h     |  1 -
>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 28 ++++++++++++++++++++++++----
>>>>>   2 files changed, 24 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
>>>>> index 874aa91..1e6cc6d 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond.h
>>>>> +++ b/drivers/net/bonding/rte_eth_bond.h
>>>>> @@ -343,7 +343,6 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id,
>>>>>   int
>>>>>   rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id);
>>>>>
>>>>> -
>>>>>   #ifdef __cplusplus
>>>>>   }
>>>>>   #endif
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> index 2e9cea5..84af348 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> @@ -606,8 +606,14 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs,
>>>>>   	/* Send packet burst on each slave device */
>>>>>   	for (i = 0; i < num_of_slaves; i++) {
>>>>>   		if (slave_nb_pkts[i] > 0) {
>>>>> +			int nb_prep_pkts;
>>>>> +
>>>>> +			nb_prep_pkts = rte_eth_tx_prepare(slaves[i],
>>>>> +					bd_tx_q->queue_id, slave_bufs[i],
>>>>> +					slave_nb_pkts[i]);
>>>>> +
>>>>
>>>> Shouldn't it be called iff queue Tx offloads are not zero?
>>>> It will allow to decrease performance degradation if no
>>>> Tx offloads are enabled. Same in all cases below.
>>>
>>> Regarding this point, it has been discussed in the previous RFC:
>>> https://inbox.dpdk.org/dev/47f907cf-3933-1de9-9c45-6734b912eccd@huawei.com/
>>>
>>> According to the TX_OFFLOAD status of the current device, PMDs can determine
>>> whether tx_prepare is currently needed. If it is not needed, set pkt_tx_prepare
>>> to NULL, so that the actual tx_prepare processing will be skipped directly in
>>> rte_eth_tx_prepare().
>>
>> I still think that the following is right:
>> No Tx offloads at all => Tx prepare is not necessary
>>
>> Am I wrong?
>>
> 
> Let PMDs determine whether tx_prepare() need be done could reduce the performance
> loss in more scenarios. For example, some offload do not need a Tx prepare, and PMDs
> could set tx_prepare to NULL in this scenario. Even if rte_eth_tx_prepare() is called,
> it will not perform the tx_prepare callback, and then return. In this case, there is
> only one judgment logic. If we judge whether tx_offloads are not zero, one more logical
> judgment is added.

I'll wait for net/bonding maintainers decision here.

IMHO all above assumptions should be proven by performance measurements.

> Of course, some PMDs currently do not optimize tx_prepare, which may have a performance
> impact. We hope to force them to optimize tx_prepare in this way, just like they optimize
> tx_burst. This makes it easier for users to use tx_prepare(), and no longer need to
> consider that using tx_prepare() will introduce unnecessary performance degradation.
> 
> IMHO tx_prepare() should be extended to all scenarios for use, and the impact on
> performance should be optimized by PMDs. Let the application consider when it should be
> used and when it should not be used, in many cases it will not be used and then introduced
> some problem.
> 
>>>>
>>>>>   			num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id,
>>>>> -					slave_bufs[i], slave_nb_pkts[i]);
>>>>> +					slave_bufs[i], nb_prep_pkts);
>>>>
>>>> In fact it is a problem here and really big problems.
>>>> Tx prepare may fail and return less packets. Tx prepare
>>>> of some packet may always fail. If application tries to
>>>> send packets in a loop until success, it will be a
>>>> forever loop here. Since application calls Tx burst,
>>>> it is 100% legal behaviour of the function to return 0
>>>> if Tx ring is full. It is not an error indication.
>>>> However, in the case of Tx prepare it is an error
>>>> indication.
>>>>
>>>> Should we change Tx burst description and enforce callers
>>>> to check for rte_errno? It sounds like a major change...
>>>>
>>>
>>> I agree that if the failure is caused by Tx ring full, it is a legal behaviour.
>>> But what about the failure caused by other reasons? At present, it is possible
>>> for some PMDs to fail during tx_burst due to other reasons. In this case,
>>> repeated tries to send will also fail.
>>
>> If so, packet should be simply dropped by Tx burst and Tx burst
>> should move on. If a packet cannot be transmitted, it must be
>> dropped (counted) and Tx burst should move to the next packet.
>>
>>> I'm not sure if all PMDs need to support the behavior of sending packets in a
>>> loop until it succeeds. If not, I think the current problem can be reminded to
>>> the user by adding a description to the bonding. If it is necessary, I think the
>>> description of tx_burst should also add related instructions, so that the developers
>>> of PMDs can better understand how tx_burst should be designed, such as putting all
>>> hardware-related constraint checks into tx_prepare. And another prerequisite for
>>> the above behavior is that the packets must be prepared (i.e. checked by
>>> rte_eth_tx_prepare()). Otherwise, it may also fail to send. This means that we have
>>> to use rte_eth_tx_prepare() in more scenarios.
>>
>> IMHO any PMD specific behaviour is a nightmare to application
>> developer and must be avoided. Ideally application should not
>> care if it is running on top of tap, virtio, failsafe or
>> bonding. It should talk to ethdev in terms of ethdev API that's
>> it. I know that net/bonding is designed that application should
>> know about it, but IMHO the places where it requires the
>> knowledge must be minimized to make applications more portable
>> across various PMDs/HW.
>>
>> I think that the only sensible solution for above problem is
>> to skip a packet which prepare dislikes. count it as dropped
>> and try to prepare/transmit subsequent packets.
> 
> Agree, I will fix this in the next version.
> 
>>
>> It is an interesting effect of the Tx prepare just before
>> Tx burst inside bonding PMD. If Tx burst fails to send
>> something because ring is full, a number of packets will
>> be processed by Tx prepare again and again. I guess it is
>> unavoidable.
>>
>>> What's Ferruh's opinion on this?
>>>
>>>> [snip]


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-06-14 14:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16 11:04 [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 1/2] net/bonding: add Tx prepare for bonding Chengchang Tang
2021-04-16 11:04 ` [dpdk-dev] [RFC 2/2] app/testpmd: add cmd for bonding Tx prepare Chengchang Tang
2021-04-16 11:12 ` [dpdk-dev] [RFC 0/2] add Tx prepare support for bonding device Min Hu (Connor)
2021-04-20  1:26 ` Ferruh Yigit
2021-04-20  2:44   ` Chengchang Tang
2021-04-20  8:33     ` Ananyev, Konstantin
2021-04-20 12:44       ` Chengchang Tang
2021-04-20 13:18         ` Ananyev, Konstantin
2021-04-20 14:06           ` Chengchang Tang
2021-04-23  9:46 ` [dpdk-dev] [PATCH " Chengchang Tang
2021-04-23  9:46   ` [dpdk-dev] [PATCH 1/2] net/bonding: support Tx prepare for bonding Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:42       ` Chengchang Tang
2021-06-09  9:35         ` Andrew Rybchenko
2021-06-10  7:32           ` Chengchang Tang
2021-06-14 14:16             ` Andrew Rybchenko
2021-06-09 10:25         ` Ananyev, Konstantin
2021-06-10  6:46           ` Chengchang Tang
2021-06-14 11:36             ` Ananyev, Konstantin
2021-04-23  9:46   ` [dpdk-dev] [PATCH 2/2] net/bonding: support configuring Tx offloading " Chengchang Tang
2021-06-08  9:49     ` Andrew Rybchenko
2021-06-09  6:57       ` Chengchang Tang
2021-06-09  9:11         ` Ananyev, Konstantin
2021-06-09  9:37           ` Andrew Rybchenko
2021-06-10  6:29             ` Chengchang Tang
2021-06-14 11:05               ` Ananyev, Konstantin
2021-06-14 14:13                 ` Andrew Rybchenko
2021-04-30  6:26   ` [dpdk-dev] [PATCH 0/2] add Tx prepare support for bonding device Chengchang Tang
2021-04-30  6:47     ` Min Hu (Connor)
2021-06-03  1:44   ` Chengchang Tang

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git