* [PATCH v1] app/testpmd: use Tx preparation in txonly engine @ 2024-01-03 1:29 Kaiwen Deng 2024-01-04 1:03 ` Stephen Hemminger ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Kaiwen Deng @ 2024-01-03 1:29 UTC (permalink / raw) To: dev Cc: stable, qiming.yang, yidingx.zhou, Kaiwen Deng, Aman Singh, Yuying Zhang, David Marchand, Ferruh Yigit Txonly forwarding engine does not call the Tx preparation API before transmitting packets. This may cause some problems. TSO breaks when MSS spans more than 8 data fragments. Those packets will be dropped by Tx preparation API, but it will cause MDD event if txonly forwarding engine does not call the Tx preparation API before transmitting packets. We can reproduce this issue by these steps list blow on ICE and I40e. ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i --tx-offloads=0x00008000 testpmd>set txpkts 64,128,256,512,64,128,256,512,512 testpmd>set burst 1 testpmd>start tx_first 1 This commit will use Tx preparation API in txonly forwarding engine. Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx") Cc: stable@dpdk.org Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com> --- app/test-pmd/txonly.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index c2b88764be..60d69be3f6 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -339,6 +339,7 @@ pkt_burst_transmit(struct fwd_stream *fs) struct rte_ether_hdr eth_hdr; uint16_t nb_tx; uint16_t nb_pkt; + uint16_t nb_prep; uint16_t vlan_tci, vlan_tci_outer; uint64_t ol_flags = 0; uint64_t tx_offloads; @@ -396,7 +397,17 @@ pkt_burst_transmit(struct fwd_stream *fs) if (nb_pkt == 0) return false; - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt); + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, + pkts_burst, nb_pkt); + if (unlikely(nb_prep != nb_pkt)) { + fprintf(stderr, + "Preparing packet burst to transmit failed: %s\n", + rte_strerror(rte_errno)); + fs->fwd_dropped += (nb_pkt - nb_prep); + rte_pktmbuf_free_bulk(&pkts_burst[nb_prep], nb_pkt - nb_prep); + } + + nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_prep); if (txonly_multi_flow) RTE_PER_LCORE(_src_port_var) -= nb_pkt - nb_tx; -- 2.34.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1] app/testpmd: use Tx preparation in txonly engine 2024-01-03 1:29 [PATCH v1] app/testpmd: use Tx preparation in txonly engine Kaiwen Deng @ 2024-01-04 1:03 ` Stephen Hemminger 2024-01-04 5:52 ` Jerin Jacob 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng 2 siblings, 0 replies; 21+ messages in thread From: Stephen Hemminger @ 2024-01-04 1:03 UTC (permalink / raw) To: Kaiwen Deng Cc: dev, stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Ferruh Yigit On Wed, 3 Jan 2024 09:29:12 +0800 Kaiwen Deng <kaiwenx.deng@intel.com> wrote: > > - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt); > + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, > + pkts_burst, nb_pkt); > + if (unlikely(nb_prep != nb_pkt)) { > + fprintf(stderr, > + "Preparing packet burst to transmit failed: %s\n", > + rte_strerror(rte_errno)); The main failure likely is mismatched offload flags, so it might be helpful to print offload flags of that mbuf. > + fs->fwd_dropped += (nb_pkt - nb_prep); Nit: no parenthesis needed here. > + rte_pktmbuf_free_bulk(&pkts_burst[nb_prep], nb_pkt - nb_prep); > + } > + > + nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_prep); > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1] app/testpmd: use Tx preparation in txonly engine 2024-01-03 1:29 [PATCH v1] app/testpmd: use Tx preparation in txonly engine Kaiwen Deng 2024-01-04 1:03 ` Stephen Hemminger @ 2024-01-04 5:52 ` Jerin Jacob 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng 2 siblings, 0 replies; 21+ messages in thread From: Jerin Jacob @ 2024-01-04 5:52 UTC (permalink / raw) To: Kaiwen Deng Cc: dev, stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Ferruh Yigit On Wed, Jan 3, 2024 at 7:38 AM Kaiwen Deng <kaiwenx.deng@intel.com> wrote: > > Txonly forwarding engine does not call the Tx preparation API > before transmitting packets. This may cause some problems. > > TSO breaks when MSS spans more than 8 data fragments. Those > packets will be dropped by Tx preparation API, but it will cause > MDD event if txonly forwarding engine does not call the Tx preparation > API before transmitting packets. > > We can reproduce this issue by these steps list blow on ICE and I40e. > > ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i > --tx-offloads=0x00008000 > > testpmd>set txpkts 64,128,256,512,64,128,256,512,512 > testpmd>set burst 1 > testpmd>start tx_first 1 > > This commit will use Tx preparation API in txonly forwarding engine. > > Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx") > Cc: stable@dpdk.org > > Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com> > --- > app/test-pmd/txonly.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c > index c2b88764be..60d69be3f6 100644 > --- a/app/test-pmd/txonly.c > +++ b/app/test-pmd/txonly.c > @@ -339,6 +339,7 @@ pkt_burst_transmit(struct fwd_stream *fs) > struct rte_ether_hdr eth_hdr; > uint16_t nb_tx; > uint16_t nb_pkt; > + uint16_t nb_prep; > uint16_t vlan_tci, vlan_tci_outer; > uint64_t ol_flags = 0; > uint64_t tx_offloads; > @@ -396,7 +397,17 @@ pkt_burst_transmit(struct fwd_stream *fs) > if (nb_pkt == 0) > return false; > > - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt); > + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, More performant version will be to have two versions of fwd_engine.packet_fwd based on the offload. And fix up the correct tx_only_engine.packet_fwd at runtime based on the offload selected. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-01-03 1:29 [PATCH v1] app/testpmd: use Tx preparation in txonly engine Kaiwen Deng 2024-01-04 1:03 ` Stephen Hemminger 2024-01-04 5:52 ` Jerin Jacob @ 2024-01-11 5:25 ` Kaiwen Deng 2024-01-11 6:34 ` lihuisong (C) ` (3 more replies) 2 siblings, 4 replies; 21+ messages in thread From: Kaiwen Deng @ 2024-01-11 5:25 UTC (permalink / raw) To: dev Cc: stable, qiming.yang, yidingx.zhou, Kaiwen Deng, Aman Singh, Yuying Zhang, Ferruh Yigit, David Marchand Txonly forwarding engine does not call the Tx preparation API before transmitting packets. This may cause some problems. TSO breaks when MSS spans more than 8 data fragments. Those packets will be dropped by Tx preparation API, but it will cause MDD event if txonly forwarding engine does not call the Tx preparation API before transmitting packets. We can reproduce this issue by these steps list blow on ICE and I40e. ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i --tx-offloads=0x00008000 testpmd>set txpkts 64,128,256,512,64,128,256,512,512 testpmd>set burst 1 testpmd>start tx_first 1 This commit will use Tx preparation API in txonly forwarding engine. Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx") Cc: stable@dpdk.org Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com> --- app/test-pmd/txonly.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index c2b88764be..9dc53553a7 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -335,13 +335,16 @@ pkt_burst_transmit(struct fwd_stream *fs) struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; struct rte_port *txp; struct rte_mbuf *pkt; + struct rte_mbuf *mb; struct rte_mempool *mbp; struct rte_ether_hdr eth_hdr; uint16_t nb_tx; uint16_t nb_pkt; + uint16_t nb_prep; uint16_t vlan_tci, vlan_tci_outer; uint64_t ol_flags = 0; uint64_t tx_offloads; + char buf[256]; mbp = current_fwd_lcore()->mbp; txp = &ports[fs->tx_port]; @@ -396,7 +399,19 @@ pkt_burst_transmit(struct fwd_stream *fs) if (nb_pkt == 0) return false; - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt); + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, + pkts_burst, nb_pkt); + if (unlikely(nb_prep != nb_pkt)) { + mb = pkts_burst[nb_prep]; + rte_get_tx_ol_flag_list(mb->ol_flags, buf, sizeof(buf)); + fprintf(stderr, + "Preparing packet burst to transmit failed: %s ol_flags: %s\n", + rte_strerror(rte_errno), buf); + fs->fwd_dropped += nb_pkt - nb_prep; + rte_pktmbuf_free_bulk(&pkts_burst[nb_prep], nb_pkt - nb_prep); + } + + nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_prep); if (txonly_multi_flow) RTE_PER_LCORE(_src_port_var) -= nb_pkt - nb_tx; -- 2.34.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng @ 2024-01-11 6:34 ` lihuisong (C) 2024-01-11 16:57 ` Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 21+ messages in thread From: lihuisong (C) @ 2024-01-11 6:34 UTC (permalink / raw) To: Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, Ferruh Yigit, David Marchand lgtm, Acked-by: Huisong Li <lihuisong@huawei.com> 在 2024/1/11 13:25, Kaiwen Deng 写道: > Txonly forwarding engine does not call the Tx preparation API > before transmitting packets. This may cause some problems. > > TSO breaks when MSS spans more than 8 data fragments. Those > packets will be dropped by Tx preparation API, but it will cause > MDD event if txonly forwarding engine does not call the Tx preparation > API before transmitting packets. > > We can reproduce this issue by these steps list blow on ICE and I40e. > > ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i > --tx-offloads=0x00008000 > > testpmd>set txpkts 64,128,256,512,64,128,256,512,512 > testpmd>set burst 1 > testpmd>start tx_first 1 > > This commit will use Tx preparation API in txonly forwarding engine. > > Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx") > Cc: stable@dpdk.org > > Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com> > --- > app/test-pmd/txonly.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c > index c2b88764be..9dc53553a7 100644 > --- a/app/test-pmd/txonly.c > +++ b/app/test-pmd/txonly.c > @@ -335,13 +335,16 @@ pkt_burst_transmit(struct fwd_stream *fs) > struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > struct rte_port *txp; > struct rte_mbuf *pkt; > + struct rte_mbuf *mb; > struct rte_mempool *mbp; > struct rte_ether_hdr eth_hdr; > uint16_t nb_tx; > uint16_t nb_pkt; > + uint16_t nb_prep; > uint16_t vlan_tci, vlan_tci_outer; > uint64_t ol_flags = 0; > uint64_t tx_offloads; > + char buf[256]; > > mbp = current_fwd_lcore()->mbp; > txp = &ports[fs->tx_port]; > @@ -396,7 +399,19 @@ pkt_burst_transmit(struct fwd_stream *fs) > if (nb_pkt == 0) > return false; > > - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt); > + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, > + pkts_burst, nb_pkt); > + if (unlikely(nb_prep != nb_pkt)) { > + mb = pkts_burst[nb_prep]; > + rte_get_tx_ol_flag_list(mb->ol_flags, buf, sizeof(buf)); > + fprintf(stderr, > + "Preparing packet burst to transmit failed: %s ol_flags: %s\n", > + rte_strerror(rte_errno), buf); > + fs->fwd_dropped += nb_pkt - nb_prep; > + rte_pktmbuf_free_bulk(&pkts_burst[nb_prep], nb_pkt - nb_prep); > + } > + > + nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_prep); > > if (txonly_multi_flow) > RTE_PER_LCORE(_src_port_var) -= nb_pkt - nb_tx; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng 2024-01-11 6:34 ` lihuisong (C) @ 2024-01-11 16:57 ` Stephen Hemminger 2024-01-12 16:00 ` David Marchand 2024-02-08 0:07 ` Ferruh Yigit 3 siblings, 0 replies; 21+ messages in thread From: Stephen Hemminger @ 2024-01-11 16:57 UTC (permalink / raw) To: Kaiwen Deng Cc: dev, stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, Ferruh Yigit, David Marchand On Thu, 11 Jan 2024 13:25:55 +0800 Kaiwen Deng <kaiwenx.deng@intel.com> wrote: > Txonly forwarding engine does not call the Tx preparation API > before transmitting packets. This may cause some problems. > > TSO breaks when MSS spans more than 8 data fragments. Those > packets will be dropped by Tx preparation API, but it will cause > MDD event if txonly forwarding engine does not call the Tx preparation > API before transmitting packets. > > We can reproduce this issue by these steps list blow on ICE and I40e. > > ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i > --tx-offloads=0x00008000 > > testpmd>set txpkts 64,128,256,512,64,128,256,512,512 > testpmd>set burst 1 > testpmd>start tx_first 1 > > This commit will use Tx preparation API in txonly forwarding engine. > > Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx") > Cc: stable@dpdk.org > > Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng 2024-01-11 6:34 ` lihuisong (C) 2024-01-11 16:57 ` Stephen Hemminger @ 2024-01-12 16:00 ` David Marchand 2024-02-08 0:07 ` Ferruh Yigit 3 siblings, 0 replies; 21+ messages in thread From: David Marchand @ 2024-01-12 16:00 UTC (permalink / raw) To: Kaiwen Deng Cc: dev, stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, Ferruh Yigit, Jerin Jacob Kollanukkaran On Thu, Jan 11, 2024 at 7:06 AM Kaiwen Deng <kaiwenx.deng@intel.com> wrote: > > Txonly forwarding engine does not call the Tx preparation API > before transmitting packets. This may cause some problems. > > TSO breaks when MSS spans more than 8 data fragments. Those > packets will be dropped by Tx preparation API, but it will cause > MDD event if txonly forwarding engine does not call the Tx preparation > API before transmitting packets. > > We can reproduce this issue by these steps list blow on ICE and I40e. > > ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xf -n 4 -- -i > --tx-offloads=0x00008000 > > testpmd>set txpkts 64,128,256,512,64,128,256,512,512 > testpmd>set burst 1 > testpmd>start tx_first 1 > > This commit will use Tx preparation API in txonly forwarding engine. > > Fixes: 655131ccf727 ("app/testpmd: factorize fwd engines Tx") > Cc: stable@dpdk.org > > Signed-off-by: Kaiwen Deng <kaiwenx.deng@intel.com> I did not see a reply to Jerin concern on performance impact of this change. Some additional comment below. > --- > app/test-pmd/txonly.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c > index c2b88764be..9dc53553a7 100644 > --- a/app/test-pmd/txonly.c > +++ b/app/test-pmd/txonly.c > @@ -335,13 +335,16 @@ pkt_burst_transmit(struct fwd_stream *fs) > struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > struct rte_port *txp; > struct rte_mbuf *pkt; > + struct rte_mbuf *mb; > struct rte_mempool *mbp; > struct rte_ether_hdr eth_hdr; > uint16_t nb_tx; > uint16_t nb_pkt; > + uint16_t nb_prep; > uint16_t vlan_tci, vlan_tci_outer; > uint64_t ol_flags = 0; > uint64_t tx_offloads; > + char buf[256]; > > mbp = current_fwd_lcore()->mbp; > txp = &ports[fs->tx_port]; > @@ -396,7 +399,19 @@ pkt_burst_transmit(struct fwd_stream *fs) > if (nb_pkt == 0) > return false; > > - nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_pkt); > + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, > + pkts_burst, nb_pkt); > + if (unlikely(nb_prep != nb_pkt)) { The buf variable declaration can be moved at the start of this block as buf is not needed anywhere else. > + mb = pkts_burst[nb_prep]; > + rte_get_tx_ol_flag_list(mb->ol_flags, buf, sizeof(buf)); You don't need a mb variable, simply pass pkts_burst[nb_prep]->ol_flags. rte_get_tx_ol_flag_list() return value must be checked for the (theoretical) case when the 'buf' variable is too short to avoid the below fprintf outputting garbage since buf may be unterminated. > + fprintf(stderr, > + "Preparing packet burst to transmit failed: %s ol_flags: %s\n", > + rte_strerror(rte_errno), buf); > + fs->fwd_dropped += nb_pkt - nb_prep; > + rte_pktmbuf_free_bulk(&pkts_burst[nb_prep], nb_pkt - nb_prep); > + } > + > + nb_tx = common_fwd_stream_transmit(fs, pkts_burst, nb_prep); > > if (txonly_multi_flow) > RTE_PER_LCORE(_src_port_var) -= nb_pkt - nb_tx; > -- > 2.34.1 > -- David Marchand ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng ` (2 preceding siblings ...) 2024-01-12 16:00 ` David Marchand @ 2024-02-08 0:07 ` Ferruh Yigit 2024-02-08 10:50 ` Konstantin Ananyev 2024-02-08 12:09 ` Jerin Jacob 3 siblings, 2 replies; 21+ messages in thread From: Ferruh Yigit @ 2024-02-08 0:07 UTC (permalink / raw) To: Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand On 1/11/2024 5:25 AM, Kaiwen Deng wrote: > Txonly forwarding engine does not call the Tx preparation API > before transmitting packets. This may cause some problems. > > TSO breaks when MSS spans more than 8 data fragments. Those > packets will be dropped by Tx preparation API, but it will cause > MDD event if txonly forwarding engine does not call the Tx preparation > API before transmitting packets. > txonly is used commonly, adding Tx prepare for a specific case may impact performance for users. What happens when driver throws MDD (Malicious Driver Detection) event, can't it be ignored? As you are already OK to drop the packet, can device be configured to drop these packages? Or as Jerin suggested adding a new forwarding engine is a solution, but that will create code duplication, I prefer to not have it if this can be handled in device level. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 0:07 ` Ferruh Yigit @ 2024-02-08 10:50 ` Konstantin Ananyev 2024-02-08 11:35 ` Ferruh Yigit 2024-02-08 11:52 ` Morten Brørup 2024-02-08 12:09 ` Jerin Jacob 1 sibling, 2 replies; 21+ messages in thread From: Konstantin Ananyev @ 2024-02-08 10:50 UTC (permalink / raw) To: Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand > On 1/11/2024 5:25 AM, Kaiwen Deng wrote: > > Txonly forwarding engine does not call the Tx preparation API > > before transmitting packets. This may cause some problems. > > > > TSO breaks when MSS spans more than 8 data fragments. Those > > packets will be dropped by Tx preparation API, but it will cause > > MDD event if txonly forwarding engine does not call the Tx preparation > > API before transmitting packets. > > > > txonly is used commonly, adding Tx prepare for a specific case may > impact performance for users. > > What happens when driver throws MDD (Malicious Driver Detection) event, > can't it be ignored? As you are already OK to drop the packet, can > device be configured to drop these packages? > > > Or as Jerin suggested adding a new forwarding engine is a solution, but > that will create code duplication, I prefer to not have it if this can > be handled in device level. Actually I am agree with the author of the patch - when TX offloads and/or multisegs are enabled, user supposed to invoke eth_tx_prepare(). Not doing that seems like a bug to me. If it still works for some cases, that's a lucky coincidence, but not the expected behavior. About performance - first we can check is that really a drop. Also as I remember most drivers set it to non-NULL value, only when some TX offloads were enabled by the user on that port, so hopefully for simple case (one segment, no tx offloads) it should be negligible. Again, we can add manual check in testpmd tx-only code to decide do we need a TX prepare to be called or not. Konstantin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 10:50 ` Konstantin Ananyev @ 2024-02-08 11:35 ` Ferruh Yigit 2024-02-08 15:14 ` Konstantin Ananyev 2024-02-08 11:52 ` Morten Brørup 1 sibling, 1 reply; 21+ messages in thread From: Ferruh Yigit @ 2024-02-08 11:35 UTC (permalink / raw) To: Konstantin Ananyev, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand On 2/8/2024 10:50 AM, Konstantin Ananyev wrote: > > >> On 1/11/2024 5:25 AM, Kaiwen Deng wrote: >>> Txonly forwarding engine does not call the Tx preparation API >>> before transmitting packets. This may cause some problems. >>> >>> TSO breaks when MSS spans more than 8 data fragments. Those >>> packets will be dropped by Tx preparation API, but it will cause >>> MDD event if txonly forwarding engine does not call the Tx preparation >>> API before transmitting packets. >>> >> >> txonly is used commonly, adding Tx prepare for a specific case may >> impact performance for users. >> >> What happens when driver throws MDD (Malicious Driver Detection) event, >> can't it be ignored? As you are already OK to drop the packet, can >> device be configured to drop these packages? >> >> >> Or as Jerin suggested adding a new forwarding engine is a solution, but >> that will create code duplication, I prefer to not have it if this can >> be handled in device level. > > Actually I am agree with the author of the patch - when TX offloads and/or multisegs are enabled, > user supposed to invoke eth_tx_prepare(). > Not doing that seems like a bug to me. > If it still works for some cases, that's a lucky coincidence, but not the expected behavior. > fair enough > About performance - first we can check is that really a drop. > Also as I remember most drivers set it to non-NULL value, only when some TX offloads were > enabled by the user on that port, so hopefully for simple case (one segment, no tx offloads) it > should be negligible. > +1 to measure the impact, that helps to decide > Again, we can add manual check in testpmd tx-only code to decide do we need a TX prepare > to be called or not. > What is the condition to call Tx prepare, is it only required when Tx offload are enabled? @Kaiwen can you please give some details on the problematic case, "MSS spans more than 8 data fragments", and how to produce it? ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 11:35 ` Ferruh Yigit @ 2024-02-08 15:14 ` Konstantin Ananyev 0 siblings, 0 replies; 21+ messages in thread From: Konstantin Ananyev @ 2024-02-08 15:14 UTC (permalink / raw) To: Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand > > > >> On 1/11/2024 5:25 AM, Kaiwen Deng wrote: > >>> Txonly forwarding engine does not call the Tx preparation API > >>> before transmitting packets. This may cause some problems. > >>> > >>> TSO breaks when MSS spans more than 8 data fragments. Those > >>> packets will be dropped by Tx preparation API, but it will cause > >>> MDD event if txonly forwarding engine does not call the Tx preparation > >>> API before transmitting packets. > >>> > >> > >> txonly is used commonly, adding Tx prepare for a specific case may > >> impact performance for users. > >> > >> What happens when driver throws MDD (Malicious Driver Detection) event, > >> can't it be ignored? As you are already OK to drop the packet, can > >> device be configured to drop these packages? > >> > >> > >> Or as Jerin suggested adding a new forwarding engine is a solution, but > >> that will create code duplication, I prefer to not have it if this can > >> be handled in device level. > > > > Actually I am agree with the author of the patch - when TX offloads and/or multisegs are enabled, > > user supposed to invoke eth_tx_prepare(). > > Not doing that seems like a bug to me. > > If it still works for some cases, that's a lucky coincidence, but not the expected behavior. > > > > fair enough > > > About performance - first we can check is that really a drop. > > Also as I remember most drivers set it to non-NULL value, only when some TX offloads were > > enabled by the user on that port, so hopefully for simple case (one segment, no tx offloads) it > > should be negligible. > > > > +1 to measure the impact, that helps to decide > > > Again, we can add manual check in testpmd tx-only code to decide do we need a TX prepare > > to be called or not. > > > > What is the condition to call Tx prepare, is it only required when Tx > offload are enabled? Yes, as I remember, tx_prepare() need to be called only when TX offloads (cksum, tso, etc.) are requested. It also does some extra sanity checks - min pkt_len, max number of segments, etc., so there is no harm to call it even when no tx offloads are enabled, but then user needs to be prepared for some extra overhead. In theory user can do all these things manually I his own code, but as requirements vary between HW models, I suppose it will be a real pain for the user. > > > @Kaiwen can you please give some details on the problematic case, "MSS > spans more than 8 data fragments", and how to produce it? ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 10:50 ` Konstantin Ananyev 2024-02-08 11:35 ` Ferruh Yigit @ 2024-02-08 11:52 ` Morten Brørup 2024-02-11 15:04 ` Konstantin Ananyev 1 sibling, 1 reply; 21+ messages in thread From: Morten Brørup @ 2024-02-08 11:52 UTC (permalink / raw) To: Konstantin Ananyev, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > Sent: Thursday, 8 February 2024 11.50 > > > On 1/11/2024 5:25 AM, Kaiwen Deng wrote: > > > Txonly forwarding engine does not call the Tx preparation API > > > before transmitting packets. This may cause some problems. > > > > > > TSO breaks when MSS spans more than 8 data fragments. Those > > > packets will be dropped by Tx preparation API, but it will cause > > > MDD event if txonly forwarding engine does not call the Tx > preparation > > > API before transmitting packets. > > > > > > > txonly is used commonly, adding Tx prepare for a specific case may > > impact performance for users. > > > > What happens when driver throws MDD (Malicious Driver Detection) > event, > > can't it be ignored? As you are already OK to drop the packet, can > > device be configured to drop these packages? > > > > > > Or as Jerin suggested adding a new forwarding engine is a solution, > but > > that will create code duplication, I prefer to not have it if this > can > > be handled in device level. > > Actually I am agree with the author of the patch - when TX offloads > and/or multisegs are enabled, > user supposed to invoke eth_tx_prepare(). > Not doing that seems like a bug to me. I strongly disagree with that statement, Konstantin! It is not documented anywhere that using TX offloads and/or multisegs requires calling rte_eth_tx_prepare() before rte_eth_tx_burst(). And none of the examples do it. In my opinion: If some driver has limitations for a feature, e.g. max 8 fragments, it should be documented for that driver, so the application developer can make the appropriate decisions when designing the application. Furthermore, we have APIs for the drivers to expose to the applications what the driver supports, so the application can configure itself optimally at startup. Perhaps those APIs need to be expanded. And if a feature limitation is common across the majority of drivers, that limitation should be mentioned in the documentation of the feature itself. We don't want to check in the fast path what can be checked at startup or build time! > If it still works for some cases, that's a lucky coincidence, but not > the expected behavior. > About performance - first we can check is that really a drop. > Also as I remember most drivers set it to non-NULL value, only when > some TX offloads were > enabled by the user on that port, so hopefully for simple case (one > segment, no tx offloads) it > should be negligible. > Again, we can add manual check in testpmd tx-only code to decide do we > need a TX prepare > to be called or not. > Konstantin ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 11:52 ` Morten Brørup @ 2024-02-11 15:04 ` Konstantin Ananyev 2024-02-13 10:27 ` Morten Brørup 0 siblings, 1 reply; 21+ messages in thread From: Konstantin Ananyev @ 2024-02-11 15:04 UTC (permalink / raw) To: Morten Brørup, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand > > > > TSO breaks when MSS spans more than 8 data fragments. Those > > > > packets will be dropped by Tx preparation API, but it will cause > > > > MDD event if txonly forwarding engine does not call the Tx > > preparation > > > > API before transmitting packets. > > > > > > > > > > txonly is used commonly, adding Tx prepare for a specific case may > > > impact performance for users. > > > > > > What happens when driver throws MDD (Malicious Driver Detection) > > event, > > > can't it be ignored? As you are already OK to drop the packet, can > > > device be configured to drop these packages? > > > > > > > > > Or as Jerin suggested adding a new forwarding engine is a solution, > > but > > > that will create code duplication, I prefer to not have it if this > > can > > > be handled in device level. > > > > Actually I am agree with the author of the patch - when TX offloads > > and/or multisegs are enabled, > > user supposed to invoke eth_tx_prepare(). > > Not doing that seems like a bug to me. > > I strongly disagree with that statement, Konstantin! > It is not documented anywhere that using TX offloads and/or multisegs requires calling rte_eth_tx_prepare() before > rte_eth_tx_burst(). And none of the examples do it. In fact, we do use it for test-pmd/csumonly.c. About other sample apps: AFAIK, not many of other DPDK apps do use L4 offloads. Right now special treatment (pseudo-header cksum calculation) is needed only for L4 offloads (CKSUM, SEG). So, majority of our apps who rely on other TX offloads (multi-seg, ipv4 cksum, vlan insertion) happily run without calling tx_prepare(), even though it is not the safest way. > > In my opinion: > If some driver has limitations for a feature, e.g. max 8 fragments, it should be documented for that driver, so the application > developer can make the appropriate decisions when designing the application. > Furthermore, we have APIs for the drivers to expose to the applications what the driver supports, so the application can configure > itself optimally at startup. Perhaps those APIs need to be expanded. > And if a feature limitation is common across the majority of drivers, that limitation should be mentioned in the documentation of the > feature itself. Many of such limitations *are* documented and in fact we do have an API to check max segments that each driver support, see struct rte_eth_desc_lim. The problem is: - none of our sample app does proper check on these values, so users don't have a good example how to do it. - with current DPDK API not all of HW/PMD requirements could be extracted programmatically: let say majority of Intel PMDs for TCP offloads expect pseudo-header cksum to be pre-calculated by the SW. another example, some HW expects pkt_len to be bigger then some threshold value, otherwise HW hang may appear. - As new HW and PMD keep appearing it is hard to predict what extra limitations/requirements will arise, that's why tx_prepare() was introduced as s driver op. > > We don't want to check in the fast path what can be checked at startup or build time! If your app supposed to work with just a few, known in advance, NIC models, then sure, you can do that. For apps that supposed to work 'in general' with any possible PMDs that DPDK supports - that might be a problem. That's why tx_prepare() was introduced and it is strongly recommended to use it by the apps that do use TX offloads. Probably tx_prepare() is not the best possible approach, but right now there are not many alternatives within DPDK. > > > If it still works for some cases, that's a lucky coincidence, but not > > the expected behavior. > > About performance - first we can check is that really a drop. > > Also as I remember most drivers set it to non-NULL value, only when > > some TX offloads were > > enabled by the user on that port, so hopefully for simple case (one > > segment, no tx offloads) it > > should be negligible. > > Again, we can add manual check in testpmd tx-only code to decide do we > > need a TX prepare > > to be called or not. > > Konstantin ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-11 15:04 ` Konstantin Ananyev @ 2024-02-13 10:27 ` Morten Brørup 2024-02-22 18:28 ` Konstantin Ananyev 0 siblings, 1 reply; 21+ messages in thread From: Morten Brørup @ 2024-02-13 10:27 UTC (permalink / raw) To: Konstantin Ananyev, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Thomas Monjalon, Andrew Rybchenko, Jerin Jacob +CC: Ethernet API maintainers +CC: Jerin (commented on another branch of this thread) > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > Sent: Sunday, 11 February 2024 16.04 > > > > > > TSO breaks when MSS spans more than 8 data fragments. Those > > > > > packets will be dropped by Tx preparation API, but it will > cause > > > > > MDD event if txonly forwarding engine does not call the Tx > > > preparation > > > > > API before transmitting packets. > > > > > > > > > > > > > txonly is used commonly, adding Tx prepare for a specific case > may > > > > impact performance for users. > > > > > > > > What happens when driver throws MDD (Malicious Driver Detection) > > > event, > > > > can't it be ignored? As you are already OK to drop the packet, > can > > > > device be configured to drop these packages? > > > > > > > > > > > > Or as Jerin suggested adding a new forwarding engine is a > solution, > > > but > > > > that will create code duplication, I prefer to not have it if > this > > > can > > > > be handled in device level. > > > > > > Actually I am agree with the author of the patch - when TX offloads > > > and/or multisegs are enabled, > > > user supposed to invoke eth_tx_prepare(). > > > Not doing that seems like a bug to me. > > > > I strongly disagree with that statement, Konstantin! > > It is not documented anywhere that using TX offloads and/or multisegs > requires calling rte_eth_tx_prepare() before > > rte_eth_tx_burst(). And none of the examples do it. > > In fact, we do use it for test-pmd/csumonly.c. > About other sample apps: > AFAIK, not many of other DPDK apps do use L4 offloads. > Right now special treatment (pseudo-header cksum calculation) is needed > only for L4 offloads (CKSUM, SEG). > So, majority of our apps who rely on other TX offloads (multi-seg, ipv4 > cksum, vlan insertion) happily run without > calling tx_prepare(), even though it is not the safest way. > > > > > In my opinion: > > If some driver has limitations for a feature, e.g. max 8 fragments, > it should be documented for that driver, so the application > > developer can make the appropriate decisions when designing the > application. > > Furthermore, we have APIs for the drivers to expose to the > applications what the driver supports, so the application can configure > > itself optimally at startup. Perhaps those APIs need to be expanded. > > And if a feature limitation is common across the majority of drivers, > that limitation should be mentioned in the documentation of the > > feature itself. > > Many of such limitations *are* documented and in fact we do have an API > to check max segments that each driver support, > see struct rte_eth_desc_lim. Yes, this is the kind of API we should provide, so the application can configure itself appropriately. > The problem is: > - none of our sample app does proper check on these values, so users > don't have a good example how to do it. Agreed. Adding an example showing how to do it properly would be the best solution. Calling tx_prepare() in the examples is certainly not the solution. > - with current DPDK API not all of HW/PMD requirements could be > extracted programmatically: > let say majority of Intel PMDs for TCP offloads expect pseudo-header > cksum to be pre-calculated by the SW. I hope this requirement is documented somewhere. > another example, some HW expects pkt_len to be bigger then some > threshold value, otherwise HW hang may appear. I hope this requirement is also documented somewhere. Generally, if the requirements cannot be extracted programmatically, they must be prominently documented, like this note to rte_eth_rx_burst(): * @note * Some drivers using vector instructions require that *nb_pkts* is * divisible by 4 or 8, depending on the driver implementation. > - As new HW and PMD keep appearing it is hard to predict what extra > limitations/requirements will arise, > that's why tx_prepare() was introduced as s driver op. > > > > > We don't want to check in the fast path what can be checked at > startup or build time! > > If your app supposed to work with just a few, known in advance, NIC > models, then sure, you can do that. > For apps that supposed to work 'in general' with any possible PMDs > that DPDK supports - that might be a problem. > That's why tx_prepare() was introduced and it is strongly recommended > to use it by the apps that do use TX offloads. > Probably tx_prepare() is not the best possible approach, but right now > there are not many alternatives within DPDK. What exactly is an application supposed to do if tx_prepare() doesn't accept the full burst? It doesn't return information about what is wrong. Dropping the packets might not be an option, e.g. for applications used in life support or tele-medicine. If limitations are documented, an application can use the lowest common denominator of the NICs it supports. And if the application is supposed to work in general, that becomes the lowest common denominator of all NICs. It looks like tx_prepare() has become a horrible workaround for undocumented limitations. Limitations due to hardware and/or software tradeoffs are unavoidable, so we have to live with them; but we should not accept PMDs with undocumented limitations. > > > > > > If it still works for some cases, that's a lucky coincidence, but > not > > > the expected behavior. > > > About performance - first we can check is that really a drop. > > > Also as I remember most drivers set it to non-NULL value, only when > > > some TX offloads were > > > enabled by the user on that port, so hopefully for simple case (one > > > segment, no tx offloads) it > > > should be negligible. > > > Again, we can add manual check in testpmd tx-only code to decide do > we > > > need a TX prepare > > > to be called or not. > > > Konstantin ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-13 10:27 ` Morten Brørup @ 2024-02-22 18:28 ` Konstantin Ananyev 2024-02-23 8:36 ` Andrew Rybchenko 0 siblings, 1 reply; 21+ messages in thread From: Konstantin Ananyev @ 2024-02-22 18:28 UTC (permalink / raw) To: Morten Brørup, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Thomas Monjalon, Andrew Rybchenko, Jerin Jacob > +CC: Ethernet API maintainers > +CC: Jerin (commented on another branch of this thread) > > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > > Sent: Sunday, 11 February 2024 16.04 > > > > > > > > TSO breaks when MSS spans more than 8 data fragments. Those > > > > > > packets will be dropped by Tx preparation API, but it will > > cause > > > > > > MDD event if txonly forwarding engine does not call the Tx > > > > preparation > > > > > > API before transmitting packets. > > > > > > > > > > > > > > > > txonly is used commonly, adding Tx prepare for a specific case > > may > > > > > impact performance for users. > > > > > > > > > > What happens when driver throws MDD (Malicious Driver Detection) > > > > event, > > > > > can't it be ignored? As you are already OK to drop the packet, > > can > > > > > device be configured to drop these packages? > > > > > > > > > > > > > > > Or as Jerin suggested adding a new forwarding engine is a > > solution, > > > > but > > > > > that will create code duplication, I prefer to not have it if > > this > > > > can > > > > > be handled in device level. > > > > > > > > Actually I am agree with the author of the patch - when TX offloads > > > > and/or multisegs are enabled, > > > > user supposed to invoke eth_tx_prepare(). > > > > Not doing that seems like a bug to me. > > > > > > I strongly disagree with that statement, Konstantin! > > > It is not documented anywhere that using TX offloads and/or multisegs > > requires calling rte_eth_tx_prepare() before > > > rte_eth_tx_burst(). And none of the examples do it. > > > > In fact, we do use it for test-pmd/csumonly.c. > > About other sample apps: > > AFAIK, not many of other DPDK apps do use L4 offloads. > > Right now special treatment (pseudo-header cksum calculation) is needed > > only for L4 offloads (CKSUM, SEG). > > So, majority of our apps who rely on other TX offloads (multi-seg, ipv4 > > cksum, vlan insertion) happily run without > > calling tx_prepare(), even though it is not the safest way. > > > > > > > > In my opinion: > > > If some driver has limitations for a feature, e.g. max 8 fragments, > > it should be documented for that driver, so the application > > > developer can make the appropriate decisions when designing the > > application. > > > Furthermore, we have APIs for the drivers to expose to the > > applications what the driver supports, so the application can configure > > > itself optimally at startup. Perhaps those APIs need to be expanded. > > > And if a feature limitation is common across the majority of drivers, > > that limitation should be mentioned in the documentation of the > > > feature itself. > > > > Many of such limitations *are* documented and in fact we do have an API > > to check max segments that each driver support, > > see struct rte_eth_desc_lim. > > Yes, this is the kind of API we should provide, so the application can configure itself appropriately. > > > The problem is: > > - none of our sample app does proper check on these values, so users > > don't have a good example how to do it. > > Agreed. > Adding an example showing how to do it properly would be the best solution. > Calling tx_prepare() in the examples is certainly not the solution. > > > - with current DPDK API not all of HW/PMD requirements could be > > extracted programmatically: > > let say majority of Intel PMDs for TCP offloads expect pseudo-header > > cksum to be pre-calculated by the SW. > > I hope this requirement is documented somewhere. > > > another example, some HW expects pkt_len to be bigger then some > > threshold value, otherwise HW hang may appear. > > I hope this requirement is also documented somewhere. No idea, I found it only in the code. > Generally, if the requirements cannot be extracted programmatically, they must be prominently documented, like this note to > rte_eth_rx_burst(): Obviously, more detailed documentation is always good, but... Right now we have 50+ different PMDs from different vendors. Even if each and every of them will carefully document all possible limitations and necessary preparation steps, how DPDK app developer supposed to deal with all that? Do you expect everyone, to read carefully through all of them, and handle all of them properly oh his own in each and every DPDK app he is going to write? That seems unrealistic. Again what to do with backward compatibility: when new driver (with new limitations) will arise *after* your app is already written and tested? > > * @note > * Some drivers using vector instructions require that *nb_pkts* is > * divisible by 4 or 8, depending on the driver implementation. > > > - As new HW and PMD keep appearing it is hard to predict what extra > > limitations/requirements will arise, > > that's why tx_prepare() was introduced as s driver op. > > > > > > > > We don't want to check in the fast path what can be checked at > > startup or build time! > > > > If your app supposed to work with just a few, known in advance, NIC > > models, then sure, you can do that. > > For apps that supposed to work 'in general' with any possible PMDs > > that DPDK supports - that might be a problem. > > That's why tx_prepare() was introduced and it is strongly recommended > > to use it by the apps that do use TX offloads. > > Probably tx_prepare() is not the best possible approach, but right now > > there are not many alternatives within DPDK. > > What exactly is an application supposed to do if tx_prepare() doesn't accept the full burst? It doesn't return information about what is > wrong. It provides some information, but it is *very* limited: just index of 'bad' packet and error code. In theory app can try to handle it in a 'smart' way: let say if ENOTSUP is returned, then try to disable all HW offloads and do all in SW. But again, it is much better to do so *before* submitting packets for TX, so in practice everyone just drop such 'bad' packets. Dropping the packets might not be an option, e.g. for applications used in life support or tele-medicine. If the packet is 'bad', then it is much better to drop it, then TX corrupted packet or even hang NIC HW completely. Though off-course it is much better to have an app that would check for limitations that can be checked by API provided and enable only supported offloads. > If limitations are documented, an application can use the lowest common denominator of the NICs it supports. And if the application is > supposed to work in general, that becomes the lowest common denominator of all NICs. I agree: for limitations that can be extracted with generic API, like: number of segments per packet, supported TX offloads, mbuf fileds that must be provided for each TX offload, etc. - it is responsibility of well-written application to obey all of them. Yes, many tx_prepare() implementations do such checks anyway, but from my perspective it is sort of last-line of defense. For well written application that should just never happen. But there is one more important responsibility of tx_prepare() - it performs PMD specific packet modifications for requested offloads. As I already mentioned for Intel NICs - it does pseudo-header cksum calcucation, for packets with size less then minimal, it can probably do padding (even if doesn't do it right now), for some other PMDs - might be something else, I didn't check. Obviously it saves app developer from a burden to do all these things on his own. > It looks like tx_prepare() has become a horrible workaround for undocumented limitations. > > Limitations due to hardware and/or software tradeoffs are unavoidable, so we have to live with them; but we should not accept > PMDs with undocumented limitations. As I already said, more documentation never hurts, but for that case, I think it is not enough. I expect PMD to provide a tx_prepare() implementation that would deal with such specific things. Anyway, back to the original patch - I looked at it once again, and realized that the problem is just in the unsupported number of segments. As we discussed above - such limitations should be handled by well written app, but none of DPDK apps does it right now. So probably it is good opportunity to do things in a proper way and introduce such checks in testpmd ;) Kaiwen, WDYT? > > > > > > > > > If it still works for some cases, that's a lucky coincidence, but > > not > > > > the expected behavior. > > > > About performance - first we can check is that really a drop. > > > > Also as I remember most drivers set it to non-NULL value, only when > > > > some TX offloads were > > > > enabled by the user on that port, so hopefully for simple case (one > > > > segment, no tx offloads) it > > > > should be negligible. > > > > Again, we can add manual check in testpmd tx-only code to decide do > > we > > > > need a TX prepare > > > > to be called or not. > > > > Konstantin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-22 18:28 ` Konstantin Ananyev @ 2024-02-23 8:36 ` Andrew Rybchenko 2024-02-26 13:26 ` Konstantin Ananyev 0 siblings, 1 reply; 21+ messages in thread From: Andrew Rybchenko @ 2024-02-23 8:36 UTC (permalink / raw) To: Konstantin Ananyev, Morten Brørup, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Thomas Monjalon, Jerin Jacob On 2/22/24 21:28, Konstantin Ananyev wrote: > >> +CC: Ethernet API maintainers >> +CC: Jerin (commented on another branch of this thread) >> >>> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] >>> Sent: Sunday, 11 February 2024 16.04 >>> >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those >>>>>>> packets will be dropped by Tx preparation API, but it will >>> cause >>>>>>> MDD event if txonly forwarding engine does not call the Tx >>>>> preparation >>>>>>> API before transmitting packets. >>>>>>> >>>>>> >>>>>> txonly is used commonly, adding Tx prepare for a specific case >>> may >>>>>> impact performance for users. >>>>>> >>>>>> What happens when driver throws MDD (Malicious Driver Detection) >>>>> event, >>>>>> can't it be ignored? As you are already OK to drop the packet, >>> can >>>>>> device be configured to drop these packages? >>>>>> >>>>>> >>>>>> Or as Jerin suggested adding a new forwarding engine is a >>> solution, >>>>> but >>>>>> that will create code duplication, I prefer to not have it if >>> this >>>>> can >>>>>> be handled in device level. >>>>> >>>>> Actually I am agree with the author of the patch - when TX offloads >>>>> and/or multisegs are enabled, >>>>> user supposed to invoke eth_tx_prepare(). >>>>> Not doing that seems like a bug to me. >>>> >>>> I strongly disagree with that statement, Konstantin! >>>> It is not documented anywhere that using TX offloads and/or multisegs >>> requires calling rte_eth_tx_prepare() before >>>> rte_eth_tx_burst(). And none of the examples do it. >>> >>> In fact, we do use it for test-pmd/csumonly.c. >>> About other sample apps: >>> AFAIK, not many of other DPDK apps do use L4 offloads. >>> Right now special treatment (pseudo-header cksum calculation) is needed >>> only for L4 offloads (CKSUM, SEG). >>> So, majority of our apps who rely on other TX offloads (multi-seg, ipv4 >>> cksum, vlan insertion) happily run without >>> calling tx_prepare(), even though it is not the safest way. >>> >>>> >>>> In my opinion: >>>> If some driver has limitations for a feature, e.g. max 8 fragments, >>> it should be documented for that driver, so the application >>>> developer can make the appropriate decisions when designing the >>> application. >>>> Furthermore, we have APIs for the drivers to expose to the >>> applications what the driver supports, so the application can configure >>>> itself optimally at startup. Perhaps those APIs need to be expanded. >>>> And if a feature limitation is common across the majority of drivers, >>> that limitation should be mentioned in the documentation of the >>>> feature itself. >>> >>> Many of such limitations *are* documented and in fact we do have an API >>> to check max segments that each driver support, >>> see struct rte_eth_desc_lim. >> >> Yes, this is the kind of API we should provide, so the application can configure itself appropriately. >> >>> The problem is: >>> - none of our sample app does proper check on these values, so users >>> don't have a good example how to do it. >> >> Agreed. >> Adding an example showing how to do it properly would be the best solution. >> Calling tx_prepare() in the examples is certainly not the solution. >> >>> - with current DPDK API not all of HW/PMD requirements could be >>> extracted programmatically: >>> let say majority of Intel PMDs for TCP offloads expect pseudo-header >>> cksum to be pre-calculated by the SW. >> >> I hope this requirement is documented somewhere. >> >>> another example, some HW expects pkt_len to be bigger then some >>> threshold value, otherwise HW hang may appear. >> >> I hope this requirement is also documented somewhere. > > No idea, I found it only in the code. IMHO Tx burst must check such limitations. If you made your HW simpler (or just lost it on initial testing), pay in your drivers (or your HW+driver will be unusable because of such problems). >> Generally, if the requirements cannot be extracted programmatically, they must be prominently documented, like this note to >> rte_eth_rx_burst(): > > Obviously, more detailed documentation is always good, but... > Right now we have 50+ different PMDs from different vendors. > Even if each and every of them will carefully document all possible limitations and necessary preparation steps, > how DPDK app developer supposed to deal with all that? > Do you expect everyone, to read carefully through all of them, and handle all of them properly oh his own > in each and every DPDK app he is going to write? > That seems unrealistic. > Again what to do with backward compatibility: when new driver (with new limitations) will arise > *after* your app is already written and tested? +1 >> >> * @note >> * Some drivers using vector instructions require that *nb_pkts* is >> * divisible by 4 or 8, depending on the driver implementation. I'm wondering what application should do if it needs to send just one packet and do it now. IMHO, such limitations are not acceptable. >> >>> - As new HW and PMD keep appearing it is hard to predict what extra >>> limitations/requirements will arise, >>> that's why tx_prepare() was introduced as s driver op. >>> >>>> >>>> We don't want to check in the fast path what can be checked at >>> startup or build time! >>> >>> If your app supposed to work with just a few, known in advance, NIC >>> models, then sure, you can do that. >>> For apps that supposed to work 'in general' with any possible PMDs >>> that DPDK supports - that might be a problem. >>> That's why tx_prepare() was introduced and it is strongly recommended >>> to use it by the apps that do use TX offloads. >>> Probably tx_prepare() is not the best possible approach, but right now >>> there are not many alternatives within DPDK. >> >> What exactly is an application supposed to do if tx_prepare() doesn't accept the full burst? It doesn't return information about what is >> wrong. > > It provides some information, but it is *very* limited: just index of 'bad' packet and error code. > In theory app can try to handle it in a 'smart' way: let say if ENOTSUP is returned, then try to disable all HW offloads > and do all in SW. But again, it is much better to do so *before* submitting packets for TX, so in practice everyone > just drop such 'bad' packets. > > Dropping the packets might not be an option, e.g. for applications used in life support or tele-medicine. Critical applications should be able to do all Tx offloads in SW and retry. Of course, various statistics and logs should help to improve the application. > If the packet is 'bad', then it is much better to drop it, then TX corrupted packet or even hang NIC HW completely. IMHO Tx burst must drop packet which could hang NIC HW completely. I realize that it is an extra checks and performance drop, but vendor should pay in performance if HW is not good enough. > Though off-course it is much better to have an app that would check for limitations that can be checked by API provided > and enable only supported offloads. Yes, that's why API to get limitation is much better than documentation. >> If limitations are documented, an application can use the lowest common denominator of the NICs it supports. And if the application is >> supposed to work in general, that becomes the lowest common denominator of all NICs. > > I agree: for limitations that can be extracted with generic API, like: > number of segments per packet, supported TX offloads, mbuf fileds that must be provided for each TX offload, etc. - > it is responsibility of well-written application to obey all of them. > Yes, many tx_prepare() implementations do such checks anyway, but from my perspective it is sort of last-line of defense. > For well written application that should just never happen. > But there is one more important responsibility of tx_prepare() - it performs PMD specific packet modifications for requested offloads. > As I already mentioned for Intel NICs - it does pseudo-header cksum calcucation, for packets with size less then minimal, it can > probably do padding (even if doesn't do it right now), for some other PMDs - might be something else, I didn't check. > Obviously it saves app developer from a burden to do all these things on his own. > >> It looks like tx_prepare() has become a horrible workaround for undocumented limitations. I strongly disagree. Documentation is never a solution for a generic application which is intended to work on any HW and IMHO it is the goal to have more and more applications which work on any HW. >> Limitations due to hardware and/or software tradeoffs are unavoidable, so we have to live with them; but we should not accept >> PMDs with undocumented limitations. > > As I already said, more documentation never hurts, but for that case, I think it is not enough. > I expect PMD to provide a tx_prepare() implementation that would deal with such specific things. +1 > Anyway, back to the original patch - I looked at it once again, and realized that the problem > is just in the unsupported number of segments. > As we discussed above - such limitations should be handled by well written app, > but none of DPDK apps does it right now. > So probably it is good opportunity to do things in a proper way and introduce such checks > in testpmd ;) +1 since we already have fields in device information to report such limitations, but it does not say that Tx prepare should be dropped. Drivers which don't need Tx prepare keep it NULL and it returns immediately from ethdev. Since it is done per-burst, it should not affect performance a lot. > Kaiwen, WDYT? > >>> >>>> >>>>> If it still works for some cases, that's a lucky coincidence, but not >>>>> the expected behavior. >>>>> About performance - first we can check is that really a drop. >>>>> Also as I remember most drivers set it to non-NULL value, only when >>>>> some TX offloads were >>>>> enabled by the user on that port, so hopefully for simple case (one >>>>> segment, no tx offloads) it >>>>> should be negligible. >>>>> Again, we can add manual check in testpmd tx-only code to decide do we >>>>> need a TX prepare >>>>> to be called or not. >>>>> Konstantin ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-23 8:36 ` Andrew Rybchenko @ 2024-02-26 13:26 ` Konstantin Ananyev 2024-02-26 13:56 ` Morten Brørup 0 siblings, 1 reply; 21+ messages in thread From: Konstantin Ananyev @ 2024-02-26 13:26 UTC (permalink / raw) To: Andrew Rybchenko, Morten Brørup, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Thomas Monjalon, Jerin Jacob > >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those > >>>>>>> packets will be dropped by Tx preparation API, but it will > >>> cause > >>>>>>> MDD event if txonly forwarding engine does not call the Tx > >>>>> preparation > >>>>>>> API before transmitting packets. > >>>>>>> > >>>>>> > >>>>>> txonly is used commonly, adding Tx prepare for a specific case > >>> may > >>>>>> impact performance for users. > >>>>>> > >>>>>> What happens when driver throws MDD (Malicious Driver Detection) > >>>>> event, > >>>>>> can't it be ignored? As you are already OK to drop the packet, > >>> can > >>>>>> device be configured to drop these packages? > >>>>>> > >>>>>> > >>>>>> Or as Jerin suggested adding a new forwarding engine is a > >>> solution, > >>>>> but > >>>>>> that will create code duplication, I prefer to not have it if > >>> this > >>>>> can > >>>>>> be handled in device level. > >>>>> > >>>>> Actually I am agree with the author of the patch - when TX offloads > >>>>> and/or multisegs are enabled, > >>>>> user supposed to invoke eth_tx_prepare(). > >>>>> Not doing that seems like a bug to me. > >>>> > >>>> I strongly disagree with that statement, Konstantin! > >>>> It is not documented anywhere that using TX offloads and/or multisegs > >>> requires calling rte_eth_tx_prepare() before > >>>> rte_eth_tx_burst(). And none of the examples do it. > >>> > >>> In fact, we do use it for test-pmd/csumonly.c. > >>> About other sample apps: > >>> AFAIK, not many of other DPDK apps do use L4 offloads. > >>> Right now special treatment (pseudo-header cksum calculation) is needed > >>> only for L4 offloads (CKSUM, SEG). > >>> So, majority of our apps who rely on other TX offloads (multi-seg, ipv4 > >>> cksum, vlan insertion) happily run without > >>> calling tx_prepare(), even though it is not the safest way. > >>> > >>>> > >>>> In my opinion: > >>>> If some driver has limitations for a feature, e.g. max 8 fragments, > >>> it should be documented for that driver, so the application > >>>> developer can make the appropriate decisions when designing the > >>> application. > >>>> Furthermore, we have APIs for the drivers to expose to the > >>> applications what the driver supports, so the application can configure > >>>> itself optimally at startup. Perhaps those APIs need to be expanded. > >>>> And if a feature limitation is common across the majority of drivers, > >>> that limitation should be mentioned in the documentation of the > >>>> feature itself. > >>> > >>> Many of such limitations *are* documented and in fact we do have an API > >>> to check max segments that each driver support, > >>> see struct rte_eth_desc_lim. > >> > >> Yes, this is the kind of API we should provide, so the application can configure itself appropriately. > >> > >>> The problem is: > >>> - none of our sample app does proper check on these values, so users > >>> don't have a good example how to do it. > >> > >> Agreed. > >> Adding an example showing how to do it properly would be the best solution. > >> Calling tx_prepare() in the examples is certainly not the solution. > >> > >>> - with current DPDK API not all of HW/PMD requirements could be > >>> extracted programmatically: > >>> let say majority of Intel PMDs for TCP offloads expect pseudo-header > >>> cksum to be pre-calculated by the SW. > >> > >> I hope this requirement is documented somewhere. > >> > >>> another example, some HW expects pkt_len to be bigger then some > >>> threshold value, otherwise HW hang may appear. > >> > >> I hope this requirement is also documented somewhere. > > > > No idea, I found it only in the code. > > IMHO Tx burst must check such limitations. If you made your HW simpler > (or just lost it on initial testing), pay in your drivers (or your > HW+driver will be unusable because of such problems). > > >> Generally, if the requirements cannot be extracted programmatically, they must be prominently documented, like this note to > >> rte_eth_rx_burst(): > > > > Obviously, more detailed documentation is always good, but... > > Right now we have 50+ different PMDs from different vendors. > > Even if each and every of them will carefully document all possible limitations and necessary preparation steps, > > how DPDK app developer supposed to deal with all that? > > Do you expect everyone, to read carefully through all of them, and handle all of them properly oh his own > > in each and every DPDK app he is going to write? > > That seems unrealistic. > > Again what to do with backward compatibility: when new driver (with new limitations) will arise > > *after* your app is already written and tested? > > +1 > > >> > >> * @note > >> * Some drivers using vector instructions require that *nb_pkts* is > >> * divisible by 4 or 8, depending on the driver implementation. > > I'm wondering what application should do if it needs to send just one > packet and do it now. IMHO, such limitations are not acceptable. > > >> > >>> - As new HW and PMD keep appearing it is hard to predict what extra > >>> limitations/requirements will arise, > >>> that's why tx_prepare() was introduced as s driver op. > >>> > >>>> > >>>> We don't want to check in the fast path what can be checked at > >>> startup or build time! > >>> > >>> If your app supposed to work with just a few, known in advance, NIC > >>> models, then sure, you can do that. > >>> For apps that supposed to work 'in general' with any possible PMDs > >>> that DPDK supports - that might be a problem. > >>> That's why tx_prepare() was introduced and it is strongly recommended > >>> to use it by the apps that do use TX offloads. > >>> Probably tx_prepare() is not the best possible approach, but right now > >>> there are not many alternatives within DPDK. > >> > >> What exactly is an application supposed to do if tx_prepare() doesn't accept the full burst? It doesn't return information about > what is > >> wrong. > > > > It provides some information, but it is *very* limited: just index of 'bad' packet and error code. > > In theory app can try to handle it in a 'smart' way: let say if ENOTSUP is returned, then try to disable all HW offloads > > and do all in SW. But again, it is much better to do so *before* submitting packets for TX, so in practice everyone > > just drop such 'bad' packets. > > > > Dropping the packets might not be an option, e.g. for applications used in life support or tele-medicine. > > Critical applications should be able to do all Tx offloads in SW and > retry. Of course, various statistics and logs should help to improve the > application. > > > If the packet is 'bad', then it is much better to drop it, then TX corrupted packet or even hang NIC HW completely. > > IMHO Tx burst must drop packet which could hang NIC HW completely. I > realize that it is an extra checks and performance drop, but vendor > should pay in performance if HW is not good enough. > > > Though off-course it is much better to have an app that would check for limitations that can be checked by API provided > > and enable only supported offloads. > > Yes, that's why API to get limitation is much better than documentation. > > >> If limitations are documented, an application can use the lowest common denominator of the NICs it supports. And if the > application is > >> supposed to work in general, that becomes the lowest common denominator of all NICs. > > > > I agree: for limitations that can be extracted with generic API, like: > > number of segments per packet, supported TX offloads, mbuf fileds that must be provided for each TX offload, etc. - > > it is responsibility of well-written application to obey all of them. > > Yes, many tx_prepare() implementations do such checks anyway, but from my perspective it is sort of last-line of defense. > > For well written application that should just never happen. > > But there is one more important responsibility of tx_prepare() - it performs PMD specific packet modifications for requested > offloads. > > As I already mentioned for Intel NICs - it does pseudo-header cksum calcucation, for packets with size less then minimal, it can > > probably do padding (even if doesn't do it right now), for some other PMDs - might be something else, I didn't check. > > Obviously it saves app developer from a burden to do all these things on his own. > > > >> It looks like tx_prepare() has become a horrible workaround for undocumented limitations. > > I strongly disagree. Documentation is never a solution for a generic > application which is intended to work on any HW and IMHO it is the > goal to have more and more applications which work on any HW. > > >> Limitations due to hardware and/or software tradeoffs are unavoidable, so we have to live with them; but we should not accept > >> PMDs with undocumented limitations. > > > > As I already said, more documentation never hurts, but for that case, I think it is not enough. > > I expect PMD to provide a tx_prepare() implementation that would deal with such specific things. > > +1 > > Anyway, back to the original patch - I looked at it once again, and realized that the problem > > is just in the unsupported number of segments. > > As we discussed above - such limitations should be handled by well written app, > > but none of DPDK apps does it right now. > > So probably it is good opportunity to do things in a proper way and introduce such checks > > in testpmd ;) > > +1 since we already have fields in device information to report such > limitations, but it does not say that Tx prepare should be dropped. > Drivers which don't need Tx prepare keep it NULL and it returns > immediately from ethdev. Since it is done per-burst, it should not > affect performance a lot. > 100% agree. I think tx_prepare needs to stay, and in general has to be strongly recommended for apps that do use TX offloads/multi-segs. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-26 13:26 ` Konstantin Ananyev @ 2024-02-26 13:56 ` Morten Brørup 2024-02-27 10:41 ` Konstantin Ananyev 0 siblings, 1 reply; 21+ messages in thread From: Morten Brørup @ 2024-02-26 13:56 UTC (permalink / raw) To: Konstantin Ananyev, Andrew Rybchenko, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Thomas Monjalon, Jerin Jacob > > >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those > > >>>>>>> packets will be dropped by Tx preparation API, but it will > > >>> cause > > >>>>>>> MDD event if txonly forwarding engine does not call the Tx > > >>>>> preparation > > >>>>>>> API before transmitting packets. > > >>>>>>> > > >>>>>> > > >>>>>> txonly is used commonly, adding Tx prepare for a specific case > > >>> may > > >>>>>> impact performance for users. > > >>>>>> > > >>>>>> What happens when driver throws MDD (Malicious Driver > Detection) > > >>>>> event, > > >>>>>> can't it be ignored? As you are already OK to drop the packet, > > >>> can > > >>>>>> device be configured to drop these packages? > > >>>>>> > > >>>>>> > > >>>>>> Or as Jerin suggested adding a new forwarding engine is a > > >>> solution, > > >>>>> but > > >>>>>> that will create code duplication, I prefer to not have it if > > >>> this > > >>>>> can > > >>>>>> be handled in device level. > > >>>>> > > >>>>> Actually I am agree with the author of the patch - when TX > offloads > > >>>>> and/or multisegs are enabled, > > >>>>> user supposed to invoke eth_tx_prepare(). > > >>>>> Not doing that seems like a bug to me. > > >>>> > > >>>> I strongly disagree with that statement, Konstantin! > > >>>> It is not documented anywhere that using TX offloads and/or > multisegs > > >>> requires calling rte_eth_tx_prepare() before > > >>>> rte_eth_tx_burst(). And none of the examples do it. > > >>> > > >>> In fact, we do use it for test-pmd/csumonly.c. > > >>> About other sample apps: > > >>> AFAIK, not many of other DPDK apps do use L4 offloads. > > >>> Right now special treatment (pseudo-header cksum calculation) is > needed > > >>> only for L4 offloads (CKSUM, SEG). > > >>> So, majority of our apps who rely on other TX offloads (multi-seg, > ipv4 > > >>> cksum, vlan insertion) happily run without > > >>> calling tx_prepare(), even though it is not the safest way. > > >>> > > >>>> > > >>>> In my opinion: > > >>>> If some driver has limitations for a feature, e.g. max 8 > fragments, > > >>> it should be documented for that driver, so the application > > >>>> developer can make the appropriate decisions when designing the > > >>> application. > > >>>> Furthermore, we have APIs for the drivers to expose to the > > >>> applications what the driver supports, so the application can > configure > > >>>> itself optimally at startup. Perhaps those APIs need to be > expanded. > > >>>> And if a feature limitation is common across the majority of > drivers, > > >>> that limitation should be mentioned in the documentation of the > > >>>> feature itself. > > >>> > > >>> Many of such limitations *are* documented and in fact we do have > an API > > >>> to check max segments that each driver support, > > >>> see struct rte_eth_desc_lim. > > >> > > >> Yes, this is the kind of API we should provide, so the application > can configure itself appropriately. > > >> > > >>> The problem is: > > >>> - none of our sample app does proper check on these values, so > users > > >>> don't have a good example how to do it. > > >> > > >> Agreed. > > >> Adding an example showing how to do it properly would be the best > solution. > > >> Calling tx_prepare() in the examples is certainly not the solution. > > >> > > >>> - with current DPDK API not all of HW/PMD requirements could be > > >>> extracted programmatically: > > >>> let say majority of Intel PMDs for TCP offloads expect pseudo- > header > > >>> cksum to be pre-calculated by the SW. > > >> > > >> I hope this requirement is documented somewhere. > > >> > > >>> another example, some HW expects pkt_len to be bigger then some > > >>> threshold value, otherwise HW hang may appear. > > >> > > >> I hope this requirement is also documented somewhere. > > > > > > No idea, I found it only in the code. > > > > IMHO Tx burst must check such limitations. If you made your HW simpler > > (or just lost it on initial testing), pay in your drivers (or your > > HW+driver will be unusable because of such problems). > > > > >> Generally, if the requirements cannot be extracted > programmatically, they must be prominently documented, like this note to > > >> rte_eth_rx_burst(): > > > > > > Obviously, more detailed documentation is always good, but... > > > Right now we have 50+ different PMDs from different vendors. > > > Even if each and every of them will carefully document all possible > limitations and necessary preparation steps, > > > how DPDK app developer supposed to deal with all that? > > > Do you expect everyone, to read carefully through all of them, and > handle all of them properly oh his own > > > in each and every DPDK app he is going to write? > > > That seems unrealistic. > > > Again what to do with backward compatibility: when new driver (with > new limitations) will arise > > > *after* your app is already written and tested? > > > > +1 > > > > >> > > >> * @note > > >> * Some drivers using vector instructions require that *nb_pkts* > is > > >> * divisible by 4 or 8, depending on the driver implementation. > > > > I'm wondering what application should do if it needs to send just one > > packet and do it now. IMHO, such limitations are not acceptable. This common limitation for vector drivers is for RX burst, not TX burst. I agree such a limitation would be unacceptable for TX. > > > > >> > > >>> - As new HW and PMD keep appearing it is hard to predict what > extra > > >>> limitations/requirements will arise, > > >>> that's why tx_prepare() was introduced as s driver op. > > >>> > > >>>> > > >>>> We don't want to check in the fast path what can be checked at > > >>> startup or build time! > > >>> > > >>> If your app supposed to work with just a few, known in advance, > NIC > > >>> models, then sure, you can do that. > > >>> For apps that supposed to work 'in general' with any possible > PMDs > > >>> that DPDK supports - that might be a problem. > > >>> That's why tx_prepare() was introduced and it is strongly > recommended > > >>> to use it by the apps that do use TX offloads. > > >>> Probably tx_prepare() is not the best possible approach, but right > now > > >>> there are not many alternatives within DPDK. > > >> > > >> What exactly is an application supposed to do if tx_prepare() > doesn't accept the full burst? It doesn't return information about > > what is > > >> wrong. > > > > > > It provides some information, but it is *very* limited: just index > of 'bad' packet and error code. > > > In theory app can try to handle it in a 'smart' way: let say if > ENOTSUP is returned, then try to disable all HW offloads > > > and do all in SW. But again, it is much better to do so *before* > submitting packets for TX, so in practice everyone > > > just drop such 'bad' packets. > > > > > > Dropping the packets might not be an option, e.g. for applications > used in life support or tele-medicine. > > > > Critical applications should be able to do all Tx offloads in SW and > > retry. Of course, various statistics and logs should help to improve > the > > application. > > > > > If the packet is 'bad', then it is much better to drop it, then TX > corrupted packet or even hang NIC HW completely. > > > > IMHO Tx burst must drop packet which could hang NIC HW completely. I > > realize that it is an extra checks and performance drop, but vendor > > should pay in performance if HW is not good enough. > > > > > Though off-course it is much better to have an app that would check > for limitations that can be checked by API provided > > > and enable only supported offloads. > > > > Yes, that's why API to get limitation is much better than > documentation. > > > > >> If limitations are documented, an application can use the lowest > common denominator of the NICs it supports. And if the > > application is > > >> supposed to work in general, that becomes the lowest common > denominator of all NICs. > > > > > > I agree: for limitations that can be extracted with generic API, > like: > > > number of segments per packet, supported TX offloads, mbuf fileds > that must be provided for each TX offload, etc. - > > > it is responsibility of well-written application to obey all of > them. > > > Yes, many tx_prepare() implementations do such checks anyway, but > from my perspective it is sort of last-line of defense. > > > For well written application that should just never happen. > > > But there is one more important responsibility of tx_prepare() - it > performs PMD specific packet modifications for requested > > offloads. > > > As I already mentioned for Intel NICs - it does pseudo-header cksum > calcucation, for packets with size less then minimal, it can > > > probably do padding (even if doesn't do it right now), for some > other PMDs - might be something else, I didn't check. > > > Obviously it saves app developer from a burden to do all these > things on his own. > > > > > >> It looks like tx_prepare() has become a horrible workaround for > undocumented limitations. > > > > I strongly disagree. Documentation is never a solution for a generic > > application which is intended to work on any HW and IMHO it is the > > goal to have more and more applications which work on any HW. > > > > >> Limitations due to hardware and/or software tradeoffs are > unavoidable, so we have to live with them; but we should not accept > > >> PMDs with undocumented limitations. > > > > > > As I already said, more documentation never hurts, but for that > case, I think it is not enough. > > > I expect PMD to provide a tx_prepare() implementation that would > deal with such specific things. > > > > +1 > > > Anyway, back to the original patch - I looked at it once again, and > realized that the problem > > > is just in the unsupported number of segments. > > > As we discussed above - such limitations should be handled by well > written app, > > > but none of DPDK apps does it right now. > > > So probably it is good opportunity to do things in a proper way and > introduce such checks > > > in testpmd ;) > > > > +1 since we already have fields in device information to report such > > limitations, but it does not say that Tx prepare should be dropped. > > Drivers which don't need Tx prepare keep it NULL and it returns > > immediately from ethdev. Since it is done per-burst, it should not > > affect performance a lot. > > > > 100% agree. I think tx_prepare needs to stay, and in general has to be > strongly recommended > for apps that do use TX offloads/multi-segs. <irony> Then tx_prepare should randomly reject packets and segments, to ensure the application has sufficient SW fallback implemented. </irony> No, seriously, considering the above arguments, I think PMD conformance requirements would be a better solution. If a PMD claims support for some feature, it must conform to some minimum requirements for that feature. E.g. support for multi-seg TX must be able to handle min. 8 segments (or some other reasonable number). Common limitations and preconditions, such as minimum RX burst size for vector drivers and pseudo-header checksum precalculation for TCP offload, should be accepted and prominently documented. Unusual HW limitations, such as inability to pad short Ethernet frames, should be handled by the driver's TX function as workarounds for unacceptable limitations (in DPDK API context) in the HW. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-26 13:56 ` Morten Brørup @ 2024-02-27 10:41 ` Konstantin Ananyev 0 siblings, 0 replies; 21+ messages in thread From: Konstantin Ananyev @ 2024-02-27 10:41 UTC (permalink / raw) To: Morten Brørup, Andrew Rybchenko, Ferruh Yigit, Kaiwen Deng, dev Cc: stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand, Thomas Monjalon, Jerin Jacob > Subject: RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine > > > > >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those > > > >>>>>>> packets will be dropped by Tx preparation API, but it will > > > >>> cause > > > >>>>>>> MDD event if txonly forwarding engine does not call the Tx > > > >>>>> preparation > > > >>>>>>> API before transmitting packets. > > > >>>>>>> > > > >>>>>> > > > >>>>>> txonly is used commonly, adding Tx prepare for a specific case > > > >>> may > > > >>>>>> impact performance for users. > > > >>>>>> > > > >>>>>> What happens when driver throws MDD (Malicious Driver > > Detection) > > > >>>>> event, > > > >>>>>> can't it be ignored? As you are already OK to drop the packet, > > > >>> can > > > >>>>>> device be configured to drop these packages? > > > >>>>>> > > > >>>>>> > > > >>>>>> Or as Jerin suggested adding a new forwarding engine is a > > > >>> solution, > > > >>>>> but > > > >>>>>> that will create code duplication, I prefer to not have it if > > > >>> this > > > >>>>> can > > > >>>>>> be handled in device level. > > > >>>>> > > > >>>>> Actually I am agree with the author of the patch - when TX > > offloads > > > >>>>> and/or multisegs are enabled, > > > >>>>> user supposed to invoke eth_tx_prepare(). > > > >>>>> Not doing that seems like a bug to me. > > > >>>> > > > >>>> I strongly disagree with that statement, Konstantin! > > > >>>> It is not documented anywhere that using TX offloads and/or > > multisegs > > > >>> requires calling rte_eth_tx_prepare() before > > > >>>> rte_eth_tx_burst(). And none of the examples do it. > > > >>> > > > >>> In fact, we do use it for test-pmd/csumonly.c. > > > >>> About other sample apps: > > > >>> AFAIK, not many of other DPDK apps do use L4 offloads. > > > >>> Right now special treatment (pseudo-header cksum calculation) is > > needed > > > >>> only for L4 offloads (CKSUM, SEG). > > > >>> So, majority of our apps who rely on other TX offloads (multi-seg, > > ipv4 > > > >>> cksum, vlan insertion) happily run without > > > >>> calling tx_prepare(), even though it is not the safest way. > > > >>> > > > >>>> > > > >>>> In my opinion: > > > >>>> If some driver has limitations for a feature, e.g. max 8 > > fragments, > > > >>> it should be documented for that driver, so the application > > > >>>> developer can make the appropriate decisions when designing the > > > >>> application. > > > >>>> Furthermore, we have APIs for the drivers to expose to the > > > >>> applications what the driver supports, so the application can > > configure > > > >>>> itself optimally at startup. Perhaps those APIs need to be > > expanded. > > > >>>> And if a feature limitation is common across the majority of > > drivers, > > > >>> that limitation should be mentioned in the documentation of the > > > >>>> feature itself. > > > >>> > > > >>> Many of such limitations *are* documented and in fact we do have > > an API > > > >>> to check max segments that each driver support, > > > >>> see struct rte_eth_desc_lim. > > > >> > > > >> Yes, this is the kind of API we should provide, so the application > > can configure itself appropriately. > > > >> > > > >>> The problem is: > > > >>> - none of our sample app does proper check on these values, so > > users > > > >>> don't have a good example how to do it. > > > >> > > > >> Agreed. > > > >> Adding an example showing how to do it properly would be the best > > solution. > > > >> Calling tx_prepare() in the examples is certainly not the solution. > > > >> > > > >>> - with current DPDK API not all of HW/PMD requirements could be > > > >>> extracted programmatically: > > > >>> let say majority of Intel PMDs for TCP offloads expect pseudo- > > header > > > >>> cksum to be pre-calculated by the SW. > > > >> > > > >> I hope this requirement is documented somewhere. > > > >> > > > >>> another example, some HW expects pkt_len to be bigger then some > > > >>> threshold value, otherwise HW hang may appear. > > > >> > > > >> I hope this requirement is also documented somewhere. > > > > > > > > No idea, I found it only in the code. > > > > > > IMHO Tx burst must check such limitations. If you made your HW simpler > > > (or just lost it on initial testing), pay in your drivers (or your > > > HW+driver will be unusable because of such problems). > > > > > > >> Generally, if the requirements cannot be extracted > > programmatically, they must be prominently documented, like this note to > > > >> rte_eth_rx_burst(): > > > > > > > > Obviously, more detailed documentation is always good, but... > > > > Right now we have 50+ different PMDs from different vendors. > > > > Even if each and every of them will carefully document all possible > > limitations and necessary preparation steps, > > > > how DPDK app developer supposed to deal with all that? > > > > Do you expect everyone, to read carefully through all of them, and > > handle all of them properly oh his own > > > > in each and every DPDK app he is going to write? > > > > That seems unrealistic. > > > > Again what to do with backward compatibility: when new driver (with > > new limitations) will arise > > > > *after* your app is already written and tested? > > > > > > +1 > > > > > > >> > > > >> * @note > > > >> * Some drivers using vector instructions require that *nb_pkts* > > is > > > >> * divisible by 4 or 8, depending on the driver implementation. > > > > > > I'm wondering what application should do if it needs to send just one > > > packet and do it now. IMHO, such limitations are not acceptable. > > This common limitation for vector drivers is for RX burst, not TX burst. > I agree such a limitation would be unacceptable for TX. > > > > > > > >> > > > >>> - As new HW and PMD keep appearing it is hard to predict what > > extra > > > >>> limitations/requirements will arise, > > > >>> that's why tx_prepare() was introduced as s driver op. > > > >>> > > > >>>> > > > >>>> We don't want to check in the fast path what can be checked at > > > >>> startup or build time! > > > >>> > > > >>> If your app supposed to work with just a few, known in advance, > > NIC > > > >>> models, then sure, you can do that. > > > >>> For apps that supposed to work 'in general' with any possible > > PMDs > > > >>> that DPDK supports - that might be a problem. > > > >>> That's why tx_prepare() was introduced and it is strongly > > recommended > > > >>> to use it by the apps that do use TX offloads. > > > >>> Probably tx_prepare() is not the best possible approach, but right > > now > > > >>> there are not many alternatives within DPDK. > > > >> > > > >> What exactly is an application supposed to do if tx_prepare() > > doesn't accept the full burst? It doesn't return information about > > > what is > > > >> wrong. > > > > > > > > It provides some information, but it is *very* limited: just index > > of 'bad' packet and error code. > > > > In theory app can try to handle it in a 'smart' way: let say if > > ENOTSUP is returned, then try to disable all HW offloads > > > > and do all in SW. But again, it is much better to do so *before* > > submitting packets for TX, so in practice everyone > > > > just drop such 'bad' packets. > > > > > > > > Dropping the packets might not be an option, e.g. for applications > > used in life support or tele-medicine. > > > > > > Critical applications should be able to do all Tx offloads in SW and > > > retry. Of course, various statistics and logs should help to improve > > the > > > application. > > > > > > > If the packet is 'bad', then it is much better to drop it, then TX > > corrupted packet or even hang NIC HW completely. > > > > > > IMHO Tx burst must drop packet which could hang NIC HW completely. I > > > realize that it is an extra checks and performance drop, but vendor > > > should pay in performance if HW is not good enough. > > > > > > > Though off-course it is much better to have an app that would check > > for limitations that can be checked by API provided > > > > and enable only supported offloads. > > > > > > Yes, that's why API to get limitation is much better than > > documentation. > > > > > > >> If limitations are documented, an application can use the lowest > > common denominator of the NICs it supports. And if the > > > application is > > > >> supposed to work in general, that becomes the lowest common > > denominator of all NICs. > > > > > > > > I agree: for limitations that can be extracted with generic API, > > like: > > > > number of segments per packet, supported TX offloads, mbuf fileds > > that must be provided for each TX offload, etc. - > > > > it is responsibility of well-written application to obey all of > > them. > > > > Yes, many tx_prepare() implementations do such checks anyway, but > > from my perspective it is sort of last-line of defense. > > > > For well written application that should just never happen. > > > > But there is one more important responsibility of tx_prepare() - it > > performs PMD specific packet modifications for requested > > > offloads. > > > > As I already mentioned for Intel NICs - it does pseudo-header cksum > > calcucation, for packets with size less then minimal, it can > > > > probably do padding (even if doesn't do it right now), for some > > other PMDs - might be something else, I didn't check. > > > > Obviously it saves app developer from a burden to do all these > > things on his own. > > > > > > > >> It looks like tx_prepare() has become a horrible workaround for > > undocumented limitations. > > > > > > I strongly disagree. Documentation is never a solution for a generic > > > application which is intended to work on any HW and IMHO it is the > > > goal to have more and more applications which work on any HW. > > > > > > >> Limitations due to hardware and/or software tradeoffs are > > unavoidable, so we have to live with them; but we should not accept > > > >> PMDs with undocumented limitations. > > > > > > > > As I already said, more documentation never hurts, but for that > > case, I think it is not enough. > > > > I expect PMD to provide a tx_prepare() implementation that would > > deal with such specific things. > > > > > > +1 > > > > Anyway, back to the original patch - I looked at it once again, and > > realized that the problem > > > > is just in the unsupported number of segments. > > > > As we discussed above - such limitations should be handled by well > > written app, > > > > but none of DPDK apps does it right now. > > > > So probably it is good opportunity to do things in a proper way and > > introduce such checks > > > > in testpmd ;) > > > > > > +1 since we already have fields in device information to report such > > > limitations, but it does not say that Tx prepare should be dropped. > > > Drivers which don't need Tx prepare keep it NULL and it returns > > > immediately from ethdev. Since it is done per-burst, it should not > > > affect performance a lot. > > > > > > > 100% agree. I think tx_prepare needs to stay, and in general has to be > > strongly recommended > > for apps that do use TX offloads/multi-segs. > > <irony> > Then tx_prepare should randomly reject packets and segments, to ensure the application has sufficient SW fallback implemented. > </irony> > > No, seriously, considering the above arguments, I think PMD conformance requirements would be a better solution. > If a PMD claims support for some feature, it must conform to some minimum requirements for that feature. > > E.g. support for multi-seg TX must be able to handle min. 8 segments (or some other reasonable number). Hmm... why is that? I thought we had a consensus here that good behaving app should use existing API (rte_eth_desc_lim) to deduce min common denominator, and that we need some code example, or even better a lib function that would do such thing. > > Common limitations and preconditions, such as minimum RX burst size for vector drivers and pseudo-header checksum precalculation > for TCP offload, should be accepted and prominently documented. I still don't understand why these preconditions can't be done by tx_prepare(). > Unusual HW limitations, such as inability to pad short Ethernet frames, should be handled by the driver's TX function as workarounds > for unacceptable limitations (in DPDK API context) in the HW. With current design, rte_eth_tx_burst() is not allowed to modify mbufs whose refcnt > 1. That's one of the reason why tx_prepare() was introduced. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 0:07 ` Ferruh Yigit 2024-02-08 10:50 ` Konstantin Ananyev @ 2024-02-08 12:09 ` Jerin Jacob 2024-02-09 19:18 ` Ferruh Yigit 1 sibling, 1 reply; 21+ messages in thread From: Jerin Jacob @ 2024-02-08 12:09 UTC (permalink / raw) To: Ferruh Yigit, Morten Brørup, Konstantin Ananyev Cc: Kaiwen Deng, dev, stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand On Thu, Feb 8, 2024 at 6:15 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote: > > On 1/11/2024 5:25 AM, Kaiwen Deng wrote: > > Txonly forwarding engine does not call the Tx preparation API > > before transmitting packets. This may cause some problems. > > > > TSO breaks when MSS spans more than 8 data fragments. Those > > packets will be dropped by Tx preparation API, but it will cause > > MDD event if txonly forwarding engine does not call the Tx preparation > > API before transmitting packets. > > > > txonly is used commonly, adding Tx prepare for a specific case may > impact performance for users. > > What happens when driver throws MDD (Malicious Driver Detection) event, > can't it be ignored? As you are already OK to drop the packet, can > device be configured to drop these packages? > > > Or as Jerin suggested adding a new forwarding engine is a solution, but > that will create code duplication, I prefer to not have it if this can We don't need to have full-blown NEW need forwarding engine. Just that, we need to select correct ".packet_fwd" based on the offload requirements. It is easy to avoid code duplication by following without performance impact by moving the logic to compile time and use runtime to fix up static inline generic_tx_only_packet_forward(...., const unsigned flag) { #logic common for both packet forward #if (flag & NEED_PREPARE) prepare specific code } static generic_tx_only_packet_forward_without_prepare() { generic_tx_only_packet_forward(..., 0); } static generic_tx_only_packet_forward_with_prepare() { generic_tx_only_packet_forward(..., NEED_PREPARE); } Select the correct .packet_fwd in runtime(generic_tx_only_packet_forward_without_prepare() vs generic_tx_only_packet_forward_with_prepare()) > be handled in device level. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] app/testpmd: use Tx preparation in txonly engine 2024-02-08 12:09 ` Jerin Jacob @ 2024-02-09 19:18 ` Ferruh Yigit 0 siblings, 0 replies; 21+ messages in thread From: Ferruh Yigit @ 2024-02-09 19:18 UTC (permalink / raw) To: Jerin Jacob, Morten Brørup, Konstantin Ananyev Cc: Kaiwen Deng, dev, stable, qiming.yang, yidingx.zhou, Aman Singh, Yuying Zhang, David Marchand On 2/8/2024 12:09 PM, Jerin Jacob wrote: > On Thu, Feb 8, 2024 at 6:15 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote: >> >> On 1/11/2024 5:25 AM, Kaiwen Deng wrote: >>> Txonly forwarding engine does not call the Tx preparation API >>> before transmitting packets. This may cause some problems. >>> >>> TSO breaks when MSS spans more than 8 data fragments. Those >>> packets will be dropped by Tx preparation API, but it will cause >>> MDD event if txonly forwarding engine does not call the Tx preparation >>> API before transmitting packets. >>> >> >> txonly is used commonly, adding Tx prepare for a specific case may >> impact performance for users. >> >> What happens when driver throws MDD (Malicious Driver Detection) event, >> can't it be ignored? As you are already OK to drop the packet, can >> device be configured to drop these packages? >> >> >> Or as Jerin suggested adding a new forwarding engine is a solution, but >> that will create code duplication, I prefer to not have it if this can > > We don't need to have full-blown NEW need forwarding engine. > Just that, we need to select correct ".packet_fwd" based on the > offload requirements. > > It is easy to avoid code duplication by following without performance > impact by moving the logic to compile time and use runtime to fix up > > static inline > generic_tx_only_packet_forward(...., const unsigned flag) > { > > #logic common for both packet forward > #if (flag & NEED_PREPARE) > prepare specific code > > } > > static > generic_tx_only_packet_forward_without_prepare() > { > generic_tx_only_packet_forward(..., 0); > } > > static > generic_tx_only_packet_forward_with_prepare() > { > generic_tx_only_packet_forward(..., NEED_PREPARE); > } > > Select the correct .packet_fwd in > runtime(generic_tx_only_packet_forward_without_prepare() vs > generic_tx_only_packet_forward_with_prepare()) > +1 to not duplicate code, and I guess we can get requested mode as testpmd parameter, but how can we avoid from performance impact, checking mode in the .packet_fwd brings additional check per burst, and we can't overwrite .packet_fwd. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-02-27 10:41 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-03 1:29 [PATCH v1] app/testpmd: use Tx preparation in txonly engine Kaiwen Deng 2024-01-04 1:03 ` Stephen Hemminger 2024-01-04 5:52 ` Jerin Jacob 2024-01-11 5:25 ` [PATCH v2] " Kaiwen Deng 2024-01-11 6:34 ` lihuisong (C) 2024-01-11 16:57 ` Stephen Hemminger 2024-01-12 16:00 ` David Marchand 2024-02-08 0:07 ` Ferruh Yigit 2024-02-08 10:50 ` Konstantin Ananyev 2024-02-08 11:35 ` Ferruh Yigit 2024-02-08 15:14 ` Konstantin Ananyev 2024-02-08 11:52 ` Morten Brørup 2024-02-11 15:04 ` Konstantin Ananyev 2024-02-13 10:27 ` Morten Brørup 2024-02-22 18:28 ` Konstantin Ananyev 2024-02-23 8:36 ` Andrew Rybchenko 2024-02-26 13:26 ` Konstantin Ananyev 2024-02-26 13:56 ` Morten Brørup 2024-02-27 10:41 ` Konstantin Ananyev 2024-02-08 12:09 ` Jerin Jacob 2024-02-09 19:18 ` Ferruh Yigit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).