* [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support @ 2017-02-22 16:09 Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 1/4] ethdev: add Tx offload limitations Shahaf Shuler ` (4 more replies) 0 siblings, 5 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-02-22 16:09 UTC (permalink / raw) To: adrien.mazarguil, nelio.laranjeiro, thomas.monjalon, jingjing.wu; +Cc: dev This patchset adds support for hardware TSO on mlx5 PMD. * Patches 1/4 and 2/4 adds TSO flags and capabilities to ethdev layer. * Patch 3/4 adds support for the flag introduced in patch 2/4. This patch also simplifies the testing of patch 4/4. * Patch 4/4 implement support for hardware TSO and demonstrate the use of patches 1/2 and 2/2. [PATCH 1/4] ethdev: add Tx offload limitations [PATCH 2/4] ethdev: add TSO disable flag [PATCH 3/4] app/testpmd: add TSO disable to test options [PATCH 4/4] net/mlx5: add hardware TSO support ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH 1/4] ethdev: add Tx offload limitations 2017-02-22 16:09 [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler @ 2017-02-22 16:09 ` Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 2/4] ethdev: add TSO disable flag Shahaf Shuler ` (3 subsequent siblings) 4 siblings, 0 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-02-22 16:09 UTC (permalink / raw) To: adrien.mazarguil, nelio.laranjeiro, thomas.monjalon, jingjing.wu; +Cc: dev Many Tx offloads are performed by hardware. As such, each offload has its own limitations. This commit adds the option to query Tx offload limitations in order to use them properly and avoid bugs. The limitations should be filled by the PMD upon query device info. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> --- lib/librte_ether/rte_ethdev.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 97f3e2d..3ab8568 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -728,6 +728,17 @@ struct rte_eth_desc_lim { uint16_t nb_mtu_seg_max; }; +struct rte_eth_tx_offload_lim { + /** + * Max allowed size of network headers (L2+L3+L4) for TSO offload. + */ + uint32_t max_tso_headers_sz; + /** + * Max allowed size of TCP payload for TSO offload. + */ + uint32_t max_tso_payload_sz; +}; + /** * This enum indicates the flow control mode */ @@ -920,6 +931,8 @@ struct rte_eth_dev_info { uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */ uint32_t rx_offload_capa; /**< Device RX offload capabilities. */ uint32_t tx_offload_capa; /**< Device TX offload capabilities. */ + struct rte_eth_tx_offload_lim tx_off_lim; + /**< Device TX offloads limits. */ uint16_t reta_size; /**< Device redirection table size, the total number of entries. */ uint8_t hash_key_size; /**< Hash key size in bytes */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH 2/4] ethdev: add TSO disable flag 2017-02-22 16:09 [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 1/4] ethdev: add Tx offload limitations Shahaf Shuler @ 2017-02-22 16:09 ` Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 3/4] app/testpmd: add TSO disable to test options Shahaf Shuler ` (2 subsequent siblings) 4 siblings, 0 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-02-22 16:09 UTC (permalink / raw) To: adrien.mazarguil, nelio.laranjeiro, thomas.monjalon, jingjing.wu; +Cc: dev This commit adds the option to disable TSO offload on a specific txq. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> --- lib/librte_ether/rte_ethdev.h | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 3ab8568..b93be09 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -677,6 +677,7 @@ struct rte_eth_rxconf { #define ETH_TXQ_FLAGS_NOXSUMSCTP 0x0200 /**< disable SCTP checksum offload */ #define ETH_TXQ_FLAGS_NOXSUMUDP 0x0400 /**< disable UDP checksum offload */ #define ETH_TXQ_FLAGS_NOXSUMTCP 0x0800 /**< disable TCP checksum offload */ +#define ETH_TXQ_FLAGS_NOTSOOFFL 0x1000 /**< disable TSO offload */ #define ETH_TXQ_FLAGS_NOOFFLOADS \ (ETH_TXQ_FLAGS_NOVLANOFFL | ETH_TXQ_FLAGS_NOXSUMSCTP | \ ETH_TXQ_FLAGS_NOXSUMUDP | ETH_TXQ_FLAGS_NOXSUMTCP) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH 3/4] app/testpmd: add TSO disable to test options 2017-02-22 16:09 [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 1/4] ethdev: add Tx offload limitations Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 2/4] ethdev: add TSO disable flag Shahaf Shuler @ 2017-02-22 16:09 ` Shahaf Shuler 2017-02-22 16:10 ` [dpdk-dev] [PATCH 4/4] net/mlx5: add hardware TSO support Shahaf Shuler 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: add " Shahaf Shuler 4 siblings, 0 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-02-22 16:09 UTC (permalink / raw) To: adrien.mazarguil, nelio.laranjeiro, thomas.monjalon, jingjing.wu; +Cc: dev Add the option to globaly disable hardware TSO offload from the command line. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> --- app/test-pmd/parameters.c | 9 ++++++++- doc/guides/testpmd_app_ug/run_app.rst | 4 ++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 28db8cd..7cd88bc 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -196,6 +196,7 @@ " or total packet length.\n"); printf(" --disable-link-check: disable check on link status when " "starting/stopping ports.\n"); + printf(" --disable-tso: disable hardware TCP segmentation offload.\n"); } #ifdef RTE_LIBRTE_CMDLINE @@ -561,6 +562,7 @@ { "no-flush-rx", 0, 0, 0 }, { "txpkts", 1, 0, 0 }, { "disable-link-check", 0, 0, 0 }, + { "disable-tso", 0, 0, 0 }, { 0, 0, 0, 0 }, }; @@ -978,7 +980,12 @@ no_flush_rx = 1; if (!strcmp(lgopts[opt_idx].name, "disable-link-check")) no_link_check = 1; - + if (!strcmp(lgopts[opt_idx].name, "disable-tso")) { + if (txq_flags < 0) + txq_flags = ETH_TXQ_FLAGS_NOTSOOFFL; + else + txq_flags |= ETH_TXQ_FLAGS_NOTSOOFFL; + } break; case 'h': usage(argv[0]); diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst index 38a4025..58417ea 100644 --- a/doc/guides/testpmd_app_ug/run_app.rst +++ b/doc/guides/testpmd_app_ug/run_app.rst @@ -460,3 +460,7 @@ The commandline options are: * ``--disable-link-check`` Disable check on link status when starting/stopping ports. + +* ``--disable-tso`` + + Disable hardware TCP segmentation offload. -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH 4/4] net/mlx5: add hardware TSO support 2017-02-22 16:09 [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler ` (2 preceding siblings ...) 2017-02-22 16:09 ` [dpdk-dev] [PATCH 3/4] app/testpmd: add TSO disable to test options Shahaf Shuler @ 2017-02-22 16:10 ` Shahaf Shuler 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: add " Shahaf Shuler 4 siblings, 0 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-02-22 16:10 UTC (permalink / raw) To: adrien.mazarguil, nelio.laranjeiro, thomas.monjalon, jingjing.wu; +Cc: dev Implement support for hardware TSO. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> --- Performance impact on tx function due to tso logic insertion is estimated ~4 clocks. --- doc/guides/nics/features/mlx5.ini | 1 + drivers/net/mlx5/mlx5.c | 8 +++ drivers/net/mlx5/mlx5.h | 2 + drivers/net/mlx5/mlx5_defs.h | 3 + drivers/net/mlx5/mlx5_ethdev.c | 5 ++ drivers/net/mlx5/mlx5_rxtx.c | 130 ++++++++++++++++++++++++++++++++++---- drivers/net/mlx5/mlx5_rxtx.h | 2 + drivers/net/mlx5/mlx5_txq.c | 10 +++ 8 files changed, 147 insertions(+), 14 deletions(-) diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index f20d214..c6948cb 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -19,6 +19,7 @@ RSS hash = Y RSS key update = Y RSS reta update = Y SR-IOV = Y +TSO = Y VLAN filter = Y Flow director = Y Flow API = Y diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d4bd469..74cae6a 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -479,6 +479,7 @@ IBV_EXP_DEVICE_ATTR_RX_HASH | IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | + IBV_EXP_DEVICE_ATTR_TSO_CAPS | 0; DEBUG("using port %u (%08" PRIx32 ")", port, test); @@ -586,6 +587,13 @@ err = ENOTSUP; goto port_error; } + priv->tso = ((!priv->mps) && + (exp_device_attr.tso_caps.max_tso > 0) && + (exp_device_attr.tso_caps.supported_qpts & + (1 << IBV_QPT_RAW_ETH))); + if (priv->tso) + priv->max_tso_payload_sz = + exp_device_attr.tso_caps.max_tso; /* Allocate and register default RSS hash keys. */ priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n, sizeof((*priv->rss_conf)[0]), 0); diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 2b4345a..d2bb835 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -126,6 +126,8 @@ struct priv { unsigned int mps:1; /* Whether multi-packet send is supported. */ unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ unsigned int pending_alarm:1; /* An alarm is pending. */ + unsigned int tso:1; /* Whether TSO is supported. */ + unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */ unsigned int txq_inline; /* Maximum packet size for inlining. */ unsigned int txqs_inline; /* Queue number threshold for inlining. */ /* RX/TX queues. */ diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index e91d245..eecb908 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -79,4 +79,7 @@ /* Maximum number of extended statistics counters. */ #define MLX5_MAX_XSTATS 32 +/* Maximum Packet headers size (L2+L3+L4) for TSO. */ +#define MLX5_MAX_TSO_HEADER 128 + #endif /* RTE_PMD_MLX5_DEFS_H_ */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 6b64f44..5fc9511 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -693,6 +693,11 @@ struct priv * (DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM); + if (priv->tso) { + info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; + info->tx_off_lim.max_tso_payload_sz = priv->max_tso_payload_sz; + info->tx_off_lim.max_tso_headers_sz = MLX5_MAX_TSO_HEADER; + } if (priv_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); /* FIXME: RETA update/query API expects the callee to know the size of diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index b2b7223..30b0096 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -365,6 +365,7 @@ const unsigned int elts_n = 1 << txq->elts_n; unsigned int i = 0; unsigned int j = 0; + unsigned int k = 0; unsigned int max; uint16_t max_wqe; unsigned int comp; @@ -392,8 +393,10 @@ uintptr_t addr; uint64_t naddr; uint16_t pkt_inline_sz = MLX5_WQE_DWORD_SIZE + 2; + uint16_t tso_header_sz = 0; uint16_t ehdr; uint8_t cs_flags = 0; + uint64_t tso = 0; #ifdef MLX5_PMD_SOFT_COUNTERS uint32_t total_length = 0; #endif @@ -465,6 +468,88 @@ length -= pkt_inline_sz; addr += pkt_inline_sz; } + if (txq->max_tso_inline) { + tso = buf->ol_flags & PKT_TX_TCP_SEG; + if (tso) { + uintptr_t end = (uintptr_t) + (((uintptr_t)txq->wqes) + + (1 << txq->wqe_n) * + MLX5_WQE_SIZE); + unsigned int max_tso_inline = + txq->max_tso_inline * + RTE_CACHE_LINE_SIZE - + MLX5_WQE_DWORD_SIZE; + unsigned int max_native_inline = + txq->max_inline ? + txq->max_inline * + RTE_CACHE_LINE_SIZE - + MLX5_WQE_DWORD_SIZE : + 0; + unsigned int max_inline = + RTE_MAX(max_tso_inline, + max_native_inline); + uintptr_t addr_end = (addr + max_inline) & + ~(RTE_CACHE_LINE_SIZE - 1); + unsigned int copy_b; + uint16_t n; + + tso_header_sz = buf->l2_len + buf->l3_len + + buf->l4_len; + copy_b = tso_header_sz - pkt_inline_sz; + /* First seg must contain all headers. */ + assert(copy_b <= length); + raw += MLX5_WQE_DWORD_SIZE; + if (copy_b && + ((end - (uintptr_t)raw) > copy_b)) { +tso_inline: + n = (MLX5_WQE_DS(copy_b) - 1 + 3) / 4; + if (unlikely(max_wqe < n)) + break; + max_wqe -= n; + rte_memcpy((void *)raw, + (void *)addr, copy_b); + addr += copy_b; + length -= copy_b; + pkt_inline_sz += copy_b; + } else { + /* NOP WQE. */ + wqe->ctrl = (rte_v128u32_t){ + htonl(txq->wqe_ci << 8), + htonl(txq->qp_num_8s | 1), + 0, + 0, + }; + ds = 1; + total_length = 0; + pkts--; + pkts_n++; + elts_head = (elts_head - 1) & + (elts_n - 1); + k++; + goto next_wqe; + } + raw += MLX5_WQE_DS(copy_b) * + MLX5_WQE_DWORD_SIZE; + copy_b = (addr_end > addr) ? + RTE_MIN((addr_end - addr), length) : + 0; + if (copy_b) { + uint32_t inl = + htonl(copy_b | MLX5_INLINE_SEG); + + pkt_inline_sz = + MLX5_WQE_DS(tso_header_sz) * + MLX5_WQE_DWORD_SIZE; + rte_memcpy((void *)raw, + (void *)&inl, sizeof(inl)); + raw += sizeof(inl); + pkt_inline_sz += sizeof(inl); + goto tso_inline; + } else { + goto check_ptr; + } + } + } /* Inline if enough room. */ if (txq->max_inline) { uintptr_t end = (uintptr_t) @@ -496,6 +581,7 @@ length -= copy_b; pkt_inline_sz += copy_b; } +check_ptr: /* * 2 DWORDs consumed by the WQE header + ETH segment + * the size of the inline part of the packet. @@ -591,18 +677,34 @@ next_pkt: ++i; /* Initialize known and common part of the WQE structure. */ - wqe->ctrl = (rte_v128u32_t){ - htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), - htonl(txq->qp_num_8s | ds), - 0, - 0, - }; - wqe->eseg = (rte_v128u32_t){ - 0, - cs_flags, - 0, - (ehdr << 16) | htons(pkt_inline_sz), - }; + if (tso) { + wqe->ctrl = (rte_v128u32_t){ + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_TSO), + htonl(txq->qp_num_8s | ds), + 0, + 0, + }; + wqe->eseg = (rte_v128u32_t){ + 0, + cs_flags | (htons(buf->tso_segsz) << 16), + 0, + (ehdr << 16) | htons(tso_header_sz), + }; + } else { + wqe->ctrl = (rte_v128u32_t){ + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), + htonl(txq->qp_num_8s | ds), + 0, + 0, + }; + wqe->eseg = (rte_v128u32_t){ + 0, + cs_flags, + 0, + (ehdr << 16) | htons(pkt_inline_sz), + }; + } +next_wqe: txq->wqe_ci += (ds + 3) / 4; #ifdef MLX5_PMD_SOFT_COUNTERS /* Increment sent bytes counter. */ @@ -610,10 +712,10 @@ #endif } while (pkts_n); /* Take a shortcut if nothing must be sent. */ - if (unlikely(i == 0)) + if (unlikely((i + k) == 0)) return 0; /* Check whether completion threshold has been reached. */ - comp = txq->elts_comp + i + j; + comp = txq->elts_comp + i + j + k; if (comp >= MLX5_TX_COMP_THRESH) { volatile struct mlx5_wqe_ctrl *w = (volatile struct mlx5_wqe_ctrl *)wqe; diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 41a34d7..a6b8bdd 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -254,6 +254,8 @@ struct txq { uint16_t cqe_n:4; /* Number of CQ elements (in log2). */ uint16_t wqe_n:4; /* Number of of WQ elements (in log2). */ uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ + uint16_t max_tso_inline; + /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ uint32_t qp_num_8s; /* QP number shifted by 8. */ volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ volatile void *wqes; /* Work queue (use volatile to write into). */ diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 949035b..8fe6328 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -343,6 +343,16 @@ attr.init.cap.max_inline_data = tmpl.txq.max_inline * RTE_CACHE_LINE_SIZE; } + if (priv->tso && !(conf->txq_flags & ETH_TXQ_FLAGS_NOTSOOFFL)) { + uint16_t max_tso_inline = ((MLX5_MAX_TSO_HEADER + + (RTE_CACHE_LINE_SIZE - 1)) / + RTE_CACHE_LINE_SIZE); + + attr.init.max_tso_header = + max_tso_inline * RTE_CACHE_LINE_SIZE; + attr.init.comp_mask |= IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER; + tmpl.txq.max_tso_inline = max_tso_inline; + } tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); if (tmpl.qp == NULL) { ret = (errno ? errno : EINVAL); -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v2 0/1] net/mlx5: add TSO support 2017-02-22 16:09 [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler ` (3 preceding siblings ...) 2017-02-22 16:10 ` [dpdk-dev] [PATCH 4/4] net/mlx5: add hardware TSO support Shahaf Shuler @ 2017-03-01 11:11 ` Shahaf Shuler 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 1/1] net/mlx5: add hardware " Shahaf Shuler 2017-03-02 9:01 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler 4 siblings, 2 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-03-01 11:11 UTC (permalink / raw) To: nelio.laranjeiro, adrien.mazarguil; +Cc: dev on v2: * Suppressed patches: [PATCH 1/4] ethdev: add Tx offload limitations. [PATCH 2/4] ethdev: add TSO disable flag. [PATCH 3/4] app/testpmd: add TSO disable to test options. * The changes introduced on the above conflict with tx_prepare API and break ABI. A proposal to disable by default optional offloads and a way to reflect HW offloads limitations to application will be addressed on different commit. * TSO support modification [PATCH v2 1/1] net/mlx5: add hardware TSO support ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v2 1/1] net/mlx5: add hardware TSO support 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: add " Shahaf Shuler @ 2017-03-01 11:11 ` Shahaf Shuler 2017-03-01 14:33 ` Nélio Laranjeiro 2017-03-02 9:01 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler 1 sibling, 1 reply; 14+ messages in thread From: Shahaf Shuler @ 2017-03-01 11:11 UTC (permalink / raw) To: nelio.laranjeiro, adrien.mazarguil; +Cc: dev Implement support for hardware TSO. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> --- on v2: * Instead of exposing capability, TSO checks on data path. * PMD specific parameter to enable TSO. * different implementaion for the data path. Performance impact ~0.1-0.2Mpps --- doc/guides/nics/features/mlx5.ini | 1 + doc/guides/nics/mlx5.rst | 12 ++++ drivers/net/mlx5/mlx5.c | 18 ++++++ drivers/net/mlx5/mlx5.h | 2 + drivers/net/mlx5/mlx5_defs.h | 3 + drivers/net/mlx5/mlx5_ethdev.c | 2 + drivers/net/mlx5/mlx5_rxtx.c | 120 +++++++++++++++++++++++++++++++++----- drivers/net/mlx5/mlx5_rxtx.h | 2 + drivers/net/mlx5/mlx5_txq.c | 13 +++++ 9 files changed, 157 insertions(+), 16 deletions(-) diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index f20d214..c6948cb 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -19,6 +19,7 @@ RSS hash = Y RSS key update = Y RSS reta update = Y SR-IOV = Y +TSO = Y VLAN filter = Y Flow director = Y Flow API = Y diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 09922a0..8651456 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -90,6 +90,7 @@ Features - Secondary process TX is supported. - KVM and VMware ESX SR-IOV modes are supported. - RSS hash result is supported. +- Hardware TSO. Limitations ----------- @@ -186,9 +187,20 @@ Run-time configuration save PCI bandwidth and improve performance at the cost of a slightly higher CPU usage. + This option cannot be used in conjunction with ``tso`` below. When ``tso`` + is set, ``txq_mpw_en`` is disabled. + It is currently only supported on the ConnectX-4 Lx and ConnectX-5 families of adapters. Enabled by default. +- ``tso`` parameter [int] + + A nonzero value enables hardware TSO. + When hardware TSO is enabled, packets marked with TCP segmentation + offload will be divided into segments by the hardware. + + Disabled by default. + Prerequisites ------------- diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d4bd469..3623fbe 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -84,6 +84,9 @@ /* Device parameter to enable multi-packet send WQEs. */ #define MLX5_TXQ_MPW_EN "txq_mpw_en" +/* Device parameter to enable hardware TSO offload. */ +#define MLX5_TSO "tso" + /** * Retrieve integer value from environment variable. * @@ -290,6 +293,8 @@ priv->txqs_inline = tmp; } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { priv->mps &= !!tmp; /* Enable MPW only if HW supports */ + } else if (strcmp(MLX5_TSO, key) == 0) { + priv->tso = !!tmp; } else { WARN("%s: unknown parameter", key); return -EINVAL; @@ -316,6 +321,7 @@ MLX5_TXQ_INLINE, MLX5_TXQS_MIN_INLINE, MLX5_TXQ_MPW_EN, + MLX5_TSO, NULL, }; struct rte_kvargs *kvlist; @@ -479,6 +485,7 @@ IBV_EXP_DEVICE_ATTR_RX_HASH | IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | + IBV_EXP_DEVICE_ATTR_TSO_CAPS | 0; DEBUG("using port %u (%08" PRIx32 ")", port, test); @@ -580,11 +587,22 @@ priv_get_num_vfs(priv, &num_vfs); priv->sriov = (num_vfs || sriov); + priv->tso = ((priv->tso) && + (exp_device_attr.tso_caps.max_tso > 0) && + (exp_device_attr.tso_caps.supported_qpts & + (1 << IBV_QPT_RAW_ETH))); + if (priv->tso) + priv->max_tso_payload_sz = + exp_device_attr.tso_caps.max_tso; if (priv->mps && !mps) { ERROR("multi-packet send not supported on this device" " (" MLX5_TXQ_MPW_EN ")"); err = ENOTSUP; goto port_error; + } else if (priv->mps && priv->tso) { + WARN("multi-paclet send not supported in conjunction " + "with TSO. PMD overrides to 0"); + priv->mps = 0; } /* Allocate and register default RSS hash keys. */ priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 2b4345a..d2bb835 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -126,6 +126,8 @@ struct priv { unsigned int mps:1; /* Whether multi-packet send is supported. */ unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ unsigned int pending_alarm:1; /* An alarm is pending. */ + unsigned int tso:1; /* Whether TSO is supported. */ + unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */ unsigned int txq_inline; /* Maximum packet size for inlining. */ unsigned int txqs_inline; /* Queue number threshold for inlining. */ /* RX/TX queues. */ diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index e91d245..eecb908 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -79,4 +79,7 @@ /* Maximum number of extended statistics counters. */ #define MLX5_MAX_XSTATS 32 +/* Maximum Packet headers size (L2+L3+L4) for TSO. */ +#define MLX5_MAX_TSO_HEADER 128 + #endif /* RTE_PMD_MLX5_DEFS_H_ */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 5677f03..5542193 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -693,6 +693,8 @@ struct priv * (DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM); + if (priv->tso) + info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; if (priv_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); /* FIXME: RETA update/query API expects the callee to know the size of diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index b2b7223..3589aae 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -365,6 +365,7 @@ const unsigned int elts_n = 1 << txq->elts_n; unsigned int i = 0; unsigned int j = 0; + unsigned int k = 0; unsigned int max; uint16_t max_wqe; unsigned int comp; @@ -392,8 +393,10 @@ uintptr_t addr; uint64_t naddr; uint16_t pkt_inline_sz = MLX5_WQE_DWORD_SIZE + 2; + uint16_t tso_header_sz = 0; uint16_t ehdr; uint8_t cs_flags = 0; + uint64_t tso = 0; #ifdef MLX5_PMD_SOFT_COUNTERS uint32_t total_length = 0; #endif @@ -465,14 +468,71 @@ length -= pkt_inline_sz; addr += pkt_inline_sz; } + if (txq->tso_en) { + tso = buf->ol_flags & PKT_TX_TCP_SEG; + if (tso) { + uintptr_t end = (uintptr_t) + (((uintptr_t)txq->wqes) + + (1 << txq->wqe_n) * + MLX5_WQE_SIZE); + unsigned int copy_b; + + tso_header_sz = buf->l2_len + buf->l3_len + + buf->l4_len; + if (unlikely(tso_header_sz > + MLX5_MAX_TSO_HEADER)) + break; + copy_b = tso_header_sz - pkt_inline_sz; + /* First seg must contain all headers. */ + assert(copy_b <= length); + raw += MLX5_WQE_DWORD_SIZE; + if (copy_b && + ((end - (uintptr_t)raw) > copy_b)) { + uint16_t n = (MLX5_WQE_DS(copy_b) - + 1 + 3) / 4; + + if (unlikely(max_wqe < n)) + break; + max_wqe -= n; + rte_memcpy((void *)raw, + (void *)addr, copy_b); + addr += copy_b; + length -= copy_b; + pkt_inline_sz += copy_b; + /* + * Another DWORD will be added + * in the inline part. + */ + raw += MLX5_WQE_DS(copy_b) * + MLX5_WQE_DWORD_SIZE - + MLX5_WQE_DWORD_SIZE; + } else { + /* NOP WQE. */ + wqe->ctrl = (rte_v128u32_t){ + htonl(txq->wqe_ci << 8), + htonl(txq->qp_num_8s | 1), + 0, + 0, + }; + ds = 1; + total_length = 0; + pkts--; + pkts_n++; + elts_head = (elts_head - 1) & + (elts_n - 1); + k++; + goto next_wqe; + } + } + } /* Inline if enough room. */ - if (txq->max_inline) { + if (txq->inline_en || tso) { uintptr_t end = (uintptr_t) (((uintptr_t)txq->wqes) + (1 << txq->wqe_n) * MLX5_WQE_SIZE); unsigned int max_inline = txq->max_inline * RTE_CACHE_LINE_SIZE - - MLX5_WQE_DWORD_SIZE; + (pkt_inline_sz - 2); uintptr_t addr_end = (addr + max_inline) & ~(RTE_CACHE_LINE_SIZE - 1); unsigned int copy_b = (addr_end > addr) ? @@ -491,6 +551,18 @@ if (unlikely(max_wqe < n)) break; max_wqe -= n; + if (tso) { + uint32_t inl = + htonl(copy_b | MLX5_INLINE_SEG); + + pkt_inline_sz = + MLX5_WQE_DS(tso_header_sz) * + MLX5_WQE_DWORD_SIZE; + rte_memcpy((void *)raw, + (void *)&inl, sizeof(inl)); + raw += sizeof(inl); + pkt_inline_sz += sizeof(inl); + } rte_memcpy((void *)raw, (void *)addr, copy_b); addr += copy_b; length -= copy_b; @@ -591,18 +663,34 @@ next_pkt: ++i; /* Initialize known and common part of the WQE structure. */ - wqe->ctrl = (rte_v128u32_t){ - htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), - htonl(txq->qp_num_8s | ds), - 0, - 0, - }; - wqe->eseg = (rte_v128u32_t){ - 0, - cs_flags, - 0, - (ehdr << 16) | htons(pkt_inline_sz), - }; + if (tso) { + wqe->ctrl = (rte_v128u32_t){ + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_TSO), + htonl(txq->qp_num_8s | ds), + 0, + 0, + }; + wqe->eseg = (rte_v128u32_t){ + 0, + cs_flags | (htons(buf->tso_segsz) << 16), + 0, + (ehdr << 16) | htons(tso_header_sz), + }; + } else { + wqe->ctrl = (rte_v128u32_t){ + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), + htonl(txq->qp_num_8s | ds), + 0, + 0, + }; + wqe->eseg = (rte_v128u32_t){ + 0, + cs_flags, + 0, + (ehdr << 16) | htons(pkt_inline_sz), + }; + } +next_wqe: txq->wqe_ci += (ds + 3) / 4; #ifdef MLX5_PMD_SOFT_COUNTERS /* Increment sent bytes counter. */ @@ -610,10 +698,10 @@ #endif } while (pkts_n); /* Take a shortcut if nothing must be sent. */ - if (unlikely(i == 0)) + if (unlikely((i + k) == 0)) return 0; /* Check whether completion threshold has been reached. */ - comp = txq->elts_comp + i + j; + comp = txq->elts_comp + i + j + k; if (comp >= MLX5_TX_COMP_THRESH) { volatile struct mlx5_wqe_ctrl *w = (volatile struct mlx5_wqe_ctrl *)wqe; diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 41a34d7..6b328cf 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -254,6 +254,8 @@ struct txq { uint16_t cqe_n:4; /* Number of CQ elements (in log2). */ uint16_t wqe_n:4; /* Number of of WQ elements (in log2). */ uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ + uint16_t inline_en:1; /* When set inline is enabled. */ + uint16_t tso_en:1; /* When set hardware TSO is enabled. */ uint32_t qp_num_8s; /* QP number shifted by 8. */ volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ volatile void *wqes; /* Work queue (use volatile to write into). */ diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 949035b..995b763 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -342,6 +342,19 @@ RTE_CACHE_LINE_SIZE); attr.init.cap.max_inline_data = tmpl.txq.max_inline * RTE_CACHE_LINE_SIZE; + tmpl.txq.inline_en = 1; + } + if (priv->tso) { + uint16_t max_tso_inline = ((MLX5_MAX_TSO_HEADER + + (RTE_CACHE_LINE_SIZE - 1)) / + RTE_CACHE_LINE_SIZE); + + attr.init.max_tso_header = + max_tso_inline * RTE_CACHE_LINE_SIZE; + attr.init.comp_mask |= IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER; + tmpl.txq.max_inline = RTE_MAX(tmpl.txq.max_inline, + max_tso_inline); + tmpl.txq.tso_en = 1; } tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); if (tmpl.qp == NULL) { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/1] net/mlx5: add hardware TSO support 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 1/1] net/mlx5: add hardware " Shahaf Shuler @ 2017-03-01 14:33 ` Nélio Laranjeiro 0 siblings, 0 replies; 14+ messages in thread From: Nélio Laranjeiro @ 2017-03-01 14:33 UTC (permalink / raw) To: Shahaf Shuler; +Cc: adrien.mazarguil, dev Sahaf, Some few remarks below. On Wed, Mar 01, 2017 at 01:11:42PM +0200, Shahaf Shuler wrote: > Implement support for hardware TSO. > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> > --- > on v2: > * Instead of exposing capability, TSO checks on data path. > * PMD specific parameter to enable TSO. > * different implementaion for the data path. > Performance impact ~0.1-0.2Mpps > --- > doc/guides/nics/features/mlx5.ini | 1 + > doc/guides/nics/mlx5.rst | 12 ++++ > drivers/net/mlx5/mlx5.c | 18 ++++++ > drivers/net/mlx5/mlx5.h | 2 + > drivers/net/mlx5/mlx5_defs.h | 3 + > drivers/net/mlx5/mlx5_ethdev.c | 2 + > drivers/net/mlx5/mlx5_rxtx.c | 120 +++++++++++++++++++++++++++++++++----- > drivers/net/mlx5/mlx5_rxtx.h | 2 + > drivers/net/mlx5/mlx5_txq.c | 13 +++++ > 9 files changed, 157 insertions(+), 16 deletions(-) > > diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini > index f20d214..c6948cb 100644 > --- a/doc/guides/nics/features/mlx5.ini > +++ b/doc/guides/nics/features/mlx5.ini > @@ -19,6 +19,7 @@ RSS hash = Y > RSS key update = Y > RSS reta update = Y > SR-IOV = Y > +TSO = Y This file expects spaces to align the equal sign. > VLAN filter = Y > Flow director = Y > Flow API = Y > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst > index 09922a0..8651456 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -90,6 +90,7 @@ Features > - Secondary process TX is supported. > - KVM and VMware ESX SR-IOV modes are supported. > - RSS hash result is supported. > +- Hardware TSO. > > Limitations > ----------- > @@ -186,9 +187,20 @@ Run-time configuration > save PCI bandwidth and improve performance at the cost of a slightly > higher CPU usage. > > + This option cannot be used in conjunction with ``tso`` below. When ``tso`` > + is set, ``txq_mpw_en`` is disabled. > + > It is currently only supported on the ConnectX-4 Lx and ConnectX-5 > families of adapters. Enabled by default. > > +- ``tso`` parameter [int] > + > + A nonzero value enables hardware TSO. > + When hardware TSO is enabled, packets marked with TCP segmentation > + offload will be divided into segments by the hardware. > + > + Disabled by default. > + > Prerequisites > ------------- > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index d4bd469..3623fbe 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -84,6 +84,9 @@ > /* Device parameter to enable multi-packet send WQEs. */ > #define MLX5_TXQ_MPW_EN "txq_mpw_en" > > +/* Device parameter to enable hardware TSO offload. */ > +#define MLX5_TSO "tso" > + > /** > * Retrieve integer value from environment variable. > * > @@ -290,6 +293,8 @@ > priv->txqs_inline = tmp; > } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { > priv->mps &= !!tmp; /* Enable MPW only if HW supports */ > + } else if (strcmp(MLX5_TSO, key) == 0) { > + priv->tso = !!tmp; > } else { > WARN("%s: unknown parameter", key); > return -EINVAL; > @@ -316,6 +321,7 @@ > MLX5_TXQ_INLINE, > MLX5_TXQS_MIN_INLINE, > MLX5_TXQ_MPW_EN, > + MLX5_TSO, > NULL, > }; > struct rte_kvargs *kvlist; > @@ -479,6 +485,7 @@ > IBV_EXP_DEVICE_ATTR_RX_HASH | > IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | > IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | > + IBV_EXP_DEVICE_ATTR_TSO_CAPS | > 0; > > DEBUG("using port %u (%08" PRIx32 ")", port, test); > @@ -580,11 +587,22 @@ > > priv_get_num_vfs(priv, &num_vfs); > priv->sriov = (num_vfs || sriov); > + priv->tso = ((priv->tso) && > + (exp_device_attr.tso_caps.max_tso > 0) && > + (exp_device_attr.tso_caps.supported_qpts & > + (1 << IBV_QPT_RAW_ETH))); > + if (priv->tso) > + priv->max_tso_payload_sz = > + exp_device_attr.tso_caps.max_tso; > if (priv->mps && !mps) { > ERROR("multi-packet send not supported on this device" > " (" MLX5_TXQ_MPW_EN ")"); > err = ENOTSUP; > goto port_error; > + } else if (priv->mps && priv->tso) { > + WARN("multi-paclet send not supported in conjunction " > + "with TSO. PMD overrides to 0"); > + priv->mps = 0; > } It is not so clear which of both features are disabled unless reading the code. Is not it better to replace "PMD overrides to 0" by "MPS disabled"? > /* Allocate and register default RSS hash keys. */ > priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n, > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h > index 2b4345a..d2bb835 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -126,6 +126,8 @@ struct priv { > unsigned int mps:1; /* Whether multi-packet send is supported. */ > unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ > unsigned int pending_alarm:1; /* An alarm is pending. */ > + unsigned int tso:1; /* Whether TSO is supported. */ > + unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */ > unsigned int txq_inline; /* Maximum packet size for inlining. */ > unsigned int txqs_inline; /* Queue number threshold for inlining. */ > /* RX/TX queues. */ > diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h > index e91d245..eecb908 100644 > --- a/drivers/net/mlx5/mlx5_defs.h > +++ b/drivers/net/mlx5/mlx5_defs.h > @@ -79,4 +79,7 @@ > /* Maximum number of extended statistics counters. */ > #define MLX5_MAX_XSTATS 32 > > +/* Maximum Packet headers size (L2+L3+L4) for TSO. */ > +#define MLX5_MAX_TSO_HEADER 128 > + > #endif /* RTE_PMD_MLX5_DEFS_H_ */ > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index 5677f03..5542193 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -693,6 +693,8 @@ struct priv * > (DEV_TX_OFFLOAD_IPV4_CKSUM | > DEV_TX_OFFLOAD_UDP_CKSUM | > DEV_TX_OFFLOAD_TCP_CKSUM); > + if (priv->tso) > + info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; > if (priv_get_ifname(priv, &ifname) == 0) > info->if_index = if_nametoindex(ifname); > /* FIXME: RETA update/query API expects the callee to know the size of > diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c > index b2b7223..3589aae 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.c > +++ b/drivers/net/mlx5/mlx5_rxtx.c > @@ -365,6 +365,7 @@ > const unsigned int elts_n = 1 << txq->elts_n; > unsigned int i = 0; > unsigned int j = 0; > + unsigned int k = 0; > unsigned int max; > uint16_t max_wqe; > unsigned int comp; > @@ -392,8 +393,10 @@ > uintptr_t addr; > uint64_t naddr; > uint16_t pkt_inline_sz = MLX5_WQE_DWORD_SIZE + 2; > + uint16_t tso_header_sz = 0; > uint16_t ehdr; > uint8_t cs_flags = 0; > + uint64_t tso = 0; > #ifdef MLX5_PMD_SOFT_COUNTERS > uint32_t total_length = 0; > #endif > @@ -465,14 +468,71 @@ > length -= pkt_inline_sz; > addr += pkt_inline_sz; > } > + if (txq->tso_en) { > + tso = buf->ol_flags & PKT_TX_TCP_SEG; > + if (tso) { > + uintptr_t end = (uintptr_t) > + (((uintptr_t)txq->wqes) + > + (1 << txq->wqe_n) * > + MLX5_WQE_SIZE); > + unsigned int copy_b; > + > + tso_header_sz = buf->l2_len + buf->l3_len + > + buf->l4_len; > + if (unlikely(tso_header_sz > > + MLX5_MAX_TSO_HEADER)) > + break; > + copy_b = tso_header_sz - pkt_inline_sz; > + /* First seg must contain all headers. */ > + assert(copy_b <= length); > + raw += MLX5_WQE_DWORD_SIZE; > + if (copy_b && > + ((end - (uintptr_t)raw) > copy_b)) { > + uint16_t n = (MLX5_WQE_DS(copy_b) - > + 1 + 3) / 4; > + > + if (unlikely(max_wqe < n)) > + break; > + max_wqe -= n; > + rte_memcpy((void *)raw, > + (void *)addr, copy_b); > + addr += copy_b; > + length -= copy_b; > + pkt_inline_sz += copy_b; > + /* > + * Another DWORD will be added > + * in the inline part. > + */ > + raw += MLX5_WQE_DS(copy_b) * > + MLX5_WQE_DWORD_SIZE - > + MLX5_WQE_DWORD_SIZE; > + } else { > + /* NOP WQE. */ > + wqe->ctrl = (rte_v128u32_t){ > + htonl(txq->wqe_ci << 8), > + htonl(txq->qp_num_8s | 1), > + 0, > + 0, > + }; > + ds = 1; > + total_length = 0; > + pkts--; > + pkts_n++; > + elts_head = (elts_head - 1) & > + (elts_n - 1); > + k++; > + goto next_wqe; > + } > + } > + } > /* Inline if enough room. */ > - if (txq->max_inline) { > + if (txq->inline_en || tso) { > uintptr_t end = (uintptr_t) > (((uintptr_t)txq->wqes) + > (1 << txq->wqe_n) * MLX5_WQE_SIZE); > unsigned int max_inline = txq->max_inline * > RTE_CACHE_LINE_SIZE - > - MLX5_WQE_DWORD_SIZE; > + (pkt_inline_sz - 2); > uintptr_t addr_end = (addr + max_inline) & > ~(RTE_CACHE_LINE_SIZE - 1); > unsigned int copy_b = (addr_end > addr) ? > @@ -491,6 +551,18 @@ > if (unlikely(max_wqe < n)) > break; > max_wqe -= n; > + if (tso) { > + uint32_t inl = > + htonl(copy_b | MLX5_INLINE_SEG); > + > + pkt_inline_sz = > + MLX5_WQE_DS(tso_header_sz) * > + MLX5_WQE_DWORD_SIZE; > + rte_memcpy((void *)raw, > + (void *)&inl, sizeof(inl)); > + raw += sizeof(inl); > + pkt_inline_sz += sizeof(inl); > + } > rte_memcpy((void *)raw, (void *)addr, copy_b); > addr += copy_b; > length -= copy_b; > @@ -591,18 +663,34 @@ > next_pkt: > ++i; > /* Initialize known and common part of the WQE structure. */ > - wqe->ctrl = (rte_v128u32_t){ > - htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), > - htonl(txq->qp_num_8s | ds), > - 0, > - 0, > - }; > - wqe->eseg = (rte_v128u32_t){ > - 0, > - cs_flags, > - 0, > - (ehdr << 16) | htons(pkt_inline_sz), > - }; > + if (tso) { > + wqe->ctrl = (rte_v128u32_t){ > + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_TSO), > + htonl(txq->qp_num_8s | ds), > + 0, > + 0, > + }; > + wqe->eseg = (rte_v128u32_t){ > + 0, > + cs_flags | (htons(buf->tso_segsz) << 16), > + 0, > + (ehdr << 16) | htons(tso_header_sz), > + }; > + } else { > + wqe->ctrl = (rte_v128u32_t){ > + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), > + htonl(txq->qp_num_8s | ds), > + 0, > + 0, > + }; > + wqe->eseg = (rte_v128u32_t){ > + 0, > + cs_flags, > + 0, > + (ehdr << 16) | htons(pkt_inline_sz), > + }; > + } > +next_wqe: > txq->wqe_ci += (ds + 3) / 4; > #ifdef MLX5_PMD_SOFT_COUNTERS > /* Increment sent bytes counter. */ > @@ -610,10 +698,10 @@ > #endif > } while (pkts_n); > /* Take a shortcut if nothing must be sent. */ > - if (unlikely(i == 0)) > + if (unlikely((i + k) == 0)) > return 0; > /* Check whether completion threshold has been reached. */ > - comp = txq->elts_comp + i + j; > + comp = txq->elts_comp + i + j + k; > if (comp >= MLX5_TX_COMP_THRESH) { > volatile struct mlx5_wqe_ctrl *w = > (volatile struct mlx5_wqe_ctrl *)wqe; > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h > index 41a34d7..6b328cf 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.h > +++ b/drivers/net/mlx5/mlx5_rxtx.h > @@ -254,6 +254,8 @@ struct txq { > uint16_t cqe_n:4; /* Number of CQ elements (in log2). */ > uint16_t wqe_n:4; /* Number of of WQ elements (in log2). */ > uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ > + uint16_t inline_en:1; /* When set inline is enabled. */ > + uint16_t tso_en:1; /* When set hardware TSO is enabled. */ > uint32_t qp_num_8s; /* QP number shifted by 8. */ > volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ > volatile void *wqes; /* Work queue (use volatile to write into). */ > diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c > index 949035b..995b763 100644 > --- a/drivers/net/mlx5/mlx5_txq.c > +++ b/drivers/net/mlx5/mlx5_txq.c > @@ -342,6 +342,19 @@ > RTE_CACHE_LINE_SIZE); > attr.init.cap.max_inline_data = > tmpl.txq.max_inline * RTE_CACHE_LINE_SIZE; > + tmpl.txq.inline_en = 1; > + } > + if (priv->tso) { > + uint16_t max_tso_inline = ((MLX5_MAX_TSO_HEADER + > + (RTE_CACHE_LINE_SIZE - 1)) / > + RTE_CACHE_LINE_SIZE); > + > + attr.init.max_tso_header = > + max_tso_inline * RTE_CACHE_LINE_SIZE; > + attr.init.comp_mask |= IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER; > + tmpl.txq.max_inline = RTE_MAX(tmpl.txq.max_inline, > + max_tso_inline); > + tmpl.txq.tso_en = 1; > } > tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); > if (tmpl.qp == NULL) { > -- > 1.8.3.1 Thanks, -- Nélio Laranjeiro 6WIND ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v3 1/1] net/mlx5: add hardware TSO support 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: add " Shahaf Shuler 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 1/1] net/mlx5: add hardware " Shahaf Shuler @ 2017-03-02 9:01 ` Shahaf Shuler 2017-03-02 9:15 ` Nélio Laranjeiro 2017-03-06 8:50 ` Ferruh Yigit 1 sibling, 2 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-03-02 9:01 UTC (permalink / raw) To: nelio.laranjeiro, adrien.mazarguil; +Cc: dev Implement support for hardware TSO. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> --- on v3: * fix alignment issues * for warn log on v2: * Instead of exposing capability, TSO checks on data path. * PMD specific parameter to enable TSO. * different implementaion for the data path. Performance impact ~0.1-0.2Mpps --- doc/guides/nics/features/mlx5.ini | 1 + doc/guides/nics/mlx5.rst | 12 ++++ drivers/net/mlx5/mlx5.c | 18 ++++++ drivers/net/mlx5/mlx5.h | 2 + drivers/net/mlx5/mlx5_defs.h | 3 + drivers/net/mlx5/mlx5_ethdev.c | 2 + drivers/net/mlx5/mlx5_rxtx.c | 120 +++++++++++++++++++++++++++++++++----- drivers/net/mlx5/mlx5_rxtx.h | 2 + drivers/net/mlx5/mlx5_txq.c | 13 +++++ 9 files changed, 157 insertions(+), 16 deletions(-) diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index f20d214..6e42150 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -19,6 +19,7 @@ RSS hash = Y RSS key update = Y RSS reta update = Y SR-IOV = Y +TSO = Y VLAN filter = Y Flow director = Y Flow API = Y diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 09922a0..8651456 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -90,6 +90,7 @@ Features - Secondary process TX is supported. - KVM and VMware ESX SR-IOV modes are supported. - RSS hash result is supported. +- Hardware TSO. Limitations ----------- @@ -186,9 +187,20 @@ Run-time configuration save PCI bandwidth and improve performance at the cost of a slightly higher CPU usage. + This option cannot be used in conjunction with ``tso`` below. When ``tso`` + is set, ``txq_mpw_en`` is disabled. + It is currently only supported on the ConnectX-4 Lx and ConnectX-5 families of adapters. Enabled by default. +- ``tso`` parameter [int] + + A nonzero value enables hardware TSO. + When hardware TSO is enabled, packets marked with TCP segmentation + offload will be divided into segments by the hardware. + + Disabled by default. + Prerequisites ------------- diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d4bd469..03ed3b3 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -84,6 +84,9 @@ /* Device parameter to enable multi-packet send WQEs. */ #define MLX5_TXQ_MPW_EN "txq_mpw_en" +/* Device parameter to enable hardware TSO offload. */ +#define MLX5_TSO "tso" + /** * Retrieve integer value from environment variable. * @@ -290,6 +293,8 @@ priv->txqs_inline = tmp; } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { priv->mps &= !!tmp; /* Enable MPW only if HW supports */ + } else if (strcmp(MLX5_TSO, key) == 0) { + priv->tso = !!tmp; } else { WARN("%s: unknown parameter", key); return -EINVAL; @@ -316,6 +321,7 @@ MLX5_TXQ_INLINE, MLX5_TXQS_MIN_INLINE, MLX5_TXQ_MPW_EN, + MLX5_TSO, NULL, }; struct rte_kvargs *kvlist; @@ -479,6 +485,7 @@ IBV_EXP_DEVICE_ATTR_RX_HASH | IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | + IBV_EXP_DEVICE_ATTR_TSO_CAPS | 0; DEBUG("using port %u (%08" PRIx32 ")", port, test); @@ -580,11 +587,22 @@ priv_get_num_vfs(priv, &num_vfs); priv->sriov = (num_vfs || sriov); + priv->tso = ((priv->tso) && + (exp_device_attr.tso_caps.max_tso > 0) && + (exp_device_attr.tso_caps.supported_qpts & + (1 << IBV_QPT_RAW_ETH))); + if (priv->tso) + priv->max_tso_payload_sz = + exp_device_attr.tso_caps.max_tso; if (priv->mps && !mps) { ERROR("multi-packet send not supported on this device" " (" MLX5_TXQ_MPW_EN ")"); err = ENOTSUP; goto port_error; + } else if (priv->mps && priv->tso) { + WARN("multi-packet send not supported in conjunction " + "with TSO. MPS disabled"); + priv->mps = 0; } /* Allocate and register default RSS hash keys. */ priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n, diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 2b4345a..d2bb835 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -126,6 +126,8 @@ struct priv { unsigned int mps:1; /* Whether multi-packet send is supported. */ unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ unsigned int pending_alarm:1; /* An alarm is pending. */ + unsigned int tso:1; /* Whether TSO is supported. */ + unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */ unsigned int txq_inline; /* Maximum packet size for inlining. */ unsigned int txqs_inline; /* Queue number threshold for inlining. */ /* RX/TX queues. */ diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index e91d245..eecb908 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -79,4 +79,7 @@ /* Maximum number of extended statistics counters. */ #define MLX5_MAX_XSTATS 32 +/* Maximum Packet headers size (L2+L3+L4) for TSO. */ +#define MLX5_MAX_TSO_HEADER 128 + #endif /* RTE_PMD_MLX5_DEFS_H_ */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 5677f03..5542193 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -693,6 +693,8 @@ struct priv * (DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM); + if (priv->tso) + info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; if (priv_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); /* FIXME: RETA update/query API expects the callee to know the size of diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index b2b7223..3589aae 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -365,6 +365,7 @@ const unsigned int elts_n = 1 << txq->elts_n; unsigned int i = 0; unsigned int j = 0; + unsigned int k = 0; unsigned int max; uint16_t max_wqe; unsigned int comp; @@ -392,8 +393,10 @@ uintptr_t addr; uint64_t naddr; uint16_t pkt_inline_sz = MLX5_WQE_DWORD_SIZE + 2; + uint16_t tso_header_sz = 0; uint16_t ehdr; uint8_t cs_flags = 0; + uint64_t tso = 0; #ifdef MLX5_PMD_SOFT_COUNTERS uint32_t total_length = 0; #endif @@ -465,14 +468,71 @@ length -= pkt_inline_sz; addr += pkt_inline_sz; } + if (txq->tso_en) { + tso = buf->ol_flags & PKT_TX_TCP_SEG; + if (tso) { + uintptr_t end = (uintptr_t) + (((uintptr_t)txq->wqes) + + (1 << txq->wqe_n) * + MLX5_WQE_SIZE); + unsigned int copy_b; + + tso_header_sz = buf->l2_len + buf->l3_len + + buf->l4_len; + if (unlikely(tso_header_sz > + MLX5_MAX_TSO_HEADER)) + break; + copy_b = tso_header_sz - pkt_inline_sz; + /* First seg must contain all headers. */ + assert(copy_b <= length); + raw += MLX5_WQE_DWORD_SIZE; + if (copy_b && + ((end - (uintptr_t)raw) > copy_b)) { + uint16_t n = (MLX5_WQE_DS(copy_b) - + 1 + 3) / 4; + + if (unlikely(max_wqe < n)) + break; + max_wqe -= n; + rte_memcpy((void *)raw, + (void *)addr, copy_b); + addr += copy_b; + length -= copy_b; + pkt_inline_sz += copy_b; + /* + * Another DWORD will be added + * in the inline part. + */ + raw += MLX5_WQE_DS(copy_b) * + MLX5_WQE_DWORD_SIZE - + MLX5_WQE_DWORD_SIZE; + } else { + /* NOP WQE. */ + wqe->ctrl = (rte_v128u32_t){ + htonl(txq->wqe_ci << 8), + htonl(txq->qp_num_8s | 1), + 0, + 0, + }; + ds = 1; + total_length = 0; + pkts--; + pkts_n++; + elts_head = (elts_head - 1) & + (elts_n - 1); + k++; + goto next_wqe; + } + } + } /* Inline if enough room. */ - if (txq->max_inline) { + if (txq->inline_en || tso) { uintptr_t end = (uintptr_t) (((uintptr_t)txq->wqes) + (1 << txq->wqe_n) * MLX5_WQE_SIZE); unsigned int max_inline = txq->max_inline * RTE_CACHE_LINE_SIZE - - MLX5_WQE_DWORD_SIZE; + (pkt_inline_sz - 2); uintptr_t addr_end = (addr + max_inline) & ~(RTE_CACHE_LINE_SIZE - 1); unsigned int copy_b = (addr_end > addr) ? @@ -491,6 +551,18 @@ if (unlikely(max_wqe < n)) break; max_wqe -= n; + if (tso) { + uint32_t inl = + htonl(copy_b | MLX5_INLINE_SEG); + + pkt_inline_sz = + MLX5_WQE_DS(tso_header_sz) * + MLX5_WQE_DWORD_SIZE; + rte_memcpy((void *)raw, + (void *)&inl, sizeof(inl)); + raw += sizeof(inl); + pkt_inline_sz += sizeof(inl); + } rte_memcpy((void *)raw, (void *)addr, copy_b); addr += copy_b; length -= copy_b; @@ -591,18 +663,34 @@ next_pkt: ++i; /* Initialize known and common part of the WQE structure. */ - wqe->ctrl = (rte_v128u32_t){ - htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), - htonl(txq->qp_num_8s | ds), - 0, - 0, - }; - wqe->eseg = (rte_v128u32_t){ - 0, - cs_flags, - 0, - (ehdr << 16) | htons(pkt_inline_sz), - }; + if (tso) { + wqe->ctrl = (rte_v128u32_t){ + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_TSO), + htonl(txq->qp_num_8s | ds), + 0, + 0, + }; + wqe->eseg = (rte_v128u32_t){ + 0, + cs_flags | (htons(buf->tso_segsz) << 16), + 0, + (ehdr << 16) | htons(tso_header_sz), + }; + } else { + wqe->ctrl = (rte_v128u32_t){ + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), + htonl(txq->qp_num_8s | ds), + 0, + 0, + }; + wqe->eseg = (rte_v128u32_t){ + 0, + cs_flags, + 0, + (ehdr << 16) | htons(pkt_inline_sz), + }; + } +next_wqe: txq->wqe_ci += (ds + 3) / 4; #ifdef MLX5_PMD_SOFT_COUNTERS /* Increment sent bytes counter. */ @@ -610,10 +698,10 @@ #endif } while (pkts_n); /* Take a shortcut if nothing must be sent. */ - if (unlikely(i == 0)) + if (unlikely((i + k) == 0)) return 0; /* Check whether completion threshold has been reached. */ - comp = txq->elts_comp + i + j; + comp = txq->elts_comp + i + j + k; if (comp >= MLX5_TX_COMP_THRESH) { volatile struct mlx5_wqe_ctrl *w = (volatile struct mlx5_wqe_ctrl *)wqe; diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 41a34d7..6b328cf 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -254,6 +254,8 @@ struct txq { uint16_t cqe_n:4; /* Number of CQ elements (in log2). */ uint16_t wqe_n:4; /* Number of of WQ elements (in log2). */ uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ + uint16_t inline_en:1; /* When set inline is enabled. */ + uint16_t tso_en:1; /* When set hardware TSO is enabled. */ uint32_t qp_num_8s; /* QP number shifted by 8. */ volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ volatile void *wqes; /* Work queue (use volatile to write into). */ diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index 949035b..995b763 100644 --- a/drivers/net/mlx5/mlx5_txq.c +++ b/drivers/net/mlx5/mlx5_txq.c @@ -342,6 +342,19 @@ RTE_CACHE_LINE_SIZE); attr.init.cap.max_inline_data = tmpl.txq.max_inline * RTE_CACHE_LINE_SIZE; + tmpl.txq.inline_en = 1; + } + if (priv->tso) { + uint16_t max_tso_inline = ((MLX5_MAX_TSO_HEADER + + (RTE_CACHE_LINE_SIZE - 1)) / + RTE_CACHE_LINE_SIZE); + + attr.init.max_tso_header = + max_tso_inline * RTE_CACHE_LINE_SIZE; + attr.init.comp_mask |= IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER; + tmpl.txq.max_inline = RTE_MAX(tmpl.txq.max_inline, + max_tso_inline); + tmpl.txq.tso_en = 1; } tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); if (tmpl.qp == NULL) { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/1] net/mlx5: add hardware TSO support 2017-03-02 9:01 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler @ 2017-03-02 9:15 ` Nélio Laranjeiro 2017-03-06 9:32 ` Ferruh Yigit 2017-03-06 8:50 ` Ferruh Yigit 1 sibling, 1 reply; 14+ messages in thread From: Nélio Laranjeiro @ 2017-03-02 9:15 UTC (permalink / raw) To: Shahaf Shuler; +Cc: adrien.mazarguil, dev On Thu, Mar 02, 2017 at 11:01:31AM +0200, Shahaf Shuler wrote: > Implement support for hardware TSO. > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> > --- > on v3: > * fix alignment issues > * for warn log > on v2: > * Instead of exposing capability, TSO checks on data path. > * PMD specific parameter to enable TSO. > * different implementaion for the data path. > Performance impact ~0.1-0.2Mpps > --- > doc/guides/nics/features/mlx5.ini | 1 + > doc/guides/nics/mlx5.rst | 12 ++++ > drivers/net/mlx5/mlx5.c | 18 ++++++ > drivers/net/mlx5/mlx5.h | 2 + > drivers/net/mlx5/mlx5_defs.h | 3 + > drivers/net/mlx5/mlx5_ethdev.c | 2 + > drivers/net/mlx5/mlx5_rxtx.c | 120 +++++++++++++++++++++++++++++++++----- > drivers/net/mlx5/mlx5_rxtx.h | 2 + > drivers/net/mlx5/mlx5_txq.c | 13 +++++ > 9 files changed, 157 insertions(+), 16 deletions(-) > > diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini > index f20d214..6e42150 100644 > --- a/doc/guides/nics/features/mlx5.ini > +++ b/doc/guides/nics/features/mlx5.ini > @@ -19,6 +19,7 @@ RSS hash = Y > RSS key update = Y > RSS reta update = Y > SR-IOV = Y > +TSO = Y > VLAN filter = Y > Flow director = Y > Flow API = Y > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst > index 09922a0..8651456 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -90,6 +90,7 @@ Features > - Secondary process TX is supported. > - KVM and VMware ESX SR-IOV modes are supported. > - RSS hash result is supported. > +- Hardware TSO. > > Limitations > ----------- > @@ -186,9 +187,20 @@ Run-time configuration > save PCI bandwidth and improve performance at the cost of a slightly > higher CPU usage. > > + This option cannot be used in conjunction with ``tso`` below. When ``tso`` > + is set, ``txq_mpw_en`` is disabled. > + > It is currently only supported on the ConnectX-4 Lx and ConnectX-5 > families of adapters. Enabled by default. > > +- ``tso`` parameter [int] > + > + A nonzero value enables hardware TSO. > + When hardware TSO is enabled, packets marked with TCP segmentation > + offload will be divided into segments by the hardware. > + > + Disabled by default. > + > Prerequisites > ------------- > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index d4bd469..03ed3b3 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -84,6 +84,9 @@ > /* Device parameter to enable multi-packet send WQEs. */ > #define MLX5_TXQ_MPW_EN "txq_mpw_en" > > +/* Device parameter to enable hardware TSO offload. */ > +#define MLX5_TSO "tso" > + > /** > * Retrieve integer value from environment variable. > * > @@ -290,6 +293,8 @@ > priv->txqs_inline = tmp; > } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { > priv->mps &= !!tmp; /* Enable MPW only if HW supports */ > + } else if (strcmp(MLX5_TSO, key) == 0) { > + priv->tso = !!tmp; > } else { > WARN("%s: unknown parameter", key); > return -EINVAL; > @@ -316,6 +321,7 @@ > MLX5_TXQ_INLINE, > MLX5_TXQS_MIN_INLINE, > MLX5_TXQ_MPW_EN, > + MLX5_TSO, > NULL, > }; > struct rte_kvargs *kvlist; > @@ -479,6 +485,7 @@ > IBV_EXP_DEVICE_ATTR_RX_HASH | > IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | > IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | > + IBV_EXP_DEVICE_ATTR_TSO_CAPS | > 0; > > DEBUG("using port %u (%08" PRIx32 ")", port, test); > @@ -580,11 +587,22 @@ > > priv_get_num_vfs(priv, &num_vfs); > priv->sriov = (num_vfs || sriov); > + priv->tso = ((priv->tso) && > + (exp_device_attr.tso_caps.max_tso > 0) && > + (exp_device_attr.tso_caps.supported_qpts & > + (1 << IBV_QPT_RAW_ETH))); > + if (priv->tso) > + priv->max_tso_payload_sz = > + exp_device_attr.tso_caps.max_tso; > if (priv->mps && !mps) { > ERROR("multi-packet send not supported on this device" > " (" MLX5_TXQ_MPW_EN ")"); > err = ENOTSUP; > goto port_error; > + } else if (priv->mps && priv->tso) { > + WARN("multi-packet send not supported in conjunction " > + "with TSO. MPS disabled"); > + priv->mps = 0; > } > /* Allocate and register default RSS hash keys. */ > priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n, > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h > index 2b4345a..d2bb835 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -126,6 +126,8 @@ struct priv { > unsigned int mps:1; /* Whether multi-packet send is supported. */ > unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ > unsigned int pending_alarm:1; /* An alarm is pending. */ > + unsigned int tso:1; /* Whether TSO is supported. */ > + unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */ > unsigned int txq_inline; /* Maximum packet size for inlining. */ > unsigned int txqs_inline; /* Queue number threshold for inlining. */ > /* RX/TX queues. */ > diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h > index e91d245..eecb908 100644 > --- a/drivers/net/mlx5/mlx5_defs.h > +++ b/drivers/net/mlx5/mlx5_defs.h > @@ -79,4 +79,7 @@ > /* Maximum number of extended statistics counters. */ > #define MLX5_MAX_XSTATS 32 > > +/* Maximum Packet headers size (L2+L3+L4) for TSO. */ > +#define MLX5_MAX_TSO_HEADER 128 > + > #endif /* RTE_PMD_MLX5_DEFS_H_ */ > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index 5677f03..5542193 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -693,6 +693,8 @@ struct priv * > (DEV_TX_OFFLOAD_IPV4_CKSUM | > DEV_TX_OFFLOAD_UDP_CKSUM | > DEV_TX_OFFLOAD_TCP_CKSUM); > + if (priv->tso) > + info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; > if (priv_get_ifname(priv, &ifname) == 0) > info->if_index = if_nametoindex(ifname); > /* FIXME: RETA update/query API expects the callee to know the size of > diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c > index b2b7223..3589aae 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.c > +++ b/drivers/net/mlx5/mlx5_rxtx.c > @@ -365,6 +365,7 @@ > const unsigned int elts_n = 1 << txq->elts_n; > unsigned int i = 0; > unsigned int j = 0; > + unsigned int k = 0; > unsigned int max; > uint16_t max_wqe; > unsigned int comp; > @@ -392,8 +393,10 @@ > uintptr_t addr; > uint64_t naddr; > uint16_t pkt_inline_sz = MLX5_WQE_DWORD_SIZE + 2; > + uint16_t tso_header_sz = 0; > uint16_t ehdr; > uint8_t cs_flags = 0; > + uint64_t tso = 0; > #ifdef MLX5_PMD_SOFT_COUNTERS > uint32_t total_length = 0; > #endif > @@ -465,14 +468,71 @@ > length -= pkt_inline_sz; > addr += pkt_inline_sz; > } > + if (txq->tso_en) { > + tso = buf->ol_flags & PKT_TX_TCP_SEG; > + if (tso) { > + uintptr_t end = (uintptr_t) > + (((uintptr_t)txq->wqes) + > + (1 << txq->wqe_n) * > + MLX5_WQE_SIZE); > + unsigned int copy_b; > + > + tso_header_sz = buf->l2_len + buf->l3_len + > + buf->l4_len; > + if (unlikely(tso_header_sz > > + MLX5_MAX_TSO_HEADER)) > + break; > + copy_b = tso_header_sz - pkt_inline_sz; > + /* First seg must contain all headers. */ > + assert(copy_b <= length); > + raw += MLX5_WQE_DWORD_SIZE; > + if (copy_b && > + ((end - (uintptr_t)raw) > copy_b)) { > + uint16_t n = (MLX5_WQE_DS(copy_b) - > + 1 + 3) / 4; > + > + if (unlikely(max_wqe < n)) > + break; > + max_wqe -= n; > + rte_memcpy((void *)raw, > + (void *)addr, copy_b); > + addr += copy_b; > + length -= copy_b; > + pkt_inline_sz += copy_b; > + /* > + * Another DWORD will be added > + * in the inline part. > + */ > + raw += MLX5_WQE_DS(copy_b) * > + MLX5_WQE_DWORD_SIZE - > + MLX5_WQE_DWORD_SIZE; > + } else { > + /* NOP WQE. */ > + wqe->ctrl = (rte_v128u32_t){ > + htonl(txq->wqe_ci << 8), > + htonl(txq->qp_num_8s | 1), > + 0, > + 0, > + }; > + ds = 1; > + total_length = 0; > + pkts--; > + pkts_n++; > + elts_head = (elts_head - 1) & > + (elts_n - 1); > + k++; > + goto next_wqe; > + } > + } > + } > /* Inline if enough room. */ > - if (txq->max_inline) { > + if (txq->inline_en || tso) { > uintptr_t end = (uintptr_t) > (((uintptr_t)txq->wqes) + > (1 << txq->wqe_n) * MLX5_WQE_SIZE); > unsigned int max_inline = txq->max_inline * > RTE_CACHE_LINE_SIZE - > - MLX5_WQE_DWORD_SIZE; > + (pkt_inline_sz - 2); > uintptr_t addr_end = (addr + max_inline) & > ~(RTE_CACHE_LINE_SIZE - 1); > unsigned int copy_b = (addr_end > addr) ? > @@ -491,6 +551,18 @@ > if (unlikely(max_wqe < n)) > break; > max_wqe -= n; > + if (tso) { > + uint32_t inl = > + htonl(copy_b | MLX5_INLINE_SEG); > + > + pkt_inline_sz = > + MLX5_WQE_DS(tso_header_sz) * > + MLX5_WQE_DWORD_SIZE; > + rte_memcpy((void *)raw, > + (void *)&inl, sizeof(inl)); > + raw += sizeof(inl); > + pkt_inline_sz += sizeof(inl); > + } > rte_memcpy((void *)raw, (void *)addr, copy_b); > addr += copy_b; > length -= copy_b; > @@ -591,18 +663,34 @@ > next_pkt: > ++i; > /* Initialize known and common part of the WQE structure. */ > - wqe->ctrl = (rte_v128u32_t){ > - htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), > - htonl(txq->qp_num_8s | ds), > - 0, > - 0, > - }; > - wqe->eseg = (rte_v128u32_t){ > - 0, > - cs_flags, > - 0, > - (ehdr << 16) | htons(pkt_inline_sz), > - }; > + if (tso) { > + wqe->ctrl = (rte_v128u32_t){ > + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_TSO), > + htonl(txq->qp_num_8s | ds), > + 0, > + 0, > + }; > + wqe->eseg = (rte_v128u32_t){ > + 0, > + cs_flags | (htons(buf->tso_segsz) << 16), > + 0, > + (ehdr << 16) | htons(tso_header_sz), > + }; > + } else { > + wqe->ctrl = (rte_v128u32_t){ > + htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND), > + htonl(txq->qp_num_8s | ds), > + 0, > + 0, > + }; > + wqe->eseg = (rte_v128u32_t){ > + 0, > + cs_flags, > + 0, > + (ehdr << 16) | htons(pkt_inline_sz), > + }; > + } > +next_wqe: > txq->wqe_ci += (ds + 3) / 4; > #ifdef MLX5_PMD_SOFT_COUNTERS > /* Increment sent bytes counter. */ > @@ -610,10 +698,10 @@ > #endif > } while (pkts_n); > /* Take a shortcut if nothing must be sent. */ > - if (unlikely(i == 0)) > + if (unlikely((i + k) == 0)) > return 0; > /* Check whether completion threshold has been reached. */ > - comp = txq->elts_comp + i + j; > + comp = txq->elts_comp + i + j + k; > if (comp >= MLX5_TX_COMP_THRESH) { > volatile struct mlx5_wqe_ctrl *w = > (volatile struct mlx5_wqe_ctrl *)wqe; > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h > index 41a34d7..6b328cf 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.h > +++ b/drivers/net/mlx5/mlx5_rxtx.h > @@ -254,6 +254,8 @@ struct txq { > uint16_t cqe_n:4; /* Number of CQ elements (in log2). */ > uint16_t wqe_n:4; /* Number of of WQ elements (in log2). */ > uint16_t max_inline; /* Multiple of RTE_CACHE_LINE_SIZE to inline. */ > + uint16_t inline_en:1; /* When set inline is enabled. */ > + uint16_t tso_en:1; /* When set hardware TSO is enabled. */ > uint32_t qp_num_8s; /* QP number shifted by 8. */ > volatile struct mlx5_cqe (*cqes)[]; /* Completion queue. */ > volatile void *wqes; /* Work queue (use volatile to write into). */ > diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c > index 949035b..995b763 100644 > --- a/drivers/net/mlx5/mlx5_txq.c > +++ b/drivers/net/mlx5/mlx5_txq.c > @@ -342,6 +342,19 @@ > RTE_CACHE_LINE_SIZE); > attr.init.cap.max_inline_data = > tmpl.txq.max_inline * RTE_CACHE_LINE_SIZE; > + tmpl.txq.inline_en = 1; > + } > + if (priv->tso) { > + uint16_t max_tso_inline = ((MLX5_MAX_TSO_HEADER + > + (RTE_CACHE_LINE_SIZE - 1)) / > + RTE_CACHE_LINE_SIZE); > + > + attr.init.max_tso_header = > + max_tso_inline * RTE_CACHE_LINE_SIZE; > + attr.init.comp_mask |= IBV_EXP_QP_INIT_ATTR_MAX_TSO_HEADER; > + tmpl.txq.max_inline = RTE_MAX(tmpl.txq.max_inline, > + max_tso_inline); > + tmpl.txq.tso_en = 1; > } > tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init); > if (tmpl.qp == NULL) { > -- > 1.8.3.1 > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> -- Nélio Laranjeiro 6WIND ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/1] net/mlx5: add hardware TSO support 2017-03-02 9:15 ` Nélio Laranjeiro @ 2017-03-06 9:32 ` Ferruh Yigit 0 siblings, 0 replies; 14+ messages in thread From: Ferruh Yigit @ 2017-03-06 9:32 UTC (permalink / raw) To: Nélio Laranjeiro, Shahaf Shuler; +Cc: adrien.mazarguil, dev On 3/2/2017 9:15 AM, Nélio Laranjeiro wrote: > On Thu, Mar 02, 2017 at 11:01:31AM +0200, Shahaf Shuler wrote: >> Implement support for hardware TSO. >> >> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Applied to dpdk-next-net/master, thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/1] net/mlx5: add hardware TSO support 2017-03-02 9:01 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler 2017-03-02 9:15 ` Nélio Laranjeiro @ 2017-03-06 8:50 ` Ferruh Yigit 2017-03-06 9:31 ` Ferruh Yigit 1 sibling, 1 reply; 14+ messages in thread From: Ferruh Yigit @ 2017-03-06 8:50 UTC (permalink / raw) To: Shahaf Shuler, nelio.laranjeiro, adrien.mazarguil; +Cc: dev On 3/2/2017 9:01 AM, Shahaf Shuler wrote: > Implement support for hardware TSO. > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> > --- > on v3: > * fix alignment issues > * for warn log > on v2: > * Instead of exposing capability, TSO checks on data path. > * PMD specific parameter to enable TSO. > * different implementaion for the data path. > Performance impact ~0.1-0.2Mpps Hi Shahaf, I think it is good idea to update release notes to announce mlx5 TSO support, what do you think? And if you will send a new version of the patch, can you please put "TSO" flag in the same order with default.ini Thanks, ferruh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/1] net/mlx5: add hardware TSO support 2017-03-06 8:50 ` Ferruh Yigit @ 2017-03-06 9:31 ` Ferruh Yigit 2017-03-06 11:03 ` Shahaf Shuler 0 siblings, 1 reply; 14+ messages in thread From: Ferruh Yigit @ 2017-03-06 9:31 UTC (permalink / raw) To: Shahaf Shuler, nelio.laranjeiro, adrien.mazarguil; +Cc: dev On 3/6/2017 8:50 AM, Ferruh Yigit wrote: > On 3/2/2017 9:01 AM, Shahaf Shuler wrote: >> Implement support for hardware TSO. >> >> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> >> --- >> on v3: >> * fix alignment issues >> * for warn log >> on v2: >> * Instead of exposing capability, TSO checks on data path. >> * PMD specific parameter to enable TSO. >> * different implementaion for the data path. >> Performance impact ~0.1-0.2Mpps > > Hi Shahaf, > > I think it is good idea to update release notes to announce mlx5 TSO > support, what do you think? Since [1] depends on this patch, I will get both, but can you please send a separate patch for release notes? [1] http://dpdk.org/dev/patchwork/patch/21065/ > > And if you will send a new version of the patch, can you please put > "TSO" flag in the same order with default.ini I will update this. > > Thanks, > ferruh > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/1] net/mlx5: add hardware TSO support 2017-03-06 9:31 ` Ferruh Yigit @ 2017-03-06 11:03 ` Shahaf Shuler 0 siblings, 0 replies; 14+ messages in thread From: Shahaf Shuler @ 2017-03-06 11:03 UTC (permalink / raw) To: Ferruh Yigit, Nélio Laranjeiro, Adrien Mazarguil; +Cc: dev Monday, March 6, 2017 11:31 AM, Ferruh Yigit: > On 3/6/2017 8:50 AM, Ferruh Yigit wrote: > > On 3/2/2017 9:01 AM, Shahaf Shuler wrote: > >> Implement support for hardware TSO. > >> > >> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> > >> --- > >> on v3: > >> * fix alignment issues > >> * for warn log > >> on v2: > >> * Instead of exposing capability, TSO checks on data path. > >> * PMD specific parameter to enable TSO. > >> * different implementaion for the data path. > >> Performance impact ~0.1-0.2Mpps > > > > Hi Shahaf, > > > > I think it is good idea to update release notes to announce mlx5 TSO > > support, what do you think? > > Since [1] depends on this patch, I will get both, but can you please send a > separate patch for release notes? Yes. I will work on one. > > [1] > http://dpdk.org/dev/patchwork/patch/21065/ > > > > > And if you will send a new version of the patch, can you please put > > "TSO" flag in the same order with default.ini > > I will update this. > > > > > Thanks, > > ferruh > > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-03-06 11:03 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-22 16:09 [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 1/4] ethdev: add Tx offload limitations Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 2/4] ethdev: add TSO disable flag Shahaf Shuler 2017-02-22 16:09 ` [dpdk-dev] [PATCH 3/4] app/testpmd: add TSO disable to test options Shahaf Shuler 2017-02-22 16:10 ` [dpdk-dev] [PATCH 4/4] net/mlx5: add hardware TSO support Shahaf Shuler 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: add " Shahaf Shuler 2017-03-01 11:11 ` [dpdk-dev] [PATCH v2 1/1] net/mlx5: add hardware " Shahaf Shuler 2017-03-01 14:33 ` Nélio Laranjeiro 2017-03-02 9:01 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler 2017-03-02 9:15 ` Nélio Laranjeiro 2017-03-06 9:32 ` Ferruh Yigit 2017-03-06 8:50 ` Ferruh Yigit 2017-03-06 9:31 ` Ferruh Yigit 2017-03-06 11:03 ` Shahaf Shuler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).