* [dpdk-dev] [PATCH 0/2] LACP control packet filtering offload @ 2017-05-27 11:27 Tomasz Kulasek 2017-05-27 11:27 ` [dpdk-dev] [PATCH 1/2] " Tomasz Kulasek ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Tomasz Kulasek @ 2017-05-27 11:27 UTC (permalink / raw) To: dev; +Cc: declan.doherty 1. Overview Packet processing in the current path for bonding in mode 4, requires parse all packets in the fast path, to classify and process LACP packets. The idea of performance improvement is to use hardware offloads to improve packet classification. 2. Scope of work a) Optimization of software LACP packet classification by using packet_type metadata to eliminate the requirement of parsing each packet in the received burst. b) Implementation of classification mechanism using flow director to redirect LACP packets to the dedicated queue (not visible by application). - Filter pattern choosing (not all filters are supported by all devices), - Changing processing path to speed up non-LACP packets processing, - Handle LACP packets from dedicated Rx queue and send to the dedicated Tx queue, c) Creation of fallback mechanism allowing to select the most preferable method of processing: - Flow director, - Packet type metadata, - Software parsing, 3. Implementation 3.1. Packet type The packet_type approach would result in a performance improvement as packets data would no longer be required to be read, but with this approach the bonded driver would still need to look at the mbuf of each packet thereby having an impact on the achievable Rx performance. There's not packet_type value describing LACP packets directly. However, it can be used to limit number of packets required to be parsed, e.g. if packet_type indicates >L2 packets. It should improve performance while well-known non-LACP packets can be skipped without the need to look up into its data. 3.2. Flow director Using rte_flow API and pattern on ethernet type of packet (0x8809), we can configure flow director to redirect slow packets to separated queue. An independent Rx queues for LACP would remove the requirement to filter all ingress traffic in sw which should result in a performance increase. Other queues stay untouched and processing of packets on the fast path will be reduced to simple packet collecting from slaves. Separated Tx queue for LACP daemon allows to send LACP responses immediately, without interfering into Tx fast path. RECEIVE .---------------. | Slave 0 | | .------. | | Fd | Rxq | | Rx ======o==>| |==============. | | +======+ | | .---------------. | `-->| LACP |--------. | | Bonding | | `------' | | | | .------. | `---------------' | | | | | | | >============>| |=======> Rx .---------------. | | | +======+ | | Slave 1 | | | | | XXXX | | | .------. | | | | `------' | | Fd | Rxq | | | | `---------------' Rx ======o==>| |==============' .-----------. | | +======+ | | / \ | `-->| LACP |--------+----------->+ LACP DAEMON | | `------' | Tx <---\ / `---------------' `-----------' All slow packets received by slaves in bonding are redirected to the separated queue using flow director. Other packets are collected from slaves and exposed to the application with Rx burst on bonded device. TRANSMIT .---------------. | Slave 0 | | .------. | | | | | Tx <=====+===| |<=============. | | |------| | | .---------------. | `---| LACP |<-------. | | Bonding | | `------' | | | | .------. | `---------------' | | | | | | | +<============| |<====== Tx .---------------. | | | +======+ | | Slave 1 | | | | | XXXX | | | .------. | | | | `------' | | | | | | | `---------------' Tx <=====+===| |<=============' Rx .-----------. | | |------| | | `-->/ \ | `---| LACP |<-------+------------+ LACP DAEMON | | `------' | \ / `---------------' `-----------' On transmit, packets are propagated on the slaves. While we have separated Tx queue for LACP responses, it can be sent regardless of the fast path. LACP DAEMON In this mode whole slow packets are handled in LACP DAEMON. Tomasz Kulasek (2): LACP control packet filtering offload test-pmd: add set bonding slow_queue hw/sw app/test-pmd/cmdline.c | 58 ++++ drivers/net/bonding/rte_eth_bond_8023ad.c | 141 +++++++-- drivers/net/bonding/rte_eth_bond_8023ad.h | 6 + drivers/net/bonding/rte_eth_bond_8023ad_private.h | 15 + drivers/net/bonding/rte_eth_bond_pmd.c | 345 +++++++++++++++++++++- drivers/net/bonding/rte_eth_bond_version.map | 9 + 6 files changed, 539 insertions(+), 35 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH 1/2] LACP control packet filtering offload 2017-05-27 11:27 [dpdk-dev] [PATCH 0/2] LACP control packet filtering offload Tomasz Kulasek @ 2017-05-27 11:27 ` Tomasz Kulasek 2017-05-29 8:10 ` Adrien Mazarguil 2017-06-29 9:18 ` Declan Doherty 2017-05-27 11:27 ` [dpdk-dev] [PATCH 2/2] test-pmd: add set bonding slow_queue hw/sw Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload Tomasz Kulasek 2 siblings, 2 replies; 22+ messages in thread From: Tomasz Kulasek @ 2017-05-27 11:27 UTC (permalink / raw) To: dev; +Cc: declan.doherty New API funtions implemented: rte_eth_bond_8023ad_slow_queue_enable(uint8_t port_id); rte_eth_bond_8023ad_slow_queue_disable(uint8_t port_id); rte_eth_bond_8023ad_slow_queue_enable should be called before bonding port start to enable new path. When this option is enabled all slaves must support flow director's filtering by ethernet type and support one additional queue on slaves tx/rx. Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> --- drivers/net/bonding/rte_eth_bond_8023ad.c | 141 +++++++-- drivers/net/bonding/rte_eth_bond_8023ad.h | 6 + drivers/net/bonding/rte_eth_bond_8023ad_private.h | 15 + drivers/net/bonding/rte_eth_bond_pmd.c | 345 +++++++++++++++++++++- drivers/net/bonding/rte_eth_bond_version.map | 9 + 5 files changed, 481 insertions(+), 35 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index 7b863d6..125eb45 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -632,12 +632,20 @@ lacpdu->tlv_type_terminator = TLV_TYPE_TERMINATOR_INFORMATION; lacpdu->terminator_length = 0; - if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { - /* If TX ring full, drop packet and free message. Retransmission - * will happen in next function call. */ - rte_pktmbuf_free(lacp_pkt); - set_warning_flags(port, WRN_TX_QUEUE_FULL); - return; + if (internals->mode4.slow_rx_queue == 0) { + if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { + /* If TX ring full, drop packet and free message. Retransmission + * will happen in next function call. */ + rte_pktmbuf_free(lacp_pkt); + set_warning_flags(port, WRN_TX_QUEUE_FULL); + return; + } + } else { + if (rte_eth_tx_burst(slave_id, internals->mode4.slow_tx_queue, &lacp_pkt, 1) == 0) { + rte_pktmbuf_free(lacp_pkt); + set_warning_flags(port, WRN_TX_QUEUE_FULL); + return; + } } MODE4_DEBUG("sending LACP frame\n"); @@ -741,6 +749,25 @@ } static void +rx_machine_update(struct bond_dev_private *internals, uint8_t slave_id, + struct rte_mbuf *lacp_pkt) { + + /* Find LACP packet to this port. Do not check subtype, it is done in + * function that queued packet */ + if (lacp_pkt != NULL) { + struct lacpdu_header *lacp; + + lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); + RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); + + /* This is LACP frame so pass it to rx_machine */ + rx_machine(internals, slave_id, &lacp->lacpdu); + rte_pktmbuf_free(lacp_pkt); + } else + rx_machine(internals, slave_id, NULL); +} + +static void bond_mode_8023ad_periodic_cb(void *arg) { struct rte_eth_dev *bond_dev = arg; @@ -809,20 +836,21 @@ SM_FLAG_SET(port, LACP_ENABLED); - /* Find LACP packet to this port. Do not check subtype, it is done in - * function that queued packet */ - if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) { - struct rte_mbuf *lacp_pkt = pkt; - struct lacpdu_header *lacp; + struct rte_mbuf *lacp_pkt = NULL; - lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); - RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); + if (internals->mode4.slow_rx_queue == 0) { + /* Find LACP packet to this port. Do not check subtype, it is done in + * function that queued packet */ + if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) + lacp_pkt = pkt; - /* This is LACP frame so pass it to rx_machine */ - rx_machine(internals, slave_id, &lacp->lacpdu); - rte_pktmbuf_free(lacp_pkt); - } else - rx_machine(internals, slave_id, NULL); + rx_machine_update(internals, slave_id, lacp_pkt); + } else { + if (rte_eth_rx_burst(slave_id, internals->mode4.slow_rx_queue, &lacp_pkt, 1) == 1) + bond_mode_8023ad_handle_slow_pkt(internals, slave_id, lacp_pkt); + else + rx_machine_update(internals, slave_id, NULL); + } periodic_machine(internals, slave_id); mux_machine(internals, slave_id); @@ -1188,18 +1216,36 @@ m_hdr->marker.tlv_type_marker = MARKER_TLV_TYPE_RESP; rte_eth_macaddr_get(slave_id, &m_hdr->eth_hdr.s_addr); - if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == -ENOBUFS)) { - /* reset timer */ - port->rx_marker_timer = 0; - wrn = WRN_TX_QUEUE_FULL; - goto free_out; + if (internals->mode4.slow_tx_queue == 0) { + if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == + -ENOBUFS)) { + /* reset timer */ + port->rx_marker_timer = 0; + wrn = WRN_TX_QUEUE_FULL; + goto free_out; + } + } else { + /* Send packet directly to the slow queue */ + if (unlikely(rte_eth_tx_burst(slave_id, + internals->mode4.slow_tx_queue, + &pkt, 1) == 0)) { + /* reset timer */ + port->rx_marker_timer = 0; + wrn = WRN_TX_QUEUE_FULL; + goto free_out; + } } } else if (likely(subtype == SLOW_SUBTYPE_LACP)) { - if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { - /* If RX fing full free lacpdu message and drop packet */ - wrn = WRN_RX_QUEUE_FULL; - goto free_out; - } + + if (internals->mode4.slow_rx_queue == 0) { + if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { + /* If RX fing full free lacpdu message and drop packet */ + wrn = WRN_RX_QUEUE_FULL; + goto free_out; + } + } else + rx_machine_update(internals, slave_id, pkt); + } else { wrn = WRN_UNKNOWN_SLOW_TYPE; goto free_out; @@ -1504,3 +1550,42 @@ rte_eal_alarm_set(internals->mode4.update_timeout_us, bond_mode_8023ad_ext_periodic_cb, arg); } + +#define MBUF_CACHE_SIZE 250 +#define NUM_MBUFS 8191 + +int +rte_eth_bond_8023ad_slow_queue_enable(uint8_t port) +{ + int retval = 0; + struct rte_eth_dev *dev = &rte_eth_devices[port]; + struct bond_dev_private *internals = (struct bond_dev_private *) + dev->data->dev_private; + + if (check_for_bonded_ethdev(dev) != 0) + return -1; + + internals->mode4.slow_rx_queue = dev->data->nb_rx_queues; + internals->mode4.slow_tx_queue = dev->data->nb_tx_queues; + + bond_ethdev_mode_set(dev, internals->mode); + return retval; +} + +int +rte_eth_bond_8023ad_slow_queue_disable(uint8_t port) +{ + int retval = 0; + struct rte_eth_dev *dev = &rte_eth_devices[port]; + struct bond_dev_private *internals = (struct bond_dev_private *) + dev->data->dev_private; + + if (check_for_bonded_ethdev(dev) != 0) + return -1; + + internals->mode4.slow_rx_queue = 0; + internals->mode4.slow_tx_queue = 0; + + bond_ethdev_mode_set(dev, internals->mode); + return retval; +} diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h index 6b8ff57..8d21c7a 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad.h @@ -302,4 +302,10 @@ struct rte_eth_bond_8023ad_slave_info { rte_eth_bond_8023ad_ext_slowtx(uint8_t port_id, uint8_t slave_id, struct rte_mbuf *lacp_pkt); +int +rte_eth_bond_8023ad_slow_queue_enable(uint8_t port_id); + +int +rte_eth_bond_8023ad_slow_queue_disable(uint8_t port_id); + #endif /* RTE_ETH_BOND_8023AD_H_ */ diff --git a/drivers/net/bonding/rte_eth_bond_8023ad_private.h b/drivers/net/bonding/rte_eth_bond_8023ad_private.h index ca8858b..3963714 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad_private.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad_private.h @@ -39,6 +39,7 @@ #include <rte_ether.h> #include <rte_byteorder.h> #include <rte_atomic.h> +#include <rte_flow.h> #include "rte_eth_bond_8023ad.h" @@ -162,6 +163,9 @@ struct port { uint64_t warning_timer; volatile uint16_t warnings_to_show; + + /** Memory pool used to allocate slow queues */ + struct rte_mempool *slow_pool; }; struct mode8023ad_private { @@ -175,6 +179,10 @@ struct mode8023ad_private { uint64_t update_timeout_us; rte_eth_bond_8023ad_ext_slowrx_fn slowrx_cb; uint8_t external_sm; + + uint8_t slow_rx_queue; /**< Queue no for slow packets, or 0 if no accel */ + uint8_t slow_tx_queue; + struct rte_flow *slow_flow[RTE_MAX_ETHPORTS]; }; /** @@ -295,4 +303,11 @@ struct mode8023ad_private { void bond_mode_8023ad_mac_address_update(struct rte_eth_dev *bond_dev); +int +bond_ethdev_8023ad_flow_verify(struct rte_eth_dev *bond_dev, + uint8_t slave_port); + +int +bond_ethdev_8023ad_flow_set(struct rte_eth_dev *bond_dev, uint8_t slave_port); + #endif /* RTE_ETH_BOND_8023AD_H_ */ diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 82959ab..558682c 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -59,6 +59,12 @@ /* Table for statistics in mode 5 TLB */ static uint64_t tlb_last_obytets[RTE_MAX_ETHPORTS]; +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ +#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8))) +#else +#define _htons(x) (x) +#endif + static inline size_t get_vlan_offset(struct ether_hdr *eth_hdr, uint16_t *proto) { @@ -133,6 +139,215 @@ (subtype == SLOW_SUBTYPE_MARKER || subtype == SLOW_SUBTYPE_LACP)); } +/***************************************************************************** + * Flow director's setup for mode 4 optimization + */ + +static struct rte_flow_item_eth flow_item_eth_type_8023ad = { + .dst.addr_bytes = { 0 }, + .src.addr_bytes = { 0 }, + .type = _htons(ETHER_TYPE_SLOW), +}; + +static struct rte_flow_item_eth flow_item_eth_mask_type_8023ad = { + .dst.addr_bytes = { 0 }, + .src.addr_bytes = { 0 }, + .type = 0xFFFF, +}; + +static struct rte_flow_item flow_item_8023ad[] = { + { + .type = RTE_FLOW_ITEM_TYPE_ETH, + .spec = &flow_item_eth_type_8023ad, + .last = NULL, + .mask = &flow_item_eth_mask_type_8023ad, + }, + { + .type = RTE_FLOW_ITEM_TYPE_END, + .spec = NULL, + .last = NULL, + .mask = NULL, + } +}; + +const struct rte_flow_attr flow_attr_8023ad = { + .group = 0, + .priority = 0, + .ingress = 1, + .egress = 0, + .reserved = 0, +}; + +int +bond_ethdev_8023ad_flow_verify(struct rte_eth_dev *bond_dev, + uint8_t slave_port) { + + struct rte_flow_error error; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + + struct rte_flow_action_queue lacp_queue_conf = { + .index = internals->mode4.slow_rx_queue, + }; + + const struct rte_flow_action actions[] = { + { + .type = RTE_FLOW_ACTION_TYPE_QUEUE, + .conf = &lacp_queue_conf + }, + { + .type = RTE_FLOW_ACTION_TYPE_END, + } + }; + + int ret = rte_flow_validate(slave_port, &flow_attr_8023ad, + flow_item_8023ad, actions, &error); + if (ret < 0) + return -1; + + return 0; +} + +int +bond_ethdev_8023ad_flow_set(struct rte_eth_dev *bond_dev, uint8_t slave_port) { + + struct rte_flow_error error; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + + struct rte_flow_action_queue lacp_queue_conf = { + .index = internals->mode4.slow_rx_queue, + }; + + const struct rte_flow_action actions[] = { + { + .type = RTE_FLOW_ACTION_TYPE_QUEUE, + .conf = &lacp_queue_conf + }, + { + .type = RTE_FLOW_ACTION_TYPE_END, + } + }; + + internals->mode4.slow_flow[slave_port] = rte_flow_create(slave_port, + &flow_attr_8023ad, flow_item_8023ad, actions, &error); + if (internals->mode4.slow_flow[slave_port] == NULL) { + RTE_BOND_LOG(ERR, + "bond_ethdev_8023ad_flow_set: %s (slave_port=%d queue_id=%d)", + error.message, slave_port, internals->mode4.slow_rx_queue); + return -1; + } + + return 0; +} + +static uint16_t +bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue; + struct bond_dev_private *internals = bd_rx_q->dev_private; + uint16_t num_rx_total = 0; /* Total number of received packets */ + uint8_t slaves[RTE_MAX_ETHPORTS]; + uint8_t slave_count; + + uint8_t i; + + /* Copy slave list to protect against slave up/down changes during tx + * bursting */ + slave_count = internals->active_slave_count; + memcpy(slaves, internals->active_slaves, + sizeof(internals->active_slaves[0]) * slave_count); + + for (i = 0; i < slave_count && num_rx_total < nb_pkts; i++) { + /* Read packets from this slave */ + num_rx_total += rte_eth_rx_burst(slaves[i], bd_rx_q->queue_id, + &bufs[num_rx_total], nb_pkts - num_rx_total); + } + + return num_rx_total; +} + +static uint16_t +bond_ethdev_tx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + struct bond_dev_private *internals; + struct bond_tx_queue *bd_tx_q; + + uint8_t num_of_slaves; + uint8_t slaves[RTE_MAX_ETHPORTS]; + /* positions in slaves, not ID */ + uint8_t distributing_offsets[RTE_MAX_ETHPORTS]; + uint8_t distributing_count; + + uint16_t num_tx_slave, num_tx_total = 0, num_tx_fail_total = 0; + uint16_t i, op_slave_idx; + + struct rte_mbuf *slave_bufs[RTE_MAX_ETHPORTS][nb_pkts]; + + /* Total amount of packets in slave_bufs */ + uint16_t slave_nb_pkts[RTE_MAX_ETHPORTS] = { 0 }; + /* Slow packets placed in each slave */ + + if (unlikely(nb_pkts == 0)) + return 0; + + bd_tx_q = (struct bond_tx_queue *)queue; + internals = bd_tx_q->dev_private; + + /* Copy slave list to protect against slave up/down changes during tx + * bursting */ + num_of_slaves = internals->active_slave_count; + if (num_of_slaves < 1) + return num_tx_total; + + memcpy(slaves, internals->active_slaves, sizeof(slaves[0]) * + num_of_slaves); + + distributing_count = 0; + for (i = 0; i < num_of_slaves; i++) { + struct port *port = &mode_8023ad_ports[slaves[i]]; + if (ACTOR_STATE(port, DISTRIBUTING)) + distributing_offsets[distributing_count++] = i; + } + + if (likely(distributing_count > 0)) { + /* Populate slaves mbuf with the packets which are to be sent on it */ + for (i = 0; i < nb_pkts; i++) { + /* Select output slave using hash based on xmit policy */ + op_slave_idx = internals->xmit_hash(bufs[i], distributing_count); + + /* Populate slave mbuf arrays with mbufs for that slave. Use only + * slaves that are currently distributing. */ + uint8_t slave_offset = distributing_offsets[op_slave_idx]; + slave_bufs[slave_offset][slave_nb_pkts[slave_offset]] = bufs[i]; + slave_nb_pkts[slave_offset]++; + } + } + + /* Send packet burst on each slave device */ + for (i = 0; i < num_of_slaves; i++) { + if (slave_nb_pkts[i] == 0) + continue; + + num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, + slave_bufs[i], slave_nb_pkts[i]); + + num_tx_total += num_tx_slave; + num_tx_fail_total += slave_nb_pkts[i] - num_tx_slave; + + /* If tx burst fails move packets to end of bufs */ + if (unlikely(num_tx_slave < slave_nb_pkts[i])) { + uint16_t j = nb_pkts - num_tx_fail_total; + for ( ; num_tx_slave < slave_nb_pkts[i]; j++, num_tx_slave++) + bufs[j] = slave_bufs[i][num_tx_slave]; + } + } + + return num_tx_total; +} + static uint16_t bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) @@ -180,6 +395,13 @@ /* Handle slow protocol packets. */ while (j < num_rx_total) { + + /* if packet is not pure L2 and is known, skip it */ + if ((bufs[j]->packet_type & ~RTE_PTYPE_L2_ETHER) != 0) { + j++; + continue; + } + if (j + 3 < num_rx_total) rte_prefetch0(rte_pktmbuf_mtod(bufs[j + 3], void *)); @@ -1295,11 +1517,19 @@ struct bwg_slave { if (bond_mode_8023ad_enable(eth_dev) != 0) return -1; - eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad; - eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad; - RTE_LOG(WARNING, PMD, - "Using mode 4, it is necessary to do TX burst and RX burst " - "at least every 100ms.\n"); + if (internals->mode4.slow_rx_queue == 0) { + eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad; + eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad; + RTE_LOG(WARNING, PMD, + "Using mode 4, it is necessary to do TX burst " + "and RX burst at least every 100ms.\n"); + } else { + /* Use flow director's optimization */ + eth_dev->rx_pkt_burst = + bond_ethdev_rx_burst_8023ad_fast_queue; + eth_dev->tx_pkt_burst = + bond_ethdev_tx_burst_8023ad_fast_queue; + } break; case BONDING_MODE_TLB: eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_tlb; @@ -1321,6 +1551,72 @@ struct bwg_slave { return 0; } +static int +slave_configure_slow_queue(struct rte_eth_dev *bonded_eth_dev, + struct rte_eth_dev *slave_eth_dev) +{ + int errval = 0; + struct bond_dev_private *internals = (struct bond_dev_private *) + bonded_eth_dev->data->dev_private; + struct port *port = &mode_8023ad_ports[slave_eth_dev->data->port_id]; + + if ((internals->mode != BONDING_MODE_8023AD) || + (internals->mode4.slow_rx_queue == 0) || + (internals->mode4.slow_tx_queue == 0)) + return 0; + + if (port->slow_pool == NULL) { + char mem_name[256]; + int slave_id = slave_eth_dev->data->port_id; + + snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_slow_pool", + slave_id); + port->slow_pool = rte_pktmbuf_pool_create(mem_name, 8191, + 250, 0, RTE_MBUF_DEFAULT_BUF_SIZE, + slave_eth_dev->data->numa_node); + + /* Any memory allocation failure in initialization is critical because + * resources can't be free, so reinitialization is impossible. */ + if (port->slow_pool == NULL) { + rte_panic("Slave %u: Failed to create memory pool '%s': %s\n", + slave_id, mem_name, rte_strerror(rte_errno)); + } + } + + if (internals->mode4.slow_rx_queue > 0) { + /* Configure slow Rx queue */ + + errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, + internals->mode4.slow_rx_queue, 128, + rte_eth_dev_socket_id(slave_eth_dev->data->port_id), + NULL, port->slow_pool); + if (errval != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_rx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, + internals->mode4.slow_rx_queue, + errval); + return errval; + } + } + + if (internals->mode4.slow_tx_queue > 0) { + errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, + internals->mode4.slow_tx_queue, 512, + rte_eth_dev_socket_id(slave_eth_dev->data->port_id), + NULL); + if (errval != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, + internals->mode4.slow_tx_queue, + errval); + return errval; + } + } + return 0; +} + int slave_configure(struct rte_eth_dev *bonded_eth_dev, struct rte_eth_dev *slave_eth_dev) @@ -1330,6 +1626,10 @@ struct bwg_slave { int errval; uint16_t q_id; + struct rte_flow_error flow_error; + + struct bond_dev_private *internals = (struct bond_dev_private *) + bonded_eth_dev->data->dev_private; /* Stop slave */ rte_eth_dev_stop(slave_eth_dev->data->port_id); @@ -1359,10 +1659,19 @@ struct bwg_slave { slave_eth_dev->data->dev_conf.rxmode.hw_vlan_filter = bonded_eth_dev->data->dev_conf.rxmode.hw_vlan_filter; + uint16_t nb_rx_queues = bonded_eth_dev->data->nb_rx_queues; + uint16_t nb_tx_queues = bonded_eth_dev->data->nb_tx_queues; + + if (internals->mode == BONDING_MODE_8023AD) { + if (internals->mode4.slow_rx_queue > 0) + nb_rx_queues++; + if (internals->mode4.slow_tx_queue > 0) + nb_tx_queues++; + } + /* Configure device */ errval = rte_eth_dev_configure(slave_eth_dev->data->port_id, - bonded_eth_dev->data->nb_rx_queues, - bonded_eth_dev->data->nb_tx_queues, + nb_rx_queues, nb_tx_queues, &(slave_eth_dev->data->dev_conf)); if (errval != 0) { RTE_BOND_LOG(ERR, "Cannot configure slave device: port %u , err (%d)", @@ -1402,6 +1711,28 @@ struct bwg_slave { } } + slave_configure_slow_queue(bonded_eth_dev, slave_eth_dev); + + if ((internals->mode == BONDING_MODE_8023AD) && + (internals->mode4.slow_rx_queue > 0)) { + + if (bond_ethdev_8023ad_flow_verify(bonded_eth_dev, + slave_eth_dev->data->port_id) != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, q_id, errval); + return -1; + } + + if (internals->mode4.slow_flow[slave_eth_dev->data->port_id] != NULL) + rte_flow_destroy(slave_eth_dev->data->port_id, + internals->mode4.slow_flow[slave_eth_dev->data->port_id], + &flow_error); + + bond_ethdev_8023ad_flow_set(bonded_eth_dev, + slave_eth_dev->data->port_id); + } + /* Start device */ errval = rte_eth_dev_start(slave_eth_dev->data->port_id); if (errval != 0) { diff --git a/drivers/net/bonding/rte_eth_bond_version.map b/drivers/net/bonding/rte_eth_bond_version.map index 2de0a7d..6f1f13a 100644 --- a/drivers/net/bonding/rte_eth_bond_version.map +++ b/drivers/net/bonding/rte_eth_bond_version.map @@ -43,3 +43,12 @@ DPDK_16.07 { rte_eth_bond_8023ad_setup; } DPDK_16.04; + +DPDK_17.08 { + global: + + rte_eth_bond_8023ad_slow_queue_enable; + rte_eth_bond_8023ad_slow_queue_disable; + + local: *; +} DPDK_16.07; -- 1.9.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] LACP control packet filtering offload 2017-05-27 11:27 ` [dpdk-dev] [PATCH 1/2] " Tomasz Kulasek @ 2017-05-29 8:10 ` Adrien Mazarguil 2017-06-29 9:18 ` Declan Doherty 1 sibling, 0 replies; 22+ messages in thread From: Adrien Mazarguil @ 2017-05-29 8:10 UTC (permalink / raw) To: Tomasz Kulasek; +Cc: dev, declan.doherty Hi Tomasz, On Sat, May 27, 2017 at 01:27:43PM +0200, Tomasz Kulasek wrote: > New API funtions implemented: > > rte_eth_bond_8023ad_slow_queue_enable(uint8_t port_id); > rte_eth_bond_8023ad_slow_queue_disable(uint8_t port_id); > > rte_eth_bond_8023ad_slow_queue_enable should be called before bonding port > start to enable new path. > > When this option is enabled all slaves must support flow director's > filtering by ethernet type and support one additional queue on slaves > tx/rx. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> [...] > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c > index 82959ab..558682c 100644 > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > @@ -59,6 +59,12 @@ > /* Table for statistics in mode 5 TLB */ > static uint64_t tlb_last_obytets[RTE_MAX_ETHPORTS]; > > +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ > +#define _htons(x) ((uint16_t)((((x) & 0x00ffU) << 8) | (((x) & 0xff00U) >> 8))) > +#else > +#define _htons(x) (x) > +#endif > + [...] > static inline size_t > get_vlan_offset(struct ether_hdr *eth_hdr, uint16_t *proto) > { > @@ -133,6 +139,215 @@ > (subtype == SLOW_SUBTYPE_MARKER || subtype == SLOW_SUBTYPE_LACP)); > } > > +/***************************************************************************** > + * Flow director's setup for mode 4 optimization > + */ > + > +static struct rte_flow_item_eth flow_item_eth_type_8023ad = { > + .dst.addr_bytes = { 0 }, > + .src.addr_bytes = { 0 }, > + .type = _htons(ETHER_TYPE_SLOW), > +}; Might I interest you in a more generic alternative [1]? [1] http://dpdk.org/ml/archives/dev/2017-May/066097.html -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] LACP control packet filtering offload 2017-05-27 11:27 ` [dpdk-dev] [PATCH 1/2] " Tomasz Kulasek 2017-05-29 8:10 ` Adrien Mazarguil @ 2017-06-29 9:18 ` Declan Doherty 1 sibling, 0 replies; 22+ messages in thread From: Declan Doherty @ 2017-06-29 9:18 UTC (permalink / raw) To: Tomasz Kulasek, dev On 27/05/17 12:27, Tomasz Kulasek wrote: > New API funtions implemented: > > rte_eth_bond_8023ad_slow_queue_enable(uint8_t port_id); > rte_eth_bond_8023ad_slow_queue_disable(uint8_t port_id); > > rte_eth_bond_8023ad_slow_queue_enable should be called before bonding port > start to enable new path. > > When this option is enabled all slaves must support flow director's > filtering by ethernet type and support one additional queue on slaves > tx/rx. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > --- > drivers/net/bonding/rte_eth_bond_8023ad.c | 141 +++++++-- > drivers/net/bonding/rte_eth_bond_8023ad.h | 6 + > drivers/net/bonding/rte_eth_bond_8023ad_private.h | 15 + > drivers/net/bonding/rte_eth_bond_pmd.c | 345 +++++++++++++++++++++- > drivers/net/bonding/rte_eth_bond_version.map | 9 + > 5 files changed, 481 insertions(+), 35 deletions(-) > > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c > index 7b863d6..125eb45 100644 > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c > @@ -632,12 +632,20 @@ > lacpdu->tlv_type_terminator = TLV_TYPE_TERMINATOR_INFORMATION; > lacpdu->terminator_length = 0; > > - if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { > - /* If TX ring full, drop packet and free message. Retransmission > - * will happen in next function call. */ > - rte_pktmbuf_free(lacp_pkt); > - set_warning_flags(port, WRN_TX_QUEUE_FULL); > - return; > + if (internals->mode4.slow_rx_queue == 0) { I think we should have an explicit flag set for if hw filtering of slow packets is enabled instead of checking the rx/tx queue id like above. > + if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { > + /* If TX ring full, drop packet and free message. Retransmission > + * will happen in next function call. */ > + rte_pktmbuf_free(lacp_pkt); > + set_warning_flags(port, WRN_TX_QUEUE_FULL); > + return; > + } > + } else { > + if (rte_eth_tx_burst(slave_id, internals->mode4.slow_tx_queue, &lacp_pkt, 1) == 0) { > + rte_pktmbuf_free(lacp_pkt); > + set_warning_flags(port, WRN_TX_QUEUE_FULL); > + return; > + } > } > > MODE4_DEBUG("sending LACP frame\n"); > @@ -741,6 +749,25 @@ > } > > static void > +rx_machine_update(struct bond_dev_private *internals, uint8_t slave_id, > + struct rte_mbuf *lacp_pkt) { > + > + /* Find LACP packet to this port. Do not check subtype, it is done in > + * function that queued packet */ > + if (lacp_pkt != NULL) { > + struct lacpdu_header *lacp; > + > + lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); > + RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); > + > + /* This is LACP frame so pass it to rx_machine */ > + rx_machine(internals, slave_id, &lacp->lacpdu); > + rte_pktmbuf_free(lacp_pkt); > + } else > + rx_machine(internals, slave_id, NULL); > +} > + > +static void > bond_mode_8023ad_periodic_cb(void *arg) > { > struct rte_eth_dev *bond_dev = arg; > @@ -809,20 +836,21 @@ > > SM_FLAG_SET(port, LACP_ENABLED); > > - /* Find LACP packet to this port. Do not check subtype, it is done in > - * function that queued packet */ > - if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) { > - struct rte_mbuf *lacp_pkt = pkt; > - struct lacpdu_header *lacp; > + struct rte_mbuf *lacp_pkt = NULL; > > - lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); > - RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); > + if (internals->mode4.slow_rx_queue == 0) { > As above instead of checking rx queue id and explicit enable/disable flag would be clearer. > + /* Find LACP packet to this port. Do not check subtype, it is done in > + * function that queued packet */ > + if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) > + lacp_pkt = pkt; > > - /* This is LACP frame so pass it to rx_machine */ > - rx_machine(internals, slave_id, &lacp->lacpdu); > - rte_pktmbuf_free(lacp_pkt); > - } else > - rx_machine(internals, slave_id, NULL); > + rx_machine_update(internals, slave_id, lacp_pkt); > + } else { > + if (rte_eth_rx_burst(slave_id, internals->mode4.slow_rx_queue, &lacp_pkt, 1) == 1) > + bond_mode_8023ad_handle_slow_pkt(internals, slave_id, lacp_pkt); > + else > + rx_machine_update(internals, slave_id, NULL); > + } If possible it would be good if the hw filtered path and the using the sw queue followed the same code path here. We are now calling bond_mode_8023ad_handle_slow_pkt from both the bond_mode_8023ad_periodic_cb and bond_ethdev_tx_burst_8023ad, it would be clearer if both follow the same processing path and bond_mode_8023ad_handle_slow_pkt wasn't called within bond_ethdev_tx_burst_8023ad. > > periodic_machine(internals, slave_id); > mux_machine(internals, slave_id); > @@ -1188,18 +1216,36 @@ > m_hdr->marker.tlv_type_marker = MARKER_TLV_TYPE_RESP; > rte_eth_macaddr_get(slave_id, &m_hdr->eth_hdr.s_addr); > > - if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == -ENOBUFS)) { > - /* reset timer */ > - port->rx_marker_timer = 0; > - wrn = WRN_TX_QUEUE_FULL; > - goto free_out; > + if (internals->mode4.slow_tx_queue == 0) { > + if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == > + -ENOBUFS)) { > + /* reset timer */ > + port->rx_marker_timer = 0; > + wrn = WRN_TX_QUEUE_FULL; > + goto free_out; > + } > + } else { > + /* Send packet directly to the slow queue */ > + if (unlikely(rte_eth_tx_burst(slave_id, > + internals->mode4.slow_tx_queue, > + &pkt, 1) == 0)) { > + /* reset timer */ > + port->rx_marker_timer = 0; > + wrn = WRN_TX_QUEUE_FULL; > + goto free_out; > + } > } > } else if (likely(subtype == SLOW_SUBTYPE_LACP)) { > - if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { > - /* If RX fing full free lacpdu message and drop packet */ > - wrn = WRN_RX_QUEUE_FULL; > - goto free_out; > - } > + > + if (internals->mode4.slow_rx_queue == 0) { > + if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { > + /* If RX fing full free lacpdu message and drop packet */ > + wrn = WRN_RX_QUEUE_FULL; > + goto free_out; > + } > + } else > + rx_machine_update(internals, slave_id, pkt); > + > } else { > wrn = WRN_UNKNOWN_SLOW_TYPE; > goto free_out; > @@ -1504,3 +1550,42 @@ > rte_eal_alarm_set(internals->mode4.update_timeout_us, > bond_mode_8023ad_ext_periodic_cb, arg); > } > + > +#define MBUF_CACHE_SIZE 250 > +#define NUM_MBUFS 8191 > + > +int > +rte_eth_bond_8023ad_slow_queue_enable(uint8_t port) > +{ > + int retval = 0; > + struct rte_eth_dev *dev = &rte_eth_devices[port]; > + struct bond_dev_private *internals = (struct bond_dev_private *) > + dev->data->dev_private; > + > + if (check_for_bonded_ethdev(dev) != 0) > + return -1; > + > + internals->mode4.slow_rx_queue = dev->data->nb_rx_queues; > + internals->mode4.slow_tx_queue = dev->data->nb_tx_queues; > + We shouldn't be setting the slow queues here as they won't necessarily be the right values, as mentioned above just an enable flag would be sufficient. Also we should really be testing whether all the slaves of the bond can support applying the filtering rule required here and then fail enablement if they don't. > + bond_ethdev_mode_set(dev, internals->mode); > + return retval; > +} > + > +int > +rte_eth_bond_8023ad_slow_queue_disable(uint8_t port) > +{ > + int retval = 0; > + struct rte_eth_dev *dev = &rte_eth_devices[port]; > + struct bond_dev_private *internals = (struct bond_dev_private *) > + dev->data->dev_private; > + > + if (check_for_bonded_ethdev(dev) != 0) > + return -1; > + > + internals->mode4.slow_rx_queue = 0; > + internals->mode4.slow_tx_queue = 0; > + As above, in regards to the enable flag > + bond_ethdev_mode_set(dev, internals->mode); > + return retval; > +} > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h > index 6b8ff57..8d21c7a 100644 > --- a/drivers/net/bonding/rte_eth_bond_8023ad.h > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.h > @@ -302,4 +302,10 @@ struct rte_eth_bond_8023ad_slave_info { > rte_eth_bond_8023ad_ext_slowtx(uint8_t port_id, uint8_t slave_id, > struct rte_mbuf *lacp_pkt); > > +int > +rte_eth_bond_8023ad_slow_queue_enable(uint8_t port_id); > > +int > +rte_eth_bond_8023ad_slow_queue_disable(uint8_t port_id); > + We need to include the doxygen here, with some details on what is being enable here, i.e. details that dedicated rx/tx queues on slaves are being created for filtering the lacp control plane traffic from data path traffic so filtering in the data path is not required. Also, I think that these functions purpose would be clearer if there where called rte_eth_bond_8023ad_slow_pkt_hw_filter_enable/disable > #endif /* RTE_ETH_BOND_8023AD_H_ */ > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad_private.h b/drivers/net/bonding/rte_eth_bond_8023ad_private.h > index ca8858b..3963714 100644 .... > On thing missing is the reporting to the application that there is a reduced number of tx/rx queues available when hw filtering is enabled. Looking at the bond_ethdev_info() it doesn't look like this is getting reported correctly at the moment anyway but it should be smallest value of the max number of queues of the slave devices minus one. So if we had 3 slaves one which support 8 rx queues and the other 2 supported 16, then we should report 7 (8-1) as the maximum number of rx queues for the bonded devices. Finally, we are missing some updated documentation about this new feature. The information in the cover note should be added to the bonding documentation at a minimum. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH 2/2] test-pmd: add set bonding slow_queue hw/sw 2017-05-27 11:27 [dpdk-dev] [PATCH 0/2] LACP control packet filtering offload Tomasz Kulasek 2017-05-27 11:27 ` [dpdk-dev] [PATCH 1/2] " Tomasz Kulasek @ 2017-05-27 11:27 ` Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload Tomasz Kulasek 2 siblings, 0 replies; 22+ messages in thread From: Tomasz Kulasek @ 2017-05-27 11:27 UTC (permalink / raw) To: dev; +Cc: declan.doherty This patch adds new command: set bonding slow_queue <port_id> sw|hw "set bonding slow_queue <bonding_port_id> hw" sets hardware management of slow packets and chooses simplified paths for tx/rx bursts. "set bonding slow_queue <bonding_port_id> sw" turns back to the software handling of slow packets. This option is default. Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> --- app/test-pmd/cmdline.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 0afac68..11fa4a5 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -87,6 +87,7 @@ #include <cmdline.h> #ifdef RTE_LIBRTE_PMD_BOND #include <rte_eth_bond.h> +#include <rte_eth_bond_8023ad.h> #endif #ifdef RTE_LIBRTE_IXGBE_PMD #include <rte_pmd_ixgbe.h> @@ -4279,6 +4280,62 @@ static void cmd_set_bonding_mode_parsed(void *parsed_result, } }; +/* *** SET BONDING SLOW_QUEUE SW/HW *** */ +struct cmd_set_bonding_slow_queue_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t bonding; + cmdline_fixed_string_t slow_queue; + uint8_t port_id; + cmdline_fixed_string_t mode; +}; + +static void cmd_set_bonding_slow_queue_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_set_bonding_slow_queue_result *res = parsed_result; + portid_t port_id = res->port_id; + + if (!strcmp(res->mode, "hw")) { + rte_eth_bond_8023ad_slow_queue_enable(port_id); + printf("Hardware slow queue enabled\n"); + } else if (!strcmp(res->mode, "sw")) { + rte_eth_bond_8023ad_slow_queue_disable(port_id); + } +} + +cmdline_parse_token_string_t cmd_setbonding_slow_queue_set = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + set, "set"); +cmdline_parse_token_string_t cmd_setbonding_slow_queue_bonding = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + bonding, "bonding"); +cmdline_parse_token_string_t cmd_setbonding_slow_queue_slow_queue = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + slow_queue, "slow_queue"); +cmdline_parse_token_num_t cmd_setbonding_slow_queue_port = +TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + port_id, UINT8); +cmdline_parse_token_string_t cmd_setbonding_slow_queue_mode = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + mode, "sw#hw"); + +cmdline_parse_inst_t cmd_set_slow_queue = { + .f = cmd_set_bonding_slow_queue_parsed, + .help_str = "set bonding slow_queue <port_id> " + "sw|hw: " + "Set the bonding slow queue acceleration for port_id", + .data = NULL, + .tokens = { + (void *)&cmd_setbonding_slow_queue_set, + (void *)&cmd_setbonding_slow_queue_bonding, + (void *)&cmd_setbonding_slow_queue_slow_queue, + (void *)&cmd_setbonding_slow_queue_port, + (void *)&cmd_setbonding_slow_queue_mode, + NULL + } +}; + /* *** SET BALANCE XMIT POLICY *** */ struct cmd_set_bonding_balance_xmit_policy_result { cmdline_fixed_string_t set; @@ -13613,6 +13670,7 @@ struct cmd_cmdfile_result { (cmdline_parse_inst_t *) &cmd_set_bond_mac_addr, (cmdline_parse_inst_t *) &cmd_set_balance_xmit_policy, (cmdline_parse_inst_t *) &cmd_set_bond_mon_period, + (cmdline_parse_inst_t *) &cmd_set_slow_queue, #endif (cmdline_parse_inst_t *)&cmd_vlan_offload, (cmdline_parse_inst_t *)&cmd_vlan_tpid, -- 1.9.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload 2017-05-27 11:27 [dpdk-dev] [PATCH 0/2] LACP control packet filtering offload Tomasz Kulasek 2017-05-27 11:27 ` [dpdk-dev] [PATCH 1/2] " Tomasz Kulasek 2017-05-27 11:27 ` [dpdk-dev] [PATCH 2/2] test-pmd: add set bonding slow_queue hw/sw Tomasz Kulasek @ 2017-06-29 16:20 ` Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 1/2] " Tomasz Kulasek ` (2 more replies) 2 siblings, 3 replies; 22+ messages in thread From: Tomasz Kulasek @ 2017-06-29 16:20 UTC (permalink / raw) To: dev 1. Overview Packet processing in the current path for bonding in mode 4, requires parse all packets in the fast path, to classify and process LACP packets. The idea of performance improvement is to use hardware offloads to improve packet classification. 2. Scope of work a) Optimization of software LACP packet classification by using packet_type metadata to eliminate the requirement of parsing each packet in the received burst. b) Implementation of classification mechanism using flow director to redirect LACP packets to the dedicated queue (not visible by application). - Filter pattern choosing (not all filters are supported by all devices), - Changing processing path to speed up non-LACP packets processing, - Handle LACP packets from dedicated Rx queue and send to the dedicated Tx queue, c) Creation of fallback mechanism allowing to select the most preferable method of processing: - Flow director, - Packet type metadata, - Software parsing, 3. Implementation 3.1. Packet type The packet_type approach would result in a performance improvement as packets data would no longer be required to be read, but with this approach the bonded driver would still need to look at the mbuf of each packet thereby having an impact on the achievable Rx performance. There's not packet_type value describing LACP packets directly. However, it can be used to limit number of packets required to be parsed, e.g. if packet_type indicates >L2 packets. It should improve performance while well-known non-LACP packets can be skipped without the need to look up into its data. 3.2. Flow director Using rte_flow API and pattern on ethernet type of packet (0x8809), we can configure flow director to redirect slow packets to separated queue. An independent Rx queues for LACP would remove the requirement to filter all ingress traffic in sw which should result in a performance increase. Other queues stay untouched and processing of packets on the fast path will be reduced to simple packet collecting from slaves. Separated Tx queue for LACP daemon allows to send LACP responses immediately, without interfering into Tx fast path. RECEIVE .---------------. | Slave 0 | | .------. | | Fd | Rxq | | Rx ======o==>| |==============. | | +======+ | | .---------------. | `-->| LACP |--------. | | Bonding | | `------' | | | | .------. | `---------------' | | | | | | | >============>| |=======> Rx .---------------. | | | +======+ | | Slave 1 | | | | | XXXX | | | .------. | | | | `------' | | Fd | Rxq | | | | `---------------' Rx ======o==>| |==============' .-----------. | | +======+ | | / \ | `-->| LACP |--------+----------->+ LACP DAEMON | | `------' | Tx <---\ / `---------------' `-----------' All slow packets received by slaves in bonding are redirected to the separated queue using flow director. Other packets are collected from slaves and exposed to the application with Rx burst on bonded device. TRANSMIT .---------------. | Slave 0 | | .------. | | | | | Tx <=====+===| |<=============. | | |------| | | .---------------. | `---| LACP |<-------. | | Bonding | | `------' | | | | .------. | `---------------' | | | | | | | +<============| |<====== Tx .---------------. | | | +======+ | | Slave 1 | | | | | XXXX | | | .------. | | | | `------' | | | | | | | `---------------' Tx <=====+===| |<=============' Rx .-----------. | | |------| | | `-->/ \ | `---| LACP |<-------+------------+ LACP DAEMON | | `------' | \ / `---------------' `-----------' On transmit, packets are propagated on the slaves. While we have separated Tx queue for LACP responses, it can be sent regardless of the fast path. LACP DAEMON In this mode whole slow packets are handled in LACP DAEMON. Tomasz Kulasek (2): LACP control packet filtering offload test-pmd: add set bonding slow_queue hw/sw app/test-pmd/cmdline.c | 75 ++++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 8 + drivers/net/bonding/rte_eth_bond_8023ad.c | 160 ++++++-- drivers/net/bonding/rte_eth_bond_8023ad.h | 35 ++ drivers/net/bonding/rte_eth_bond_8023ad_private.h | 24 ++ drivers/net/bonding/rte_eth_bond_pmd.c | 424 +++++++++++++++++++++- drivers/net/bonding/rte_eth_bond_version.map | 9 + 7 files changed, 693 insertions(+), 42 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v2 1/2] LACP control packet filtering offload 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload Tomasz Kulasek @ 2017-06-29 16:20 ` Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 2/2] test-pmd: add set bonding slow_queue hw/sw Tomasz Kulasek 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty 2 siblings, 0 replies; 22+ messages in thread From: Tomasz Kulasek @ 2017-06-29 16:20 UTC (permalink / raw) To: dev New API functions implemented: rte_eth_bond_8023ad_slow_pkt_hw_filter_enable(uint8_t port_id); rte_eth_bond_8023ad_slow_pkt_hw_filter_disable(uint8_t port_id); rte_eth_bond_8023ad_slow_pkt_hw_filter_enable should be called before bonding port start to enable new path. When this option is enabled all slaves must support flow director's filtering by ethernet type and support one additional queue on slaves tx/rx. Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> --- v2 changes: - changed name of rte_eth_bond_8023ad_slow_queue_enable/disable to rte_eth_bond_8023ad_slow_pkt_hw_filter_enable/disable, - propagated number of tx/rx queues available for bonding based on the attached slaves and slow packet filtering requirements, - improved validation of slaves, - introduced one structure to organize all slow queue settings, - use of RTE_BE16() instead of locally defined macro, - some comments improvements --- drivers/net/bonding/rte_eth_bond_8023ad.c | 160 ++++++-- drivers/net/bonding/rte_eth_bond_8023ad.h | 35 ++ drivers/net/bonding/rte_eth_bond_8023ad_private.h | 24 ++ drivers/net/bonding/rte_eth_bond_pmd.c | 424 +++++++++++++++++++++- drivers/net/bonding/rte_eth_bond_version.map | 9 + 5 files changed, 610 insertions(+), 42 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index d2b7592..6b37a61 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -632,15 +632,25 @@ lacpdu->tlv_type_terminator = TLV_TYPE_TERMINATOR_INFORMATION; lacpdu->terminator_length = 0; - if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { - /* If TX ring full, drop packet and free message. Retransmission - * will happen in next function call. */ - rte_pktmbuf_free(lacp_pkt); - set_warning_flags(port, WRN_TX_QUEUE_FULL); - return; + if (!internals->mode4.slow_pkts.hw_filtering_en) { + if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { + /* If TX ring full, drop packet and free message. + Retransmission will happen in next function call. */ + rte_pktmbuf_free(lacp_pkt); + set_warning_flags(port, WRN_TX_QUEUE_FULL); + return; + } + } else { + if (rte_eth_tx_burst(slave_id, + internals->mode4.slow_pkts.tx_queue_id, + &lacp_pkt, 1) == 0) { + rte_pktmbuf_free(lacp_pkt); + set_warning_flags(port, WRN_TX_QUEUE_FULL); + return; + } } - MODE4_DEBUG("sending LACP frame\n"); + MODE4_DEBUG("Sending LACP frame\n"); BOND_PRINT_LACP(lacpdu); timer_set(&port->tx_machine_timer, internals->mode4.tx_period_timeout); @@ -741,6 +751,22 @@ } static void +rx_machine_update(struct bond_dev_private *internals, uint8_t slave_id, + struct rte_mbuf *lacp_pkt) { + struct lacpdu_header *lacp; + + if (lacp_pkt != NULL) { + lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); + RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); + + /* This is LACP frame so pass it to rx_machine */ + rx_machine(internals, slave_id, &lacp->lacpdu); + rte_pktmbuf_free(lacp_pkt); + } else + rx_machine(internals, slave_id, NULL); +} + +static void bond_mode_8023ad_periodic_cb(void *arg) { struct rte_eth_dev *bond_dev = arg; @@ -809,20 +835,24 @@ SM_FLAG_SET(port, LACP_ENABLED); - /* Find LACP packet to this port. Do not check subtype, it is done in - * function that queued packet */ - if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) { - struct rte_mbuf *lacp_pkt = pkt; - struct lacpdu_header *lacp; + struct rte_mbuf *lacp_pkt = NULL; - lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); - RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); + if (!internals->mode4.slow_pkts.hw_filtering_en) { + /* Find LACP packet to this port. Do not check subtype, + * it is done in function that queued packet + */ + if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) + lacp_pkt = pkt; - /* This is LACP frame so pass it to rx_machine */ - rx_machine(internals, slave_id, &lacp->lacpdu); - rte_pktmbuf_free(lacp_pkt); - } else - rx_machine(internals, slave_id, NULL); + rx_machine_update(internals, slave_id, lacp_pkt); + } else { + if (rte_eth_rx_burst(slave_id, + internals->mode4.slow_rx_queue, + &lacp_pkt, 1) == 1) + bond_mode_8023ad_handle_slow_pkt(internals, slave_id, lacp_pkt); + else + rx_machine_update(internals, slave_id, NULL); + } periodic_machine(internals, slave_id); mux_machine(internals, slave_id); @@ -1064,6 +1094,10 @@ mode4->tx_period_timeout = conf->tx_period_ms * ms_ticks; mode4->rx_marker_timeout = conf->rx_marker_period_ms * ms_ticks; mode4->update_timeout_us = conf->update_timeout_ms * 1000; + + mode4->slow_pkts.hw_filtering_en = 0; + mode4->slow_pkts.rx_queue_id = UINT16_MAX; + mode4->slow_pkts.tx_queue_id = UINT16_MAX; } static void @@ -1188,18 +1222,34 @@ m_hdr->marker.tlv_type_marker = MARKER_TLV_TYPE_RESP; rte_eth_macaddr_get(slave_id, &m_hdr->eth_hdr.s_addr); - if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == -ENOBUFS)) { - /* reset timer */ - port->rx_marker_timer = 0; - wrn = WRN_TX_QUEUE_FULL; - goto free_out; + if (internals->mode4.slow_pkts.hw_filtering_en == 0) { + if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == + -ENOBUFS)) { + /* reset timer */ + port->rx_marker_timer = 0; + wrn = WRN_TX_QUEUE_FULL; + goto free_out; + } + } else { + /* Send packet directly to the slow queue */ + if (unlikely(rte_eth_tx_burst(slave_id, + internals->mode4.slow_pkts.tx_queue_id, + &pkt, 1) == 0)) { + /* reset timer */ + port->rx_marker_timer = 0; + wrn = WRN_TX_QUEUE_FULL; + goto free_out; + } } } else if (likely(subtype == SLOW_SUBTYPE_LACP)) { - if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { - /* If RX fing full free lacpdu message and drop packet */ - wrn = WRN_RX_QUEUE_FULL; - goto free_out; - } + if (!internals->mode4.slow_pkts.hw_filtering_en) { + if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { + /* If RX fing full free lacpdu message and drop packet */ + wrn = WRN_RX_QUEUE_FULL; + goto free_out; + } + } else + rx_machine_update(internals, slave_id, pkt); } else { wrn = WRN_UNKNOWN_SLOW_TYPE; goto free_out; @@ -1504,3 +1554,55 @@ rte_eal_alarm_set(internals->mode4.update_timeout_us, bond_mode_8023ad_ext_periodic_cb, arg); } + +#define MBUF_CACHE_SIZE 250 +#define NUM_MBUFS 8191 + +int +rte_eth_bond_8023ad_slow_pkt_hw_filter_enable(uint8_t port) +{ + int retval = 0; + struct rte_eth_dev *dev = &rte_eth_devices[port]; + struct bond_dev_private *internals = (struct bond_dev_private *) + dev->data->dev_private; + + if (check_for_bonded_ethdev(dev) != 0) + return -1; + + if (bond_8023ad_slow_pkt_hw_filter_supported(port) != 0) + return -1; + + /* Device must be stopped to set up slow queue */ + if (dev->data->dev_started) + return -1; + + internals->mode4.slow_pkts.hw_filtering_en = 1; + + bond_ethdev_mode_set(dev, internals->mode); + return retval; +} + +int +rte_eth_bond_8023ad_slow_pkt_hw_filter_disable(uint8_t port) +{ + int retval = 0; + struct rte_eth_dev *dev = &rte_eth_devices[port]; + struct bond_dev_private *internals = (struct bond_dev_private *) + dev->data->dev_private; + + if (check_for_bonded_ethdev(dev) != 0) + return -1; + + /* Device must be stopped to set up slow queue */ + if (dev->data->dev_started) + return -1; + + internals->mode4.slow_pkts.hw_filtering_en = 0; + + bond_ethdev_mode_set(dev, internals->mode); + + internals->mode4.slow_pkts.rx_queue_id = UINT16_MAX; + internals->mode4.slow_pkts.tx_queue_id = UINT16_MAX; + + return retval; +} diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h index 6b8ff57..d527970 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad.h @@ -302,4 +302,39 @@ struct rte_eth_bond_8023ad_slave_info { rte_eth_bond_8023ad_ext_slowtx(uint8_t port_id, uint8_t slave_id, struct rte_mbuf *lacp_pkt); +/** + * Enable slow queue on slaves + * + * This function creates additional queues on slaves to use flow director to + * redirect all slow packets to process it in LACP daemon. + * To use this feature all slaves must support at least one queue more than + * bonded device for receiving and transmit packets. + * + * Bonding port must be stopped to change this configuration. + * + * @param port_id Bonding device id + * + * @return + * 0 on success, negative value otherwise. + */ +int +rte_eth_bond_8023ad_slow_pkt_hw_filter_enable(uint8_t port_id); + +/** + * Disable slow queue on slaves + * + * This function disables hardware slow packet filter. + * + * Bonding port must be stopped to change this configuration. + * + * @see rte_eth_bond_8023ad_slow_pkt_hw_filter_enable + * + * @param port_id Bonding device id + * @return + * 0 on success, negative value otherwise. + * + */ +int +rte_eth_bond_8023ad_slow_pkt_hw_filter_disable(uint8_t port_id); + #endif /* RTE_ETH_BOND_8023AD_H_ */ diff --git a/drivers/net/bonding/rte_eth_bond_8023ad_private.h b/drivers/net/bonding/rte_eth_bond_8023ad_private.h index ca8858b..40a4320 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad_private.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad_private.h @@ -39,6 +39,7 @@ #include <rte_ether.h> #include <rte_byteorder.h> #include <rte_atomic.h> +#include <rte_flow.h> #include "rte_eth_bond_8023ad.h" @@ -162,6 +163,9 @@ struct port { uint64_t warning_timer; volatile uint16_t warnings_to_show; + + /** Memory pool used to allocate slow queues */ + struct rte_mempool *slow_pool; }; struct mode8023ad_private { @@ -175,6 +179,16 @@ struct mode8023ad_private { uint64_t update_timeout_us; rte_eth_bond_8023ad_ext_slowrx_fn slowrx_cb; uint8_t external_sm; + + + struct { + uint8_t hw_filtering_en; + + struct rte_flow *flow[RTE_MAX_ETHPORTS]; + + uint16_t rx_queue_id; + uint16_t tx_queue_id; + } slow_pkts; }; /** @@ -295,4 +309,14 @@ struct mode8023ad_private { void bond_mode_8023ad_mac_address_update(struct rte_eth_dev *bond_dev); +int +bond_ethdev_8023ad_flow_verify(struct rte_eth_dev *bond_dev, + uint8_t slave_port); + +int +bond_ethdev_8023ad_flow_set(struct rte_eth_dev *bond_dev, uint8_t slave_port); + +int +bond_8023ad_slow_pkt_hw_filter_supported(uint8_t port_id); + #endif /* RTE_ETH_BOND_8023AD_H_ */ diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 37f3d43..46b7d80 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -133,6 +133,250 @@ (subtype == SLOW_SUBTYPE_MARKER || subtype == SLOW_SUBTYPE_LACP)); } +/***************************************************************************** + * Flow director's setup for mode 4 optimization + */ + +static struct rte_flow_item_eth flow_item_eth_type_8023ad = { + .dst.addr_bytes = { 0 }, + .src.addr_bytes = { 0 }, + .type = RTE_BE16(ETHER_TYPE_SLOW), +}; + +static struct rte_flow_item_eth flow_item_eth_mask_type_8023ad = { + .dst.addr_bytes = { 0 }, + .src.addr_bytes = { 0 }, + .type = 0xFFFF, +}; + +static struct rte_flow_item flow_item_8023ad[] = { + { + .type = RTE_FLOW_ITEM_TYPE_ETH, + .spec = &flow_item_eth_type_8023ad, + .last = NULL, + .mask = &flow_item_eth_mask_type_8023ad, + }, + { + .type = RTE_FLOW_ITEM_TYPE_END, + .spec = NULL, + .last = NULL, + .mask = NULL, + } +}; + +const struct rte_flow_attr flow_attr_8023ad = { + .group = 0, + .priority = 0, + .ingress = 1, + .egress = 0, + .reserved = 0, +}; + +int +bond_ethdev_8023ad_flow_verify(struct rte_eth_dev *bond_dev, + uint8_t slave_port) { + struct rte_flow_error error; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + + struct rte_flow_action_queue lacp_queue_conf = { + .index = internals->mode4.slow_pkts.rx_queue_id, + }; + + const struct rte_flow_action actions[] = { + { + .type = RTE_FLOW_ACTION_TYPE_QUEUE, + .conf = &lacp_queue_conf + }, + { + .type = RTE_FLOW_ACTION_TYPE_END, + } + }; + + int ret = rte_flow_validate(slave_port, &flow_attr_8023ad, + flow_item_8023ad, actions, &error); + if (ret < 0) + return -1; + + return 0; +} + +int +bond_8023ad_slow_pkt_hw_filter_supported(uint8_t port_id) { + struct rte_eth_dev *bond_dev = &rte_eth_devices[port_id]; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + struct rte_eth_dev_info bond_info, slave_info; + uint8_t idx; + + /* Verify if all slaves in bonding supports flow director and */ + if (internals->slave_count > 0) { + rte_eth_dev_info_get(bond_dev->data->port_id, &bond_info); + internals->mode4.slow_pkts.rx_queue_id = bond_info.nb_rx_queues; + internals->mode4.slow_pkts.tx_queue_id = bond_info.nb_tx_queues; + for (idx = 0; idx < internals->slave_count; idx++) { + rte_eth_dev_info_get(internals->slaves[idx].port_id, + &slave_info); + if ((slave_info.max_rx_queues < bond_info.nb_rx_queues) + || (slave_info.max_rx_queues < + bond_info.nb_rx_queues)) + return -1; + + if (bond_ethdev_8023ad_flow_verify(bond_dev, + internals->slaves[idx].port_id) != 0) + return -1; + } + } + + return 0; +} + +int +bond_ethdev_8023ad_flow_set(struct rte_eth_dev *bond_dev, uint8_t slave_port) { + + struct rte_flow_error error; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + + struct rte_flow_action_queue lacp_queue_conf = { + .index = internals->mode4.slow_pkts.rx_queue_id, + }; + + const struct rte_flow_action actions[] = { + { + .type = RTE_FLOW_ACTION_TYPE_QUEUE, + .conf = &lacp_queue_conf + }, + { + .type = RTE_FLOW_ACTION_TYPE_END, + } + }; + + internals->mode4.slow_pkts.flow[slave_port] = rte_flow_create(slave_port, + &flow_attr_8023ad, flow_item_8023ad, actions, &error); + if (internals->mode4.slow_pkts.flow[slave_port] == NULL) { + RTE_BOND_LOG(ERR, "bond_ethdev_8023ad_flow_set: %s " + "(slave_port=%d queue_id=%d)", + error.message, slave_port, + internals->mode4.slow_pkts.rx_queue_id); + return -1; + } + + return 0; +} + +static uint16_t +bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue; + struct bond_dev_private *internals = bd_rx_q->dev_private; + uint16_t num_rx_total = 0; /* Total number of received packets */ + uint8_t slaves[RTE_MAX_ETHPORTS]; + uint8_t slave_count; + + uint8_t i; + + /* Copy slave list to protect against slave up/down changes during tx + * bursting */ + slave_count = internals->active_slave_count; + memcpy(slaves, internals->active_slaves, + sizeof(internals->active_slaves[0]) * slave_count); + + for (i = 0; i < slave_count && num_rx_total < nb_pkts; i++) { + /* Read packets from this slave */ + num_rx_total += rte_eth_rx_burst(slaves[i], bd_rx_q->queue_id, + &bufs[num_rx_total], nb_pkts - num_rx_total); + } + + return num_rx_total; +} + +static uint16_t +bond_ethdev_tx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + struct bond_dev_private *internals; + struct bond_tx_queue *bd_tx_q; + + uint8_t num_of_slaves; + uint8_t slaves[RTE_MAX_ETHPORTS]; + /* positions in slaves, not ID */ + uint8_t distributing_offsets[RTE_MAX_ETHPORTS]; + uint8_t distributing_count; + + uint16_t num_tx_slave, num_tx_total = 0, num_tx_fail_total = 0; + uint16_t i, op_slave_idx; + + struct rte_mbuf *slave_bufs[RTE_MAX_ETHPORTS][nb_pkts]; + + /* Total amount of packets in slave_bufs */ + uint16_t slave_nb_pkts[RTE_MAX_ETHPORTS] = { 0 }; + /* Slow packets placed in each slave */ + + if (unlikely(nb_pkts == 0)) + return 0; + + bd_tx_q = (struct bond_tx_queue *)queue; + internals = bd_tx_q->dev_private; + + /* Copy slave list to protect against slave up/down changes during tx + * bursting */ + num_of_slaves = internals->active_slave_count; + if (num_of_slaves < 1) + return num_tx_total; + + memcpy(slaves, internals->active_slaves, sizeof(slaves[0]) * + num_of_slaves); + + distributing_count = 0; + for (i = 0; i < num_of_slaves; i++) { + struct port *port = &mode_8023ad_ports[slaves[i]]; + if (ACTOR_STATE(port, DISTRIBUTING)) + distributing_offsets[distributing_count++] = i; + } + + if (likely(distributing_count > 0)) { + /* Populate slaves mbuf with the packets which are to be sent */ + for (i = 0; i < nb_pkts; i++) { + /* Select output slave using hash based on xmit policy */ + op_slave_idx = internals->xmit_hash(bufs[i], + distributing_count); + + /* Populate slave mbuf arrays with mbufs for that slave. + * Use only slaves that are currently distributing. + */ + uint8_t slave_offset = + distributing_offsets[op_slave_idx]; + slave_bufs[slave_offset][slave_nb_pkts[slave_offset]] = + bufs[i]; + slave_nb_pkts[slave_offset]++; + } + } + + /* Send packet burst on each slave device */ + for (i = 0; i < num_of_slaves; i++) { + if (slave_nb_pkts[i] == 0) + continue; + + num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, + slave_bufs[i], slave_nb_pkts[i]); + + num_tx_total += num_tx_slave; + num_tx_fail_total += slave_nb_pkts[i] - num_tx_slave; + + /* If tx burst fails move packets to end of bufs */ + if (unlikely(num_tx_slave < slave_nb_pkts[i])) { + uint16_t j = nb_pkts - num_tx_fail_total; + for ( ; num_tx_slave < slave_nb_pkts[i]; j++, + num_tx_slave++) + bufs[j] = slave_bufs[i][num_tx_slave]; + } + } + + return num_tx_total; +} + static uint16_t bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) @@ -180,6 +424,13 @@ /* Handle slow protocol packets. */ while (j < num_rx_total) { + + /* if packet is not pure L2 and is known, skip it */ + if ((bufs[j]->packet_type & ~RTE_PTYPE_L2_ETHER) != 0) { + j++; + continue; + } + if (j + 3 < num_rx_total) rte_prefetch0(rte_pktmbuf_mtod(bufs[j + 3], void *)); @@ -187,7 +438,7 @@ subtype = ((struct slow_protocol_frame *)hdr)->slow_protocol.subtype; /* Remove packet from array if it is slow packet or slave is not - * in collecting state or bondign interface is not in promiscus + * in collecting state or bonding interface is not in promiscuous * mode and packet address does not match. */ if (unlikely(is_lacp_packets(hdr->ether_type, subtype, bufs[j]->vlan_tci) || !collecting || (!promisc && @@ -204,7 +455,8 @@ num_rx_total--; if (j < num_rx_total) { memmove(&bufs[j], &bufs[j + 1], sizeof(bufs[0]) * - (num_rx_total - j)); + (num_rx_total - j)); + } } else j++; @@ -1295,11 +1547,19 @@ struct bwg_slave { if (bond_mode_8023ad_enable(eth_dev) != 0) return -1; - eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad; - eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad; - RTE_LOG(WARNING, PMD, - "Using mode 4, it is necessary to do TX burst and RX burst " - "at least every 100ms.\n"); + if (!internals->mode4.slow_pkts.hw_filtering_en) { + eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad; + eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad; + RTE_LOG(WARNING, PMD, + "Using mode 4, it is necessary to do TX burst " + "and RX burst at least every 100ms.\n"); + } else { + /* Use flow director's optimization */ + eth_dev->rx_pkt_burst = + bond_ethdev_rx_burst_8023ad_fast_queue; + eth_dev->tx_pkt_burst = + bond_ethdev_tx_burst_8023ad_fast_queue; + } break; case BONDING_MODE_TLB: eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_tlb; @@ -1321,15 +1581,80 @@ struct bwg_slave { return 0; } +static int +slave_configure_slow_queue(struct rte_eth_dev *bonded_eth_dev, + struct rte_eth_dev *slave_eth_dev) +{ + int errval = 0; + struct bond_dev_private *internals = (struct bond_dev_private *) + bonded_eth_dev->data->dev_private; + struct port *port = &mode_8023ad_ports[slave_eth_dev->data->port_id]; + + if (port->slow_pool == NULL) { + char mem_name[256]; + int slave_id = slave_eth_dev->data->port_id; + + snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_slow_pool", + slave_id); + port->slow_pool = rte_pktmbuf_pool_create(mem_name, 8191, + 250, 0, RTE_MBUF_DEFAULT_BUF_SIZE, + slave_eth_dev->data->numa_node); + + /* Any memory allocation failure in initialization is critical because + * resources can't be free, so reinitialization is impossible. */ + if (port->slow_pool == NULL) { + rte_panic("Slave %u: Failed to create memory pool '%s': %s\n", + slave_id, mem_name, rte_strerror(rte_errno)); + } + } + + if (internals->mode4.slow_pkts.hw_filtering_en) { + /* Configure slow Rx queue */ + + errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, + internals->mode4.slow_pkts.rx_queue_id, 128, + rte_eth_dev_socket_id(slave_eth_dev->data->port_id), + NULL, port->slow_pool); + if (errval != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_rx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, + internals->mode4.slow_pkts.rx_queue_id, + errval); + return errval; + } + + errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, + internals->mode4.slow_pkts.tx_queue_id, 512, + rte_eth_dev_socket_id(slave_eth_dev->data->port_id), + NULL); + if (errval != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, + internals->mode4.slow_pkts.tx_queue_id, + errval); + return errval; + } + } + return 0; +} + int slave_configure(struct rte_eth_dev *bonded_eth_dev, struct rte_eth_dev *slave_eth_dev) { struct bond_rx_queue *bd_rx_q; struct bond_tx_queue *bd_tx_q; + uint16_t nb_rx_queues; + uint16_t nb_tx_queues; int errval; uint16_t q_id; + struct rte_flow_error flow_error; + + struct bond_dev_private *internals = (struct bond_dev_private *) + bonded_eth_dev->data->dev_private; /* Stop slave */ rte_eth_dev_stop(slave_eth_dev->data->port_id); @@ -1359,10 +1684,19 @@ struct bwg_slave { slave_eth_dev->data->dev_conf.rxmode.hw_vlan_filter = bonded_eth_dev->data->dev_conf.rxmode.hw_vlan_filter; + nb_rx_queues = bonded_eth_dev->data->nb_rx_queues; + nb_tx_queues = bonded_eth_dev->data->nb_tx_queues; + + if (internals->mode == BONDING_MODE_8023AD) { + if (internals->mode4.slow_pkts.hw_filtering_en) { + nb_rx_queues++; + nb_tx_queues++; + } + } + /* Configure device */ errval = rte_eth_dev_configure(slave_eth_dev->data->port_id, - bonded_eth_dev->data->nb_rx_queues, - bonded_eth_dev->data->nb_tx_queues, + nb_rx_queues, nb_tx_queues, &(slave_eth_dev->data->dev_conf)); if (errval != 0) { RTE_BOND_LOG(ERR, "Cannot configure slave device: port %u , err (%d)", @@ -1396,10 +1730,33 @@ struct bwg_slave { &bd_tx_q->tx_conf); if (errval != 0) { RTE_BOND_LOG(ERR, - "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", - slave_eth_dev->data->port_id, q_id, errval); + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, q_id, errval); + return errval; + } + } + + if (internals->mode == BONDING_MODE_8023AD && + internals->mode4.slow_pkts.hw_filtering_en) { + if (slave_configure_slow_queue(bonded_eth_dev, slave_eth_dev) + != 0) return errval; + + if (bond_ethdev_8023ad_flow_verify(bonded_eth_dev, + slave_eth_dev->data->port_id) != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, q_id, errval); + return -1; } + + if (internals->mode4.slow_pkts.flow[slave_eth_dev->data->port_id] != NULL) + rte_flow_destroy(slave_eth_dev->data->port_id, + internals->mode4.slow_pkts.flow[slave_eth_dev->data->port_id], + &flow_error); + + bond_ethdev_8023ad_flow_set(bonded_eth_dev, + slave_eth_dev->data->port_id); } /* Start device */ @@ -1559,6 +1916,15 @@ struct bwg_slave { if (internals->promiscuous_en) bond_ethdev_promiscuous_enable(eth_dev); + if (internals->mode == BONDING_MODE_8023AD) { + if (internals->mode4.slow_pkts.hw_filtering_en) { + internals->mode4.slow_pkts.rx_queue_id = + eth_dev->data->nb_rx_queues; + internals->mode4.slow_pkts.tx_queue_id = + eth_dev->data->nb_tx_queues; + } + } + /* Reconfigure each slave device if starting bonded device */ for (i = 0; i < internals->slave_count; i++) { if (slave_configure(eth_dev, @@ -1688,8 +2054,10 @@ struct bwg_slave { static void bond_ethdev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) + { struct bond_dev_private *internals = dev->data->dev_private; + uint16_t max_nb_rx_queues = 0, max_nb_tx_queues = 0; dev_info->max_mac_addrs = 1; @@ -1697,8 +2065,38 @@ struct bwg_slave { ? internals->candidate_max_rx_pktlen : ETHER_MAX_JUMBO_FRAME_LEN; - dev_info->max_rx_queues = (uint16_t)128; - dev_info->max_tx_queues = (uint16_t)512; + if (internals->slave_count > 0) { + /* Max number of tx/rx queues that the bonded device can + * support is the minimum values of the bonded slaves */ + struct rte_eth_dev_info slave_info; + uint8_t idx; + + max_nb_rx_queues = UINT16_MAX; + max_nb_tx_queues = UINT16_MAX; + for (idx = 0; idx < internals->slave_count; idx++) { + rte_eth_dev_info_get(internals->slaves[idx].port_id, + &slave_info); + + if (max_nb_rx_queues == 0 || + slave_info.max_rx_queues < max_nb_rx_queues) + max_nb_rx_queues = slave_info.max_rx_queues; + + if (max_nb_tx_queues == 0 || + slave_info.max_rx_queues < max_nb_tx_queues) + max_nb_tx_queues = slave_info.max_tx_queues; + } + dev_info->max_rx_queues = max_nb_rx_queues; + dev_info->max_tx_queues = max_nb_tx_queues; + } else { + dev_info->max_rx_queues = (uint16_t)128; + dev_info->max_tx_queues = (uint16_t)512; + } + + if (internals->mode == BONDING_MODE_8023AD && + internals->mode4.slow_pkts.hw_filtering_en) { + dev_info->max_rx_queues--; + dev_info->max_tx_queues--; + } dev_info->min_rx_bufsize = 0; diff --git a/drivers/net/bonding/rte_eth_bond_version.map b/drivers/net/bonding/rte_eth_bond_version.map index 2de0a7d..0ad2ba4 100644 --- a/drivers/net/bonding/rte_eth_bond_version.map +++ b/drivers/net/bonding/rte_eth_bond_version.map @@ -43,3 +43,12 @@ DPDK_16.07 { rte_eth_bond_8023ad_setup; } DPDK_16.04; + +DPDK_17.08 { + global: + + rte_eth_bond_8023ad_slow_pkt_hw_filter_enable; + rte_eth_bond_8023ad_slow_pkt_hw_filter_disable; + + local: *; +} DPDK_16.07; -- 1.9.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v2 2/2] test-pmd: add set bonding slow_queue hw/sw 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 1/2] " Tomasz Kulasek @ 2017-06-29 16:20 ` Tomasz Kulasek 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty 2 siblings, 0 replies; 22+ messages in thread From: Tomasz Kulasek @ 2017-06-29 16:20 UTC (permalink / raw) To: dev This patch adds new command: set bonding slow_queue <port_id> [sw|hw] "set bonding slow_queue <bonding_port_id> hw" sets hardware management of slow packets and chooses simplified paths for tx/rx bursts. "set bonding slow_queue <bonding_port_id> sw" turns back to the software handling of slow packets. This option is default. Example: testpmd> create bonded device 4 0 testpmd> add bonding slave 0 <bond_id> testpmd> add bonding slave 1 <bond_id> testpmd> set bonding slow_queue <bond_id> [sw|hw] testpmd> port start <bond_id> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> --- v2 changes: - changed name of rte_eth_bond_8023ad_slow_queue_enable/disable to rte_eth_bond_8023ad_slow_pkt_hw_filter_enable/disable, - added "set bonding slow_queue <port_id> [sw|hw]" description in documentation --- app/test-pmd/cmdline.c | 75 +++++++++++++++++++++++++++++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 8 +++ 2 files changed, 83 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 632d6f0..194d986 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -87,6 +87,7 @@ #include <cmdline.h> #ifdef RTE_LIBRTE_PMD_BOND #include <rte_eth_bond.h> +#include <rte_eth_bond_8023ad.h> #endif #ifdef RTE_LIBRTE_IXGBE_PMD #include <rte_pmd_ixgbe.h> @@ -4300,6 +4301,79 @@ static void cmd_set_bonding_mode_parsed(void *parsed_result, } }; +/* *** SET BONDING SLOW_QUEUE SW/HW *** */ +struct cmd_set_bonding_slow_queue_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t bonding; + cmdline_fixed_string_t slow_queue; + uint8_t port_id; + cmdline_fixed_string_t mode; +}; + +static void cmd_set_bonding_slow_queue_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_set_bonding_slow_queue_result *res = parsed_result; + portid_t port_id = res->port_id; + struct rte_port *port; + + port = &ports[port_id]; + + /** Check if the port is not started **/ + if (port->port_status != RTE_PORT_STOPPED) { + printf("Please stop port %d first\n", port_id); + return; + } + + if (!strcmp(res->mode, "hw")) { + if (rte_eth_bond_8023ad_slow_pkt_hw_filter_enable(port_id) == 0) + printf("Hardware slow queue enabled\n"); + else + printf("Enabling hardware slow queue on port %d " + "failed\n", port_id); + } else if (!strcmp(res->mode, "sw")) { + if (rte_eth_bond_8023ad_slow_pkt_hw_filter_disable(port_id) + == 0) + printf("Software slow queue enabled\n"); + else + printf("Enabling software slow queue on port %d " + "failed\n", port_id); + } +} + +cmdline_parse_token_string_t cmd_setbonding_slow_queue_set = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + set, "set"); +cmdline_parse_token_string_t cmd_setbonding_slow_queue_bonding = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + bonding, "bonding"); +cmdline_parse_token_string_t cmd_setbonding_slow_queue_slow_queue = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + slow_queue, "slow_queue"); +cmdline_parse_token_num_t cmd_setbonding_slow_queue_port = +TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + port_id, UINT8); +cmdline_parse_token_string_t cmd_setbonding_slow_queue_mode = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_slow_queue_result, + mode, "sw#hw"); + +cmdline_parse_inst_t cmd_set_slow_queue = { + .f = cmd_set_bonding_slow_queue_parsed, + .help_str = "set bonding slow_queue <port_id> " + "sw|hw: " + "Set the bonding slow queue acceleration for port_id", + .data = NULL, + .tokens = { + (void *)&cmd_setbonding_slow_queue_set, + (void *)&cmd_setbonding_slow_queue_bonding, + (void *)&cmd_setbonding_slow_queue_slow_queue, + (void *)&cmd_setbonding_slow_queue_port, + (void *)&cmd_setbonding_slow_queue_mode, + NULL + } +}; + /* *** SET BALANCE XMIT POLICY *** */ struct cmd_set_bonding_balance_xmit_policy_result { cmdline_fixed_string_t set; @@ -13846,6 +13920,7 @@ struct cmd_cmdfile_result { (cmdline_parse_inst_t *) &cmd_set_bond_mac_addr, (cmdline_parse_inst_t *) &cmd_set_balance_xmit_policy, (cmdline_parse_inst_t *) &cmd_set_bond_mon_period, + (cmdline_parse_inst_t *) &cmd_set_slow_queue, #endif (cmdline_parse_inst_t *)&cmd_vlan_offload, (cmdline_parse_inst_t *)&cmd_vlan_tpid, diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst index 18ee8a3..3da2a38 100644 --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst @@ -1759,6 +1759,14 @@ For example, to set the link status monitoring polling period of bonded device ( testpmd> set bonding mon_period 5 150 +set bonding slow_queue +~~~~~~~~~~~~~~~~~~~~~~ + +Set software or hardware slow packet processing in mode 4. + + testpmd> set bonding slow_queue (port_id) (sw|hw) + + show bonding config ~~~~~~~~~~~~~~~~~~~ -- 1.9.1 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 1/2] " Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 2/2] test-pmd: add set bonding slow_queue hw/sw Tomasz Kulasek @ 2017-07-04 16:46 ` Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 1/4] net/bond: calculate number of bonding tx/rx queues Declan Doherty ` (4 more replies) 2 siblings, 5 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 16:46 UTC (permalink / raw) To: dev; +Cc: Declan Doherty 1. Overview Packet processing in the current path for bonding in mode 4, requires parse all packets in the fast path, to classify and process LACP packets. The idea of performance improvement is to use hardware offloads to improve packet classification. 2. Scope of work a) Optimization of software LACP packet classification by using packet_type metadata to eliminate the requirement of parsing each packet in the received burst. b) Implementation of classification mechanism using flow director to redirect LACP packets to the dedicated queue (not visible by application). - Filter pattern choosing (not all filters are supported by all devices), - Changing processing path to speed up non-LACP packets processing, - Handle LACP packets from dedicated Rx queue and send to the dedicated Tx queue, c) Creation of fallback mechanism allowing to select the most preferable method of processing: - Flow director, - Packet type metadata, - Software parsing, 3. Implementation 3.1. Packet type The packet_type approach would result in a performance improvement as packets data would no longer be required to be read, but with this approach the bonded driver would still need to look at the mbuf of each packet thereby having an impact on the achievable Rx performance. There's not packet_type value describing LACP packets directly. However, it can be used to limit number of packets required to be parsed, e.g. if packet_type indicates >L2 packets. It should improve performance while well-known non-LACP packets can be skipped without the need to look up into its data. 3.2. Flow director Using rte_flow API and pattern on ethernet type of packet (0x8809), we can configure flow director to redirect slow packets to separated queue. An independent Rx queues for LACP would remove the requirement to filter all ingress traffic in sw which should result in a performance increase. Other queues stay untouched and processing of packets on the fast path will be reduced to simple packet collecting from slaves. Separated Tx queue for LACP daemon allows to send LACP responses immediately, without interfering into Tx fast path. RECEIVE .---------------. | Slave 0 | | .------. | | Fd | Rxq | | Rx ======o==>| |==============. | | +======+ | | .---------------. | `-->| LACP |--------. | | Bonding | | `------' | | | | .------. | `---------------' | | | | | | | >============>| |=======> Rx .---------------. | | | +======+ | | Slave 1 | | | | | XXXX | | | .------. | | | | `------' | | Fd | Rxq | | | | `---------------' Rx ======o==>| |==============' .-----------. | | +======+ | | / \ | `-->| LACP |--------+----------->+ LACP DAEMON | | `------' | Tx <---\ / `---------------' `-----------' All slow packets received by slaves in bonding are redirected to the separated queue using flow director. Other packets are collected from slaves and exposed to the application with Rx burst on bonded device. TRANSMIT .---------------. | Slave 0 | | .------. | | | | | Tx <=====+===| |<=============. | | |------| | | .---------------. | `---| LACP |<-------. | | Bonding | | `------' | | | | .------. | `---------------' | | | | | | | +<============| |<====== Tx .---------------. | | | +======+ | | Slave 1 | | | | | XXXX | | | .------. | | | | `------' | | | | | | | `---------------' Tx <=====+===| |<=============' Rx .-----------. | | |------| | | `-->/ \ | `---| LACP |<-------+------------+ LACP DAEMON | | `------' | \ / `---------------' `-----------' On transmit, packets are propagated on the slaves. While we have separated Tx queue for LACP responses, it can be sent regardless of the fast path. LACP DAEMON In this mode whole slow packets are handled in LACP DAEMON. V3: - Split hw filtering patch into 3 patches: - fix for calculating maximum number of tx/rx queues of bonding device - enable use of ptype hint for filtering of control plane packets in default enablement - enablement of dedicated queues for LACP control packet filtering. Declan Doherty (1): net/bond: calculate number of bonding tx/rx queues Tomasz Kulasek (3): net/bond: use ptype flags for LACP rx filtering net/bond: dedicated hw queues for LACP control traffic app/test-pmd: add cmd for dedicated LACP rx/tx queues app/test-pmd/cmdline.c | 85 +++++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 9 + drivers/net/bonding/rte_eth_bond_8023ad.c | 167 ++++++-- drivers/net/bonding/rte_eth_bond_8023ad.h | 42 ++ drivers/net/bonding/rte_eth_bond_8023ad_private.h | 27 ++ drivers/net/bonding/rte_eth_bond_pmd.c | 445 +++++++++++++++++++++- drivers/net/bonding/rte_eth_bond_version.map | 9 + 7 files changed, 734 insertions(+), 50 deletions(-) -- 2.9.4 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v3 1/4] net/bond: calculate number of bonding tx/rx queues 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty @ 2017-07-04 16:46 ` Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 2/4] net/bond: use ptype flags for LACP rx filtering Declan Doherty ` (3 subsequent siblings) 4 siblings, 0 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 16:46 UTC (permalink / raw) To: dev; +Cc: Declan Doherty Fixes: 2efb58cb ("bond: new link bonding library") This patch fixes the maximum number of tx an rx queues supported by a bonding device return by the rte_eth_dev_info_get function. The bonding device now calculates the maximum number of supported tx and rx queues based on the slaves bound to the bonded device, with the minimum values of tx and rx queues from the device slaves being the bonded devices maximum, as each slave must be able to support the same number of tx and rx queues. Signed-off-by: Declan Doherty <declan.doherty@intel.com> --- drivers/net/bonding/rte_eth_bond_pmd.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index dccc016..f428e96 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -1691,6 +1691,8 @@ static void bond_ethdev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { struct bond_dev_private *internals = dev->data->dev_private; + uint16_t max_nb_rx_queues = UINT16_MAX; + uint16_t max_nb_tx_queues = UINT16_MAX; dev_info->max_mac_addrs = 1; @@ -1698,8 +1700,29 @@ bond_ethdev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) ? internals->candidate_max_rx_pktlen : ETHER_MAX_JUMBO_FRAME_LEN; - dev_info->max_rx_queues = (uint16_t)128; - dev_info->max_tx_queues = (uint16_t)512; + if (internals->slave_count > 0) { + /* Max number of tx/rx queues that the bonded device can + * support is the minimum values of the bonded slaves, as + * all slaves must be capable of supporting the same number + * of tx/rx queues. + */ + struct rte_eth_dev_info slave_info; + uint8_t idx; + + for (idx = 0; idx < internals->slave_count; idx++) { + rte_eth_dev_info_get(internals->slaves[idx].port_id, + &slave_info); + + if (slave_info.max_rx_queues < max_nb_rx_queues) + max_nb_rx_queues = slave_info.max_rx_queues; + + if (slave_info.max_tx_queues < max_nb_tx_queues) + max_nb_tx_queues = slave_info.max_tx_queues; + } + } + + dev_info->max_rx_queues = max_nb_rx_queues; + dev_info->max_tx_queues = max_nb_tx_queues; dev_info->min_rx_bufsize = 0; -- 2.9.4 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v3 2/4] net/bond: use ptype flags for LACP rx filtering 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 1/4] net/bond: calculate number of bonding tx/rx queues Declan Doherty @ 2017-07-04 16:46 ` Declan Doherty 2017-07-04 19:54 ` Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty ` (2 subsequent siblings) 4 siblings, 1 reply; 22+ messages in thread From: Declan Doherty @ 2017-07-04 16:46 UTC (permalink / raw) To: dev; +Cc: Tomasz Kulasek, Declan Doherty From: Tomasz Kulasek <tomaszx.kulasek@intel.com> Use packet types flags in mbuf to provide hint for filtering of LACP control plane traffic from the data path. Signed-off-by: Declan Doherty <declan.doherty@intel.com> --- drivers/net/bonding/rte_eth_bond_pmd.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index f428e96..9730ae0 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -180,6 +180,13 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, /* Handle slow protocol packets. */ while (j < num_rx_total) { + + /* If packet is not pure L2 and is known, skip it */ + if ((bufs[j]->packet_type & ~RTE_PTYPE_L2_ETHER) != 0) { + j++; + continue; + } + if (j + 3 < num_rx_total) rte_prefetch0(rte_pktmbuf_mtod(bufs[j + 3], void *)); @@ -187,7 +194,7 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, subtype = ((struct slow_protocol_frame *)hdr)->slow_protocol.subtype; /* Remove packet from array if it is slow packet or slave is not - * in collecting state or bondign interface is not in promiscus + * in collecting state or bonding interface is not in promiscuous * mode and packet address does not match. */ if (unlikely(is_lacp_packets(hdr->ether_type, subtype, bufs[j]->vlan_tci) || !collecting || (!promisc && -- 2.9.4 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/4] net/bond: use ptype flags for LACP rx filtering 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 2/4] net/bond: use ptype flags for LACP rx filtering Declan Doherty @ 2017-07-04 19:54 ` Declan Doherty 0 siblings, 0 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 19:54 UTC (permalink / raw) To: dev; +Cc: Tomasz Kulasek On 04/07/17 17:46, Declan Doherty wrote: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Use packet types flags in mbuf to provide hint for filtering of LACP > control plane traffic from the data path. > > Signed-off-by: Declan Doherty <declan.doherty@intel.com> > --- ... > Acked-by: Declan Doherty <declan.doherty@intel.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 1/4] net/bond: calculate number of bonding tx/rx queues Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 2/4] net/bond: use ptype flags for LACP rx filtering Declan Doherty @ 2017-07-04 16:46 ` Declan Doherty 2017-07-04 19:55 ` Declan Doherty ` (3 more replies) 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues Declan Doherty 2017-07-05 11:35 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Ferruh Yigit 4 siblings, 4 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 16:46 UTC (permalink / raw) To: dev; +Cc: Tomasz Kulasek, Declan Doherty From: Tomasz Kulasek <tomaszx.kulasek@intel.com> Add support for hardware flow classification of LACP control plane traffic to be redirect to a dedicated receive queue on each slave which is not visible to application. Also enables a dedicate transmit queue for LACP traffic which allows complete decoupling of control and data paths. This only applies to bonding devices running in mode 4 (link-aggegration-802.3ad). Introduce two new APIs to support enable/disabled of dedicated queues. - rte_eth_bond_8023ad_dedicated_queues_enable - rte_eth_bond_8023ad_dedicated_queues_disable rte_eth_bond_8023ad_dedicated_queues_enable must be called before bonding port is configured or started to reserved and configure the dedicated queuesh. When this option is enabled all slaves must support flow filtering by ethernet type and support one additional tx and rx queue on each slave. Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Signed-off-by: Declan Doherty <declan.doherty@intel.com> --- drivers/net/bonding/rte_eth_bond_8023ad.c | 167 +++++++-- drivers/net/bonding/rte_eth_bond_8023ad.h | 42 +++ drivers/net/bonding/rte_eth_bond_8023ad_private.h | 27 ++ drivers/net/bonding/rte_eth_bond_pmd.c | 419 ++++++++++++++++++++-- drivers/net/bonding/rte_eth_bond_version.map | 9 + 5 files changed, 612 insertions(+), 52 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index 65dc75b..a2313b3 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -632,16 +632,29 @@ tx_machine(struct bond_dev_private *internals, uint8_t slave_id) lacpdu->tlv_type_terminator = TLV_TYPE_TERMINATOR_INFORMATION; lacpdu->terminator_length = 0; - if (rte_ring_enqueue(port->tx_ring, lacp_pkt) == -ENOBUFS) { - /* If TX ring full, drop packet and free message. Retransmission - * will happen in next function call. */ - rte_pktmbuf_free(lacp_pkt); - set_warning_flags(port, WRN_TX_QUEUE_FULL); - return; + MODE4_DEBUG("Sending LACP frame\n"); + BOND_PRINT_LACP(lacpdu); + + if (internals->mode4.dedicated_queues.enabled == 0) { + int retval = rte_ring_enqueue(port->tx_ring, lacp_pkt); + if (retval != 0) { + /* If TX ring full, drop packet and free message. + Retransmission will happen in next function call. */ + rte_pktmbuf_free(lacp_pkt); + set_warning_flags(port, WRN_TX_QUEUE_FULL); + return; + } + } else { + uint16_t pkts_sent = rte_eth_tx_burst(slave_id, + internals->mode4.dedicated_queues.tx_qid, + &lacp_pkt, 1); + if (pkts_sent != 1) { + rte_pktmbuf_free(lacp_pkt); + set_warning_flags(port, WRN_TX_QUEUE_FULL); + return; + } } - MODE4_DEBUG("sending LACP frame\n"); - BOND_PRINT_LACP(lacpdu); timer_set(&port->tx_machine_timer, internals->mode4.tx_period_timeout); SM_FLAG_CLR(port, NTT); @@ -741,6 +754,22 @@ link_speed_key(uint16_t speed) { } static void +rx_machine_update(struct bond_dev_private *internals, uint8_t slave_id, + struct rte_mbuf *lacp_pkt) { + struct lacpdu_header *lacp; + + if (lacp_pkt != NULL) { + lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); + RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); + + /* This is LACP frame so pass it to rx_machine */ + rx_machine(internals, slave_id, &lacp->lacpdu); + rte_pktmbuf_free(lacp_pkt); + } else + rx_machine(internals, slave_id, NULL); +} + +static void bond_mode_8023ad_periodic_cb(void *arg) { struct rte_eth_dev *bond_dev = arg; @@ -748,8 +777,8 @@ bond_mode_8023ad_periodic_cb(void *arg) struct port *port; struct rte_eth_link link_info; struct ether_addr slave_addr; + struct rte_mbuf *lacp_pkt = NULL; - void *pkt = NULL; uint8_t i, slave_id; @@ -809,20 +838,28 @@ bond_mode_8023ad_periodic_cb(void *arg) SM_FLAG_SET(port, LACP_ENABLED); - /* Find LACP packet to this port. Do not check subtype, it is done in - * function that queued packet */ - if (rte_ring_dequeue(port->rx_ring, &pkt) == 0) { - struct rte_mbuf *lacp_pkt = pkt; - struct lacpdu_header *lacp; + if (internals->mode4.dedicated_queues.enabled == 0) { + /* Find LACP packet to this port. Do not check subtype, + * it is done in function that queued packet + */ + int retval = rte_ring_dequeue(port->rx_ring, + (void **)&lacp_pkt); - lacp = rte_pktmbuf_mtod(lacp_pkt, struct lacpdu_header *); - RTE_ASSERT(lacp->lacpdu.subtype == SLOW_SUBTYPE_LACP); + if (retval != 0) + lacp_pkt = NULL; - /* This is LACP frame so pass it to rx_machine */ - rx_machine(internals, slave_id, &lacp->lacpdu); - rte_pktmbuf_free(lacp_pkt); - } else - rx_machine(internals, slave_id, NULL); + rx_machine_update(internals, slave_id, lacp_pkt); + } else { + uint16_t rx_count = rte_eth_rx_burst(slave_id, + internals->mode4.dedicated_queues.rx_qid, + &lacp_pkt, 1); + + if (rx_count == 1) + bond_mode_8023ad_handle_slow_pkt(internals, + slave_id, lacp_pkt); + else + rx_machine_update(internals, slave_id, NULL); + } periodic_machine(internals, slave_id); mux_machine(internals, slave_id); @@ -1067,6 +1104,10 @@ bond_mode_8023ad_conf_assign(struct mode8023ad_private *mode4, mode4->tx_period_timeout = conf->tx_period_ms * ms_ticks; mode4->rx_marker_timeout = conf->rx_marker_period_ms * ms_ticks; mode4->update_timeout_us = conf->update_timeout_ms * 1000; + + mode4->dedicated_queues.enabled = 0; + mode4->dedicated_queues.rx_qid = UINT16_MAX; + mode4->dedicated_queues.tx_qid = UINT16_MAX; } static void @@ -1191,18 +1232,36 @@ bond_mode_8023ad_handle_slow_pkt(struct bond_dev_private *internals, m_hdr->marker.tlv_type_marker = MARKER_TLV_TYPE_RESP; rte_eth_macaddr_get(slave_id, &m_hdr->eth_hdr.s_addr); - if (unlikely(rte_ring_enqueue(port->tx_ring, pkt) == -ENOBUFS)) { - /* reset timer */ - port->rx_marker_timer = 0; - wrn = WRN_TX_QUEUE_FULL; - goto free_out; + if (internals->mode4.dedicated_queues.enabled == 0) { + int retval = rte_ring_enqueue(port->tx_ring, pkt); + if (retval != 0) { + /* reset timer */ + port->rx_marker_timer = 0; + wrn = WRN_TX_QUEUE_FULL; + goto free_out; + } + } else { + /* Send packet directly to the slow queue */ + uint16_t tx_count = rte_eth_tx_burst(slave_id, + internals->mode4.dedicated_queues.tx_qid, + &pkt, 1); + if (tx_count != 1) { + /* reset timer */ + port->rx_marker_timer = 0; + wrn = WRN_TX_QUEUE_FULL; + goto free_out; + } } } else if (likely(subtype == SLOW_SUBTYPE_LACP)) { - if (unlikely(rte_ring_enqueue(port->rx_ring, pkt) == -ENOBUFS)) { - /* If RX fing full free lacpdu message and drop packet */ - wrn = WRN_RX_QUEUE_FULL; - goto free_out; - } + if (internals->mode4.dedicated_queues.enabled == 0) { + int retval = rte_ring_enqueue(port->rx_ring, pkt); + if (retval != 0) { + /* If RX fing full free lacpdu message and drop packet */ + wrn = WRN_RX_QUEUE_FULL; + goto free_out; + } + } else + rx_machine_update(internals, slave_id, pkt); } else { wrn = WRN_UNKNOWN_SLOW_TYPE; goto free_out; @@ -1507,3 +1566,49 @@ bond_mode_8023ad_ext_periodic_cb(void *arg) rte_eal_alarm_set(internals->mode4.update_timeout_us, bond_mode_8023ad_ext_periodic_cb, arg); } + +int +rte_eth_bond_8023ad_dedicated_queues_enable(uint8_t port) +{ + int retval = 0; + struct rte_eth_dev *dev = &rte_eth_devices[port]; + struct bond_dev_private *internals = (struct bond_dev_private *) + dev->data->dev_private; + + if (check_for_bonded_ethdev(dev) != 0) + return -1; + + if (bond_8023ad_slow_pkt_hw_filter_supported(port) != 0) + return -1; + + /* Device must be stopped to set up slow queue */ + if (dev->data->dev_started) + return -1; + + internals->mode4.dedicated_queues.enabled = 1; + + bond_ethdev_mode_set(dev, internals->mode); + return retval; +} + +int +rte_eth_bond_8023ad_dedicated_queues_disable(uint8_t port) +{ + int retval = 0; + struct rte_eth_dev *dev = &rte_eth_devices[port]; + struct bond_dev_private *internals = (struct bond_dev_private *) + dev->data->dev_private; + + if (check_for_bonded_ethdev(dev) != 0) + return -1; + + /* Device must be stopped to set up slow queue */ + if (dev->data->dev_started) + return -1; + + internals->mode4.dedicated_queues.enabled = 0; + + bond_ethdev_mode_set(dev, internals->mode); + + return retval; +} diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h index 6b8ff57..5c61e66 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad.h @@ -302,4 +302,46 @@ int rte_eth_bond_8023ad_ext_slowtx(uint8_t port_id, uint8_t slave_id, struct rte_mbuf *lacp_pkt); +/** + * Enable dedicated hw queues for 802.3ad control plane traffic on on slaves + * + * This function creates an additional tx and rx queue on each slave for + * dedicated 802.3ad control plane traffic . A flow filtering rule is + * programmed on each slave to redirect all LACP slow packets to that rx queue + * for processing in the LACP state machine, this removes the need to filter + * these packets in the bonded devices data path. The additional tx queue is + * used to enable the LACP state machine to enqueue LACP packets directly to + * slave hw independently of the bonded devices data path. + * + * To use this feature all slaves must support the programming of the flow + * filter rule required for rx and have enough queues that one rx and tx queue + * can be reserved for the LACP state machines control packets. + * + * Bonding port must be stopped to change this configuration. + * + * @param port_id Bonding device id + * + * @return + * 0 on success, negative value otherwise. + */ +int +rte_eth_bond_8023ad_dedicated_queues_enable(uint8_t port_id); + +/** + * Disable slow queue on slaves + * + * This function disables hardware slow packet filter. + * + * Bonding port must be stopped to change this configuration. + * + * @see rte_eth_bond_8023ad_slow_pkt_hw_filter_enable + * + * @param port_id Bonding device id + * @return + * 0 on success, negative value otherwise. + * + */ +int +rte_eth_bond_8023ad_dedicated_queues_disable(uint8_t port_id); + #endif /* RTE_ETH_BOND_8023AD_H_ */ diff --git a/drivers/net/bonding/rte_eth_bond_8023ad_private.h b/drivers/net/bonding/rte_eth_bond_8023ad_private.h index ca8858b..c16dba8 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad_private.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad_private.h @@ -39,6 +39,7 @@ #include <rte_ether.h> #include <rte_byteorder.h> #include <rte_atomic.h> +#include <rte_flow.h> #include "rte_eth_bond_8023ad.h" @@ -162,6 +163,9 @@ struct port { uint64_t warning_timer; volatile uint16_t warnings_to_show; + + /** Memory pool used to allocate slow queues */ + struct rte_mempool *slow_pool; }; struct mode8023ad_private { @@ -175,6 +179,19 @@ struct mode8023ad_private { uint64_t update_timeout_us; rte_eth_bond_8023ad_ext_slowrx_fn slowrx_cb; uint8_t external_sm; + + /** + * Configuration of dedicated hardware queues for control plane + * traffic + */ + struct { + uint8_t enabled; + + struct rte_flow *flow[RTE_MAX_ETHPORTS]; + + uint16_t rx_qid; + uint16_t tx_qid; + } dedicated_queues; }; /** @@ -295,4 +312,14 @@ bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *dev, uint8_t slave_pos); void bond_mode_8023ad_mac_address_update(struct rte_eth_dev *bond_dev); +int +bond_ethdev_8023ad_flow_verify(struct rte_eth_dev *bond_dev, + uint8_t slave_port); + +int +bond_ethdev_8023ad_flow_set(struct rte_eth_dev *bond_dev, uint8_t slave_port); + +int +bond_8023ad_slow_pkt_hw_filter_supported(uint8_t port_id); + #endif /* RTE_ETH_BOND_8023AD_H_ */ diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 9730ae0..4d1b262 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -133,6 +133,254 @@ is_lacp_packets(uint16_t ethertype, uint8_t subtype, uint16_t vlan_tci) (subtype == SLOW_SUBTYPE_MARKER || subtype == SLOW_SUBTYPE_LACP)); } +/***************************************************************************** + * Flow director's setup for mode 4 optimization + */ + +static struct rte_flow_item_eth flow_item_eth_type_8023ad = { + .dst.addr_bytes = { 0 }, + .src.addr_bytes = { 0 }, + .type = RTE_BE16(ETHER_TYPE_SLOW), +}; + +static struct rte_flow_item_eth flow_item_eth_mask_type_8023ad = { + .dst.addr_bytes = { 0 }, + .src.addr_bytes = { 0 }, + .type = 0xFFFF, +}; + +static struct rte_flow_item flow_item_8023ad[] = { + { + .type = RTE_FLOW_ITEM_TYPE_ETH, + .spec = &flow_item_eth_type_8023ad, + .last = NULL, + .mask = &flow_item_eth_mask_type_8023ad, + }, + { + .type = RTE_FLOW_ITEM_TYPE_END, + .spec = NULL, + .last = NULL, + .mask = NULL, + } +}; + +const struct rte_flow_attr flow_attr_8023ad = { + .group = 0, + .priority = 0, + .ingress = 1, + .egress = 0, + .reserved = 0, +}; + +int +bond_ethdev_8023ad_flow_verify(struct rte_eth_dev *bond_dev, + uint8_t slave_port) { + struct rte_flow_error error; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + + struct rte_flow_action_queue lacp_queue_conf = { + .index = internals->mode4.dedicated_queues.rx_qid, + }; + + const struct rte_flow_action actions[] = { + { + .type = RTE_FLOW_ACTION_TYPE_QUEUE, + .conf = &lacp_queue_conf + }, + { + .type = RTE_FLOW_ACTION_TYPE_END, + } + }; + + int ret = rte_flow_validate(slave_port, &flow_attr_8023ad, + flow_item_8023ad, actions, &error); + if (ret < 0) + return -1; + + return 0; +} + +int +bond_8023ad_slow_pkt_hw_filter_supported(uint8_t port_id) { + struct rte_eth_dev *bond_dev = &rte_eth_devices[port_id]; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + struct rte_eth_dev_info bond_info, slave_info; + uint8_t idx; + + /* Verify if all slaves in bonding supports flow director and */ + if (internals->slave_count > 0) { + rte_eth_dev_info_get(bond_dev->data->port_id, &bond_info); + + internals->mode4.dedicated_queues.rx_qid = bond_info.nb_rx_queues; + internals->mode4.dedicated_queues.tx_qid = bond_info.nb_tx_queues; + + for (idx = 0; idx < internals->slave_count; idx++) { + rte_eth_dev_info_get(internals->slaves[idx].port_id, + &slave_info); + + if (bond_ethdev_8023ad_flow_verify(bond_dev, + internals->slaves[idx].port_id) != 0) + return -1; + } + } + + return 0; +} + +int +bond_ethdev_8023ad_flow_set(struct rte_eth_dev *bond_dev, uint8_t slave_port) { + + struct rte_flow_error error; + struct bond_dev_private *internals = (struct bond_dev_private *) + (bond_dev->data->dev_private); + + struct rte_flow_action_queue lacp_queue_conf = { + .index = internals->mode4.dedicated_queues.rx_qid, + }; + + const struct rte_flow_action actions[] = { + { + .type = RTE_FLOW_ACTION_TYPE_QUEUE, + .conf = &lacp_queue_conf + }, + { + .type = RTE_FLOW_ACTION_TYPE_END, + } + }; + + internals->mode4.dedicated_queues.flow[slave_port] = rte_flow_create(slave_port, + &flow_attr_8023ad, flow_item_8023ad, actions, &error); + if (internals->mode4.dedicated_queues.flow[slave_port] == NULL) { + RTE_BOND_LOG(ERR, "bond_ethdev_8023ad_flow_set: %s " + "(slave_port=%d queue_id=%d)", + error.message, slave_port, + internals->mode4.dedicated_queues.rx_qid); + return -1; + } + + return 0; +} + +static uint16_t +bond_ethdev_rx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + struct bond_rx_queue *bd_rx_q = (struct bond_rx_queue *)queue; + struct bond_dev_private *internals = bd_rx_q->dev_private; + uint16_t num_rx_total = 0; /* Total number of received packets */ + uint8_t slaves[RTE_MAX_ETHPORTS]; + uint8_t slave_count; + + uint8_t i, idx; + + /* Copy slave list to protect against slave up/down changes during tx + * bursting */ + slave_count = internals->active_slave_count; + memcpy(slaves, internals->active_slaves, + sizeof(internals->active_slaves[0]) * slave_count); + + for (i = 0, idx = internals->active_slave; + i < slave_count && num_rx_total < nb_pkts; i++, idx++) { + idx = idx % slave_count; + + /* Read packets from this slave */ + num_rx_total += rte_eth_rx_burst(slaves[idx], bd_rx_q->queue_id, + &bufs[num_rx_total], nb_pkts - num_rx_total); + } + + internals->active_slave = idx; + + return num_rx_total; +} + +static uint16_t +bond_ethdev_tx_burst_8023ad_fast_queue(void *queue, struct rte_mbuf **bufs, + uint16_t nb_pkts) +{ + struct bond_dev_private *internals; + struct bond_tx_queue *bd_tx_q; + + uint8_t num_of_slaves; + uint8_t slaves[RTE_MAX_ETHPORTS]; + /* positions in slaves, not ID */ + uint8_t distributing_offsets[RTE_MAX_ETHPORTS]; + uint8_t distributing_count; + + uint16_t num_tx_slave, num_tx_total = 0, num_tx_fail_total = 0; + uint16_t i, op_slave_idx; + + struct rte_mbuf *slave_bufs[RTE_MAX_ETHPORTS][nb_pkts]; + + /* Total amount of packets in slave_bufs */ + uint16_t slave_nb_pkts[RTE_MAX_ETHPORTS] = { 0 }; + /* Slow packets placed in each slave */ + + if (unlikely(nb_pkts == 0)) + return 0; + + bd_tx_q = (struct bond_tx_queue *)queue; + internals = bd_tx_q->dev_private; + + /* Copy slave list to protect against slave up/down changes during tx + * bursting */ + num_of_slaves = internals->active_slave_count; + if (num_of_slaves < 1) + return num_tx_total; + + memcpy(slaves, internals->active_slaves, sizeof(slaves[0]) * + num_of_slaves); + + distributing_count = 0; + for (i = 0; i < num_of_slaves; i++) { + struct port *port = &mode_8023ad_ports[slaves[i]]; + if (ACTOR_STATE(port, DISTRIBUTING)) + distributing_offsets[distributing_count++] = i; + } + + if (likely(distributing_count > 0)) { + /* Populate slaves mbuf with the packets which are to be sent */ + for (i = 0; i < nb_pkts; i++) { + /* Select output slave using hash based on xmit policy */ + op_slave_idx = internals->xmit_hash(bufs[i], + distributing_count); + + /* Populate slave mbuf arrays with mbufs for that slave. + * Use only slaves that are currently distributing. + */ + uint8_t slave_offset = + distributing_offsets[op_slave_idx]; + slave_bufs[slave_offset][slave_nb_pkts[slave_offset]] = + bufs[i]; + slave_nb_pkts[slave_offset]++; + } + } + + /* Send packet burst on each slave device */ + for (i = 0; i < num_of_slaves; i++) { + if (slave_nb_pkts[i] == 0) + continue; + + num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, + slave_bufs[i], slave_nb_pkts[i]); + + num_tx_total += num_tx_slave; + num_tx_fail_total += slave_nb_pkts[i] - num_tx_slave; + + /* If tx burst fails move packets to end of bufs */ + if (unlikely(num_tx_slave < slave_nb_pkts[i])) { + uint16_t j = nb_pkts - num_tx_fail_total; + for ( ; num_tx_slave < slave_nb_pkts[i]; j++, + num_tx_slave++) + bufs[j] = slave_bufs[i][num_tx_slave]; + } + } + + return num_tx_total; +} + + static uint16_t bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) @@ -1302,11 +1550,19 @@ bond_ethdev_mode_set(struct rte_eth_dev *eth_dev, int mode) if (bond_mode_8023ad_enable(eth_dev) != 0) return -1; - eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad; - eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad; - RTE_LOG(WARNING, PMD, - "Using mode 4, it is necessary to do TX burst and RX burst " - "at least every 100ms.\n"); + if (internals->mode4.dedicated_queues.enabled == 0) { + eth_dev->rx_pkt_burst = bond_ethdev_rx_burst_8023ad; + eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_8023ad; + RTE_LOG(WARNING, PMD, + "Using mode 4, it is necessary to do TX burst " + "and RX burst at least every 100ms.\n"); + } else { + /* Use flow director's optimization */ + eth_dev->rx_pkt_burst = + bond_ethdev_rx_burst_8023ad_fast_queue; + eth_dev->tx_pkt_burst = + bond_ethdev_tx_burst_8023ad_fast_queue; + } break; case BONDING_MODE_TLB: eth_dev->tx_pkt_burst = bond_ethdev_tx_burst_tlb; @@ -1328,15 +1584,81 @@ bond_ethdev_mode_set(struct rte_eth_dev *eth_dev, int mode) return 0; } + +static int +slave_configure_slow_queue(struct rte_eth_dev *bonded_eth_dev, + struct rte_eth_dev *slave_eth_dev) +{ + int errval = 0; + struct bond_dev_private *internals = (struct bond_dev_private *) + bonded_eth_dev->data->dev_private; + struct port *port = &mode_8023ad_ports[slave_eth_dev->data->port_id]; + + if (port->slow_pool == NULL) { + char mem_name[256]; + int slave_id = slave_eth_dev->data->port_id; + + snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_slow_pool", + slave_id); + port->slow_pool = rte_pktmbuf_pool_create(mem_name, 8191, + 250, 0, RTE_MBUF_DEFAULT_BUF_SIZE, + slave_eth_dev->data->numa_node); + + /* Any memory allocation failure in initialization is critical because + * resources can't be free, so reinitialization is impossible. */ + if (port->slow_pool == NULL) { + rte_panic("Slave %u: Failed to create memory pool '%s': %s\n", + slave_id, mem_name, rte_strerror(rte_errno)); + } + } + + if (internals->mode4.dedicated_queues.enabled == 1) { + /* Configure slow Rx queue */ + + errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, + internals->mode4.dedicated_queues.rx_qid, 128, + rte_eth_dev_socket_id(slave_eth_dev->data->port_id), + NULL, port->slow_pool); + if (errval != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_rx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, + internals->mode4.dedicated_queues.rx_qid, + errval); + return errval; + } + + errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, + internals->mode4.dedicated_queues.tx_qid, 512, + rte_eth_dev_socket_id(slave_eth_dev->data->port_id), + NULL); + if (errval != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, + internals->mode4.dedicated_queues.tx_qid, + errval); + return errval; + } + } + return 0; +} + int slave_configure(struct rte_eth_dev *bonded_eth_dev, struct rte_eth_dev *slave_eth_dev) { struct bond_rx_queue *bd_rx_q; struct bond_tx_queue *bd_tx_q; + uint16_t nb_rx_queues; + uint16_t nb_tx_queues; int errval; uint16_t q_id; + struct rte_flow_error flow_error; + + struct bond_dev_private *internals = (struct bond_dev_private *) + bonded_eth_dev->data->dev_private; /* Stop slave */ rte_eth_dev_stop(slave_eth_dev->data->port_id); @@ -1366,10 +1688,19 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev, slave_eth_dev->data->dev_conf.rxmode.hw_vlan_filter = bonded_eth_dev->data->dev_conf.rxmode.hw_vlan_filter; + nb_rx_queues = bonded_eth_dev->data->nb_rx_queues; + nb_tx_queues = bonded_eth_dev->data->nb_tx_queues; + + if (internals->mode == BONDING_MODE_8023AD) { + if (internals->mode4.dedicated_queues.enabled == 1) { + nb_rx_queues++; + nb_tx_queues++; + } + } + /* Configure device */ errval = rte_eth_dev_configure(slave_eth_dev->data->port_id, - bonded_eth_dev->data->nb_rx_queues, - bonded_eth_dev->data->nb_tx_queues, + nb_rx_queues, nb_tx_queues, &(slave_eth_dev->data->dev_conf)); if (errval != 0) { RTE_BOND_LOG(ERR, "Cannot configure slave device: port %u , err (%d)", @@ -1403,12 +1734,35 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev, &bd_tx_q->tx_conf); if (errval != 0) { RTE_BOND_LOG(ERR, - "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", - slave_eth_dev->data->port_id, q_id, errval); + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, q_id, errval); return errval; } } + if (internals->mode == BONDING_MODE_8023AD && + internals->mode4.dedicated_queues.enabled == 1) { + if (slave_configure_slow_queue(bonded_eth_dev, slave_eth_dev) + != 0) + return errval; + + if (bond_ethdev_8023ad_flow_verify(bonded_eth_dev, + slave_eth_dev->data->port_id) != 0) { + RTE_BOND_LOG(ERR, + "rte_eth_tx_queue_setup: port=%d queue_id %d, err (%d)", + slave_eth_dev->data->port_id, q_id, errval); + return -1; + } + + if (internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id] != NULL) + rte_flow_destroy(slave_eth_dev->data->port_id, + internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id], + &flow_error); + + bond_ethdev_8023ad_flow_set(bonded_eth_dev, + slave_eth_dev->data->port_id); + } + /* Start device */ errval = rte_eth_dev_start(slave_eth_dev->data->port_id); if (errval != 0) { @@ -1567,13 +1921,26 @@ bond_ethdev_start(struct rte_eth_dev *eth_dev) if (internals->promiscuous_en) bond_ethdev_promiscuous_enable(eth_dev); + if (internals->mode == BONDING_MODE_8023AD) { + if (internals->mode4.dedicated_queues.enabled == 1) { + internals->mode4.dedicated_queues.rx_qid = + eth_dev->data->nb_rx_queues; + internals->mode4.dedicated_queues.tx_qid = + eth_dev->data->nb_tx_queues; + } + } + + /* Reconfigure each slave device if starting bonded device */ for (i = 0; i < internals->slave_count; i++) { - if (slave_configure(eth_dev, - &(rte_eth_devices[internals->slaves[i].port_id])) != 0) { + struct rte_eth_dev *slave_ethdev = + &(rte_eth_devices[internals->slaves[i].port_id]); + if (slave_configure(eth_dev, slave_ethdev) != 0) { RTE_BOND_LOG(ERR, - "bonded port (%d) failed to reconfigure slave device (%d)", - eth_dev->data->port_id, internals->slaves[i].port_id); + "bonded port (%d) failed to reconfigure" + "slave device (%d)", + eth_dev->data->port_id, + internals->slaves[i].port_id); return -1; } /* We will need to poll for link status if any slave doesn't @@ -1698,21 +2065,21 @@ static void bond_ethdev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { struct bond_dev_private *internals = dev->data->dev_private; + uint16_t max_nb_rx_queues = UINT16_MAX; uint16_t max_nb_tx_queues = UINT16_MAX; dev_info->max_mac_addrs = 1; - dev_info->max_rx_pktlen = internals->candidate_max_rx_pktlen - ? internals->candidate_max_rx_pktlen - : ETHER_MAX_JUMBO_FRAME_LEN; + dev_info->max_rx_pktlen = internals->candidate_max_rx_pktlen ? + internals->candidate_max_rx_pktlen : + ETHER_MAX_JUMBO_FRAME_LEN; + /* Max number of tx/rx queues that the bonded device can support is the + * minimum values of the bonded slaves, as all slaves must be capable + * of supporting the same number of tx/rx queues. + */ if (internals->slave_count > 0) { - /* Max number of tx/rx queues that the bonded device can - * support is the minimum values of the bonded slaves, as - * all slaves must be capable of supporting the same number - * of tx/rx queues. - */ struct rte_eth_dev_info slave_info; uint8_t idx; @@ -1731,6 +2098,16 @@ bond_ethdev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->max_rx_queues = max_nb_rx_queues; dev_info->max_tx_queues = max_nb_tx_queues; + /** + * If dedicated hw queues enabled for link bonding device in LACP mode + * then we need to reduce the maximum number of data path queues by 1. + */ + if (internals->mode == BONDING_MODE_8023AD && + internals->mode4.dedicated_queues.enabled == 1) { + dev_info->max_rx_queues--; + dev_info->max_tx_queues--; + } + dev_info->min_rx_bufsize = 0; dev_info->rx_offload_capa = internals->rx_offload_capa; diff --git a/drivers/net/bonding/rte_eth_bond_version.map b/drivers/net/bonding/rte_eth_bond_version.map index 2de0a7d..9c15864 100644 --- a/drivers/net/bonding/rte_eth_bond_version.map +++ b/drivers/net/bonding/rte_eth_bond_version.map @@ -43,3 +43,12 @@ DPDK_16.07 { rte_eth_bond_8023ad_setup; } DPDK_16.04; + +DPDK_17.08 { + global: + + rte_eth_bond_8023ad_dedicated_queues_enable; + rte_eth_bond_8023ad_dedicated_queues_disable; + + local: *; +} DPDK_17.05; -- 2.9.4 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty @ 2017-07-04 19:55 ` Declan Doherty 2017-07-05 11:19 ` Ferruh Yigit ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 19:55 UTC (permalink / raw) To: dev; +Cc: Tomasz Kulasek On 04/07/17 17:46, Declan Doherty wrote: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Add support for hardware flow classification of LACP control plane > traffic to be redirect to a dedicated receive queue on each slave which > is not visible to application. Also enables a dedicate transmit queue > for LACP traffic which allows complete decoupling of control and data > paths. > > This only applies to bonding devices running in mode 4 > (link-aggegration-802.3ad). > > Introduce two new APIs to support enable/disabled of dedicated > queues. > > - rte_eth_bond_8023ad_dedicated_queues_enable > - rte_eth_bond_8023ad_dedicated_queues_disable > > rte_eth_bond_8023ad_dedicated_queues_enable must be called before > bonding port is configured or started to reserved and configure the > dedicated queuesh. > > When this option is enabled all slaves must support flow filtering > by ethernet type and support one additional tx and rx queue on > each slave. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > Signed-off-by: Declan Doherty <declan.doherty@intel.com> > --- ... > Acked-by: Declan Doherty <declan.doherty@intel.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty 2017-07-04 19:55 ` Declan Doherty @ 2017-07-05 11:19 ` Ferruh Yigit 2017-07-05 11:33 ` Ferruh Yigit 2017-12-13 8:16 ` linhaifeng 3 siblings, 0 replies; 22+ messages in thread From: Ferruh Yigit @ 2017-07-05 11:19 UTC (permalink / raw) To: Declan Doherty, dev; +Cc: Tomasz Kulasek On 7/4/2017 5:46 PM, Declan Doherty wrote: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Add support for hardware flow classification of LACP control plane > traffic to be redirect to a dedicated receive queue on each slave which > is not visible to application. Also enables a dedicate transmit queue > for LACP traffic which allows complete decoupling of control and data > paths. > > This only applies to bonding devices running in mode 4 > (link-aggegration-802.3ad). > > Introduce two new APIs to support enable/disabled of dedicated > queues. > > - rte_eth_bond_8023ad_dedicated_queues_enable > - rte_eth_bond_8023ad_dedicated_queues_disable > > rte_eth_bond_8023ad_dedicated_queues_enable must be called before > bonding port is configured or started to reserved and configure the > dedicated queuesh. > > When this option is enabled all slaves must support flow filtering > by ethernet type and support one additional tx and rx queue on > each slave. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > Signed-off-by: Declan Doherty <declan.doherty@intel.com> <...> > + > +DPDK_17.08 { > + global: > + > + rte_eth_bond_8023ad_dedicated_queues_enable; > + rte_eth_bond_8023ad_dedicated_queues_disable; > + > + local: *; This line is not required. > +} DPDK_17.05; And this should be DPDK_16.07, otherwise breaking shared build. I will fix above ones while applying. > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty 2017-07-04 19:55 ` Declan Doherty 2017-07-05 11:19 ` Ferruh Yigit @ 2017-07-05 11:33 ` Ferruh Yigit 2017-12-13 8:16 ` linhaifeng 3 siblings, 0 replies; 22+ messages in thread From: Ferruh Yigit @ 2017-07-05 11:33 UTC (permalink / raw) To: Declan Doherty, dev; +Cc: Tomasz Kulasek On 7/4/2017 5:46 PM, Declan Doherty wrote: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Add support for hardware flow classification of LACP control plane > traffic to be redirect to a dedicated receive queue on each slave which > is not visible to application. Also enables a dedicate transmit queue > for LACP traffic which allows complete decoupling of control and data > paths. > > This only applies to bonding devices running in mode 4 > (link-aggegration-802.3ad). > > Introduce two new APIs to support enable/disabled of dedicated > queues. > > - rte_eth_bond_8023ad_dedicated_queues_enable > - rte_eth_bond_8023ad_dedicated_queues_disable > > rte_eth_bond_8023ad_dedicated_queues_enable must be called before > bonding port is configured or started to reserved and configure the > dedicated queuesh. > > When this option is enabled all slaves must support flow filtering > by ethernet type and support one additional tx and rx queue on > each slave. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > Signed-off-by: Declan Doherty <declan.doherty@intel.com> <...> > - "bonded port (%d) failed to reconfigure slave device (%d)", > - eth_dev->data->port_id, internals->slaves[i].port_id); > + "bonded port (%d) failed to reconfigure" > + "slave device (%d)", Log string merged into single line. <...> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty ` (2 preceding siblings ...) 2017-07-05 11:33 ` Ferruh Yigit @ 2017-12-13 8:16 ` linhaifeng 2017-12-13 12:41 ` Kulasek, TomaszX 3 siblings, 1 reply; 22+ messages in thread From: linhaifeng @ 2017-12-13 8:16 UTC (permalink / raw) To: Declan Doherty, dev; +Cc: Tomasz Kulasek Hi, What is the purpose of this patch? fix problem or improve performance? 在 2017/7/5 0:46, Declan Doherty 写道: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Add support for hardware flow classification of LACP control plane > traffic to be redirect to a dedicated receive queue on each slave which > is not visible to application. Also enables a dedicate transmit queue > for LACP traffic which allows complete decoupling of control and data > paths. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic 2017-12-13 8:16 ` linhaifeng @ 2017-12-13 12:41 ` Kulasek, TomaszX 0 siblings, 0 replies; 22+ messages in thread From: Kulasek, TomaszX @ 2017-12-13 12:41 UTC (permalink / raw) To: linhaifeng, Doherty, Declan, dev Hi, > -----Original Message----- > From: linhaifeng [mailto:haifeng.lin@huawei.com] > Sent: Wednesday, December 13, 2017 09:16 > To: Doherty, Declan <declan.doherty@intel.com>; dev@dpdk.org > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com> > Subject: Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP > control traffic > > Hi, > > What is the purpose of this patch? fix problem or improve performance? > > 在 2017/7/5 0:46, Declan Doherty 写道: > > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > > > Add support for hardware flow classification of LACP control plane > > traffic to be redirect to a dedicated receive queue on each slave which > > is not visible to application. Also enables a dedicate transmit queue > > for LACP traffic which allows complete decoupling of control and data > > paths. > This is performance improvement. Tomasz ^ permalink raw reply [flat|nested] 22+ messages in thread
* [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty ` (2 preceding siblings ...) 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty @ 2017-07-04 16:46 ` Declan Doherty 2017-07-04 19:56 ` Declan Doherty 2017-07-05 11:33 ` Ferruh Yigit 2017-07-05 11:35 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Ferruh Yigit 4 siblings, 2 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 16:46 UTC (permalink / raw) To: dev; +Cc: Tomasz Kulasek, Declan Doherty From: Tomasz Kulasek <tomaszx.kulasek@intel.com> Add new command to support enable/disable of dedicated tx/rx queue on each slave of a bond device for LACP control plane traffic. set bonding lacp dedicated_queues <port_id> [enable|disable] When enabled this option creates dedicated queues on each slave device for LACP control plane traffic. This removes the need to filter control plane packets in the data path. Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Signed-off-by: Declan Doherty <declan.doherty@intel.com> --- app/test-pmd/cmdline.c | 85 +++++++++++++++++++++++++++++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 9 +++ 2 files changed, 94 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 0fc40a6..486252a 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -87,6 +87,7 @@ #include <cmdline.h> #ifdef RTE_LIBRTE_PMD_BOND #include <rte_eth_bond.h> +#include <rte_eth_bond_8023ad.h> #endif #ifdef RTE_LIBRTE_IXGBE_PMD #include <rte_pmd_ixgbe.h> @@ -575,6 +576,10 @@ static void cmd_help_long_parsed(void *parsed_result, "set bonding mon_period (port_id) (value)\n" " Set the bonding link status monitoring polling period in ms.\n\n" + + "set bonding lacp dedicated_queues <port_id> (enable|disable)\n" + " Enable/disable dedicated queues for LACP control traffic.\n\n" + #endif "set link-up port (port_id)\n" " Set link up for a port.\n\n" @@ -4303,6 +4308,85 @@ cmdline_parse_inst_t cmd_set_bonding_mode = { } }; +/* *** SET BONDING SLOW_QUEUE SW/HW *** */ +struct cmd_set_bonding_lacp_dedicated_queues_result { + cmdline_fixed_string_t set; + cmdline_fixed_string_t bonding; + cmdline_fixed_string_t lacp; + cmdline_fixed_string_t dedicated_queues; + uint8_t port_id; + cmdline_fixed_string_t mode; +}; + +static void cmd_set_bonding_lacp_dedicated_queues_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_set_bonding_lacp_dedicated_queues_result *res = parsed_result; + portid_t port_id = res->port_id; + struct rte_port *port; + + port = &ports[port_id]; + + /** Check if the port is not started **/ + if (port->port_status != RTE_PORT_STOPPED) { + printf("Please stop port %d first\n", port_id); + return; + } + + if (!strcmp(res->mode, "enable")) { + if (rte_eth_bond_8023ad_dedicated_queues_enable(port_id) == 0) + printf("Dedicate queues for LACP control packets" + " enabled\n"); + else + printf("Enabling dedicate queues for LACP control " + "packets on port %d failed\n", port_id); + } else if (!strcmp(res->mode, "disable")) { + if (rte_eth_bond_8023ad_dedicated_queues_disable(port_id) == 0) + printf("Dedicated queues for LACP control packets " + "disabled\n"); + else + printf("Disabling dedicated queues for LACP control " + "traffic on port %d failed\n", port_id); + } +} + +cmdline_parse_token_string_t cmd_setbonding_lacp_dedicated_queues_set = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_lacp_dedicated_queues_result, + set, "set"); +cmdline_parse_token_string_t cmd_setbonding_lacp_dedicated_queues_bonding = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_lacp_dedicated_queues_result, + bonding, "bonding"); +cmdline_parse_token_string_t cmd_setbonding_lacp_dedicated_queues_lacp = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_lacp_dedicated_queues_result, + lacp, "lacp"); +cmdline_parse_token_string_t cmd_setbonding_lacp_dedicated_queues_dedicated_queues = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_lacp_dedicated_queues_result, + dedicated_queues, "dedicated_queues"); +cmdline_parse_token_num_t cmd_setbonding_lacp_dedicated_queues_port_id = +TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_lacp_dedicated_queues_result, + port_id, UINT8); +cmdline_parse_token_string_t cmd_setbonding_lacp_dedicated_queues_mode = +TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_lacp_dedicated_queues_result, + mode, "enable#disable"); + +cmdline_parse_inst_t cmd_set_lacp_dedicated_queues = { + .f = cmd_set_bonding_lacp_dedicated_queues_parsed, + .help_str = "set bonding lacp dedicated_queues <port_id> " + "enable|disable: " + "Enable/disable dedicated queues for LACP control traffic for port_id", + .data = NULL, + .tokens = { + (void *)&cmd_setbonding_lacp_dedicated_queues_set, + (void *)&cmd_setbonding_lacp_dedicated_queues_bonding, + (void *)&cmd_setbonding_lacp_dedicated_queues_lacp, + (void *)&cmd_setbonding_lacp_dedicated_queues_dedicated_queues, + (void *)&cmd_setbonding_lacp_dedicated_queues_port_id, + (void *)&cmd_setbonding_lacp_dedicated_queues_mode, + NULL + } +}; + /* *** SET BALANCE XMIT POLICY *** */ struct cmd_set_bonding_balance_xmit_policy_result { cmdline_fixed_string_t set; @@ -13934,6 +14018,7 @@ cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *) &cmd_set_bond_mac_addr, (cmdline_parse_inst_t *) &cmd_set_balance_xmit_policy, (cmdline_parse_inst_t *) &cmd_set_bond_mon_period, + (cmdline_parse_inst_t *) &cmd_set_lacp_dedicated_queues, #endif (cmdline_parse_inst_t *)&cmd_vlan_offload, (cmdline_parse_inst_t *)&cmd_vlan_tpid, diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst index b8f47fd..35d0b1f 100644 --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst @@ -1766,6 +1766,15 @@ For example, to set the link status monitoring polling period of bonded device ( testpmd> set bonding mon_period 5 150 +set bonding lacp dedicated_queue +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Enable dedicated tx/rx queues on bonding devices slaves to handle LACP control plane traffic +when in mode 4 (link-aggregration-802.3ad) + + testpmd> set bonding lacp dedicated_queues (port_id) (enable|disable) + + show bonding config ~~~~~~~~~~~~~~~~~~~ -- 2.9.4 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues Declan Doherty @ 2017-07-04 19:56 ` Declan Doherty 2017-07-05 11:33 ` Ferruh Yigit 1 sibling, 0 replies; 22+ messages in thread From: Declan Doherty @ 2017-07-04 19:56 UTC (permalink / raw) To: dev; +Cc: Tomasz Kulasek On 04/07/17 17:46, Declan Doherty wrote: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Add new command to support enable/disable of dedicated tx/rx queue on > each slave of a bond device for LACP control plane traffic. > > set bonding lacp dedicated_queues <port_id> [enable|disable] > > When enabled this option creates dedicated queues on each slave device > for LACP control plane traffic. This removes the need to filter control > plane packets in the data path. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > Signed-off-by: Declan Doherty <declan.doherty@intel.com> > --- ... > Acked-by: Declan Doherty <declan.doherty@intel.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues Declan Doherty 2017-07-04 19:56 ` Declan Doherty @ 2017-07-05 11:33 ` Ferruh Yigit 1 sibling, 0 replies; 22+ messages in thread From: Ferruh Yigit @ 2017-07-05 11:33 UTC (permalink / raw) To: Declan Doherty, dev; +Cc: Tomasz Kulasek On 7/4/2017 5:46 PM, Declan Doherty wrote: > From: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Add new command to support enable/disable of dedicated tx/rx queue on > each slave of a bond device for LACP control plane traffic. > > set bonding lacp dedicated_queues <port_id> [enable|disable] > > When enabled this option creates dedicated queues on each slave device > for LACP control plane traffic. This removes the need to filter control > plane packets in the data path. > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > Signed-off-by: Declan Doherty <declan.doherty@intel.com> <...> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst > @@ -1766,6 +1766,15 @@ For example, to set the link status monitoring polling period of bonded device ( > testpmd> set bonding mon_period 5 150 > > > +set bonding lacp dedicated_queue trailing white-space removed. > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <...> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty ` (3 preceding siblings ...) 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues Declan Doherty @ 2017-07-05 11:35 ` Ferruh Yigit 4 siblings, 0 replies; 22+ messages in thread From: Ferruh Yigit @ 2017-07-05 11:35 UTC (permalink / raw) To: Declan Doherty, dev On 7/4/2017 5:46 PM, Declan Doherty wrote: > 1. Overview > > Packet processing in the current path for bonding in mode 4, requires > parse all packets in the fast path, to classify and process LACP > packets. > > The idea of performance improvement is to use hardware offloads to > improve packet classification. > > 2. Scope of work > > a) Optimization of software LACP packet classification by using > packet_type metadata to eliminate the requirement of parsing each > packet in the received burst. > > b) Implementation of classification mechanism using flow director to > redirect LACP packets to the dedicated queue (not visible by > application). > > - Filter pattern choosing (not all filters are supported by all > devices), > - Changing processing path to speed up non-LACP packets > processing, > - Handle LACP packets from dedicated Rx queue and send to the > dedicated Tx queue, > > c) Creation of fallback mechanism allowing to select the most > preferable method of processing: > > - Flow director, > - Packet type metadata, > - Software parsing, > > 3. Implementation > > 3.1. Packet type > > The packet_type approach would result in a performance improvement > as packets data would no longer be required to be read, but with this > approach the bonded driver would still need to look at the mbuf of > each packet thereby having an impact on the achievable Rx > performance. > > There's not packet_type value describing LACP packets directly. > However, it can be used to limit number of packets required to be > parsed, e.g. if packet_type indicates >L2 packets. > > It should improve performance while well-known non-LACP packets can > be skipped without the need to look up into its data. > > 3.2. Flow director > > Using rte_flow API and pattern on ethernet type of packet (0x8809), > we can configure flow director to redirect slow packets to separated > queue. > > An independent Rx queues for LACP would remove the requirement to > filter all ingress traffic in sw which should result in a performance > increase. Other queues stay untouched and processing of packets on > the fast path will be reduced to simple packet collecting from > slaves. > > Separated Tx queue for LACP daemon allows to send LACP responses > immediately, without interfering into Tx fast path. > > RECEIVE > > .---------------. > | Slave 0 | > | .------. | > | Fd | Rxq | | > Rx ======o==>| |==============. > | | +======+ | | .---------------. > | `-->| LACP |--------. | | Bonding | > | `------' | | | | .------. | > `---------------' | | | | | | > | >============>| |=======> Rx > .---------------. | | | +======+ | > | Slave 1 | | | | | XXXX | | > | .------. | | | | `------' | > | Fd | Rxq | | | | `---------------' > Rx ======o==>| |==============' .-----------. > | | +======+ | | / \ > | `-->| LACP |--------+----------->+ LACP DAEMON | > | `------' | Tx <---\ / > `---------------' `-----------' > > All slow packets received by slaves in bonding are redirected to the > separated queue using flow director. Other packets are collected from > slaves and exposed to the application with Rx burst on bonded device. > > TRANSMIT > > .---------------. > | Slave 0 | > | .------. | > | | | | > Tx <=====+===| |<=============. > | | |------| | | .---------------. > | `---| LACP |<-------. | | Bonding | > | `------' | | | | .------. | > `---------------' | | | | | | > | +<============| |<====== Tx > .---------------. | | | +======+ | > | Slave 1 | | | | | XXXX | | > | .------. | | | | `------' | > | | | | | | `---------------' > Tx <=====+===| |<=============' Rx .-----------. > | | |------| | | `-->/ \ > | `---| LACP |<-------+------------+ LACP DAEMON | > | `------' | \ / > `---------------' `-----------' > > On transmit, packets are propagated on the slaves. While we have > separated Tx queue for LACP responses, it can be sent regardless of > the fast path. > > LACP DAEMON > > In this mode whole slow packets are handled in LACP DAEMON. > > V3: > - Split hw filtering patch into 3 patches: > - fix for calculating maximum number of tx/rx queues of bonding device > - enable use of ptype hint for filtering of control plane packets in > default enablement > - enablement of dedicated queues for LACP control packet filtering. > > Declan Doherty (1): > net/bond: calculate number of bonding tx/rx queues > > Tomasz Kulasek (3): > net/bond: use ptype flags for LACP rx filtering > net/bond: dedicated hw queues for LACP control traffic > app/test-pmd: add cmd for dedicated LACP rx/tx queues Series applied to dpdk-next-net/master, thanks. (minor updates, mentioned in mail thread, done while applying.) ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-12-13 12:41 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-27 11:27 [dpdk-dev] [PATCH 0/2] LACP control packet filtering offload Tomasz Kulasek 2017-05-27 11:27 ` [dpdk-dev] [PATCH 1/2] " Tomasz Kulasek 2017-05-29 8:10 ` Adrien Mazarguil 2017-06-29 9:18 ` Declan Doherty 2017-05-27 11:27 ` [dpdk-dev] [PATCH 2/2] test-pmd: add set bonding slow_queue hw/sw Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 0/2] LACP control packet filtering offload Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 1/2] " Tomasz Kulasek 2017-06-29 16:20 ` [dpdk-dev] [PATCH v2 2/2] test-pmd: add set bonding slow_queue hw/sw Tomasz Kulasek 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 1/4] net/bond: calculate number of bonding tx/rx queues Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 2/4] net/bond: use ptype flags for LACP rx filtering Declan Doherty 2017-07-04 19:54 ` Declan Doherty 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic Declan Doherty 2017-07-04 19:55 ` Declan Doherty 2017-07-05 11:19 ` Ferruh Yigit 2017-07-05 11:33 ` Ferruh Yigit 2017-12-13 8:16 ` linhaifeng 2017-12-13 12:41 ` Kulasek, TomaszX 2017-07-04 16:46 ` [dpdk-dev] [PATCH v3 4/4] app/test-pmd: add cmd for dedicated LACP rx/tx queues Declan Doherty 2017-07-04 19:56 ` Declan Doherty 2017-07-05 11:33 ` Ferruh Yigit 2017-07-05 11:35 ` [dpdk-dev] [PATCH v3 0/4] LACP control packet filtering acceleration Ferruh Yigit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).