* Re: [PATCH v3] netdev-dpdk: add control plane protection support [not found] <20221021145308.141933-1-rjarry@redhat.com> @ 2022-11-11 18:15 ` Kevin Traynor 2022-11-14 9:41 ` Kevin Traynor 2022-11-14 12:46 ` Robin Jarry 0 siblings, 2 replies; 3+ messages in thread From: Kevin Traynor @ 2022-11-11 18:15 UTC (permalink / raw) To: Robin Jarry, dev; +Cc: Christophe Fontaine Hi Robin, On 21/10/2022 15:53, Robin Jarry wrote: > Some control protocols are used to maintain link status between > forwarding engines (e.g. LACP). When the system is not sized properly, > the PMD threads may not be able to process all incoming traffic from the > configured Rx queues. When a signaling packet of such protocols is > dropped, it can cause link flapping, worsening the situation. > > Use the RTE flow API to redirect these protocols into a dedicated Rx > queue. The assumption is made that the ratio between control protocol > traffic and user data traffic is very low and thus this dedicated Rx > queue will never get full. The RSS redirection table is re-programmed to > only use the other Rx queues. The RSS table size is stored in the > netdev_dpdk structure at port initialization to avoid requesting the > information again when changing the port configuration. > > The additional Rx queue will be assigned a PMD core like any other Rx > queue. Polling that extra queue may introduce increased latency and > a slight performance penalty at the benefit of preventing link flapping. > > This feature must be enabled per port on specific protocols via the > cp-protection option. This option takes a coma-separated list of > protocol names. It is only supported on ethernet ports. > > If the user has already configured multiple Rx queues on the port, an > additional one will be allocated for control plane packets. If the > hardware cannot satisfy the requested number of requested Rx queues, the > last Rx queue will be assigned for control plane. If only one Rx queue > is available, the cp-protection feature will be disabled. If the > hardware does not support the RTE flow matchers/actions, the feature > will be disabled. > > It cannot be enabled when other_config:hw-offload=true as it may > conflict with the offloaded RTE flows. Similarly, if hw-offload is > enabled while some ports already have cp-protection enabled, the RTE > flow offloading will be disabled on these ports. > > Example use: > > ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ > set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \ > set interface phy0 options:cp-protection=lacp -- \ > set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \ > set interface phy1 options:cp-protection=lacp > > As a starting point, only one protocol is supported: LACP. Other > protocols can be added in the future. NIC compatibility should be > checked. > > To validate that this works as intended, I used a traffic generator to > generate random traffic slightly above the machine capacity at line rate > on a two ports bond interface. OVS is configured to receive traffic on > two VLANs and pop/push them in a br-int bridge based on tags set on > patch ports. > > +----------------------+ > | DUT | > |+--------------------+| > || br-int || default flow, action=NORMAL > || || > || patch10 patch11 || > |+---|-----------|----+| > | | | | > |+---|-----------|----+| > || patch00 patch01 || > || tag:10 tag:20 || > || || > || br-phy || default flow, action=NORMAL > || || > || bond0 || balance-slb, lacp=passive, lacp-time=fast > || phy0 phy1 || > |+------|-----|-------+| > +-------|-----|--------+ > | | > +-------|-----|--------+ > | port0 port1 | balance L3/L4, lacp=active, lacp-time=fast > | lag | mode trunk VLANs 10, 20 > | | > | switch | > | | > | vlan 10 vlan 20 | mode access > | port2 port3 | > +-----|----------|-----+ > | | > +-----|----------|-----+ > | port0 port1 | Random traffic that is properly balanced > | | across the bond ports in both directions. > | traffic generator | > +----------------------+ > > Without cp-protection, the bond0 links are randomly switching to > "defaulted" when one of the LACP packets sent by the switch is dropped > because the RX queues are full and the PMD threads did not process them > fast enough. When that happens, all traffic must go through a single > link which causes above line rate traffic to be dropped. > > When cp-protection is enabled, no LACP packet is dropped and the bond > links remain enabled at all times, maximizing the throughput. > > This feature may be considered as "QoS". However, it does not work by > limiting the rate of traffic explicitly. It only guarantees that some > protocols have a lower chance of being dropped because the PMD cores > cannot keep up with regular traffic. > > The choice of protocols is limited on purpose. This is not meant to be > configurable by users. Some limited configurability could be considered > in the future but it would expose to more potential issues if users are > accidentally redirecting all traffic in the control plane queue. > Not a full review, but rather send now with issue I saw on CX-5 that we discussed as it'll impact testing. > Cc: Christophe Fontaine <cfontain@redhat.com> > Cc: Kevin Traynor <ktraynor@redhat.com> > Signed-off-by: Robin Jarry <rjarry@redhat.com> > --- > v2 -> v3: > > * Added dry_run validation that rte_flows are all supported by the NIC > before configuring anything. > * Added check to make cp-protection and hw-offload mutually exclusive. > * Removed the "match-all" RSS flow that dealt with redirecting all > non-control-plane traffic to all but the control-plane Rx queue. Very > few NICs actually support "match-all" flows without any mask. This was > replaced by reconfiguring the RSS redirection table. The > * Made sure to unconfigure everything and remove the extra Rx queue in > the case the hardware does not support one of the RTE flows. > * Updated vswitchd/vswitch.xml > * Added diagnostics info in netdev_dpdk_get_status > * Tested under load on the following NICs: > - Intel E810 (2x 25G) > - Mellanox ConnectX-5 (2x 25G) > * Basic functionality tested on the following NICs: > - Intel 82599ES (2x 10G) > - Intel X710 (4x 10G) > - Mellanox ConnectX-4 (2x 25G) > > Documentation/topics/dpdk/phy.rst | 55 ++++++ > lib/netdev-dpdk.c | 293 +++++++++++++++++++++++++++++- > vswitchd/vswitch.xml | 26 +++ > 3 files changed, 373 insertions(+), 1 deletion(-) > > diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst > index 937f4c40e5a8..86e69d79b104 100644 > --- a/Documentation/topics/dpdk/phy.rst > +++ b/Documentation/topics/dpdk/phy.rst > @@ -131,6 +131,61 @@ possible with DPDK acceleration. It is possible to configure multiple Rx queues > for ``dpdk`` ports, thus ensuring this is not a bottleneck for performance. For > information on configuring PMD threads, refer to :doc:`pmd`. > > +Control Plane Protection > +------------------------ > + > +Some control protocols are used to maintain link status between forwarding > +engines. In SDN environments, these packets share the same physical network > +than the user data traffic. > + > +When the system is not sized properly, the PMD threads may not be able to > +process all incoming traffic from the configured Rx queues. When a signaling > +packet of such protocols is dropped, it can cause link flapping, worsening the > +situation. > + > +Some physical NICs can be programmed to put these protocols in a dedicated > +hardware Rx queue using the rte_flow__ API. > + > +__ https://doc.dpdk.org/guides-21.11/prog_guide/rte_flow.html#device-compatibility > + > +The currently supported control plane protocols are: > + > +``lacp`` > + `Link Aggregation Control Protocol`__. Ether type ``0x8809``. > + > + __ https://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf > + > +.. warning:: > + > + This feature is not compatible with all NICs. Refer to vendor documentation > + for more information. > + > +Control plane protection must be enabled on specific protocols per port. The > +``cp-protection`` option requires a coma separated list of protocol names:: > + > + $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ > + options:dpdk-devargs=0000:01:00.0 options:cp-protection=lacp > + > +.. note:: > + > + If multiple Rx queues are already configured, regular RSS (Receive Side > + Scaling) queue balancing is done on all but the extra control plane > + protection queue. > + > +.. tip:: > + > + You can check if control plane protection is supported on a port with the > + following command:: > + > + $ ovs-vsctl get interface dpdk-p0 status > + {cp_protection_queue="2", driver_name=..., rss_queues="0-1"} > + > + If the hardware does not support redirecting control plane traffic to > + a dedicated queue, it will be explicit:: > + > + $ ovs-vsctl get interface dpdk-p0 status > + {cp_protection_queue=unsupported, driver_name=..., rss_queues="0-1"} > + > .. _dpdk-phy-flow-control: > > Flow Control > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 0dd655507b50..94f04437a641 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -410,6 +410,11 @@ enum dpdk_hw_ol_features { > NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4, > }; > > +enum dpdk_cp_prot_flags { > + DPDK_CP_PROT_UNSUPPORTED = 1 << 0, > + DPDK_CP_PROT_LACP = 1 << 1, > +}; > + > /* > * In order to avoid confusion in variables names, following naming convention > * should be used, if possible: > @@ -453,6 +458,7 @@ struct netdev_dpdk { > }; > struct dpdk_tx_queue *tx_q; > struct rte_eth_link link; > + uint16_t reta_size; > ); > > PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, > @@ -529,6 +535,13 @@ struct netdev_dpdk { > > /* VF configuration. */ > struct eth_addr requested_hwaddr; > + > + /* Requested control plane protection flags, > + * from the enum set 'dpdk_cp_prot_flags' */ > + uint64_t requested_cp_prot_flags; > + uint64_t cp_prot_flags; > + size_t cp_prot_flows_num; > + struct rte_flow **cp_prot_flows; > ); > > PADDED_MEMBERS(CACHE_LINE_SIZE, > @@ -1192,6 +1205,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) > netdev_get_name(&dev->up)); > } > } > + dev->reta_size = info.reta_size; > > n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); > n_txq = MIN(info.max_tx_queues, dev->up.n_txq); > @@ -1309,6 +1323,10 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, > dev->requested_n_txq = NR_QUEUE; > dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; > dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; > + dev->requested_cp_prot_flags = 0; > + dev->cp_prot_flags = 0; > + dev->cp_prot_flows_num = 0; > + dev->cp_prot_flows = NULL; > > /* Initialize the flow control to NULL */ > memset(&dev->fc_conf, 0, sizeof dev->fc_conf); > @@ -1904,6 +1922,9 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) > int new_n_rxq; > > new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); > + if (dev->requested_cp_prot_flags) { > + new_n_rxq += 1; > + } > if (new_n_rxq != dev->requested_n_rxq) { > dev->requested_n_rxq = new_n_rxq; > netdev_request_reconfigure(&dev->up); > @@ -1927,6 +1948,53 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args, > } > } > > +static int > +dpdk_cp_prot_set_config(struct netdev *netdev, struct netdev_dpdk *dev, > + const struct smap *args, char **errp) > +{ > + const char *arg = smap_get_def(args, "cp-protection", ""); > + uint64_t flags = 0; > + char buf[256]; > + char *token, *saveptr; > + > + ovs_strzcpy(buf, arg, sizeof(buf)); > + buf[sizeof(buf) - 1] = '\0'; > + > + token = strtok_r(buf, ",", &saveptr); > + while (token) { > + if (strcmp(token, "lacp") == 0) { > + flags |= DPDK_CP_PROT_LACP; > + } else { > + VLOG_WARN_BUF( > + errp, "%s options:cp-protection unknown protocol '%s'", > + netdev_get_name(netdev), token); > + return -1; > + } > + token = strtok_r(NULL, ",", &saveptr); > + } > + > + if (flags && dev->type != DPDK_DEV_ETH) { > + VLOG_WARN_BUF( errp, > + "%s options:cp-protection is only supported on ethernet ports", > + netdev_get_name(netdev)); > + return -1; > + } > + > + if (flags && netdev_is_flow_api_enabled()) { > + VLOG_WARN_BUF(errp, > + "%s options:cp-protection is incompatible with hw-offload", > + netdev_get_name(netdev)); > + return -1; > + } > + > + if (flags != dev->requested_cp_prot_flags) { > + dev->requested_cp_prot_flags = flags; > + netdev_request_reconfigure(netdev); > + } > + > + return 0; > +} > + > static int > netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, > char **errp) > @@ -1946,6 +2014,11 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, > ovs_mutex_lock(&dpdk_mutex); > ovs_mutex_lock(&dev->mutex); > > + if (dpdk_cp_prot_set_config(netdev, dev, args, errp) < 0) { > + err = EINVAL; > + goto out; > + } > + > dpdk_set_rxq_config(dev, args); > > dpdk_process_queue_size(netdev, args, "n_rxq_desc", > @@ -3639,8 +3712,10 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) > { > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > struct rte_eth_dev_info dev_info; > + uint64_t cp_prot_flags; > uint32_t link_speed; > uint32_t dev_flags; > + int n_rxq; > > if (!rte_eth_dev_is_valid_port(dev->port_id)) { > return ENODEV; > @@ -3651,6 +3726,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) > rte_eth_dev_info_get(dev->port_id, &dev_info); > link_speed = dev->link.link_speed; > dev_flags = *dev_info.dev_flags; > + cp_prot_flags = dev->cp_prot_flags; > + n_rxq = netdev->n_rxq; > ovs_mutex_unlock(&dev->mutex); > const struct rte_bus *bus; > const struct rte_pci_device *pci_dev; > @@ -3703,6 +3780,24 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) > ETH_ADDR_ARGS(dev->hwaddr)); > } > > + if (cp_prot_flags) { > + if (cp_prot_flags & DPDK_CP_PROT_UNSUPPORTED) { > + smap_add(args, "cp_protection_queue", "unsupported"); > + if (n_rxq > 1) { > + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 1); > + } else { > + smap_add(args, "rss_queues", "0"); > + } > + } else { > + smap_add_format(args, "cp_protection_queue", "%d", n_rxq - 1); > + if (n_rxq > 2) { > + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 2); > + } else { > + smap_add(args, "rss_queues", "0"); > + } > + } > + } > + > return 0; > } > > @@ -4933,6 +5028,179 @@ static const struct dpdk_qos_ops trtcm_policer_ops = { > .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init > }; > > +static int > +dpdk_cp_prot_add_flow(struct netdev_dpdk *dev, > + const struct rte_flow_attr *attr, > + const struct rte_flow_item items[], > + const struct rte_flow_action actions[], > + const char *desc, bool dry_run) > +{ > + struct rte_flow_error error; > + struct rte_flow *flow; > + size_t num; > + > + if (dry_run) { > + int ret; > + ret = rte_flow_validate(dev->port_id, attr, items, actions, &error); > + if (rte_flow_validate(dev->port_id, attr, items, actions, &error)) { 'if (ret)' > + VLOG_WARN("%s: cp-protection: device does not support %s flow: %s", > + netdev_get_name(&dev->up), desc, error.message); > + } > + return ret; > + } > + > + flow = rte_flow_create(dev->port_id, attr, items, actions, &error); > + if (flow == NULL) { > + VLOG_WARN("%s: cp-protection: failed to add %s flow: %s", > + netdev_get_name(&dev->up), desc, error.message); > + return rte_errno; > + } > + > + num = dev->cp_prot_flows_num + 1; > + dev->cp_prot_flows = xrealloc(dev->cp_prot_flows, sizeof(flow) * num); > + dev->cp_prot_flows[dev->cp_prot_flows_num] = flow; > + dev->cp_prot_flows_num = num; > + > + return 0; > +} > + > +static int > +dpdk_cp_prot_add_traffic_flow(struct netdev_dpdk *dev, > + const struct rte_flow_item items[], > + const char *desc, bool dry_run) > +{ > + const struct rte_flow_attr attr = { .ingress = 1 }; > + const struct rte_flow_action actions[] = { > + { > + .type = RTE_FLOW_ACTION_TYPE_QUEUE, > + .conf = &(const struct rte_flow_action_queue) { > + .index = dev->up.n_rxq - 1, > + }, > + }, > + { .type = RTE_FLOW_ACTION_TYPE_END }, > + }; > + > + if (!dry_run) { > + VLOG_INFO("%s: cp-protection: redirecting %s traffic to queue %d", > + netdev_get_name(&dev->up), desc, dev->up.n_rxq - 1); > + } > + return dpdk_cp_prot_add_flow(dev, &attr, items, actions, desc, dry_run); > +} > + > +static int > +dpdk_cp_prot_rss_configure(struct netdev_dpdk *dev, int rss_n_rxq) > +{ > + struct rte_eth_rss_reta_entry64 *reta_conf; > + size_t reta_conf_size; > + int err; > + > + if (rss_n_rxq == 1) { > + VLOG_INFO("%s: cp-protection: redirecting other traffic to queue 0", > + netdev_get_name(&dev->up)); > + } else { > + VLOG_INFO("%s: cp-protection: applying rss on queues 0-%d", > + netdev_get_name(&dev->up), rss_n_rxq - 1); > + } > + > + reta_conf_size = (dev->reta_size / RTE_ETH_RETA_GROUP_SIZE) > + * sizeof(struct rte_eth_rss_reta_entry64); In dpdk_eth_dev_init, we get reta_size from driver, mlx5_ethdev.c 333├> info->reta_size = priv->reta_idx_n ? 334│ priv->reta_idx_n : config->ind_table_max_size; (gdb) p priv->reta_idx_n $5 = 1 (gdb) p config->ind_table_max_size $6 = 512 and store: dev->reta_size = info.reta_size; Now we use it, dev->reta_size = 1 / RTE_ETH_RETA_GROUP_SIZE (64) but it results in reta_conf_size = 0 > + reta_conf = xmalloc(reta_conf_size); xmalloc only allocates 1 byte (void *p = malloc(size ? size : 1);) > + memset(reta_conf, 0, reta_conf_size); > + > + for (uint16_t i = 0; i < dev->reta_size; i++) { > + uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE; > + uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE; > + reta_conf[idx].mask |= 1ULL << shift; > + reta_conf[idx].reta[shift] = i % rss_n_rxq; > + } > + err = rte_eth_dev_rss_reta_update(dev->port_id, reta_conf, dev->reta_size); > + if (err < 0) { > + VLOG_DBG("%s: failed to configure RSS redirection table: err=%d", > + netdev_get_name(&dev->up), err); > + } > + > + free(reta_conf); > + > + return err; > +} > + > +static int > +dpdk_cp_prot_configure(struct netdev_dpdk *dev, bool dry_run) > +{ > + int err = 0; > + > + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_UNSUPPORTED) { > + goto out; > + } > + if (dev->up.n_rxq < 2) { > + err = ENOTSUP; > + VLOG_DBG("%s: cp-protection: not enough available rx queues", > + netdev_get_name(&dev->up)); > + goto out; > + } > + > + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_LACP) { > + err = dpdk_cp_prot_add_traffic_flow( > + dev, > + (const struct rte_flow_item []) { > + { > + .type = RTE_FLOW_ITEM_TYPE_ETH, > + .spec = &(const struct rte_flow_item_eth){ > + .type = htons(ETH_TYPE_LACP), > + }, > + .mask = &(const struct rte_flow_item_eth){ > + .type = htons(0xffff), > + }, > + }, > + { .type = RTE_FLOW_ITEM_TYPE_END }, > + }, > + "lacp", > + dry_run > + ); > + if (err) { > + goto out; > + } > + } > + > + if (!dry_run && dev->cp_prot_flows_num) { > + /* reconfigure RSS reta in all but the cp protection queue */ > + err = dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq - 1); > + } > + > +out: > + if (!dry_run) { > + dev->cp_prot_flags = dev->requested_cp_prot_flags; > + } > + if (err) { > + dev->requested_cp_prot_flags |= DPDK_CP_PROT_UNSUPPORTED; > + } > + return err; > +} > + > +static void > +dpdk_cp_prot_unconfigure(struct netdev_dpdk *dev) > +{ > + struct rte_flow_error error; > + > + if (dev->cp_prot_flows_num == 0) { > + return; > + } > + > + VLOG_DBG("%s: cp-protection: reset flows", netdev_get_name(&dev->up)); > + > + for (int i = 0; i < dev->cp_prot_flows_num; i++) { > + if (rte_flow_destroy(dev->port_id, dev->cp_prot_flows[i], &error)) { > + VLOG_DBG("%s: cp-protection: failed to destroy flow: %s", > + netdev_get_name(&dev->up), error.message); > + } > + } > + free(dev->cp_prot_flows); > + dev->cp_prot_flows_num = 0; > + dev->cp_prot_flows = NULL; > + > + (void) dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq); > +} > + > static int > netdev_dpdk_reconfigure(struct netdev *netdev) > { > @@ -4943,6 +5211,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > > if (netdev->n_txq == dev->requested_n_txq > && netdev->n_rxq == dev->requested_n_rxq > + && dev->cp_prot_flags == dev->requested_cp_prot_flags > && dev->mtu == dev->requested_mtu > && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode > && dev->rxq_size == dev->requested_rxq_size > @@ -4987,6 +5256,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > } > } > > + dpdk_cp_prot_unconfigure(dev); > + > err = dpdk_eth_dev_init(dev); > if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { > netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; > @@ -5014,6 +5285,20 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > if (!dev->tx_q) { > err = ENOMEM; > } > + if (!err && dev->requested_cp_prot_flags) { > + /* dry run first */ > + err = dpdk_cp_prot_configure(dev, true); > + if (!err) { > + /* if no error, apply configuration */ > + err = dpdk_cp_prot_configure(dev, false); > + } > + if (err) { > + /* no hw support, remove the extra queue & recover gracefully */ > + err = 0; > + dev->requested_n_rxq -= 1; > + netdev_request_reconfigure(netdev); > + } > + } > > netdev_change_seq_changed(netdev); > > @@ -5215,7 +5500,13 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev) > ovs_mutex_lock(&dev->mutex); > if (dev->type == DPDK_DEV_ETH) { > /* TODO: Check if we able to offload some minimal flow. */ > - ret = true; > + if (dev->requested_cp_prot_flags || dev->cp_prot_flags) { > + VLOG_WARN( > + "%s: hw-offload is mutually exclusive with cp-protection", > + netdev_get_name(netdev)); > + } else { > + ret = true; > + } > } > ovs_mutex_unlock(&dev->mutex); > out: > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 36388e3c42d7..7e6ae3df7583 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -3430,6 +3430,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ > <p>This option may only be used with dpdk VF representors.</p> > </column> > > + <column name="options" key="cp-protection" > + type='{"type": "string", "enum": ["set", ["lacp"]]}'> > + <p> > + Allocate an extra Rx queue for control plane packets of the specified > + protocol(s). > + </p> > + <p> > + If the user has already configured multiple > + <code>options:n_rxq</code> on the port, an additional one will be > + allocated for control plane packets. If the hardware cannot satisfy > + the requested number of requested Rx queues, the last Rx queue will > + be assigned for control plane. If only one Rx queue is available or > + if the hardware does not support the RTE flow matchers/actions > + required to redirect the selected protocols, > + <code>cp-protection</code> will be disabled. > + </p> > + <p> > + This feature is multually exclusive with > + <code>other_options:hw-offload</code> as it may conflict with the > + offloaded RTE flows. > + </p> > + <p> > + Disabled by default. > + </p> > + </column> > + > <column name="other_config" key="tx-steering" > type='{"type": "string", > "enum": ["set", ["thread", "hash"]]}'> ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] netdev-dpdk: add control plane protection support 2022-11-11 18:15 ` [PATCH v3] netdev-dpdk: add control plane protection support Kevin Traynor @ 2022-11-14 9:41 ` Kevin Traynor 2022-11-14 12:46 ` Robin Jarry 1 sibling, 0 replies; 3+ messages in thread From: Kevin Traynor @ 2022-11-14 9:41 UTC (permalink / raw) To: Robin Jarry, dev, dev; +Cc: Christophe Fontaine, Ori Kam Fixing To: to add back in OVS ML. +cc Ori, as mlx5 operation discussed below. On 11/11/2022 18:15, Kevin Traynor wrote: > Hi Robin, > > On 21/10/2022 15:53, Robin Jarry wrote: >> Some control protocols are used to maintain link status between >> forwarding engines (e.g. LACP). When the system is not sized properly, >> the PMD threads may not be able to process all incoming traffic from the >> configured Rx queues. When a signaling packet of such protocols is >> dropped, it can cause link flapping, worsening the situation. >> >> Use the RTE flow API to redirect these protocols into a dedicated Rx >> queue. The assumption is made that the ratio between control protocol >> traffic and user data traffic is very low and thus this dedicated Rx >> queue will never get full. The RSS redirection table is re-programmed to >> only use the other Rx queues. The RSS table size is stored in the >> netdev_dpdk structure at port initialization to avoid requesting the >> information again when changing the port configuration. >> >> The additional Rx queue will be assigned a PMD core like any other Rx >> queue. Polling that extra queue may introduce increased latency and >> a slight performance penalty at the benefit of preventing link flapping. >> >> This feature must be enabled per port on specific protocols via the >> cp-protection option. This option takes a coma-separated list of >> protocol names. It is only supported on ethernet ports. >> >> If the user has already configured multiple Rx queues on the port, an >> additional one will be allocated for control plane packets. If the >> hardware cannot satisfy the requested number of requested Rx queues, the >> last Rx queue will be assigned for control plane. If only one Rx queue >> is available, the cp-protection feature will be disabled. If the >> hardware does not support the RTE flow matchers/actions, the feature >> will be disabled. >> >> It cannot be enabled when other_config:hw-offload=true as it may >> conflict with the offloaded RTE flows. Similarly, if hw-offload is >> enabled while some ports already have cp-protection enabled, the RTE >> flow offloading will be disabled on these ports. >> >> Example use: >> >> ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \ >> set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \ >> set interface phy0 options:cp-protection=lacp -- \ >> set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \ >> set interface phy1 options:cp-protection=lacp >> >> As a starting point, only one protocol is supported: LACP. Other >> protocols can be added in the future. NIC compatibility should be >> checked. >> >> To validate that this works as intended, I used a traffic generator to >> generate random traffic slightly above the machine capacity at line rate >> on a two ports bond interface. OVS is configured to receive traffic on >> two VLANs and pop/push them in a br-int bridge based on tags set on >> patch ports. >> >> +----------------------+ >> | DUT | >> |+--------------------+| >> || br-int || default flow, action=NORMAL >> || || >> || patch10 patch11 || >> |+---|-----------|----+| >> | | | | >> |+---|-----------|----+| >> || patch00 patch01 || >> || tag:10 tag:20 || >> || || >> || br-phy || default flow, action=NORMAL >> || || >> || bond0 || balance-slb, lacp=passive, lacp-time=fast >> || phy0 phy1 || >> |+------|-----|-------+| >> +-------|-----|--------+ >> | | >> +-------|-----|--------+ >> | port0 port1 | balance L3/L4, lacp=active, lacp-time=fast >> | lag | mode trunk VLANs 10, 20 >> | | >> | switch | >> | | >> | vlan 10 vlan 20 | mode access >> | port2 port3 | >> +-----|----------|-----+ >> | | >> +-----|----------|-----+ >> | port0 port1 | Random traffic that is properly balanced >> | | across the bond ports in both directions. >> | traffic generator | >> +----------------------+ >> >> Without cp-protection, the bond0 links are randomly switching to >> "defaulted" when one of the LACP packets sent by the switch is dropped >> because the RX queues are full and the PMD threads did not process them >> fast enough. When that happens, all traffic must go through a single >> link which causes above line rate traffic to be dropped. >> >> When cp-protection is enabled, no LACP packet is dropped and the bond >> links remain enabled at all times, maximizing the throughput. >> >> This feature may be considered as "QoS". However, it does not work by >> limiting the rate of traffic explicitly. It only guarantees that some >> protocols have a lower chance of being dropped because the PMD cores >> cannot keep up with regular traffic. >> >> The choice of protocols is limited on purpose. This is not meant to be >> configurable by users. Some limited configurability could be considered >> in the future but it would expose to more potential issues if users are >> accidentally redirecting all traffic in the control plane queue. >> > > Not a full review, but rather send now with issue I saw on CX-5 that we > discussed as it'll impact testing. > >> Cc: Christophe Fontaine <cfontain@redhat.com> >> Cc: Kevin Traynor <ktraynor@redhat.com> >> Signed-off-by: Robin Jarry <rjarry@redhat.com> >> --- >> v2 -> v3: >> >> * Added dry_run validation that rte_flows are all supported by the NIC >> before configuring anything. >> * Added check to make cp-protection and hw-offload mutually exclusive. >> * Removed the "match-all" RSS flow that dealt with redirecting all >> non-control-plane traffic to all but the control-plane Rx queue. Very >> few NICs actually support "match-all" flows without any mask. This was >> replaced by reconfiguring the RSS redirection table. The >> * Made sure to unconfigure everything and remove the extra Rx queue in >> the case the hardware does not support one of the RTE flows. >> * Updated vswitchd/vswitch.xml >> * Added diagnostics info in netdev_dpdk_get_status >> * Tested under load on the following NICs: >> - Intel E810 (2x 25G) >> - Mellanox ConnectX-5 (2x 25G) >> * Basic functionality tested on the following NICs: >> - Intel 82599ES (2x 10G) >> - Intel X710 (4x 10G) >> - Mellanox ConnectX-4 (2x 25G) >> >> Documentation/topics/dpdk/phy.rst | 55 ++++++ >> lib/netdev-dpdk.c | 293 +++++++++++++++++++++++++++++- >> vswitchd/vswitch.xml | 26 +++ >> 3 files changed, 373 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst >> index 937f4c40e5a8..86e69d79b104 100644 >> --- a/Documentation/topics/dpdk/phy.rst >> +++ b/Documentation/topics/dpdk/phy.rst >> @@ -131,6 +131,61 @@ possible with DPDK acceleration. It is possible to configure multiple Rx queues >> for ``dpdk`` ports, thus ensuring this is not a bottleneck for performance. For >> information on configuring PMD threads, refer to :doc:`pmd`. >> >> +Control Plane Protection >> +------------------------ >> + >> +Some control protocols are used to maintain link status between forwarding >> +engines. In SDN environments, these packets share the same physical network >> +than the user data traffic. >> + >> +When the system is not sized properly, the PMD threads may not be able to >> +process all incoming traffic from the configured Rx queues. When a signaling >> +packet of such protocols is dropped, it can cause link flapping, worsening the >> +situation. >> + >> +Some physical NICs can be programmed to put these protocols in a dedicated >> +hardware Rx queue using the rte_flow__ API. >> + >> +__ https://doc.dpdk.org/guides-21.11/prog_guide/rte_flow.html#device-compatibility >> + >> +The currently supported control plane protocols are: >> + >> +``lacp`` >> + `Link Aggregation Control Protocol`__. Ether type ``0x8809``. >> + >> + __ https://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf >> + >> +.. warning:: >> + >> + This feature is not compatible with all NICs. Refer to vendor documentation >> + for more information. >> + >> +Control plane protection must be enabled on specific protocols per port. The >> +``cp-protection`` option requires a coma separated list of protocol names:: >> + >> + $ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ >> + options:dpdk-devargs=0000:01:00.0 options:cp-protection=lacp >> + >> +.. note:: >> + >> + If multiple Rx queues are already configured, regular RSS (Receive Side >> + Scaling) queue balancing is done on all but the extra control plane >> + protection queue. >> + >> +.. tip:: >> + >> + You can check if control plane protection is supported on a port with the >> + following command:: >> + >> + $ ovs-vsctl get interface dpdk-p0 status >> + {cp_protection_queue="2", driver_name=..., rss_queues="0-1"} >> + >> + If the hardware does not support redirecting control plane traffic to >> + a dedicated queue, it will be explicit:: >> + >> + $ ovs-vsctl get interface dpdk-p0 status >> + {cp_protection_queue=unsupported, driver_name=..., rss_queues="0-1"} >> + >> .. _dpdk-phy-flow-control: >> >> Flow Control >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index 0dd655507b50..94f04437a641 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -410,6 +410,11 @@ enum dpdk_hw_ol_features { >> NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4, >> }; >> >> +enum dpdk_cp_prot_flags { >> + DPDK_CP_PROT_UNSUPPORTED = 1 << 0, >> + DPDK_CP_PROT_LACP = 1 << 1, >> +}; >> + >> /* >> * In order to avoid confusion in variables names, following naming convention >> * should be used, if possible: >> @@ -453,6 +458,7 @@ struct netdev_dpdk { >> }; >> struct dpdk_tx_queue *tx_q; >> struct rte_eth_link link; >> + uint16_t reta_size; >> ); >> >> PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, >> @@ -529,6 +535,13 @@ struct netdev_dpdk { >> >> /* VF configuration. */ >> struct eth_addr requested_hwaddr; >> + >> + /* Requested control plane protection flags, >> + * from the enum set 'dpdk_cp_prot_flags' */ >> + uint64_t requested_cp_prot_flags; >> + uint64_t cp_prot_flags; >> + size_t cp_prot_flows_num; >> + struct rte_flow **cp_prot_flows; >> ); >> >> PADDED_MEMBERS(CACHE_LINE_SIZE, >> @@ -1192,6 +1205,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) >> netdev_get_name(&dev->up)); >> } >> } >> + dev->reta_size = info.reta_size; >> >> n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); >> n_txq = MIN(info.max_tx_queues, dev->up.n_txq); >> @@ -1309,6 +1323,10 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, >> dev->requested_n_txq = NR_QUEUE; >> dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; >> dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; >> + dev->requested_cp_prot_flags = 0; >> + dev->cp_prot_flags = 0; >> + dev->cp_prot_flows_num = 0; >> + dev->cp_prot_flows = NULL; >> >> /* Initialize the flow control to NULL */ >> memset(&dev->fc_conf, 0, sizeof dev->fc_conf); >> @@ -1904,6 +1922,9 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) >> int new_n_rxq; >> >> new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); >> + if (dev->requested_cp_prot_flags) { >> + new_n_rxq += 1; >> + } >> if (new_n_rxq != dev->requested_n_rxq) { >> dev->requested_n_rxq = new_n_rxq; >> netdev_request_reconfigure(&dev->up); >> @@ -1927,6 +1948,53 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args, >> } >> } >> >> +static int >> +dpdk_cp_prot_set_config(struct netdev *netdev, struct netdev_dpdk *dev, >> + const struct smap *args, char **errp) >> +{ >> + const char *arg = smap_get_def(args, "cp-protection", ""); >> + uint64_t flags = 0; >> + char buf[256]; >> + char *token, *saveptr; >> + >> + ovs_strzcpy(buf, arg, sizeof(buf)); >> + buf[sizeof(buf) - 1] = '\0'; >> + >> + token = strtok_r(buf, ",", &saveptr); >> + while (token) { >> + if (strcmp(token, "lacp") == 0) { >> + flags |= DPDK_CP_PROT_LACP; >> + } else { >> + VLOG_WARN_BUF( >> + errp, "%s options:cp-protection unknown protocol '%s'", >> + netdev_get_name(netdev), token); >> + return -1; >> + } >> + token = strtok_r(NULL, ",", &saveptr); >> + } >> + >> + if (flags && dev->type != DPDK_DEV_ETH) { >> + VLOG_WARN_BUF( errp, >> + "%s options:cp-protection is only supported on ethernet ports", >> + netdev_get_name(netdev)); >> + return -1; >> + } >> + >> + if (flags && netdev_is_flow_api_enabled()) { >> + VLOG_WARN_BUF(errp, >> + "%s options:cp-protection is incompatible with hw-offload", >> + netdev_get_name(netdev)); >> + return -1; >> + } >> + >> + if (flags != dev->requested_cp_prot_flags) { >> + dev->requested_cp_prot_flags = flags; >> + netdev_request_reconfigure(netdev); >> + } >> + >> + return 0; >> +} >> + >> static int >> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, >> char **errp) >> @@ -1946,6 +2014,11 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, >> ovs_mutex_lock(&dpdk_mutex); >> ovs_mutex_lock(&dev->mutex); >> >> + if (dpdk_cp_prot_set_config(netdev, dev, args, errp) < 0) { >> + err = EINVAL; >> + goto out; >> + } >> + >> dpdk_set_rxq_config(dev, args); >> >> dpdk_process_queue_size(netdev, args, "n_rxq_desc", >> @@ -3639,8 +3712,10 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) >> { >> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> struct rte_eth_dev_info dev_info; >> + uint64_t cp_prot_flags; >> uint32_t link_speed; >> uint32_t dev_flags; >> + int n_rxq; >> >> if (!rte_eth_dev_is_valid_port(dev->port_id)) { >> return ENODEV; >> @@ -3651,6 +3726,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) >> rte_eth_dev_info_get(dev->port_id, &dev_info); >> link_speed = dev->link.link_speed; >> dev_flags = *dev_info.dev_flags; >> + cp_prot_flags = dev->cp_prot_flags; >> + n_rxq = netdev->n_rxq; >> ovs_mutex_unlock(&dev->mutex); >> const struct rte_bus *bus; >> const struct rte_pci_device *pci_dev; >> @@ -3703,6 +3780,24 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) >> ETH_ADDR_ARGS(dev->hwaddr)); >> } >> >> + if (cp_prot_flags) { >> + if (cp_prot_flags & DPDK_CP_PROT_UNSUPPORTED) { >> + smap_add(args, "cp_protection_queue", "unsupported"); >> + if (n_rxq > 1) { >> + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 1); >> + } else { >> + smap_add(args, "rss_queues", "0"); >> + } >> + } else { >> + smap_add_format(args, "cp_protection_queue", "%d", n_rxq - 1); >> + if (n_rxq > 2) { >> + smap_add_format(args, "rss_queues", "0-%d", n_rxq - 2); >> + } else { >> + smap_add(args, "rss_queues", "0"); >> + } >> + } >> + } >> + >> return 0; >> } >> >> @@ -4933,6 +5028,179 @@ static const struct dpdk_qos_ops trtcm_policer_ops = { >> .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init >> }; >> >> +static int >> +dpdk_cp_prot_add_flow(struct netdev_dpdk *dev, >> + const struct rte_flow_attr *attr, >> + const struct rte_flow_item items[], >> + const struct rte_flow_action actions[], >> + const char *desc, bool dry_run) >> +{ >> + struct rte_flow_error error; >> + struct rte_flow *flow; >> + size_t num; >> + >> + if (dry_run) { >> + int ret; >> + ret = rte_flow_validate(dev->port_id, attr, items, actions, &error); >> + if (rte_flow_validate(dev->port_id, attr, items, actions, &error)) { > > 'if (ret)' > >> + VLOG_WARN("%s: cp-protection: device does not support %s flow: %s", >> + netdev_get_name(&dev->up), desc, error.message); >> + } >> + return ret; >> + } >> + >> + flow = rte_flow_create(dev->port_id, attr, items, actions, &error); >> + if (flow == NULL) { >> + VLOG_WARN("%s: cp-protection: failed to add %s flow: %s", >> + netdev_get_name(&dev->up), desc, error.message); >> + return rte_errno; >> + } >> + >> + num = dev->cp_prot_flows_num + 1; >> + dev->cp_prot_flows = xrealloc(dev->cp_prot_flows, sizeof(flow) * num); >> + dev->cp_prot_flows[dev->cp_prot_flows_num] = flow; >> + dev->cp_prot_flows_num = num; >> + >> + return 0; >> +} >> + >> +static int >> +dpdk_cp_prot_add_traffic_flow(struct netdev_dpdk *dev, >> + const struct rte_flow_item items[], >> + const char *desc, bool dry_run) >> +{ >> + const struct rte_flow_attr attr = { .ingress = 1 }; >> + const struct rte_flow_action actions[] = { >> + { >> + .type = RTE_FLOW_ACTION_TYPE_QUEUE, >> + .conf = &(const struct rte_flow_action_queue) { >> + .index = dev->up.n_rxq - 1, >> + }, >> + }, >> + { .type = RTE_FLOW_ACTION_TYPE_END }, >> + }; >> + >> + if (!dry_run) { >> + VLOG_INFO("%s: cp-protection: redirecting %s traffic to queue %d", >> + netdev_get_name(&dev->up), desc, dev->up.n_rxq - 1); >> + } >> + return dpdk_cp_prot_add_flow(dev, &attr, items, actions, desc, dry_run); >> +} >> + >> +static int >> +dpdk_cp_prot_rss_configure(struct netdev_dpdk *dev, int rss_n_rxq) >> +{ >> + struct rte_eth_rss_reta_entry64 *reta_conf; >> + size_t reta_conf_size; >> + int err; >> + >> + if (rss_n_rxq == 1) { >> + VLOG_INFO("%s: cp-protection: redirecting other traffic to queue 0", >> + netdev_get_name(&dev->up)); >> + } else { >> + VLOG_INFO("%s: cp-protection: applying rss on queues 0-%d", >> + netdev_get_name(&dev->up), rss_n_rxq - 1); >> + } >> + >> + reta_conf_size = (dev->reta_size / RTE_ETH_RETA_GROUP_SIZE) >> + * sizeof(struct rte_eth_rss_reta_entry64); > > In dpdk_eth_dev_init, we get reta_size from driver, > > mlx5_ethdev.c > > 333├> info->reta_size = priv->reta_idx_n ? > 334│ priv->reta_idx_n : config->ind_table_max_size; > > (gdb) p priv->reta_idx_n > $5 = 1 > (gdb) p config->ind_table_max_size > $6 = 512 > > and store: > dev->reta_size = info.reta_size; > > Now we use it, > dev->reta_size = 1 / RTE_ETH_RETA_GROUP_SIZE (64) > but it results in reta_conf_size = 0 > >> + reta_conf = xmalloc(reta_conf_size); > > xmalloc only allocates 1 byte (void *p = malloc(size ? size : 1);) > >> + memset(reta_conf, 0, reta_conf_size); >> + >> + for (uint16_t i = 0; i < dev->reta_size; i++) { >> + uint16_t idx = i / RTE_ETH_RETA_GROUP_SIZE; >> + uint16_t shift = i % RTE_ETH_RETA_GROUP_SIZE; >> + reta_conf[idx].mask |= 1ULL << shift; >> + reta_conf[idx].reta[shift] = i % rss_n_rxq; >> + } >> + err = rte_eth_dev_rss_reta_update(dev->port_id, reta_conf, dev->reta_size); >> + if (err < 0) { >> + VLOG_DBG("%s: failed to configure RSS redirection table: err=%d", >> + netdev_get_name(&dev->up), err); >> + } >> + >> + free(reta_conf); >> + >> + return err; >> +} >> + >> +static int >> +dpdk_cp_prot_configure(struct netdev_dpdk *dev, bool dry_run) >> +{ >> + int err = 0; >> + >> + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_UNSUPPORTED) { >> + goto out; >> + } >> + if (dev->up.n_rxq < 2) { >> + err = ENOTSUP; >> + VLOG_DBG("%s: cp-protection: not enough available rx queues", >> + netdev_get_name(&dev->up)); >> + goto out; >> + } >> + >> + if (dev->requested_cp_prot_flags & DPDK_CP_PROT_LACP) { >> + err = dpdk_cp_prot_add_traffic_flow( >> + dev, >> + (const struct rte_flow_item []) { >> + { >> + .type = RTE_FLOW_ITEM_TYPE_ETH, >> + .spec = &(const struct rte_flow_item_eth){ >> + .type = htons(ETH_TYPE_LACP), >> + }, >> + .mask = &(const struct rte_flow_item_eth){ >> + .type = htons(0xffff), >> + }, >> + }, >> + { .type = RTE_FLOW_ITEM_TYPE_END }, >> + }, >> + "lacp", >> + dry_run >> + ); >> + if (err) { >> + goto out; >> + } >> + } >> + >> + if (!dry_run && dev->cp_prot_flows_num) { >> + /* reconfigure RSS reta in all but the cp protection queue */ >> + err = dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq - 1); >> + } >> + >> +out: >> + if (!dry_run) { >> + dev->cp_prot_flags = dev->requested_cp_prot_flags; >> + } >> + if (err) { >> + dev->requested_cp_prot_flags |= DPDK_CP_PROT_UNSUPPORTED; >> + } >> + return err; >> +} >> + >> +static void >> +dpdk_cp_prot_unconfigure(struct netdev_dpdk *dev) >> +{ >> + struct rte_flow_error error; >> + >> + if (dev->cp_prot_flows_num == 0) { >> + return; >> + } >> + >> + VLOG_DBG("%s: cp-protection: reset flows", netdev_get_name(&dev->up)); >> + >> + for (int i = 0; i < dev->cp_prot_flows_num; i++) { >> + if (rte_flow_destroy(dev->port_id, dev->cp_prot_flows[i], &error)) { >> + VLOG_DBG("%s: cp-protection: failed to destroy flow: %s", >> + netdev_get_name(&dev->up), error.message); >> + } >> + } >> + free(dev->cp_prot_flows); >> + dev->cp_prot_flows_num = 0; >> + dev->cp_prot_flows = NULL; >> + >> + (void) dpdk_cp_prot_rss_configure(dev, dev->up.n_rxq); >> +} >> + >> static int >> netdev_dpdk_reconfigure(struct netdev *netdev) >> { >> @@ -4943,6 +5211,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> >> if (netdev->n_txq == dev->requested_n_txq >> && netdev->n_rxq == dev->requested_n_rxq >> + && dev->cp_prot_flags == dev->requested_cp_prot_flags >> && dev->mtu == dev->requested_mtu >> && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode >> && dev->rxq_size == dev->requested_rxq_size >> @@ -4987,6 +5256,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> } >> } >> >> + dpdk_cp_prot_unconfigure(dev); >> + >> err = dpdk_eth_dev_init(dev); >> if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { >> netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; >> @@ -5014,6 +5285,20 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> if (!dev->tx_q) { >> err = ENOMEM; >> } >> + if (!err && dev->requested_cp_prot_flags) { >> + /* dry run first */ >> + err = dpdk_cp_prot_configure(dev, true); >> + if (!err) { >> + /* if no error, apply configuration */ >> + err = dpdk_cp_prot_configure(dev, false); >> + } >> + if (err) { >> + /* no hw support, remove the extra queue & recover gracefully */ >> + err = 0; >> + dev->requested_n_rxq -= 1; >> + netdev_request_reconfigure(netdev); >> + } >> + } >> >> netdev_change_seq_changed(netdev); >> >> @@ -5215,7 +5500,13 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev) >> ovs_mutex_lock(&dev->mutex); >> if (dev->type == DPDK_DEV_ETH) { >> /* TODO: Check if we able to offload some minimal flow. */ >> - ret = true; >> + if (dev->requested_cp_prot_flags || dev->cp_prot_flags) { >> + VLOG_WARN( >> + "%s: hw-offload is mutually exclusive with cp-protection", >> + netdev_get_name(netdev)); >> + } else { >> + ret = true; >> + } >> } >> ovs_mutex_unlock(&dev->mutex); >> out: >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >> index 36388e3c42d7..7e6ae3df7583 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -3430,6 +3430,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ >> <p>This option may only be used with dpdk VF representors.</p> >> </column> >> >> + <column name="options" key="cp-protection" >> + type='{"type": "string", "enum": ["set", ["lacp"]]}'> >> + <p> >> + Allocate an extra Rx queue for control plane packets of the specified >> + protocol(s). >> + </p> >> + <p> >> + If the user has already configured multiple >> + <code>options:n_rxq</code> on the port, an additional one will be >> + allocated for control plane packets. If the hardware cannot satisfy >> + the requested number of requested Rx queues, the last Rx queue will >> + be assigned for control plane. If only one Rx queue is available or >> + if the hardware does not support the RTE flow matchers/actions >> + required to redirect the selected protocols, >> + <code>cp-protection</code> will be disabled. >> + </p> >> + <p> >> + This feature is multually exclusive with >> + <code>other_options:hw-offload</code> as it may conflict with the >> + offloaded RTE flows. >> + </p> >> + <p> >> + Disabled by default. >> + </p> >> + </column> >> + >> <column name="other_config" key="tx-steering" >> type='{"type": "string", >> "enum": ["set", ["thread", "hash"]]}'> > > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] netdev-dpdk: add control plane protection support 2022-11-11 18:15 ` [PATCH v3] netdev-dpdk: add control plane protection support Kevin Traynor 2022-11-14 9:41 ` Kevin Traynor @ 2022-11-14 12:46 ` Robin Jarry 1 sibling, 0 replies; 3+ messages in thread From: Robin Jarry @ 2022-11-14 12:46 UTC (permalink / raw) To: Kevin Traynor; +Cc: Christophe Fontaine, dev, Ori Kam, dev Hi Kevin, Kevin Traynor, Nov 11, 2022 at 19:15: > > + reta_conf_size = (dev->reta_size / RTE_ETH_RETA_GROUP_SIZE) > > + * sizeof(struct rte_eth_rss_reta_entry64); > > In dpdk_eth_dev_init, we get reta_size from driver, > > mlx5_ethdev.c > > 333├> info->reta_size = priv->reta_idx_n ? > 334│ priv->reta_idx_n : config->ind_table_max_size; > > (gdb) p priv->reta_idx_n > $5 = 1 > (gdb) p config->ind_table_max_size > $6 = 512 > > and store: > dev->reta_size = info.reta_size; > > Now we use it, > dev->reta_size = 1 / RTE_ETH_RETA_GROUP_SIZE (64) > but it results in reta_conf_size = 0 > > > + reta_conf = xmalloc(reta_conf_size); > > xmalloc only allocates 1 byte (void *p = malloc(size ? size : 1);) Hmm indeed this is bad :) Since reworking the RSS fallback I may not have re-tested with CX-5, only with Intel NICs. I am quite surprised of this return value. Could it be an issue with the mlx5 dpdk driver? Thanks for testing! ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-11-14 12:46 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20221021145308.141933-1-rjarry@redhat.com> 2022-11-11 18:15 ` [PATCH v3] netdev-dpdk: add control plane protection support Kevin Traynor 2022-11-14 9:41 ` Kevin Traynor 2022-11-14 12:46 ` Robin Jarry
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).