From: Yongseok Koh <yskoh@mellanox.com> To: Slava Ovsiienko <viacheslavo@mellanox.com> Cc: Shahaf Shuler <shahafs@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org> Subject: Re: [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Date: Thu, 1 Nov 2018 20:32:08 +0000 Message-ID: <20181101203200.GA6118@mtidpdk.mti.labs.mlnx> (raw) In-Reply-To: <1541074741-41368-1-git-send-email-viacheslavo@mellanox.com> On Thu, Nov 01, 2018 at 05:19:21AM -0700, Slava Ovsiienko wrote: > This patchset adds the VXLAN encapsulation/decapsulation hardware > offload feature for E-Switch. > > A typical use case of tunneling infrastructure is port representors > in switchdev mode, with VXLAN traffic encapsulation performed on > traffic coming *from* a representor and decapsulation on traffic > going *to* that representor, in order to transparently assign > a given VXLAN to VF traffic. > > Since these actions are supported at the E-Switch level, the "transfer" > attribute must be set on such flow rules. They must also be combined > with a port redirection action to make sense. > > Since only ingress is supported, encapsulation flow rules are normally > applied on a physical port and emit traffic to a port representor. > The opposite order is used for decapsulation. > > Like other mlx5 E-Switch flow rule actions, these ones are implemented > through Linux's TC flower API. Since the Linux interface for VXLAN > encap/decap involves virtual network devices (i.e. ip link add type > vxlan [...]), the PMD dynamically spawns them on a needed basis > through Netlink calls. These VXLAN implicitly created devices are > called VTEPs (Virtual Tunnel End Points). > > VXLAN interfaces are dynamically created for each local port of > outer networks and then used as targets for TC "flower" filters > in order to perform encapsulation. For decapsulation the VXLAN > devices are created for each unique UDP-port. These VXLAN interfaces > are system-wide, the only one device with given UDP port can exist > in the system (the attempt of creating another device with the > same UDP local port returns EEXIST), so PMD should support the > shared (between PMD instances) device database. > > Rules samples consideraions: > > $PF - physical device, outer network > $VF - representor for VF, outer/inner network > $VXLAN - VTEP netdev name > $PF_OUTER_IP - $PF IP (v4 or v6) within outer network > $REMOTE_IP - remote peer IP (v4 or v6) within outer network > $LOCAL_PORT - local UDP port > $REMOTE_PORT - remote UDP port > > VXLAN VTEP creation with iproute2 (PMD does the same via Netlink): > > - for encapsulation: > > ip link add $VXLAN type vxlan dstport $LOCAL_PORT external dev $PF > ip link set dev $VXLAN up > tc qdisc del dev $VXLAN ingress > tc qdisc add dev $VXLAN ingress > > $LOCAL_PORT for egress encapsulated traffic (note, this is not > source UDP port in the VXLAN header, it is just UDP port assigned > to VTEP, no practical usage) is selected from available UDP ports > automatically in range 30000-60000. > > - for decapsulation: > > ip link add $VXLAN type vxlan dstport $LOCAL_PORT external > ip link set dev $VXLAN up > tc qdisc del dev $VXLAN ingress > tc qdisc add dev $VXLAN ingress > > $LOCAL_PORT is UDP port receiving the VXLAN traffic from outer networks. > > All ingress UDP traffic with given UDP destination port from ALL existing > netdevs is routed by kernel to the $VXLAN net device. While applying the > rule the kernel checks the IP parameter withing rule, determines the > appropriate underlaying PF and tryes to setup the rule hardware offload. > > VXLAN encapsulation > > VXLAN encap rules are applied to the VF ingress traffic and have the > VTEP as actual redirection destinations instead of outer PF. > The encapsulation rule should provide: > - redirection action VF->PF > - VF port ID > - some inner network parameters (MACs) > - the tunnel outer source IP (v4/v6), (IS A MUST) > - the tunnel outer destination IP (v4/v6), (IS A MUST). > - VNI - Virtual Network Identifier (IS A MUST) > > VXLAN encapsulation rule sample for tc utility: > > tc filter add dev $VF protocol all parent ffff: flower skip_sw \ > action tunnel_key set dst_port $REMOTE_PORT \ > src_ip $PF_OUTER_IP dst_ip $REMOTE_IP id $VNI \ > action mirred egress redirect dev $VXLAN > > VXLAN encapsulation rule sample for testpmd: > > - Setting up outer properties of VXLAN tunnel: > > set vxlan ip-version ipv4 vni $VNI \ > udp-src $IGNORED udp-dst $REMOTE_PORT \ > ip-src $PF_OUTER_IP ip-dst $REMOTE_IP \ > eth-src $IGNORED eth-dst $REMOTE_MAC > > - Creating a flow rule on port ID 4 performing VXLAN encapsulation > with the abovementioned properties and directing the resulting > traffic to port ID 0: > > flow create 4 ingress transfer pattern eth src is $INNER_MAC / end > actions vxlan_encap / port_id id 0 / end > > There is no direct way found to provide kernel with all required > encapsulatioh header parameters. The encapsulation VTEP is created > attached to the outer interface and assumed as default path for > egress encapsulated traffic. The outer tunnel IP address are > assigned to interface using Netlink, the implicit route is > created like this: > > ip addr add <src_ip> peer <dst_ip> dev <outer> scope link > > The peer address option provides implicit route, and scope link > attribute reduces the risk of conflicts. At initialization time all > local scope link addresses are flushed from the outer network device. > > The destination MAC address is provided via permenent neigh rule: > > ip neigh add dev <outer> lladdr <dst_mac> to <dst_ip> nud permanent > > At initialization time all neigh rules of permanent type are flushed > from the outer network device. > > VXLAN decapsulation > > VXLAN decap rules are applied to the ingress traffic of VTEP ($VXLAN) > device instead of PF. The decapsulation rule should provide: > - redirection action PF->VF > - VF port ID as redirection destination > - $VXLAN device as ingress traffic source > - the tunnel outer source IP (v4/v6), (optional) > - the tunnel outer destination IP (v4/v6), (IS A MUST) > - the tunnel local UDP port (IS A MUST, PMD looks for appropriate VTEP > with given local UDP port) > - VNI - Virtual Network Identifier (IS A MUST) > > VXLAN decap rule sample for tc utility: > > tc filter add dev $VXLAN protocol all parent ffff: flower skip_sw \ > enc_src_ip $REMOTE_IP enc_dst_ip $PF_OUTER_IP enc_key_id $VNI \ > nc_dst_port $LOCAL_PORT \ > action tunnel_key unset action mirred egress redirect dev $VF > > VXLAN decap rule sample for testpmd: > > - Creating a flow on port ID 0 performing VXLAN decapsulation and directing > the result to port ID 4 with checking inner properties: > > flow create 0 ingress transfer pattern / > ipv4 src is $REMOTE_IP dst $PF_LOCAL_IP / > udp src is 9999 dst is $LOCAL_PORT / vxlan vni is $VNI / > eth src is 00:11:22:33:44:55 dst is $INNER_MAC / end > actions vxlan_decap / port_id id 4 / end > > The VXLAN encap/decap rules constrains (implied by current kernel support) > > - VXLAN decapsulation provided for PF->VF direction only > - VXLAN encapsulation provided for VF->PF direction only > - current implementation will support non-shared database of VTEPs > (impossible simultaneous usage of the same UDP port by several > instances of DPDK apps) > > Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> > --- Excellent commit log!! One nit. Please change e-switch in the title/log to E-Switch. Thanks, Yongseok > v3: > * patchset is resplitted into more dedicated parts > * decapsulation rule takes MAC from inner eth item > * appropriate RTE_BEx are replaced with runtime rte_cpu_xxx > * E-Switch Flow counter deletion is fixed > * VTEP management routines are refactored > * found typos are corrected > > v2: > * removed non-VXLAN related parts > * multipart Netlink messages support > * local IP and peer IP rules management > * neigh IP address to MAC address rules > * management rules cleanup at outer device initialization > * attached devices cleanup at outer device initialization > > v1: > * http://patches.dpdk.org/patch/45800/ > * Refactored code of initial experimental proposal > > v0: > * http://patches.dpdk.org/cover/44080/ > * Initial proposal by Adrien Mazarguil <adrien.mazarguil@6wind.com> > > Viacheslav Ovsiienko (13): > net/mlx5: prepare makefile for adding e-switch VXLAN > net/mlx5: prepare meson.build for adding e-switch VXLAN > net/mlx5: add necessary definitions for e-switch VXLAN > net/mlx5: add necessary structures for e-switch VXLAN > net/mlx5: swap items/actions validations for e-switch rules > net/mlx5: add e-switch VXLAN support to validation routine > net/mlx5: add VXLAN support to flow prepare routine > net/mlx5: add VXLAN support to flow translate routine > net/mlx5: e-switch VXLAN netlink routines update > net/mlx5: fix e-switch Flow counter deletion > net/mlx5: add e-switch VXLAN tunnel devices management > net/mlx5: add e-switch VXLAN encapsulation rules > net/mlx5: add e-switch VXLAN rule cleanup routines > > drivers/net/mlx5/Makefile | 85 + > drivers/net/mlx5/meson.build | 34 + > drivers/net/mlx5/mlx5_flow.h | 11 + > drivers/net/mlx5/mlx5_flow_tcf.c | 5118 +++++++++++++++++++++++++++++--------- > 4 files changed, 4107 insertions(+), 1141 deletions(-) > > -- > 1.8.3.1 >
next prev parent reply other threads:[~2018-11-01 20:32 UTC|newest] Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-02 6:30 [dpdk-dev] [PATCH 1/5] net/mlx5: add VXLAN encap/decap support for e-switch Slava Ovsiienko 2018-10-02 6:30 ` [dpdk-dev] [PATCH 2/5] net/mlx5: e-switch VXLAN netlink routines update Slava Ovsiienko 2018-10-02 6:30 ` [dpdk-dev] [PATCH 3/5] net/mlx5: e-switch VXLAN flow validation routine Slava Ovsiienko 2018-10-02 6:30 ` [dpdk-dev] [PATCH 4/5] net/mlx5: e-switch VXLAN flow translation routine Slava Ovsiienko 2018-10-02 6:30 ` [dpdk-dev] [PATCH 5/5] net/mlx5: e-switch VXLAN tunnel devices management Slava Ovsiienko 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 0/7] net/mlx5: e-switch VXLAN encap/decap hardware offload Viacheslav Ovsiienko 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 1/7] net/mlx5: e-switch VXLAN configuration and definitions Viacheslav Ovsiienko 2018-10-23 10:01 ` Yongseok Koh 2018-10-25 12:50 ` Slava Ovsiienko 2018-10-25 23:33 ` Yongseok Koh 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 2/7] net/mlx5: e-switch VXLAN flow validation routine Viacheslav Ovsiienko 2018-10-23 10:04 ` Yongseok Koh 2018-10-25 13:53 ` Slava Ovsiienko 2018-10-26 3:07 ` Yongseok Koh 2018-10-26 8:39 ` Slava Ovsiienko 2018-10-26 21:56 ` Yongseok Koh 2018-10-29 9:33 ` Slava Ovsiienko 2018-10-29 18:26 ` Yongseok Koh 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 3/7] net/mlx5: e-switch VXLAN flow translation routine Viacheslav Ovsiienko 2018-10-23 10:06 ` Yongseok Koh 2018-10-25 14:37 ` Slava Ovsiienko 2018-10-26 4:22 ` Yongseok Koh 2018-10-26 9:06 ` Slava Ovsiienko 2018-10-26 22:10 ` Yongseok Koh 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 4/7] net/mlx5: e-switch VXLAN netlink routines update Viacheslav Ovsiienko 2018-10-23 10:07 ` Yongseok Koh 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management Viacheslav Ovsiienko 2018-10-25 0:28 ` Yongseok Koh 2018-10-25 20:21 ` Slava Ovsiienko 2018-10-26 6:25 ` Yongseok Koh 2018-10-26 9:35 ` Slava Ovsiienko 2018-10-26 22:42 ` Yongseok Koh 2018-10-29 11:53 ` Slava Ovsiienko 2018-10-29 18:42 ` Yongseok Koh 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 6/7] net/mlx5: e-switch VXLAN encapsulation rules management Viacheslav Ovsiienko 2018-10-25 0:33 ` Yongseok Koh 2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 7/7] net/mlx5: e-switch VXLAN rule cleanup routines Viacheslav Ovsiienko 2018-10-25 0:36 ` Yongseok Koh 2018-10-25 20:32 ` Slava Ovsiienko 2018-10-26 6:30 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 01/13] net/mlx5: prepare makefile for adding e-switch VXLAN Slava Ovsiienko 2018-11-01 20:33 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 02/13] net/mlx5: prepare meson.build " Slava Ovsiienko 2018-11-01 20:33 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko 2018-11-01 20:35 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 04/13] net/mlx5: add necessary structures " Slava Ovsiienko 2018-11-01 20:36 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 05/13] net/mlx5: swap items/actions validations for e-switch rules Slava Ovsiienko 2018-11-01 20:37 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 06/13] net/mlx5: add e-switch VXLAN support to validation routine Slava Ovsiienko 2018-11-01 20:49 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko 2018-11-01 21:03 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko 2018-11-01 21:18 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 09/13] net/mlx5: e-switch VXLAN netlink routines update Slava Ovsiienko 2018-11-01 21:21 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 10/13] net/mlx5: fix e-switch Flow counter deletion Slava Ovsiienko 2018-11-01 22:00 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 11/13] net/mlx5: add e-switch VXLAN tunnel devices management Slava Ovsiienko 2018-11-01 23:59 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 12/13] net/mlx5: add e-switch VXLAN encapsulation rules Slava Ovsiienko 2018-11-02 0:01 ` Yongseok Koh 2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 13/13] net/mlx5: add e-switch VXLAN rule cleanup routines Slava Ovsiienko 2018-11-02 0:01 ` Yongseok Koh 2018-11-01 20:32 ` Yongseok Koh [this message] 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 01/13] net/mlx5: prepare makefile for adding E-Switch VXLAN Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 01/13] net/mlx5: prepare makefile for adding E-Switch VXLAN Slava Ovsiienko 2018-11-12 20:01 ` [dpdk-dev] [PATCH 0/4] net/mlx5: prepare to add E-switch rule flags check Slava Ovsiienko 2018-11-12 20:01 ` [dpdk-dev] [PATCH 1/4] net/mlx5: prepare Netlink communication routine to fix Slava Ovsiienko 2018-11-13 13:21 ` Shahaf Shuler 2018-11-12 20:01 ` [dpdk-dev] [PATCH 2/4] net/mlx5: fix Netlink communication routine Slava Ovsiienko 2018-11-13 13:21 ` Shahaf Shuler 2018-11-14 12:57 ` Slava Ovsiienko 2018-11-12 20:01 ` [dpdk-dev] [PATCH 3/4] net/mlx5: prepare to add E-switch rule flags check Slava Ovsiienko 2018-11-12 20:01 ` [dpdk-dev] [PATCH 4/4] net/mlx5: add E-switch rule hardware offload flag check Slava Ovsiienko 2018-11-13 13:21 ` [dpdk-dev] [PATCH 0/4] net/mlx5: prepare to add E-switch rule flags check Shahaf Shuler 2018-11-14 14:56 ` Shahaf Shuler 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 03/13] net/mlx5: add necessary definitions for E-Switch VXLAN Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 02/13] net/mlx5: prepare meson.build for adding " Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 04/13] net/mlx5: add necessary structures for " Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 05/13] net/mlx5: swap items/actions validations for E-Switch rules Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 06/13] net/mlx5: add E-Switch VXLAN support to validation routine Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 09/13] net/mlx5: update E-Switch VXLAN netlink routines Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 10/13] net/mlx5: fix E-Switch Flow counter deletion Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 11/13] net/mlx5: add E-switch VXLAN tunnel devices management Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 12/13] net/mlx5: add E-Switch VXLAN encapsulation rules Slava Ovsiienko 2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 13/13] net/mlx5: add E-switch VXLAN rule cleanup routines Slava Ovsiienko 2018-11-04 6:48 ` [dpdk-dev] [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Shahaf Shuler 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 02/13] net/mlx5: prepare meson.build for adding E-Switch VXLAN Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 04/13] net/mlx5: add necessary structures " Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 05/13] net/mlx5: swap items/actions validations for E-Switch rules Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko 2018-11-02 21:38 ` Yongseok Koh 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 06/13] net/mlx5: add E-Switch VXLAN support to validation routine Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko 2018-11-02 21:53 ` Yongseok Koh 2018-11-02 23:29 ` Yongseok Koh 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 09/13] net/mlx5: update E-Switch VXLAN netlink routines Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 10/13] net/mlx5: fix E-Switch Flow counter deletion Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 11/13] net/mlx5: add E-switch VXLAN tunnel devices management Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 12/13] net/mlx5: add E-Switch VXLAN encapsulation rules Slava Ovsiienko 2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 13/13] net/mlx5: add E-switch VXLAN rule cleanup routines Slava Ovsiienko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181101203200.GA6118@mtidpdk.mti.labs.mlnx \ --to=yskoh@mellanox.com \ --cc=dev@dpdk.org \ --cc=shahafs@mellanox.com \ --cc=viacheslavo@mellanox.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
DPDK patches and discussions This inbox may be cloned and mirrored by anyone: git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \ dev@dpdk.org public-inbox-index dev Example config snippet for mirrors. Newsgroup available over NNTP: nntp://inbox.dpdk.org/inbox.dpdk.dev AGPL code for this site: git clone https://public-inbox.org/public-inbox.git