Hello,

Wanted to follow up with some additional testing results.  I believe this is a bug at the NVM firmware level but still want someone else to confirm.  We can easily retest or change parameters of testpmd to provide additional information if desired.  In parallel to this we will be trying to reach out to Intel and Dell (Intel branded card with firmware provided by Dell) to report this bug for additional follow up.

Device configuration:
traffic gen (trex) --> sw1 (basic vlan -- vl 200) --> sw2 (qinq push -- vl 300) -- dut (testpmd)

OS: CentOS 7.9
DPDK 21.11 (different than initial report, used to move to a current version and try to rule out other issues, but same issue)
testpmd cmd: sudo /tmp/dpdk-testpmd -c 0xffff -- -i --enable-hw-vlan --enable-hw-vlan-strip --enable-hw-vlan-extend --enable-hw-qinq-strip
NVM version(s): 8.15 (working) and 8.40 (non-working)

Offload configuration (these were the same under both 8.15 and 8.40 so only providing one copy)
testpmd> show port 0 rx_offload configuration
Rx Offloading Configuration of port 0 :
  Port : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND
  Queue[ 0] : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND

testpmd> show port 1 rx_offload configuration
Rx Offloading Configuration of port 1 :
  Port : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND
  Queue[ 0] : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND

testpmd> show port 2 rx_offload configuration
Rx Offloading Configuration of port 2 :
  Port : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND
  Queue[ 0] : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND

testpmd> show port 3 rx_offload configuration
Rx Offloading Configuration of port 3 :
  Port : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND
  Queue[ 0] : VLAN_STRIP QINQ_STRIP VLAN_FILTER VLAN_EXTEND

When running testpmd with the above cmdline parameters and then setting "set fwd rxonly" we observe the following results with the different firmwares.
8.15 (working)
      src=F8:F2:1E:31:96:D0 - dst=F8:F2:1E:31:96:D1 - type=0x0800 - length=74 - nb_segs=1 - QinQ VLAN tci=0xc8, VLAN tci outer=0x12c - hw ptype: L2_ETHER L3_IPV4_EXT_UNKNOWN L4_TCP  - sw ptype: L2_ETHER L3_IPV4 L4_TCP  - l2_len=14 - l3_len=20 - l4_len=40 - Receive queue=0x0
    ol_flags: RTE_MBUF_F_RX_VLAN RTE_MBUF_F_RX_L4_CKSUM_GOOD RTE_MBUF_F_RX_IP_CKSUM_GOOD RTE_MBUF_F_RX_VLAN_STRIPPED RTE_MBUF_F_RX_QINQ_STRIPPED RTE_MBUF_F_RX_QINQ RTE_MBUF_F_RX_OUTER_L4_CKSUM_UNKNOWN

8.40 (non working)
     src=F8:F2:1E:31:96:D0 - dst=F8:F2:1E:31:96:D1 - type=0x8100 - length=78 - nb_segs=1 - VLAN tci=0xc8 - hw ptype: L2_ETHER L3_IPV4_EXT_UNKNOWN L4_TCP  - sw ptype: L2_ETHER_VLAN L3_IPV4 L4_TCP  - l2_len=18 - l3_len=20 - l4_len=40 - Receive queue=0x0
    ol_flags: RTE_MBUF_F_RX_VLAN RTE_MBUF_F_RX_L4_CKSUM_GOOD RTE_MBUF_F_RX_IP_CKSUM_GOOD RTE_MBUF_F_RX_VLAN_STRIPPED RTE_MBUF_F_RX_OUTER_L4_CKSUM_UNKNOWN


Thanks,

Ben

On Fri, Apr 1, 2022 at 11:13 AM Ben Magistro <koncept1@gmail.com> wrote:
Hello,

We recently needed to apply a firmware upgrade for some XXV710s to resolve a FEC issue (I'd have to find the details in email) but applied this same firmware to other nics (XL710s) to maintain a consistent baseline.  In testing we have seen the NVM 8.40 resolve the FEC issue but it introduces an issue with QinQ offloading + stripping.  When running NVM 8.15 (previous version), we could send QinQ traffic, and the nic would properly strip and store the values into vlan_tci and vlan_tci_outer as expected.  When running NVM 8.40 (FEC fix version) sending QinQ traffic is only stripping the inner tag.  The code we are using has not changed.

I added some additional lines to drivers/net/i40e/i40e_rxtx.c to help troubleshoot this, specifically one to log the vlans and one to log ext_status.  In comparing the two, ext_status is 0 under 8.40 while it is 1 under 8.15.  This does correspond with not running the second layer processing code in the i40e_rxtx.c (line ~87).  We will continue to investigate but would like to get this out there sooner and ask for assistance in confirming this behavior.

This is a Dell based card so the firmware package used to update/downgrade the card is coming from Dell and not Intel directly.  It is our assumption that the firmware in general should be pretty consistent between the two.

Traffic is being generated by trex with the vlan nesting being pushed by some Juniper switches.  Both vlan tags are 0x8100.

OS: CentOS 7.9
DPDK: 20.08 (we know it's not supported anymore, but were trying to put off that upgrade until some other changes were also completed)
NIC: i40e XL710 -- net_i40e / firmware 8.15 0x800096d0 20.0.17

If there are any additional details needed please let us know.

Thanks,

Ben