DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/12] net/virtio: add offload support
@ 2016-07-21  8:08 Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 01/12] virtio: move device initialization in a function Olivier Matz
                   ` (13 more replies)
  0 siblings, 14 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

This patchset, targetted for 16.11, introduces the support of rx and tx
offload in virtio pmd.  To achieve this, some new mbuf flags must be
introduced, as discussed in [1].

It applies on top of:
- 16.07-rc3
- software packet type [2]
- testpmd enhancements [3]
- virtio packet corruption fix [4]

The new mbuf checksum flags are backward compatible for current
applications that assume that unknown_csum = good_cum (since there
was only a bad_csum flag). But it the patchset is integrated, we
should consider updating the PMDs to match the new API for 16.11.

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-July/043333.html
[3] http://dpdk.org/ml/archives/dev/2016-July/043826.html
[4] http://dpdk.org/ml/archives/dev/2016-July/044266.html

Olivier Matz (12):
  virtio: move device initialization in a function
  virtio: setup and start cq in configure callback
  virtio: reinitialize the device in configure callback
  mbuf: add function to calculate a checksum
  mbuf: add new Rx checksum mbuf flags
  app/testpmd: fix checksum stats in csum engine
  mbuf: new flag for LRO
  app/testpmd: display lro segment size
  virtio: add Rx checksum offload support
  virtio: add Tx checksum offload support
  virtio: add Lro support
  virtio: add Tso support

 app/test-pmd/csumonly.c                |   8 +-
 doc/guides/rel_notes/release_16_11.rst |  16 ++
 drivers/net/virtio/virtio_ethdev.c     | 184 ++++++++++++++---------
 drivers/net/virtio/virtio_ethdev.h     |  14 +-
 drivers/net/virtio/virtio_pci.h        |   4 +-
 drivers/net/virtio/virtio_rxtx.c       | 267 ++++++++++++++++++++++++++++++---
 drivers/net/virtio/virtqueue.h         |   1 +
 lib/librte_mbuf/rte_mbuf.c             |  73 ++++++++-
 lib/librte_mbuf/rte_mbuf.h             |  71 ++++++++-
 lib/librte_mbuf/rte_mbuf_version.map   |   1 +
 10 files changed, 530 insertions(+), 109 deletions(-)

Test plan
=========

Platform description
--------------------

  guest (dpdk)
  +----------------+
  |                |
  |                |
  |         port0  +-----<---+
  |       ixgbe /  |         |
  |       directio |         |
  |                |         |
  |    port1       |         ^ flow1
  +----------------+         | (flow2 is the reverse)
         |                   |
         | virtio            |
         v                   |
  +----------------+         |
  |     tap0   /   |         |
  |1.1.1.1   /     |         |
  |ns-tap  /       |         |
  |      /         |         |
  |    /   ixgbe2  +------>--+
  |  /    1.1.1.2  |
  |/      ns-ixgbe |
  +----------------+
  host (linux, vhost-net)


flow1:
  host -(ixgbe)-> guest -(virtio)-> host
  1.1.1.2 -> 1.1.1.1

flow2:
  host -(virtio)-> guest -(ixgbe)-> host
  1.1.1.2 -> 1.1.1.1

Host configuration
------------------

Start qemu with:

- a ne2k management interface to avoi any conflict with dpdk
- 2 ixgbe interfaces given to with vm through vfio
- a virtio net device, connected to a tap interface through vhost-net

  /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \
    -smp 3 -serial telnet::40564,server,nowait -serial null \
    -qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \
    -device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \
    -netdev user,id=user.0,hostfwd=tcp::34965-:22 \
    -device vfio-pci,host=0000:04:00.0 -device vfio-pci,host=0000:04:00.1 \
    -netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \
    -device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \
    -hda "/path/to/ubuntu-14.04-template.qcow2" \
    -snapshot -vga none -display none

Move the tap interface in a netns, and configure it:

  ip netns add ns-tap
  ip netns exec ns-tap ip l set lo up
  ip link set tap0 netns ns-tap
  ip netns exec ns-tap ip l set tap0 down
  ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0
  ip netns exec ns-tap ip l set tap0 up
  ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0
  ip netns exec ns-tap arp -s 1.1.1.2 02:00:00:00:00:00
  ip netns exec ns-tap ip a

Move the ixgbe interface in a netns, and configure it:

  IXGBE=ixgbe2
  ip netns add ns-ixgbe
  ip netns exec ns-ixgbe ip l set lo up
  ip link set ${IXGBE} netns ns-ixgbe
  ip netns exec ns-ixgbe ip l set ${IXGBE} down
  ip netns exec ns-ixgbe ip l set addr 02:00:00:00:00:00 dev ${IXGBE}
  ip netns exec ns-ixgbe ip l set ${IXGBE} up
  ip netns exec ns-ixgbe ip a a 1.1.1.2/24 dev ${IXGBE}
  ip netns exec ns-ixgbe arp -s 1.1.1.1 02:00:00:00:00:01
  ip netns exec ns-ixgbe ip a

Guest configuration
-------------------

List of pci devices:

  00:02.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
  00:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8029(AS) [10ec:8029]
  00:04.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
  00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]

Compile dpdk:

  cd dpdk.org
  make config T=x86_64-native-linuxapp-gcc
  make -j4

Prepare environment:

  mkdir -p /mnt/huge
  mount -t hugetlbfs nodev /mnt/huge
  echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  modprobe uio_pci_generic
  python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:02.0
  python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:05.0

Run test
========

The test uses iperf to validate connectivity between the 2 netns of the
host and trough the guest.

Iperf is run with:

  # flow1: host -(ixgbe)-> guest -(virtio)-> host
  ip netns exec ns-tap iperf -s
  ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10

  # flow2: host -(virtio)-> guest -(ixgbe)-> host
  ip netns exec ns-ixgbe iperf -s
  ip netns exec ns-tap iperf -c 1.1.1.2 -t 10

The guest runs testpmd with csum forward engine, its configuration
depends on the test case.

test1: large packets (lro/tso)
------------------------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter \
    --disable-hw-vlan-strip --enable-rx-cksum --enable-lro \
    --crc-strip --txqflags=0

  set fwd csum
  tso set 1440 0
  csum set ip hw 0
  csum set tcp hw 0
  tso set 1440 1
  #csum set ip hw 1 # not supported by virtio
  csum set tcp hw 1
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54460 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.14 GBytes  5.27 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58312 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.70 GBytes  5.76 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7f968ad9fdc0, pkt_len=24682, nb_segs=13:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: m->tso_segsz=1440
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7f968acc9f40, pkt_len=42058, nb_segs=21:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN PKT_RX_LRO
  rx: m->lro_segsz=1440
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: m->tso_segsz=1440
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4

test2: hardware checksum only
-----------------------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter \
    --disable-hw-vlan-strip --enable-rx-cksum --crc-strip --txqflags=0

  set fwd csum
  csum set ip hw 0
  csum set tcp hw 0
  csum set tcp hw 1
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54462 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  4.49 GBytes  3.86 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58314 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7f0adca89b40, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: flags=PKT_TX_TCP_CKSUM PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7f0adcb98d80, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM PKT_TX_IPV4

test3: no offload
-----------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter --disable-hw-vlan-strip

  set fwd csum
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54466 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  4.29 GBytes  3.68 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58316 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7faf38b3e700, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7faf38b71500, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4


-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 01/12] virtio: move device initialization in a function
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback Olivier Matz
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Move all code related to device initialization in a new function
virtio_init_device().

This commit brings no functional change, it prepares the next commits
that will add the offload support. For that, it will be needed to
reinitialize the device from ethdev->configure(), using this new
function.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 99 ++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index fcc996e..4926a2c 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1118,46 +1118,13 @@ rx_func_get(struct rte_eth_dev *eth_dev)
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
 }
 
-/*
- * This function is based on probe() function in virtio_pci.c
- * It returns 0 on success.
- */
-int
-eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+static int
+virtio_init_device(struct rte_eth_dev *eth_dev)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
 	struct virtio_net_config local_config;
-	struct rte_pci_device *pci_dev;
-	uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
-	int ret;
-
-	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr_mrg_rxbuf));
-
-	eth_dev->dev_ops = &virtio_eth_dev_ops;
-	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		rx_func_get(eth_dev);
-		return 0;
-	}
-
-	/* Allocate memory for storing MAC addresses */
-	eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
-	if (eth_dev->data->mac_addrs == NULL) {
-		PMD_INIT_LOG(ERR,
-			"Failed to allocate %d bytes needed to store MAC addresses",
-			VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
-		return -ENOMEM;
-	}
-
-	pci_dev = eth_dev->pci_dev;
-
-	if (pci_dev) {
-		ret = vtpci_init(pci_dev, hw, &dev_flags);
-		if (ret)
-			return ret;
-	}
+	struct rte_pci_device *pci_dev = eth_dev->pci_dev;
 
 	/* Reset the device although not necessary at startup */
 	vtpci_reset(hw);
@@ -1172,10 +1139,11 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 
 	/* If host does not support status then disable LSC */
 	if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
-		dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+		eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+	else
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
 
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
-	eth_dev->data->dev_flags = dev_flags;
 
 	rx_func_get(eth_dev);
 
@@ -1254,12 +1222,61 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
+	virtio_dev_cq_start(eth_dev);
+
+	return 0;
+}
+
+/*
+ * This function is based on probe() function in virtio_pci.c
+ * It returns 0 on success.
+ */
+int
+eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
+	int ret;
+
+	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr_mrg_rxbuf));
+
+	eth_dev->dev_ops = &virtio_eth_dev_ops;
+	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		rx_func_get(eth_dev);
+		return 0;
+	}
+
+	/* Allocate memory for storing MAC addresses */
+	eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR,
+			"Failed to allocate %d bytes needed to store MAC addresses",
+			VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
+		return -ENOMEM;
+	}
+
+	pci_dev = eth_dev->pci_dev;
+
+	if (pci_dev) {
+		ret = vtpci_init(pci_dev, hw, &dev_flags);
+		if (ret)
+			return ret;
+	}
+
+	eth_dev->data->dev_flags = dev_flags;
+
+	/* reset device and negotiate features */
+	ret = virtio_init_device(eth_dev);
+	if (ret < 0)
+		return ret;
+
 	/* Setup interrupt callback  */
 	if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		rte_intr_callback_register(&pci_dev->intr_handle,
-				   virtio_interrupt_handler, eth_dev);
-
-	virtio_dev_cq_start(eth_dev);
+			virtio_interrupt_handler, eth_dev);
 
 	return 0;
 }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 01/12] virtio: move device initialization in a function Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21 21:15   ` Stephen Hemminger
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 03/12] virtio: reinitialize the device " Olivier Matz
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Move the configuration of control queue in the configure callback.
This is needed by next commit, which introduces the reinitialization
of the device in the configure callback to change the feature flags.
Therefore, the control queue will have to be restarted at the same
place.

As virtio_dev_cq_queue_setup() is called from a place where
config->max_virtqueue_pairs is not available, we need to store this in
the private structure. It replaces max_rx_queues and max_tx_queues which
have the same value. The log showing the value of max_rx_queues and
max_tx_queues is also removed since config->max_virtqueue_pairs is
already displayed above.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 45 +++++++++++++++++++-------------------
 drivers/net/virtio/virtio_pci.h    |  3 +--
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 4926a2c..eea48ae 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -552,6 +552,9 @@ virtio_dev_close(struct rte_eth_dev *dev)
 	if (hw->started == 1)
 		virtio_dev_stop(dev);
 
+	if (hw->cvq)
+		virtio_dev_queue_release(hw->cvq->vq);
+
 	/* reset the NIC */
 	if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
@@ -1191,16 +1194,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 			config->max_virtqueue_pairs = 1;
 		}
 
-		hw->max_rx_queues =
-			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-		hw->max_tx_queues =
-			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-		virtio_dev_cq_queue_setup(eth_dev,
-					config->max_virtqueue_pairs * 2,
-					SOCKET_ID_ANY);
+		hw->max_queue_pairs = config->max_virtqueue_pairs;
 
 		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
 				config->max_virtqueue_pairs);
@@ -1211,19 +1205,15 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 				config->mac[2], config->mac[3],
 				config->mac[4], config->mac[5]);
 	} else {
-		hw->max_rx_queues = 1;
-		hw->max_tx_queues = 1;
+		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1");
+		hw->max_queue_pairs = 1;
 	}
 
-	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-			hw->max_rx_queues, hw->max_tx_queues);
 	if (pci_dev)
 		PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
-	virtio_dev_cq_start(eth_dev);
-
 	return 0;
 }
 
@@ -1285,7 +1275,6 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct rte_pci_device *pci_dev;
-	struct virtio_hw *hw = eth_dev->data->dev_private;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -1301,9 +1290,6 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 	eth_dev->tx_pkt_burst = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 
-	if (hw->cvq)
-		virtio_dev_queue_release(hw->cvq->vq);
-
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
@@ -1358,6 +1344,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
 
@@ -1366,6 +1353,16 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	/* Setup and start control queue */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		ret = virtio_dev_cq_queue_setup(dev,
+			hw->max_queue_pairs * 2,
+			SOCKET_ID_ANY);
+		if (ret < 0)
+			return ret;
+		virtio_dev_cq_start(dev);
+	}
+
 	hw->vlan_strip = rxmode->hw_vlan_strip;
 
 	if (rxmode->hw_vlan_filter
@@ -1559,8 +1556,12 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->driver_name = dev->driver->pci_drv.name;
 	else
 		dev_info->driver_name = "virtio-user PMD";
-	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+	dev_info->max_rx_queues = (uint16_t)
+		((VIRTIO_MAX_RX_QUEUES < hw->max_queue_pairs) ?
+			VIRTIO_MAX_RX_QUEUES : hw->max_queue_pairs);
+	dev_info->max_tx_queues = (uint16_t)
+		((VIRTIO_MAX_TX_QUEUES < hw->max_queue_pairs) ?
+			VIRTIO_MAX_TX_QUEUES : hw->max_queue_pairs);
 	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
 	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
 	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..552166d 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -246,8 +246,7 @@ struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
 	uint64_t    guest_features;
-	uint32_t    max_tx_queues;
-	uint32_t    max_rx_queues;
+	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
 	uint8_t	    vlan_strip;
 	uint8_t	    use_msix;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 03/12] virtio: reinitialize the device in configure callback
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 01/12] virtio: move device initialization in a function Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum Olivier Matz
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Add the ability to reset the virtio device in the configure callback
if the features flag changed since previous reset. This will be possible
with the introduction of offload support in next commits.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 26 +++++++++++++++++++-------
 drivers/net/virtio/virtio_pci.h    |  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index eea48ae..02eae94 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1045,14 +1045,13 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 }
 
 static int
-virtio_negotiate_features(struct virtio_hw *hw)
+virtio_negotiate_features(struct virtio_hw *hw, uint64_t req_features)
 {
 	uint64_t host_features;
 
 	/* Prepare guest_features: feature that driver wants to support */
-	hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
 	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %" PRIx64,
-		hw->guest_features);
+		req_features);
 
 	/* Read device(host) feature bits */
 	host_features = hw->vtpci_ops->get_features(hw);
@@ -1063,6 +1062,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 	 * Negotiate features: Subset of device feature bits are written back
 	 * guest feature bits.
 	 */
+	hw->guest_features = req_features;
 	hw->guest_features = vtpci_negotiate_features(hw, host_features);
 	PMD_INIT_LOG(DEBUG, "features after negotiate = %" PRIx64,
 		hw->guest_features);
@@ -1081,6 +1081,8 @@ virtio_negotiate_features(struct virtio_hw *hw)
 		}
 	}
 
+	hw->req_guest_features = req_features;
+
 	return 0;
 }
 
@@ -1121,8 +1123,9 @@ rx_func_get(struct rte_eth_dev *eth_dev)
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
 }
 
+/* reset device and renegotiate features if needed */
 static int
-virtio_init_device(struct rte_eth_dev *eth_dev)
+virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
@@ -1137,7 +1140,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 
 	/* Tell the host we've known how to drive the device. */
 	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-	if (virtio_negotiate_features(hw) < 0)
+	if (virtio_negotiate_features(hw, req_features) < 0)
 		return -1;
 
 	/* If host does not support status then disable LSC */
@@ -1258,8 +1261,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 
 	eth_dev->data->dev_flags = dev_flags;
 
-	/* reset device and negotiate features */
-	ret = virtio_init_device(eth_dev);
+	/* reset device and negotiate default features */
+	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
 	if (ret < 0)
 		return ret;
 
@@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	uint64_t req_features;
 	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
@@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	req_features = VIRTIO_PMD_GUEST_FEATURES;
+	/* if request features changed, reinit the device */
+	if (req_features != hw->req_guest_features) {
+		ret = virtio_init_device(dev, req_features);
+		if (ret < 0)
+			return ret;
+	}
+
 	/* Setup and start control queue */
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		ret = virtio_dev_cq_queue_setup(dev,
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 552166d..d1a7d1e 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -245,6 +245,7 @@ struct virtio_net_config;
 struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
+	uint64_t    req_guest_features;
 	uint64_t    guest_features;
 	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (2 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 03/12] virtio: reinitialize the device " Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21 10:51   ` Ananyev, Konstantin
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

This function can be used to calculate the checksum of data embedded in
mbuf, that can be composed of several segments.

This function will be used by the virtio pmd in next commits to calculate
the checksum in software in case the protocol is not recognized.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst |  5 ++++
 lib/librte_mbuf/rte_mbuf.c             | 55 ++++++++++++++++++++++++++++++++--
 lib/librte_mbuf/rte_mbuf.h             | 13 ++++++++
 lib/librte_mbuf/rte_mbuf_version.map   |  1 +
 4 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 6a591e2..da70f3b 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -53,6 +53,11 @@ New Features
   Added two new functions ``rte_get_rx_ol_flag_list()`` and
   ``rte_get_tx_ol_flag_list()`` to dump offload flags as a string.
 
+* **Added a functions to calculate the checksum of data in a mbuf.**
+
+  Added a new function ``rte_pktmbuf_cksum()`` to process the checksum of
+  data embedded in an mbuf chain.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 56f37e6..0304245 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -60,6 +60,7 @@
 #include <rte_hexdump.h>
 #include <rte_errno.h>
 #include <rte_memcpy.h>
+#include <rte_ip.h>
 
 /*
  * ctrlmbuf constructor, given as a callback function to
@@ -273,8 +274,7 @@ const void *__rte_pktmbuf_read(const struct rte_mbuf *m, uint32_t off,
 	if (off + len > rte_pktmbuf_pkt_len(m))
 		return NULL;
 
-	while (off >= rte_pktmbuf_data_len(seg) &&
-			rte_pktmbuf_data_len(seg) != 0) {
+	while (off >= rte_pktmbuf_data_len(seg)) {
 		off -= rte_pktmbuf_data_len(seg);
 		seg = seg->next;
 	}
@@ -432,3 +432,54 @@ int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 
 	return 0;
 }
+
+/* compute the raw (non complemented) checksum of a packet */
+uint16_t
+rte_pktmbuf_cksum(const struct rte_mbuf *m, uint32_t off, uint32_t len)
+{
+	const struct rte_mbuf *seg;
+	const char *buf;
+	uint32_t sum, tmp;
+	uint32_t seglen, done;
+
+	/* easy case: all data in the first segment */
+	if (off + len <= rte_pktmbuf_data_len(m))
+		return rte_raw_cksum(rte_pktmbuf_mtod_offset(m,
+				const char *, off), len);
+
+	if (off + len > rte_pktmbuf_pkt_len(m))
+		return 0; /* invalid params, return a dummy value */
+
+	/* else browse the segment to find offset */
+	seglen = 0;
+	for (seg = m; seg != NULL; seg = seg->next) {
+		seglen = rte_pktmbuf_data_len(seg);
+		if (off < seglen)
+			break;
+		off -= seglen;
+	}
+	seglen -= off;
+	buf = rte_pktmbuf_mtod_offset(seg, const char *, off);
+	if (seglen >= len) /* all in one segment */
+		return rte_raw_cksum(buf, len);
+
+	/* hard case: process checksum of several segments */
+	sum = 0;
+	done = 0;
+	for (;;) {
+		tmp = __rte_raw_cksum(buf, seglen, 0);
+		if (done & 1)
+			tmp = rte_bswap16(tmp);
+		sum += tmp;
+		done += seglen;
+		if (done == len)
+			break;
+		seg = seg->next;
+		buf = rte_pktmbuf_mtod(seg, const char *);
+		seglen = rte_pktmbuf_data_len(seg);
+		if (seglen > len - done)
+			seglen = len - done;
+	}
+
+	return __rte_raw_cksum_reduce(sum);
+}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 3c21c71..7bbe096 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1581,6 +1581,19 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
  */
 void rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len);
 
+/**
+ * Compute the raw (non complemented) checksum of a packet.
+ *
+ * @param m
+ *   The pointer to the mbuf.
+ * @param off
+ *   The offset in bytes to start the checksum.
+ * @param len
+ *   The length in bytes of the data to ckecksum.
+ */
+uint16_t
+rte_pktmbuf_cksum(const struct rte_mbuf *m, uint32_t off, uint32_t len);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index 6f83745..7b85dad 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -33,6 +33,7 @@ DPDK_16.11 {
 	rte_get_ptype_tunnel_name;
 	rte_get_rx_ol_flag_list;
 	rte_get_tx_ol_flag_list;
+	rte_pktmbuf_cksum;
 	rte_pktmbuf_get_ptype;
 
 } DPDK_2.1;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (3 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21 21:22   ` Stephen Hemminger
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Following discussions in [1] and [2], introduce a new bit to
describe the Rx checksum status in mbuf.

Before this patch, only one flag was available:
  PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.

And same for L3:
  PKT_RX_IP_CKSUM_BAD: IP cksum of RX pkt. is not OK.

This had 2 issues:
- it was not possible to differentiate "checksum good" from
  "checksum unknown".
- it was not possible for a virtual driver to say "the checksum
  in packet may be wrong, but data integrity is valid".

This patch tries to solve this issue by having 4 states (2 bits)
for the IP and L4 Rx checksums. New values are:

 - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
   -> the application should verify the checksum by sw
 - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
   -> the application can drop the packet without additional check
 - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
   -> the application can accept the packet without verifying the
      checksum by sw
 - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
   data, but the integrity of the L4 data is verified.
   -> the application can process the packet but must not verify the
      checksum by sw. It has to take care to recalculate the cksum
      if the packet is transmitted (either by sw or using tx offload)

  And same for L3 (replace L4 by IP in description above).

This commit tries to be compatible with existing applications that
only check the existing flag (CKSUM_BAD).

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-June/040007.html

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst |  6 ++++
 lib/librte_mbuf/rte_mbuf.c             | 16 +++++++++--
 lib/librte_mbuf/rte_mbuf.h             | 51 ++++++++++++++++++++++++++++++++--
 3 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index da70f3b..8f4f24b 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -58,6 +58,12 @@ New Features
   Added a new function ``rte_pktmbuf_cksum()`` to process the checksum of
   data embedded in an mbuf chain.
 
+* **Added new Rx checksum mbuf flags.**
+
+  Added new Rx checksum flags in mbufs to described more states: unknown,
+  good, bad, or not present (useful for virtual drivers). This modification
+  was done for IP and L4.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 0304245..c40b926 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -309,7 +309,11 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
 	case PKT_RX_FDIR: return "PKT_RX_FDIR";
 	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_L4_CKSUM_GOOD: return "PKT_RX_L4_CKSUM_GOOD";
+	case PKT_RX_L4_CKSUM_NONE: return "PKT_RX_L4_CKSUM_NONE";
 	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_GOOD: return "PKT_RX_IP_CKSUM_GOOD";
+	case PKT_RX_IP_CKSUM_NONE: return "PKT_RX_IP_CKSUM_NONE";
 	case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
 	case PKT_RX_VLAN_STRIPPED: return "PKT_RX_VLAN_STRIPPED";
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
@@ -332,8 +336,16 @@ int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT, NULL },
 		{ PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, NULL },
 		{ PKT_RX_FDIR, PKT_RX_FDIR, NULL },
-		{ PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, NULL },
-		{ PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD, NULL },
+		{ PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_GOOD, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_NONE, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_UNKNOWN, PKT_RX_L4_CKSUM_MASK,
+		  "PKT_RX_L4_CKSUM_UNKNOWN" },
+		{ PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_GOOD, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_NONE, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_UNKNOWN, PKT_RX_IP_CKSUM_MASK,
+		  "PKT_RX_IP_CKSUM_UNKNOWN" },
 		{ PKT_RX_EIP_CKSUM_BAD, PKT_RX_EIP_CKSUM_BAD, NULL },
 		{ PKT_RX_VLAN_STRIPPED, PKT_RX_VLAN_STRIPPED, NULL },
 		{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7bbe096..841326d 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,8 +91,25 @@ extern "C" {
 
 #define PKT_RX_RSS_HASH      (1ULL << 1)  /**< RX packet with RSS hash result. */
 #define PKT_RX_FDIR          (1ULL << 2)  /**< RX packet with FDIR match indicate. */
-#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt. is not OK. */
-#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum of RX pkt. is not OK. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
 #define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)  /**< External IP header checksum error. */
 
 /**
@@ -102,7 +119,35 @@ extern "C" {
  */
 #define PKT_RX_VLAN_STRIPPED (1ULL << 6)
 
-/* hole, some bits can be reused here  */
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
 
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 06/12] app/testpmd: fix checksum stats in csum engine
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (4 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 07/12] mbuf: new flag for LRO Olivier Matz
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

---
 app/test-pmd/csumonly.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 4b36d74..34a2591 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -680,8 +680,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		rx_ol_flags = m->ol_flags;
 
 		/* Update the L3/L4 checksum error packet statistics */
-		rx_bad_ip_csum += ((rx_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += ((rx_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+		if ((rx_ol_flags & PKT_RX_IP_CKSUM_MASK) == PKT_RX_IP_CKSUM_BAD)
+			rx_bad_ip_csum += 1;
+		if ((rx_ol_flags & PKT_RX_L4_CKSUM_MASK) == PKT_RX_L4_CKSUM_BAD)
+			rx_bad_l4_csum += 1;
 
 		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
 		 * and inner headers */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 07/12] mbuf: new flag for LRO
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (5 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 08/12] app/testpmd: display lro segment size Olivier Matz
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

When receiving coalesced packets in virtio, the original size of the
segments is provided. This is a useful information because it allows to
resegment with the same size.

Add a RX new flag in mbuf, that can be set when packets are coalesced by
a hardware or virtual driver when the m->tso_segsz field is valid and is
set to the segment size of original packets.

This flag is used in next commits in the virtio pmd.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst | 5 +++++
 lib/librte_mbuf/rte_mbuf.c             | 2 ++
 lib/librte_mbuf/rte_mbuf.h             | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 8f4f24b..237a5ae 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -64,6 +64,11 @@ New Features
   good, bad, or not present (useful for virtual drivers). This modification
   was done for IP and L4.
 
+* **Added a LRO mbuf flag.**
+
+  Added a new RX LRO mbuf flag, used when packets are coalesced. This
+  flag indicates that the segment size of original packets is known.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index c40b926..2df35d0 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -319,6 +319,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
 	case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
+	case PKT_RX_LRO: return "PKT_RX_LRO";
 	default: return NULL;
 	}
 }
@@ -351,6 +352,7 @@ int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
 		{ PKT_RX_IEEE1588_TMST, PKT_RX_IEEE1588_TMST, NULL },
 		{ PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ_STRIPPED, NULL },
+		{ PKT_RX_LRO, PKT_RX_LRO, NULL },
 	};
 	const char *name;
 	unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 841326d..a45bc02 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -170,6 +170,13 @@ extern "C" {
  */
 #define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
 
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
 /* add new RX flags here */
 
 /* add new TX flags here */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 08/12] app/testpmd: display lro segment size
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (6 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 07/12] mbuf: new flag for LRO Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support Olivier Matz
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

In csumonly engine, display the value of LRO segment if the
LRO flag is set.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/csumonly.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 34a2591..3455a7e 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -793,6 +793,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				"l4_proto=%d l4_len=%d flags=%s\n",
 				info.l2_len, rte_be_to_cpu_16(info.ethertype),
 				info.l3_len, info.l4_proto, info.l4_len, buf);
+			if (rx_ol_flags & PKT_RX_LRO)
+				printf("rx: m->lro_segsz=%u\n", m->tso_segsz);
 			if (info.is_tunnel == 1)
 				printf("rx: outer_l2_len=%d outer_ethertype=%x "
 					"outer_l3_len=%d\n", info.outer_l2_len,
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (7 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 08/12] app/testpmd: display lro segment size Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-27  9:52   ` Wang, Xiao W
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 10/12] virtio: add Tx " Olivier Matz
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 14 ++++----
 drivers/net/virtio/virtio_ethdev.h |  2 +-
 drivers/net/virtio/virtio_rxtx.c   | 66 ++++++++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h     |  1 +
 4 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 02eae94..c0f1f21 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->data->dev_flags = dev_flags;
 
 	/* reset device and negotiate default features */
-	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
+	ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
 	if (ret < 0)
 		return ret;
 
@@ -1351,13 +1351,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
+	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
+	if (rxmode->hw_ip_checksum)
+		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
 
-	if (rxmode->hw_ip_checksum) {
-		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
-		return -EINVAL;
-	}
-
-	req_features = VIRTIO_PMD_GUEST_FEATURES;
 	/* if request features changed, reinit the device */
 	if (req_features != hw->req_guest_features) {
 		ret = virtio_init_device(dev, req_features);
@@ -1580,6 +1577,9 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_txconf = (struct rte_eth_txconf) {
 		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
 	};
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_TCP_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM;
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 2ecec6e..701a22f 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -54,7 +54,7 @@
 #define VIRTIO_MAX_RX_PKTLEN  9728
 
 /* Features desired/implemented by this driver. */
-#define VIRTIO_PMD_GUEST_FEATURES		\
+#define VIRTIO_PMD_DEFAULT_GUEST_FEATURES	\
 	(1u << VIRTIO_NET_F_MAC		  |	\
 	 1u << VIRTIO_NET_F_STATUS	  |	\
 	 1u << VIRTIO_NET_F_MQ		  |	\
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9aba044..a18798f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -613,6 +613,54 @@ virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
 	}
 }
 
+/* Optionally fill offload information in structure */
+static int
+virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
+{
+	struct rte_mbuf_hdr_lens hdr_lens;
+	uint32_t hdrlen, ptype;
+	int l4_supported = 0;
+
+	/* nothing to do */
+	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+		return 0;
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_pktmbuf_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum, off;
+
+			csum = ~rte_pktmbuf_cksum(m, hdr->csum_start,
+				rte_pktmbuf_pkt_len(m) - hdr->csum_start);
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *,
+					off) = csum;
+		}
+	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
+		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
+	}
+
+	return 0;
+}
+
 #define VIRTIO_MBUF_BURST_SZ 64
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
 uint16_t
@@ -628,6 +676,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	int error;
 	uint32_t i, nb_enqueued;
 	uint32_t hdr_size;
+	struct virtio_net_hdr *hdr;
 
 	nb_used = VIRTQUEUE_NUSED(vq);
 
@@ -669,9 +718,19 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
 		rxm->data_len = (uint16_t)(len[i] - hdr_size);
 
+		hdr = (struct virtio_net_hdr *)((char *)rxm->buf_addr +
+			RTE_PKTMBUF_HEADROOM - hdr_size);
+
 		if (hw->vlan_strip)
 			rte_vlan_strip(rxm);
 
+		/* Update offload features */
+		if (virtio_rx_offload(rxm, hdr) < 0) {
+			virtio_discard_rxbuf(vq, rxm);
+			rxvq->stats.errors++;
+			continue;
+		}
+
 		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
 
 		rx_pkts[nb_rx++] = rxm;
@@ -791,6 +850,13 @@ virtio_recv_mergeable_pkts(void *rx_queue,
 		rx_pkts[nb_rx] = rxm;
 		prev = rxm;
 
+		/* Update offload features */
+		if (virtio_rx_offload(rxm, &header->hdr) < 0) {
+			virtio_discard_rxbuf(vq, rxm);
+			rxvq->stats.errors++;
+			continue;
+		}
+
 		seg_res = seg_num - 1;
 
 		while (seg_res != 0) {
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index c452d04..cff0eef 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -223,6 +223,7 @@ struct virtqueue {
  */
 struct virtio_net_hdr {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
+#define VIRTIO_NET_HDR_F_DATA_VALID 2    /**< Checksum is valid */
 	uint8_t flags;
 #define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 10/12] virtio: add Tx checksum offload support
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (8 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 11/12] virtio: add Lro support Olivier Matz
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |  7 +++++
 drivers/net/virtio/virtio_ethdev.h |  1 +
 drivers/net/virtio/virtio_rxtx.c   | 57 +++++++++++++++++++++++++-------------
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index c0f1f21..2443b42 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1580,6 +1580,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_TCP_CKSUM |
 		DEV_RX_OFFLOAD_UDP_CKSUM;
+	dev_info->tx_offload_capa = 0;
+
+	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
+		dev_info->tx_offload_capa |=
+			DEV_TX_OFFLOAD_UDP_CKSUM |
+			DEV_TX_OFFLOAD_TCP_CKSUM;
+	}
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 701a22f..0dad91f 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -62,6 +62,7 @@
 	 1u << VIRTIO_NET_F_CTRL_VQ	  |	\
 	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
+	 1u << VIRTIO_NET_F_CSUM	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1ULL << VIRTIO_F_VERSION_1)
 
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index a18798f..063edf7 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -212,13 +212,14 @@ static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		       uint16_t needed, int use_indirect, int can_push)
 {
+	struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
 	struct vq_desc_extra *dxp;
 	struct virtqueue *vq = txvq->vq;
 	struct vring_desc *start_dp;
 	uint16_t seg_num = cookie->nb_segs;
 	uint16_t head_idx, idx;
 	uint16_t head_size = vq->hw->vtnet_hdr_size;
-	unsigned long offs;
+	struct virtio_net_hdr *hdr;
 
 	head_idx = vq->vq_desc_head_idx;
 	idx = head_idx;
@@ -229,10 +230,9 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	start_dp = vq->vq_ring.desc;
 
 	if (can_push) {
-		/* put on zero'd transmit header (no offloads) */
-		void *hdr = rte_pktmbuf_prepend(cookie, head_size);
-
-		memset(hdr, 0, head_size);
+		/* prepend cannot fail, checked by caller */
+		hdr = (struct virtio_net_hdr *)
+			rte_pktmbuf_prepend(cookie, head_size);
 	} else if (use_indirect) {
 		/* setup tx ring slot to point to indirect
 		 * descriptor list stored in reserved region.
@@ -240,14 +240,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		 * the first slot in indirect ring is already preset
 		 * to point to the header in reserved region
 		 */
-		struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
-
-		offs = idx * sizeof(struct virtio_tx_region)
-			+ offsetof(struct virtio_tx_region, tx_indir);
-
-		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+			RTE_PTR_DIFF(&txr[idx].tx_indir, txr);
 		start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
 		start_dp[idx].flags = VRING_DESC_F_INDIRECT;
+		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
 
 		/* loop below will fill in rest of the indirect elements */
 		start_dp = txr[idx].tx_indir;
@@ -256,15 +253,40 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		/* setup first tx ring slot to point to header
 		 * stored in reserved region.
 		 */
-		offs = idx * sizeof(struct virtio_tx_region)
-			+ offsetof(struct virtio_tx_region, tx_hdr);
-
-		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+			RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
 		start_dp[idx].len   = vq->hw->vtnet_hdr_size;
 		start_dp[idx].flags = VRING_DESC_F_NEXT;
+		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
+
 		idx = start_dp[idx].next;
 	}
 
+	/* Checksum Offload */
+	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+	case PKT_TX_UDP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = 6;
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	case PKT_TX_TCP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = 16;
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	default:
+		hdr->csum_start = 0;
+		hdr->csum_offset = 0;
+		hdr->flags = 0;
+		break;
+	}
+
+	hdr->gso_type = 0;
+	hdr->gso_size = 0;
+	hdr->hdr_len = 0;
+
 	do {
 		start_dp[idx].addr  = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
 		start_dp[idx].len   = cookie->data_len;
@@ -505,11 +527,6 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	PMD_INIT_FUNC_TRACE();
 
-	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
-	    != ETH_TXQ_FLAGS_NOXSUMS) {
-		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
-		return -EINVAL;
-	}
 
 #ifdef RTE_MACHINE_CPUFLAG_SSSE3
 	/* Use simple rx/tx func if single segment and no offloads */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 11/12] virtio: add Lro support
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (9 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 10/12] virtio: add Tx " Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 12/12] virtio: add Tso support Olivier Matz
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |  7 ++++++-
 drivers/net/virtio/virtio_ethdev.h |  9 ---------
 drivers/net/virtio/virtio_rxtx.c   | 21 +++++++++++++++++++++
 3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 2443b42..c9d85f6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1354,6 +1354,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
 	if (rxmode->hw_ip_checksum)
 		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+	if (rxmode->enable_lro)
+		req_features |=
+			(1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+			(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 
 	/* if request features changed, reinit the device */
 	if (req_features != hw->req_guest_features) {
@@ -1579,7 +1583,8 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	};
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_TCP_CKSUM |
-		DEV_RX_OFFLOAD_UDP_CKSUM;
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_LRO;
 	dev_info->tx_offload_capa = 0;
 
 	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 0dad91f..dec71e9 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -116,13 +116,4 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
 
-/*
- * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
- * frames larger than 1514 bytes. We do not yet support software LRO
- * via tcp_lro_rx().
- */
-#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
-			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
-
-
 #endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 063edf7..3f3b9d6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -675,6 +675,27 @@ virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
 		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
 	}
 
+	/* GSO request, save required information in mbuf */
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		/* Check unsupported modes */
+		if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
+		    (hdr->gso_size == 0)) {
+			return -EINVAL;
+		}
+
+		/* Update mss lengthes in mbuf */
+		m->tso_segsz = hdr->gso_size;
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+			case VIRTIO_NET_HDR_GSO_TCPV4:
+			case VIRTIO_NET_HDR_GSO_TCPV6:
+				m->ol_flags |= PKT_RX_LRO | \
+					PKT_RX_L4_CKSUM_NONE;
+				break;
+			default:
+				return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH 12/12] virtio: add Tso support
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (10 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 11/12] virtio: add Lro support Olivier Matz
@ 2016-07-21  8:08 ` Olivier Matz
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
  13 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21  8:08 UTC (permalink / raw)
  To: dev, yuanhan.liu, konstantin.ananyev
  Cc: sugesh.chandran, bruce.richardson, jianfeng.tan, helin.zhang,
	adrien.mazarguil

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |   6 ++
 drivers/net/virtio/virtio_ethdev.h |   2 +
 drivers/net/virtio/virtio_rxtx.c   | 129 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index c9d85f6..106b60d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1563,6 +1563,7 @@ virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complet
 static void
 virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
+	uint64_t tso_mask;
 	struct virtio_hw *hw = dev->data->dev_private;
 
 	if (dev->pci_dev)
@@ -1592,6 +1593,11 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			DEV_TX_OFFLOAD_UDP_CKSUM |
 			DEV_TX_OFFLOAD_TCP_CKSUM;
 	}
+
+	tso_mask = (1ULL << VIRTIO_NET_F_HOST_TSO4) |
+		(1ULL << VIRTIO_NET_F_HOST_TSO6);
+	if ((hw->guest_features & tso_mask) == tso_mask)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index dec71e9..b2b7da2 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -63,6 +63,8 @@
 	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
 	 1u << VIRTIO_NET_F_CSUM	  |	\
+	 1u << VIRTIO_NET_F_HOST_TSO4	  |	\
+	 1u << VIRTIO_NET_F_HOST_TSO6	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1ULL << VIRTIO_F_VERSION_1)
 
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 3f3b9d6..e492bcf 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -50,6 +50,8 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_byteorder.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -208,6 +210,111 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
 	return 0;
 }
 
+/* When doing TSO, the IP length is not included in the pseudo header
+ * checksum of the packet given to the PMD, but for virtio it is
+ * expected.
+ */
+static void
+virtio_tso_fix_cksum(struct rte_mbuf *m)
+{
+	/* common case: header is not fragmented */
+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
+			m->l4_len)) {
+		struct ipv4_hdr *iph;
+		struct ipv6_hdr *ip6h;
+		struct tcp_hdr *th;
+		uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
+		uint32_t tmp;
+
+		iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
+		th = RTE_PTR_ADD(iph, m->l3_len);
+		if ((iph->version_ihl >> 4) == 4) {
+			iph->hdr_checksum = 0;
+			iph->hdr_checksum = rte_ipv4_cksum(iph);
+			ip_len = iph->total_length;
+			ip_paylen = rte_cpu_to_be_16(rte_be_to_cpu_16(ip_len) -
+				m->l3_len);
+		} else {
+			ip6h = (struct ipv6_hdr *)iph;
+			ip_paylen = ip6h->payload_len;
+		}
+
+		/* calculate the new phdr checksum not including ip_paylen */
+		prev_cksum = th->cksum;
+		tmp = prev_cksum;
+		tmp += ip_paylen;
+		tmp = (tmp & 0xffff) + (tmp >> 16);
+		new_cksum = tmp;
+
+		/* replace it in the packet */
+		th->cksum = new_cksum;
+	} else {
+		const struct ipv4_hdr *iph;
+		struct ipv4_hdr iph_copy;
+		union {
+			uint16_t u16;
+			uint8_t u8[2];
+		} prev_cksum, new_cksum, ip_len, ip_paylen, ip_csum;
+		uint32_t tmp;
+
+		/* Same code than above, but we use rte_pktmbuf_read()
+		 * or we read/write in mbuf data one byte at a time to
+		 * avoid issues if the packet is multi segmented.
+		 */
+
+		uint8_t ip_version;
+
+		ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len) >> 4;
+
+		/* calculate ip checksum (API imposes to set it to 0)
+		 * and get ip payload len */
+		if (ip_version == 4) {
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 10) = 0;
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 11) = 0;
+			iph = rte_pktmbuf_read(m, m->l2_len,
+				sizeof(*iph), &iph_copy);
+			ip_csum.u16 = rte_ipv4_cksum(iph);
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 10) = ip_csum.u8[0];
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 11) = ip_csum.u8[1];
+
+			ip_len.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 2);
+			ip_len.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 3);
+
+			ip_paylen.u16 = rte_cpu_to_be_16(
+				rte_be_to_cpu_16(ip_len.u16) - m->l3_len);
+		} else {
+			ip_paylen.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 4);
+			ip_paylen.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 5);
+		}
+
+		/* calculate the new phdr checksum not including ip_paylen */
+		/* get phdr cksum at offset 16 of TCP header */
+		prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 16);
+		prev_cksum.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 17);
+		tmp = prev_cksum.u16;
+		tmp += ip_paylen.u16;
+		tmp = (tmp & 0xffff) + (tmp >> 16);
+		new_cksum.u16 = tmp;
+
+		/* replace it in the packet */
+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
+	}
+}
+
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		       uint16_t needed, int use_indirect, int can_push)
@@ -263,6 +370,9 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	}
 
 	/* Checksum Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG)
+		cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+
 	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
 	case PKT_TX_UDP_CKSUM:
 		hdr->csum_start = cookie->l2_len + cookie->l3_len;
@@ -283,9 +393,22 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		break;
 	}
 
-	hdr->gso_type = 0;
-	hdr->gso_size = 0;
-	hdr->hdr_len = 0;
+	/* TCP Segmentation Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+		virtio_tso_fix_cksum(cookie);
+		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+			VIRTIO_NET_HDR_GSO_TCPV6 :
+			VIRTIO_NET_HDR_GSO_TCPV4;
+		hdr->gso_size = cookie->tso_segsz;
+		hdr->hdr_len =
+			cookie->l2_len +
+			cookie->l3_len +
+			cookie->l4_len;
+	} else {
+		hdr->gso_type = 0;
+		hdr->gso_size = 0;
+		hdr->hdr_len = 0;
+	}
 
 	do {
 		start_dp[idx].addr  = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum Olivier Matz
@ 2016-07-21 10:51   ` Ananyev, Konstantin
  2016-07-21 16:26     ` Don Provan
  2016-07-22  8:24     ` Olivier Matz
  0 siblings, 2 replies; 97+ messages in thread
From: Ananyev, Konstantin @ 2016-07-21 10:51 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: Chandran, Sugesh, Richardson, Bruce, Tan, Jianfeng, adrien.mazarguil

Hi Olivier,

> 
> This function can be used to calculate the checksum of data embedded in mbuf, that can be composed of several segments.
> 
> This function will be used by the virtio pmd in next commits to calculate the checksum in software in case the protocol is not recognized.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  doc/guides/rel_notes/release_16_11.rst |  5 ++++
>  lib/librte_mbuf/rte_mbuf.c             | 55 ++++++++++++++++++++++++++++++++--
>  lib/librte_mbuf/rte_mbuf.h             | 13 ++++++++
>  lib/librte_mbuf/rte_mbuf_version.map   |  1 +
>  4 files changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
> index 6a591e2..da70f3b 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -53,6 +53,11 @@ New Features
>    Added two new functions ``rte_get_rx_ol_flag_list()`` and
>    ``rte_get_tx_ol_flag_list()`` to dump offload flags as a string.
> 
> +* **Added a functions to calculate the checksum of data in a mbuf.**
> +
> +  Added a new function ``rte_pktmbuf_cksum()`` to process the checksum
> + of  data embedded in an mbuf chain.
> +
>  Resolved Issues
>  ---------------
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 56f37e6..0304245 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -60,6 +60,7 @@
>  #include <rte_hexdump.h>
>  #include <rte_errno.h>
>  #include <rte_memcpy.h>
> +#include <rte_ip.h>

As a nit, do we need to introduce a dependency for librte_mbuf on librte_net?
Might be better to put this functionality into librte_net?
Konstantin

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum
  2016-07-21 10:51   ` Ananyev, Konstantin
@ 2016-07-21 16:26     ` Don Provan
  2016-07-21 16:46       ` Olivier Matz
  2016-07-22  8:24     ` Olivier Matz
  1 sibling, 1 reply; 97+ messages in thread
From: Don Provan @ 2016-07-21 16:26 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier Matz, dev, yuanhan.liu
  Cc: Chandran, Sugesh, Richardson, Bruce, Tan, Jianfeng, adrien.mazarguil

> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Thursday, July 21, 2016 3:51 AM
> Subject: Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a
> checksum
> 
>...
> > +  Added a new function ``rte_pktmbuf_cksum()`` to process the checksum
> > + of  data embedded in an mbuf chain.
> >...
> > +#include <rte_ip.h>
>
> As a nit, do we need to introduce a dependency for librte_mbuf on librte_net?
> Might be better to put this functionality into librte_net?

That's not a nit at all. This is clearly a net function that has no place in the mbuf code.
That should be obvious even before we notice this circular dependency.
-don
dprovan@bivio.net

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum
  2016-07-21 16:26     ` Don Provan
@ 2016-07-21 16:46       ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-21 16:46 UTC (permalink / raw)
  To: Don Provan, Ananyev, Konstantin, dev, yuanhan.liu
  Cc: Chandran, Sugesh, Richardson, Bruce, Tan, Jianfeng, adrien.mazarguil

Dear Don,

On 07/21/2016 06:26 PM, Don Provan wrote:
>> -----Original Message-----
>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>> Sent: Thursday, July 21, 2016 3:51 AM
>> Subject: Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a
>> checksum
>>
>> ...
>>> +  Added a new function ``rte_pktmbuf_cksum()`` to process the checksum
>>> + of  data embedded in an mbuf chain.
>>> ...
>>> +#include <rte_ip.h>
>>
>> As a nit, do we need to introduce a dependency for librte_mbuf on librte_net?
>> Might be better to put this functionality into librte_net?
> 
> That's not a nit at all. This is clearly a net function that has no place in the mbuf code.
> That should be obvious even before we notice this circular dependency.


The function is called rte_pktmbuf_cksum(), and takes a mbuf as a
parameter. You cannot haughtily say "it no place in the mbuf code".

As you can see, librte_net only contains headers files. The initial goal
of librte_net was to contain network headers and nothing more.
See:
http://dpdk.org/browse/dpdk/commit/lib/librte_net?id=af75078fece3615088e561357c1e97603e43a5fe

To me, the question of having a dependency in one direction or another
(librte_net needs librte_mbuf, or librte_mbuf needs librte_net) is an
open debate.

I've asked myself the same question when software packet type parsing,
that needs definitions of network headers. As packet_type is a pure mbuf
notion, my choice was to have this parse in mbuf library, and using
network headers definitions provided by librte_net.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback Olivier Matz
@ 2016-07-21 21:15   ` Stephen Hemminger
  2016-07-22  7:54     ` Olivier Matz
  0 siblings, 1 reply; 97+ messages in thread
From: Stephen Hemminger @ 2016-07-21 21:15 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, yuanhan.liu, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil

On Thu, 21 Jul 2016 10:08:20 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:

> +	dev_info->max_rx_queues = (uint16_t)
> +		((VIRTIO_MAX_RX_QUEUES < hw->max_queue_pairs) ?
> +			VIRTIO_MAX_RX_QUEUES : hw->max_queue_pairs);
> +	dev_info->max_tx_queues = (uint16_t)
> +		((VIRTIO_MAX_TX_QUEUES < hw->max_queue_pairs) ?
> +			VIRTIO_MAX_TX_QUEUES : hw->max_queue_pairs);

cast here was always unnecessary.
Why not use RTE_MIN()

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
@ 2016-07-21 21:22   ` Stephen Hemminger
  2016-07-22  8:03     ` Olivier Matz
  0 siblings, 1 reply; 97+ messages in thread
From: Stephen Hemminger @ 2016-07-21 21:22 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, yuanhan.liu, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil

On Thu, 21 Jul 2016 10:08:23 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:

> +/**
> + * Deprecated.
> + * Checking this flag alone is deprecated: check the 2 bits of
> + * PKT_RX_L4_CKSUM_MASK.
> + * This flag was set when the L4 checksum of a packet was detected as
> + * wrong by the hardware.
> + */
> +#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
> +
> +/**
> + * Deprecated.
> + * Checking this flag alone is deprecated: check the 2 bits of
> + * PKT_RX_IP_CKSUM_MASK.
> + * This flag was set when the IP checksum of a packet was detected as
> + * wrong by the hardware.
> + */
> +#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)

I think you should use the GCC deprecated attribute, not sure how though

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback
  2016-07-21 21:15   ` Stephen Hemminger
@ 2016-07-22  7:54     ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-22  7:54 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, yuanhan.liu, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil



On 07/21/2016 11:15 PM, Stephen Hemminger wrote:
> On Thu, 21 Jul 2016 10:08:20 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
> 
>> +	dev_info->max_rx_queues = (uint16_t)
>> +		((VIRTIO_MAX_RX_QUEUES < hw->max_queue_pairs) ?
>> +			VIRTIO_MAX_RX_QUEUES : hw->max_queue_pairs);
>> +	dev_info->max_tx_queues = (uint16_t)
>> +		((VIRTIO_MAX_TX_QUEUES < hw->max_queue_pairs) ?
>> +			VIRTIO_MAX_TX_QUEUES : hw->max_queue_pairs);
> 
> cast here was always unnecessary.
> Why not use RTE_MIN()
> 

Yes, good idea, I'll do that for the v2.

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags
  2016-07-21 21:22   ` Stephen Hemminger
@ 2016-07-22  8:03     ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-07-22  8:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, yuanhan.liu, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil

Hi Stephen,

On 07/21/2016 11:22 PM, Stephen Hemminger wrote:
> On Thu, 21 Jul 2016 10:08:23 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
> 
>> +/**
>> + * Deprecated.
>> + * Checking this flag alone is deprecated: check the 2 bits of
>> + * PKT_RX_L4_CKSUM_MASK.
>> + * This flag was set when the L4 checksum of a packet was detected as
>> + * wrong by the hardware.
>> + */
>> +#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
>> +
>> +/**
>> + * Deprecated.
>> + * Checking this flag alone is deprecated: check the 2 bits of
>> + * PKT_RX_IP_CKSUM_MASK.
>> + * This flag was set when the IP checksum of a packet was detected as
>> + * wrong by the hardware.
>> + */
>> +#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
> 
> I think you should use the GCC deprecated attribute, not sure how though
> 

The reason why I did not use a macro poisoning here is because this flag
is still valid when used with the mask. Actually, checking this flag
alone still works and does the same as before but I wanted to highlight
that it should now be used with the mask.

Your comment makes me think that maybe the new flags could have
different names to avoid to keep old-style tests on this flag. On the
other hand, I think the name is already the good one, and doing this
would break the API and affect large pieces of code in dpdk.

Opinions are welcome here :)

Thanks for commenting
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum
  2016-07-21 10:51   ` Ananyev, Konstantin
  2016-07-21 16:26     ` Don Provan
@ 2016-07-22  8:24     ` Olivier Matz
  2016-08-29 14:52       ` Olivier Matz
  1 sibling, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-07-22  8:24 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev, yuanhan.liu
  Cc: Chandran, Sugesh, Richardson, Bruce, Tan, Jianfeng, adrien.mazarguil

Hi Konstantin,

On 07/21/2016 12:51 PM, Ananyev, Konstantin wrote:
> Hi Olivier,
> 
>>
>> This function can be used to calculate the checksum of data embedded in mbuf, that can be composed of several segments.
>>
>> This function will be used by the virtio pmd in next commits to calculate the checksum in software in case the protocol is not recognized.
>>
>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>> ---
>>  doc/guides/rel_notes/release_16_11.rst |  5 ++++
>>  lib/librte_mbuf/rte_mbuf.c             | 55 ++++++++++++++++++++++++++++++++--
>>  lib/librte_mbuf/rte_mbuf.h             | 13 ++++++++
>>  lib/librte_mbuf/rte_mbuf_version.map   |  1 +
>>  4 files changed, 72 insertions(+), 2 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
>> index 6a591e2..da70f3b 100644
>> --- a/doc/guides/rel_notes/release_16_11.rst
>> +++ b/doc/guides/rel_notes/release_16_11.rst
>> @@ -53,6 +53,11 @@ New Features
>>    Added two new functions ``rte_get_rx_ol_flag_list()`` and
>>    ``rte_get_tx_ol_flag_list()`` to dump offload flags as a string.
>>
>> +* **Added a functions to calculate the checksum of data in a mbuf.**
>> +
>> +  Added a new function ``rte_pktmbuf_cksum()`` to process the checksum
>> + of  data embedded in an mbuf chain.
>> +
>>  Resolved Issues
>>  ---------------
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 56f37e6..0304245 100644
>> --- a/lib/librte_mbuf/rte_mbuf.c
>> +++ b/lib/librte_mbuf/rte_mbuf.c
>> @@ -60,6 +60,7 @@
>>  #include <rte_hexdump.h>
>>  #include <rte_errno.h>
>>  #include <rte_memcpy.h>
>> +#include <rte_ip.h>
> 
> As a nit, do we need to introduce a dependency for librte_mbuf on librte_net?
> Might be better to put this functionality into librte_net?

I tried to have this code in librte_net, also when working on the
software packet type parser, and it did not really convince me, mainly
because librte_net is just header files as of today (it's not a real
library). But I can give it a try and post a patch so we can compare,
probably not in the coming days, but I keep a note on it.

Also, as I answered to Don, it would make less sense to move software
packet type parser in librte_net, since it's not a network feature but
more a dpdk mbuf feature. But software packet type needs network headers
definitions... so the cat is eating its tail ;)

Regards,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support Olivier Matz
@ 2016-07-27  9:52   ` Wang, Xiao W
  0 siblings, 0 replies; 97+ messages in thread
From: Wang, Xiao W @ 2016-07-27  9:52 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu, Ananyev, Konstantin
  Cc: Chandran, Sugesh, Richardson, Bruce, Tan, Jianfeng, Zhang, Helin,
	adrien.mazarguil



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Thursday, July 21, 2016 4:08 PM
> To: dev@dpdk.org; yuanhan.liu@linux.intel.com; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: Chandran, Sugesh <sugesh.chandran@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Zhang,
> Helin <helin.zhang@intel.com>; adrien.mazarguil@6wind.com
> Subject: [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 14 ++++----
> drivers/net/virtio/virtio_ethdev.h |  2 +-
>  drivers/net/virtio/virtio_rxtx.c   | 66
> ++++++++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h     |  1 +
>  4 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
> index 02eae94..c0f1f21 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>  	eth_dev->data->dev_flags = dev_flags;
> 
>  	/* reset device and negotiate default features */
> -	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
> +	ret = virtio_init_device(eth_dev,
> VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
>  	if (ret < 0)
>  		return ret;
> 
> @@ -1351,13 +1351,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>  	int ret;
> 
>  	PMD_INIT_LOG(DEBUG, "configure");
> +	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
> +	if (rxmode->hw_ip_checksum)
> +		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
> 

....

> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 9aba044..a18798f 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -613,6 +613,54 @@ virtio_update_packet_stats(struct virtnet_stats *stats,
> struct rte_mbuf *mbuf)
>  	}
>  }
> 
> +/* Optionally fill offload information in structure */ static int
> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr) {
> +	struct rte_mbuf_hdr_lens hdr_lens;
> +	uint32_t hdrlen, ptype;
> +	int l4_supported = 0;
> +
> +	/* nothing to do */
> +	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> +		return 0;
> +
> +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> +	ptype = rte_pktmbuf_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->packet_type = ptype;
> +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> +		l4_supported = 1;
> +
> +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> +		if (hdr->csum_start <= hdrlen && l4_supported) {
> +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> +		} else {
> +			/* Unknown proto or tunnel, do sw cksum. We can
> assume
> +			 * the cksum field is in the first segment since the
> +			 * buffers we provided to the host are large enough.
> +			 * In case of SCTP, this will be wrong since it's a CRC
> +			 * but there's nothing we can do.
> +			 */
> +			uint16_t csum, off;
> +
> +			csum = ~rte_pktmbuf_cksum(m, hdr->csum_start,
> +				rte_pktmbuf_pkt_len(m) - hdr->csum_start);

1. When translate raw_cksum to the final cksum, it should be like "(cksum == 0xffff) ? cksum : ~cksum".
2. How about making this function inline as it's called in fast path?

Best Regards,
Xiao

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum
  2016-07-22  8:24     ` Olivier Matz
@ 2016-08-29 14:52       ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-08-29 14:52 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev, yuanhan.liu
  Cc: Chandran, Sugesh, Richardson, Bruce, Tan, Jianfeng,
	adrien.mazarguil, dprovan

Hi guys,

On 07/22/2016 10:24 AM, Olivier Matz wrote:
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 56f37e6..0304245 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -60,6 +60,7 @@
>>>  #include <rte_hexdump.h>
>>>  #include <rte_errno.h>
>>>  #include <rte_memcpy.h>
>>> +#include <rte_ip.h>
>>
>> As a nit, do we need to introduce a dependency for librte_mbuf on librte_net?
>> Might be better to put this functionality into librte_net?
> 
> I tried to have this code in librte_net, also when working on the
> software packet type parser, and it did not really convince me, mainly
> because librte_net is just header files as of today (it's not a real
> library). But I can give it a try and post a patch so we can compare,
> probably not in the coming days, but I keep a note on it.

Back on this. I've just submitted a v2 for the software packet type
patchset:
http://dpdk.org/ml/archives/dev/2016-August/045876.html

As promised, I did the exercice of moving the the software packet type
parser in librte_net. I did it on the sw ptype patchset, but it would be
almost the same for the mbuf checksum function calculation of this patchset.

The most notable differences between v1 and v2 are:

v1:
- rte_pktmbuf_get_ptype() is in librte_mbuf
- only headers in librte_net
- librte_mbuf depends on headers in librte_net

v2:
- rte_net_get_ptype() is in librte_net
- librte_net is now a real library (.so or .a)
- librte_net depends on librte_mbuf

Please, let me know if you have any comment. Depending on them, I'll
adapt the v2 of this patchset too.


Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (11 preceding siblings ...)
  2016-07-21  8:08 ` [dpdk-dev] [PATCH 12/12] virtio: add Tso support Olivier Matz
@ 2016-10-03  9:00 ` Olivier Matz
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function Olivier Matz
                     ` (12 more replies)
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
  13 siblings, 13 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

This patchset, targetted for 16.11, introduces the support of rx and tx
offload in virtio pmd.  To achieve this, some new mbuf flags must be
introduced, as discussed in [1].

It applies on top of:
- software packet type [2]
- testpmd enhancements [3]

The new mbuf checksum flags are backward compatible for current
applications that assume that unknown_csum = good_cum (since there
was only a bad_csum flag). But it the patchset is integrated, we
should consider updating the PMDs to match the new API for 16.11.

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-October/048073.html
[3] http://dpdk.org/ml/archives/dev/2016-September/046443.html

changes v1 -> v2
- change mbuf checksum calculation static inline
- fix checksum calculation for protocol where csum=0 means no csum
- move mbuf checksum calculation in librte_net
- use RTE_MIN() to set max rx/tx queue
- rebase on top of head

Olivier Matz (12):
  virtio: move device initialization in a function
  virtio: setup and start cq in configure callback
  virtio: reinitialize the device in configure callback
  net: add function to calculate a checksum in a mbuf
  mbuf: add new Rx checksum mbuf flags
  app/testpmd: fix checksum stats in csum engine
  mbuf: new flag for LRO
  app/testpmd: display lro segment size
  virtio: add Rx checksum offload support
  virtio: add Tx checksum offload support
  virtio: add Lro support
  virtio: add Tso support

 app/test-pmd/csumonly.c                |   8 +-
 doc/guides/rel_notes/release_16_11.rst |  16 ++
 drivers/net/virtio/virtio_ethdev.c     | 182 +++++++++++++---------
 drivers/net/virtio/virtio_ethdev.h     |  18 +--
 drivers/net/virtio/virtio_pci.h        |   4 +-
 drivers/net/virtio/virtio_rxtx.c       | 270 ++++++++++++++++++++++++++++++---
 drivers/net/virtio/virtqueue.h         |   1 +
 lib/librte_mbuf/rte_mbuf.c             |  18 ++-
 lib/librte_mbuf/rte_mbuf.h             |  58 ++++++-
 lib/librte_net/rte_ip.h                |  60 ++++++++
 10 files changed, 526 insertions(+), 109 deletions(-)

Test plan
=========

(not fully replayed on v2, but no major change)

Platform description
--------------------

  guest (dpdk)
  +----------------+
  |                |
  |                |
  |         port0  +-----<---+
  |       ixgbe /  |         |
  |       directio |         |
  |                |         |
  |    port1       |         ^ flow1
  +----------------+         | (flow2 is the reverse)
         |                   |
         | virtio            |
         v                   |
  +----------------+         |
  |     tap0   /   |         |
  |1.1.1.1   /     |         |
  |ns-tap  /       |         |
  |      /         |         |
  |    /   ixgbe2  +------>--+
  |  /    1.1.1.2  |
  |/      ns-ixgbe |
  +----------------+
  host (linux, vhost-net)


flow1:
  host -(ixgbe)-> guest -(virtio)-> host
  1.1.1.2 -> 1.1.1.1

flow2:
  host -(virtio)-> guest -(ixgbe)-> host
  1.1.1.2 -> 1.1.1.1

Host configuration
------------------

Start qemu with:

- a ne2k management interface to avoi any conflict with dpdk
- 2 ixgbe interfaces given to with vm through vfio
- a virtio net device, connected to a tap interface through vhost-net

  /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \
    -smp 3 -serial telnet::40564,server,nowait -serial null \
    -qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \
    -device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \
    -netdev user,id=user.0,hostfwd=tcp::34965-:22 \
    -device vfio-pci,host=0000:04:00.0 -device vfio-pci,host=0000:04:00.1 \
    -netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \
    -device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \
    -hda "/path/to/ubuntu-14.04-template.qcow2" \
    -snapshot -vga none -display none

Move the tap interface in a netns, and configure it:

  ip netns add ns-tap
  ip netns exec ns-tap ip l set lo up
  ip link set tap0 netns ns-tap
  ip netns exec ns-tap ip l set tap0 down
  ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0
  ip netns exec ns-tap ip l set tap0 up
  ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0
  ip netns exec ns-tap arp -s 1.1.1.2 02:00:00:00:00:00
  ip netns exec ns-tap ip a

Move the ixgbe interface in a netns, and configure it:

  IXGBE=ixgbe2
  ip netns add ns-ixgbe
  ip netns exec ns-ixgbe ip l set lo up
  ip link set ${IXGBE} netns ns-ixgbe
  ip netns exec ns-ixgbe ip l set ${IXGBE} down
  ip netns exec ns-ixgbe ip l set addr 02:00:00:00:00:00 dev ${IXGBE}
  ip netns exec ns-ixgbe ip l set ${IXGBE} up
  ip netns exec ns-ixgbe ip a a 1.1.1.2/24 dev ${IXGBE}
  ip netns exec ns-ixgbe arp -s 1.1.1.1 02:00:00:00:00:01
  ip netns exec ns-ixgbe ip a

Guest configuration
-------------------

List of pci devices:

  00:02.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
  00:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8029(AS) [10ec:8029]
  00:04.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
  00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]

Compile dpdk:

  cd dpdk.org
  make config T=x86_64-native-linuxapp-gcc
  make -j4

Prepare environment:

  mkdir -p /mnt/huge
  mount -t hugetlbfs nodev /mnt/huge
  echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  modprobe uio_pci_generic
  python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:02.0
  python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:05.0

Run test
========

The test uses iperf to validate connectivity between the 2 netns of the
host and trough the guest.

Iperf is run with:

  # flow1: host -(ixgbe)-> guest -(virtio)-> host
  ip netns exec ns-tap iperf -s
  ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10

  # flow2: host -(virtio)-> guest -(ixgbe)-> host
  ip netns exec ns-ixgbe iperf -s
  ip netns exec ns-tap iperf -c 1.1.1.2 -t 10

The guest runs testpmd with csum forward engine, its configuration
depends on the test case.

test1: large packets (lro/tso)
------------------------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter \
    --disable-hw-vlan-strip --enable-rx-cksum --enable-lro \
    --crc-strip --txqflags=0

  set fwd csum
  tso set 1440 0
  csum set ip hw 0
  csum set tcp hw 0
  tso set 1440 1
  #csum set ip hw 1 # not supported by virtio
  csum set tcp hw 1
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54460 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.14 GBytes  5.27 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58312 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.70 GBytes  5.76 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7f968ad9fdc0, pkt_len=24682, nb_segs=13:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: m->tso_segsz=1440
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7f968acc9f40, pkt_len=42058, nb_segs=21:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN PKT_RX_LRO
  rx: m->lro_segsz=1440
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: m->tso_segsz=1440
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4

test2: hardware checksum only
-----------------------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter \
    --disable-hw-vlan-strip --enable-rx-cksum --crc-strip --txqflags=0

  set fwd csum
  csum set ip hw 0
  csum set tcp hw 0
  csum set tcp hw 1
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54462 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  4.49 GBytes  3.86 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58314 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7f0adca89b40, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: flags=PKT_TX_TCP_CKSUM PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7f0adcb98d80, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM PKT_TX_IPV4

test3: no offload
-----------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter --disable-hw-vlan-strip

  set fwd csum
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54466 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  4.29 GBytes  3.68 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58316 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7faf38b3e700, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7faf38b71500, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4

-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 12:30     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 02/12] virtio: setup and start cq in configure callback Olivier Matz
                     ` (11 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Move all code related to device initialization in a new function
virtio_init_device().

This commit brings no functional change, it prepares the next commits
that will add the offload support. For that, it will be needed to
reinitialize the device from ethdev->configure(), using this new
function.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 99 ++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index ef0d6ee..21ed945 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1118,46 +1118,13 @@ rx_func_get(struct rte_eth_dev *eth_dev)
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
 }
 
-/*
- * This function is based on probe() function in virtio_pci.c
- * It returns 0 on success.
- */
-int
-eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+static int
+virtio_init_device(struct rte_eth_dev *eth_dev)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
 	struct virtio_net_config local_config;
-	struct rte_pci_device *pci_dev;
-	uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
-	int ret;
-
-	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr_mrg_rxbuf));
-
-	eth_dev->dev_ops = &virtio_eth_dev_ops;
-	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		rx_func_get(eth_dev);
-		return 0;
-	}
-
-	/* Allocate memory for storing MAC addresses */
-	eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
-	if (eth_dev->data->mac_addrs == NULL) {
-		PMD_INIT_LOG(ERR,
-			"Failed to allocate %d bytes needed to store MAC addresses",
-			VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
-		return -ENOMEM;
-	}
-
-	pci_dev = eth_dev->pci_dev;
-
-	if (pci_dev) {
-		ret = vtpci_init(pci_dev, hw, &dev_flags);
-		if (ret)
-			return ret;
-	}
+	struct rte_pci_device *pci_dev = eth_dev->pci_dev;
 
 	/* Reset the device although not necessary at startup */
 	vtpci_reset(hw);
@@ -1172,10 +1139,11 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 
 	/* If host does not support status then disable LSC */
 	if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
-		dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+		eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+	else
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
 
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
-	eth_dev->data->dev_flags = dev_flags;
 
 	rx_func_get(eth_dev);
 
@@ -1254,12 +1222,61 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
+	virtio_dev_cq_start(eth_dev);
+
+	return 0;
+}
+
+/*
+ * This function is based on probe() function in virtio_pci.c
+ * It returns 0 on success.
+ */
+int
+eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
+	int ret;
+
+	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr_mrg_rxbuf));
+
+	eth_dev->dev_ops = &virtio_eth_dev_ops;
+	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		rx_func_get(eth_dev);
+		return 0;
+	}
+
+	/* Allocate memory for storing MAC addresses */
+	eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR,
+			"Failed to allocate %d bytes needed to store MAC addresses",
+			VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
+		return -ENOMEM;
+	}
+
+	pci_dev = eth_dev->pci_dev;
+
+	if (pci_dev) {
+		ret = vtpci_init(pci_dev, hw, &dev_flags);
+		if (ret)
+			return ret;
+	}
+
+	eth_dev->data->dev_flags = dev_flags;
+
+	/* reset device and negotiate features */
+	ret = virtio_init_device(eth_dev);
+	if (ret < 0)
+		return ret;
+
 	/* Setup interrupt callback  */
 	if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		rte_intr_callback_register(&pci_dev->intr_handle,
-				   virtio_interrupt_handler, eth_dev);
-
-	virtio_dev_cq_start(eth_dev);
+			virtio_interrupt_handler, eth_dev);
 
 	return 0;
 }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 02/12] virtio: setup and start cq in configure callback
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 12:47     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device " Olivier Matz
                     ` (10 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Move the configuration of control queue in the configure callback.
This is needed by next commit, which introduces the reinitialization
of the device in the configure callback to change the feature flags.
Therefore, the control queue will have to be restarted at the same
place.

As virtio_dev_cq_queue_setup() is called from a place where
config->max_virtqueue_pairs is not available, we need to store this in
the private structure. It replaces max_rx_queues and max_tx_queues which
have the same value. The log showing the value of max_rx_queues and
max_tx_queues is also removed since config->max_virtqueue_pairs is
already displayed above.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 43 +++++++++++++++++++-------------------
 drivers/net/virtio/virtio_ethdev.h |  4 ++--
 drivers/net/virtio/virtio_pci.h    |  3 +--
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 21ed945..b1056a1 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -552,6 +552,9 @@ virtio_dev_close(struct rte_eth_dev *dev)
 	if (hw->started == 1)
 		virtio_dev_stop(dev);
 
+	if (hw->cvq)
+		virtio_dev_queue_release(hw->cvq->vq);
+
 	/* reset the NIC */
 	if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
@@ -1191,16 +1194,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 			config->max_virtqueue_pairs = 1;
 		}
 
-		hw->max_rx_queues =
-			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-		hw->max_tx_queues =
-			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-		virtio_dev_cq_queue_setup(eth_dev,
-					config->max_virtqueue_pairs * 2,
-					SOCKET_ID_ANY);
+		hw->max_queue_pairs = config->max_virtqueue_pairs;
 
 		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
 				config->max_virtqueue_pairs);
@@ -1211,19 +1205,15 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 				config->mac[2], config->mac[3],
 				config->mac[4], config->mac[5]);
 	} else {
-		hw->max_rx_queues = 1;
-		hw->max_tx_queues = 1;
+		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1");
+		hw->max_queue_pairs = 1;
 	}
 
-	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-			hw->max_rx_queues, hw->max_tx_queues);
 	if (pci_dev)
 		PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
-	virtio_dev_cq_start(eth_dev);
-
 	return 0;
 }
 
@@ -1285,7 +1275,6 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct rte_pci_device *pci_dev;
-	struct virtio_hw *hw = eth_dev->data->dev_private;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -1301,9 +1290,6 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 	eth_dev->tx_pkt_burst = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 
-	if (hw->cvq)
-		virtio_dev_queue_release(hw->cvq->vq);
-
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
@@ -1358,6 +1344,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
 
@@ -1366,6 +1353,16 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	/* Setup and start control queue */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		ret = virtio_dev_cq_queue_setup(dev,
+			hw->max_queue_pairs * 2,
+			SOCKET_ID_ANY);
+		if (ret < 0)
+			return ret;
+		virtio_dev_cq_start(dev);
+	}
+
 	hw->vlan_strip = rxmode->hw_vlan_strip;
 
 	if (rxmode->hw_vlan_filter
@@ -1559,8 +1556,10 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->driver_name = dev->driver->pci_drv.name;
 	else
 		dev_info->driver_name = "virtio_user PMD";
-	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+	dev_info->max_rx_queues =
+		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_RX_QUEUES);
+	dev_info->max_tx_queues =
+		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_TX_QUEUES);
 	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
 	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
 	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 2ecec6e..5d5e788 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -47,8 +47,8 @@
 #define PAGE_SIZE 4096
 #endif
 
-#define VIRTIO_MAX_RX_QUEUES 128
-#define VIRTIO_MAX_TX_QUEUES 128
+#define VIRTIO_MAX_RX_QUEUES 128U
+#define VIRTIO_MAX_TX_QUEUES 128U
 #define VIRTIO_MAX_MAC_ADDRS 64
 #define VIRTIO_MIN_RX_BUFSIZE 64
 #define VIRTIO_MAX_RX_PKTLEN  9728
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..552166d 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -246,8 +246,7 @@ struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
 	uint64_t    guest_features;
-	uint32_t    max_tx_queues;
-	uint32_t    max_rx_queues;
+	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
 	uint8_t	    vlan_strip;
 	uint8_t	    use_msix;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function Olivier Matz
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 02/12] virtio: setup and start cq in configure callback Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 13:13     ` Maxime Coquelin
  2016-10-12 14:41     ` Yuanhan Liu
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
                     ` (9 subsequent siblings)
  12 siblings, 2 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Add the ability to reset the virtio device in the configure callback
if the features flag changed since previous reset. This will be possible
with the introduction of offload support in next commits.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 26 +++++++++++++++++++-------
 drivers/net/virtio/virtio_pci.h    |  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index b1056a1..fa56032 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1045,14 +1045,13 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 }
 
 static int
-virtio_negotiate_features(struct virtio_hw *hw)
+virtio_negotiate_features(struct virtio_hw *hw, uint64_t req_features)
 {
 	uint64_t host_features;
 
 	/* Prepare guest_features: feature that driver wants to support */
-	hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
 	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %" PRIx64,
-		hw->guest_features);
+		req_features);
 
 	/* Read device(host) feature bits */
 	host_features = hw->vtpci_ops->get_features(hw);
@@ -1063,6 +1062,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 	 * Negotiate features: Subset of device feature bits are written back
 	 * guest feature bits.
 	 */
+	hw->guest_features = req_features;
 	hw->guest_features = vtpci_negotiate_features(hw, host_features);
 	PMD_INIT_LOG(DEBUG, "features after negotiate = %" PRIx64,
 		hw->guest_features);
@@ -1081,6 +1081,8 @@ virtio_negotiate_features(struct virtio_hw *hw)
 		}
 	}
 
+	hw->req_guest_features = req_features;
+
 	return 0;
 }
 
@@ -1121,8 +1123,9 @@ rx_func_get(struct rte_eth_dev *eth_dev)
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
 }
 
+/* reset device and renegotiate features if needed */
 static int
-virtio_init_device(struct rte_eth_dev *eth_dev)
+virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
@@ -1137,7 +1140,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 
 	/* Tell the host we've known how to drive the device. */
 	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-	if (virtio_negotiate_features(hw) < 0)
+	if (virtio_negotiate_features(hw, req_features) < 0)
 		return -1;
 
 	/* If host does not support status then disable LSC */
@@ -1258,8 +1261,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 
 	eth_dev->data->dev_flags = dev_flags;
 
-	/* reset device and negotiate features */
-	ret = virtio_init_device(eth_dev);
+	/* reset device and negotiate default features */
+	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
 	if (ret < 0)
 		return ret;
 
@@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	uint64_t req_features;
 	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
@@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	req_features = VIRTIO_PMD_GUEST_FEATURES;
+	/* if request features changed, reinit the device */
+	if (req_features != hw->req_guest_features) {
+		ret = virtio_init_device(dev, req_features);
+		if (ret < 0)
+			return ret;
+	}
+
 	/* Setup and start control queue */
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		ret = virtio_dev_cq_queue_setup(dev,
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 552166d..d1a7d1e 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -245,6 +245,7 @@ struct virtio_net_config;
 struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
+	uint64_t    req_guest_features;
 	uint64_t    guest_features;
 	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (2 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device " Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 13:25     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

This function can be used to calculate the checksum of data embedded in
mbuf, that can be composed of several segments.

This function will be used by the virtio pmd in next commits to calculate
the checksum in software in case the protocol is not recognized.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst |  5 +++
 lib/librte_net/rte_ip.h                | 60 ++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 3d3c417..f29b44c 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -55,6 +55,11 @@ New Features
   Added two new functions ``rte_get_rx_ol_flag_list()`` and
   ``rte_get_tx_ol_flag_list()`` to dump offload flags as a string.
 
+* **Added a functions to calculate the checksum of data in a mbuf.**
+
+  Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
+  data embedded in an mbuf chain.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 5b7554a..8499356 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -230,6 +230,66 @@ rte_raw_cksum(const void *buf, size_t len)
 }
 
 /**
+ * Compute the raw (non complemented) checksum of a packet.
+ *
+ * @param m
+ *   The pointer to the mbuf.
+ * @param off
+ *   The offset in bytes to start the checksum.
+ * @param len
+ *   The length in bytes of the data to ckecksum.
+ */
+static inline uint16_t
+rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len)
+{
+	const struct rte_mbuf *seg;
+	const char *buf;
+	uint32_t sum, tmp;
+	uint32_t seglen, done;
+
+	/* easy case: all data in the first segment */
+	if (off + len <= rte_pktmbuf_data_len(m))
+		return rte_raw_cksum(rte_pktmbuf_mtod_offset(m,
+				const char *, off), len);
+
+	if (off + len > rte_pktmbuf_pkt_len(m))
+		return 0; /* invalid params, return a dummy value */
+
+	/* else browse the segment to find offset */
+	seglen = 0;
+	for (seg = m; seg != NULL; seg = seg->next) {
+		seglen = rte_pktmbuf_data_len(seg);
+		if (off < seglen)
+			break;
+		off -= seglen;
+	}
+	seglen -= off;
+	buf = rte_pktmbuf_mtod_offset(seg, const char *, off);
+	if (seglen >= len) /* all in one segment */
+		return rte_raw_cksum(buf, len);
+
+	/* hard case: process checksum of several segments */
+	sum = 0;
+	done = 0;
+	for (;;) {
+		tmp = __rte_raw_cksum(buf, seglen, 0);
+		if (done & 1)
+			tmp = rte_bswap16(tmp);
+		sum += tmp;
+		done += seglen;
+		if (done == len)
+			break;
+		seg = seg->next;
+		buf = rte_pktmbuf_mtod(seg, const char *);
+		seglen = rte_pktmbuf_data_len(seg);
+		if (seglen > len - done)
+			seglen = len - done;
+	}
+
+	return __rte_raw_cksum_reduce(sum);
+}
+
+/**
  * Process the IPv4 checksum of an IPv4 header.
  *
  * The checksum field must be set to 0 by the caller.
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 05/12] mbuf: add new Rx checksum mbuf flags
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (3 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 13:43     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Following discussions in [1] and [2], introduce a new bit to
describe the Rx checksum status in mbuf.

Before this patch, only one flag was available:
  PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.

And same for L3:
  PKT_RX_IP_CKSUM_BAD: IP cksum of RX pkt. is not OK.

This had 2 issues:
- it was not possible to differentiate "checksum good" from
  "checksum unknown".
- it was not possible for a virtual driver to say "the checksum
  in packet may be wrong, but data integrity is valid".

This patch tries to solve this issue by having 4 states (2 bits)
for the IP and L4 Rx checksums. New values are:

 - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
   -> the application should verify the checksum by sw
 - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
   -> the application can drop the packet without additional check
 - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
   -> the application can accept the packet without verifying the
      checksum by sw
 - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
   data, but the integrity of the L4 data is verified.
   -> the application can process the packet but must not verify the
      checksum by sw. It has to take care to recalculate the cksum
      if the packet is transmitted (either by sw or using tx offload)

  And same for L3 (replace L4 by IP in description above).

This commit tries to be compatible with existing applications that
only check the existing flag (CKSUM_BAD).

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-June/040007.html

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst |  6 ++++
 lib/librte_mbuf/rte_mbuf.c             | 16 +++++++++--
 lib/librte_mbuf/rte_mbuf.h             | 51 ++++++++++++++++++++++++++++++++--
 3 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index f29b44c..2aff84c 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -60,6 +60,12 @@ New Features
   Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
   data embedded in an mbuf chain.
 
+* **Added new Rx checksum mbuf flags.**
+
+  Added new Rx checksum flags in mbufs to described more states: unknown,
+  good, bad, or not present (useful for virtual drivers). This modification
+  was done for IP and L4.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index bd5bd48..c55cb57 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -307,7 +307,11 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
 	case PKT_RX_FDIR: return "PKT_RX_FDIR";
 	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_L4_CKSUM_GOOD: return "PKT_RX_L4_CKSUM_GOOD";
+	case PKT_RX_L4_CKSUM_NONE: return "PKT_RX_L4_CKSUM_NONE";
 	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_GOOD: return "PKT_RX_IP_CKSUM_GOOD";
+	case PKT_RX_IP_CKSUM_NONE: return "PKT_RX_IP_CKSUM_NONE";
 	case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
 	case PKT_RX_VLAN_STRIPPED: return "PKT_RX_VLAN_STRIPPED";
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
@@ -330,8 +334,16 @@ int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT, NULL },
 		{ PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, NULL },
 		{ PKT_RX_FDIR, PKT_RX_FDIR, NULL },
-		{ PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, NULL },
-		{ PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD, NULL },
+		{ PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_GOOD, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_NONE, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_UNKNOWN, PKT_RX_L4_CKSUM_MASK,
+		  "PKT_RX_L4_CKSUM_UNKNOWN" },
+		{ PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_GOOD, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_NONE, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_UNKNOWN, PKT_RX_IP_CKSUM_MASK,
+		  "PKT_RX_IP_CKSUM_UNKNOWN" },
 		{ PKT_RX_EIP_CKSUM_BAD, PKT_RX_EIP_CKSUM_BAD, NULL },
 		{ PKT_RX_VLAN_STRIPPED, PKT_RX_VLAN_STRIPPED, NULL },
 		{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 5e349e7..7061cfc 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,8 +91,25 @@ extern "C" {
 
 #define PKT_RX_RSS_HASH      (1ULL << 1)  /**< RX packet with RSS hash result. */
 #define PKT_RX_FDIR          (1ULL << 2)  /**< RX packet with FDIR match indicate. */
-#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt. is not OK. */
-#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum of RX pkt. is not OK. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
 #define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)  /**< External IP header checksum error. */
 
 /**
@@ -102,7 +119,35 @@ extern "C" {
  */
 #define PKT_RX_VLAN_STRIPPED (1ULL << 6)
 
-/* hole, some bits can be reused here  */
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
 
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 06/12] app/testpmd: fix checksum stats in csum engine
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (4 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 13:46     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 07/12] mbuf: new flag for LRO Olivier Matz
                     ` (6 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

---
 app/test-pmd/csumonly.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index d5eb260..8c88ee8 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -679,8 +679,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		rx_ol_flags = m->ol_flags;
 
 		/* Update the L3/L4 checksum error packet statistics */
-		rx_bad_ip_csum += ((rx_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += ((rx_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+		if ((rx_ol_flags & PKT_RX_IP_CKSUM_MASK) == PKT_RX_IP_CKSUM_BAD)
+			rx_bad_ip_csum += 1;
+		if ((rx_ol_flags & PKT_RX_L4_CKSUM_MASK) == PKT_RX_L4_CKSUM_BAD)
+			rx_bad_l4_csum += 1;
 
 		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
 		 * and inner headers */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 07/12] mbuf: new flag for LRO
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (5 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 13:48     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 08/12] app/testpmd: display lro segment size Olivier Matz
                     ` (5 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

When receiving coalesced packets in virtio, the original size of the
segments is provided. This is a useful information because it allows to
resegment with the same size.

Add a RX new flag in mbuf, that can be set when packets are coalesced by
a hardware or virtual driver when the m->tso_segsz field is valid and is
set to the segment size of original packets.

This flag is used in next commits in the virtio pmd.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst | 5 +++++
 lib/librte_mbuf/rte_mbuf.c             | 2 ++
 lib/librte_mbuf/rte_mbuf.h             | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 2aff84c..a8ad9ab 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -66,6 +66,11 @@ New Features
   good, bad, or not present (useful for virtual drivers). This modification
   was done for IP and L4.
 
+* **Added a LRO mbuf flag.**
+
+  Added a new RX LRO mbuf flag, used when packets are coalesced. This
+  flag indicates that the segment size of original packets is known.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index c55cb57..61bcd7e 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -317,6 +317,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
 	case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
+	case PKT_RX_LRO: return "PKT_RX_LRO";
 	default: return NULL;
 	}
 }
@@ -349,6 +350,7 @@ int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
 		{ PKT_RX_IEEE1588_TMST, PKT_RX_IEEE1588_TMST, NULL },
 		{ PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ_STRIPPED, NULL },
+		{ PKT_RX_LRO, PKT_RX_LRO, NULL },
 	};
 	const char *name;
 	unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7061cfc..f9d7bfa 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -170,6 +170,13 @@ extern "C" {
  */
 #define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
 
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
 /* add new RX flags here */
 
 /* add new TX flags here */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 08/12] app/testpmd: display lro segment size
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (6 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 07/12] mbuf: new flag for LRO Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 13:49     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support Olivier Matz
                     ` (4 subsequent siblings)
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

In csumonly engine, display the value of LRO segment if the
LRO flag is set.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test-pmd/csumonly.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8c88ee8..3f71595 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -792,6 +792,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				"l4_proto=%d l4_len=%d flags=%s\n",
 				info.l2_len, rte_be_to_cpu_16(info.ethertype),
 				info.l3_len, info.l4_proto, info.l4_len, buf);
+			if (rx_ol_flags & PKT_RX_LRO)
+				printf("rx: m->lro_segsz=%u\n", m->tso_segsz);
 			if (info.is_tunnel == 1)
 				printf("rx: outer_l2_len=%d outer_ethertype=%x "
 					"outer_l3_len=%d\n", info.outer_l2_len,
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (7 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 08/12] app/testpmd: display lro segment size Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-03 12:51     ` Maxime Coquelin
  2016-10-11 14:04     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 10/12] virtio: add Tx " Olivier Matz
                     ` (3 subsequent siblings)
  12 siblings, 2 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 14 ++++----
 drivers/net/virtio/virtio_ethdev.h |  2 +-
 drivers/net/virtio/virtio_rxtx.c   | 69 ++++++++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h     |  1 +
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index fa56032..43cb096 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->data->dev_flags = dev_flags;
 
 	/* reset device and negotiate default features */
-	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
+	ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
 	if (ret < 0)
 		return ret;
 
@@ -1351,13 +1351,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
+	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
+	if (rxmode->hw_ip_checksum)
+		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
 
-	if (rxmode->hw_ip_checksum) {
-		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
-		return -EINVAL;
-	}
-
-	req_features = VIRTIO_PMD_GUEST_FEATURES;
 	/* if request features changed, reinit the device */
 	if (req_features != hw->req_guest_features) {
 		ret = virtio_init_device(dev, req_features);
@@ -1578,6 +1575,9 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_txconf = (struct rte_eth_txconf) {
 		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
 	};
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_TCP_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM;
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 5d5e788..2fc9218 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -54,7 +54,7 @@
 #define VIRTIO_MAX_RX_PKTLEN  9728
 
 /* Features desired/implemented by this driver. */
-#define VIRTIO_PMD_GUEST_FEATURES		\
+#define VIRTIO_PMD_DEFAULT_GUEST_FEATURES	\
 	(1u << VIRTIO_NET_F_MAC		  |	\
 	 1u << VIRTIO_NET_F_STATUS	  |	\
 	 1u << VIRTIO_NET_F_MQ		  |	\
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 724517e..eda678a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -50,6 +50,7 @@
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_byteorder.h>
+#include <rte_net.h>
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
 	}
 }
 
+/* Optionally fill offload information in structure */
+static int
+virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
+{
+	struct rte_net_hdr_lens hdr_lens;
+	uint32_t hdrlen, ptype;
+	int l4_supported = 0;
+
+	/* nothing to do */
+	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+		return 0;
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum, off;
+
+			csum = rte_raw_cksum_mbuf(m, hdr->csum_start,
+				rte_pktmbuf_pkt_len(m) - hdr->csum_start);
+			if (csum != 0xffff)
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *,
+					off) = csum;
+		}
+	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
+		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
+	}
+
+	return 0;
+}
+
 #define VIRTIO_MBUF_BURST_SZ 64
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
 uint16_t
@@ -642,6 +693,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	int error;
 	uint32_t i, nb_enqueued;
 	uint32_t hdr_size;
+	struct virtio_net_hdr *hdr;
 
 	nb_used = VIRTQUEUE_NUSED(vq);
 
@@ -683,9 +735,19 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
 		rxm->data_len = (uint16_t)(len[i] - hdr_size);
 
+		hdr = (struct virtio_net_hdr *)((char *)rxm->buf_addr +
+			RTE_PKTMBUF_HEADROOM - hdr_size);
+
 		if (hw->vlan_strip)
 			rte_vlan_strip(rxm);
 
+		/* Update offload features */
+		if (virtio_rx_offload(rxm, hdr) < 0) {
+			virtio_discard_rxbuf(vq, rxm);
+			rxvq->stats.errors++;
+			continue;
+		}
+
 		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
 
 		rx_pkts[nb_rx++] = rxm;
@@ -805,6 +867,13 @@ virtio_recv_mergeable_pkts(void *rx_queue,
 		rx_pkts[nb_rx] = rxm;
 		prev = rxm;
 
+		/* Update offload features */
+		if (virtio_rx_offload(rxm, &header->hdr) < 0) {
+			virtio_discard_rxbuf(vq, rxm);
+			rxvq->stats.errors++;
+			continue;
+		}
+
 		seg_res = seg_num - 1;
 
 		while (seg_res != 0) {
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6737b81..ef0027b 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -223,6 +223,7 @@ struct virtqueue {
  */
 struct virtio_net_hdr {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
+#define VIRTIO_NET_HDR_F_DATA_VALID 2    /**< Checksum is valid */
 	uint8_t flags;
 #define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (8 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-07  7:25     ` Maxime Coquelin
  2016-10-13  8:38     ` Yuanhan Liu
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 11/12] virtio: add Lro support Olivier Matz
                     ` (2 subsequent siblings)
  12 siblings, 2 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |  7 +++++
 drivers/net/virtio/virtio_ethdev.h |  1 +
 drivers/net/virtio/virtio_rxtx.c   | 57 +++++++++++++++++++++++++-------------
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 43cb096..55024cd 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1578,6 +1578,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_TCP_CKSUM |
 		DEV_RX_OFFLOAD_UDP_CKSUM;
+	dev_info->tx_offload_capa = 0;
+
+	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
+		dev_info->tx_offload_capa |=
+			DEV_TX_OFFLOAD_UDP_CKSUM |
+			DEV_TX_OFFLOAD_TCP_CKSUM;
+	}
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 2fc9218..202aa2e 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -62,6 +62,7 @@
 	 1u << VIRTIO_NET_F_CTRL_VQ	  |	\
 	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
+	 1u << VIRTIO_NET_F_CSUM	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1ULL << VIRTIO_F_VERSION_1)
 
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index eda678a..4ae11e7 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -213,13 +213,14 @@ static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		       uint16_t needed, int use_indirect, int can_push)
 {
+	struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
 	struct vq_desc_extra *dxp;
 	struct virtqueue *vq = txvq->vq;
 	struct vring_desc *start_dp;
 	uint16_t seg_num = cookie->nb_segs;
 	uint16_t head_idx, idx;
 	uint16_t head_size = vq->hw->vtnet_hdr_size;
-	unsigned long offs;
+	struct virtio_net_hdr *hdr;
 
 	head_idx = vq->vq_desc_head_idx;
 	idx = head_idx;
@@ -230,10 +231,9 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	start_dp = vq->vq_ring.desc;
 
 	if (can_push) {
-		/* put on zero'd transmit header (no offloads) */
-		void *hdr = rte_pktmbuf_prepend(cookie, head_size);
-
-		memset(hdr, 0, head_size);
+		/* prepend cannot fail, checked by caller */
+		hdr = (struct virtio_net_hdr *)
+			rte_pktmbuf_prepend(cookie, head_size);
 	} else if (use_indirect) {
 		/* setup tx ring slot to point to indirect
 		 * descriptor list stored in reserved region.
@@ -241,14 +241,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		 * the first slot in indirect ring is already preset
 		 * to point to the header in reserved region
 		 */
-		struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
-
-		offs = idx * sizeof(struct virtio_tx_region)
-			+ offsetof(struct virtio_tx_region, tx_indir);
-
-		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+			RTE_PTR_DIFF(&txr[idx].tx_indir, txr);
 		start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
 		start_dp[idx].flags = VRING_DESC_F_INDIRECT;
+		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
 
 		/* loop below will fill in rest of the indirect elements */
 		start_dp = txr[idx].tx_indir;
@@ -257,15 +254,40 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		/* setup first tx ring slot to point to header
 		 * stored in reserved region.
 		 */
-		offs = idx * sizeof(struct virtio_tx_region)
-			+ offsetof(struct virtio_tx_region, tx_hdr);
-
-		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+			RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
 		start_dp[idx].len   = vq->hw->vtnet_hdr_size;
 		start_dp[idx].flags = VRING_DESC_F_NEXT;
+		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
+
 		idx = start_dp[idx].next;
 	}
 
+	/* Checksum Offload */
+	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+	case PKT_TX_UDP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = 6;
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	case PKT_TX_TCP_CKSUM:
+		hdr->csum_start = cookie->l2_len + cookie->l3_len;
+		hdr->csum_offset = 16;
+		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+		break;
+
+	default:
+		hdr->csum_start = 0;
+		hdr->csum_offset = 0;
+		hdr->flags = 0;
+		break;
+	}
+
+	hdr->gso_type = 0;
+	hdr->gso_size = 0;
+	hdr->hdr_len = 0;
+
 	do {
 		start_dp[idx].addr  = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
 		start_dp[idx].len   = cookie->data_len;
@@ -512,11 +534,6 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	PMD_INIT_FUNC_TRACE();
 
-	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
-	    != ETH_TXQ_FLAGS_NOXSUMS) {
-		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
-		return -EINVAL;
-	}
 
 #ifdef RTE_MACHINE_CPUFLAG_SSSE3
 	/* Use simple rx/tx func if single segment and no offloads */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 11/12] virtio: add Lro support
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (9 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 10/12] virtio: add Tx " Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-11 14:21     ` Maxime Coquelin
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support Olivier Matz
  2016-10-11 11:35   ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Yuanhan Liu
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |  7 ++++++-
 drivers/net/virtio/virtio_ethdev.h |  9 ---------
 drivers/net/virtio/virtio_rxtx.c   | 21 +++++++++++++++++++++
 3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 55024cd..fd33364 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1354,6 +1354,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
 	if (rxmode->hw_ip_checksum)
 		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+	if (rxmode->enable_lro)
+		req_features |=
+			(1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+			(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 
 	/* if request features changed, reinit the device */
 	if (req_features != hw->req_guest_features) {
@@ -1577,7 +1581,8 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	};
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_TCP_CKSUM |
-		DEV_RX_OFFLOAD_UDP_CKSUM;
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_LRO;
 	dev_info->tx_offload_capa = 0;
 
 	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 202aa2e..daa6bff 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -116,13 +116,4 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
 
-/*
- * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
- * frames larger than 1514 bytes. We do not yet support software LRO
- * via tcp_lro_rx().
- */
-#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
-			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
-
-
 #endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 4ae11e7..0464bd1 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -692,6 +692,27 @@ virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
 		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
 	}
 
+	/* GSO request, save required information in mbuf */
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		/* Check unsupported modes */
+		if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
+		    (hdr->gso_size == 0)) {
+			return -EINVAL;
+		}
+
+		/* Update mss lengthes in mbuf */
+		m->tso_segsz = hdr->gso_size;
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+			case VIRTIO_NET_HDR_GSO_TCPV4:
+			case VIRTIO_NET_HDR_GSO_TCPV6:
+				m->ol_flags |= PKT_RX_LRO | \
+					PKT_RX_L4_CKSUM_NONE;
+				break;
+			default:
+				return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (10 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 11/12] virtio: add Lro support Olivier Matz
@ 2016-10-03  9:00   ` Olivier Matz
  2016-10-13  8:18     ` Yuanhan Liu
  2016-10-11 11:35   ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Yuanhan Liu
  12 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-03  9:00 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |   6 ++
 drivers/net/virtio/virtio_ethdev.h |   2 +
 drivers/net/virtio/virtio_rxtx.c   | 129 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index fd33364..5728ca1 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1563,6 +1563,7 @@ virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complet
 static void
 virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
+	uint64_t tso_mask;
 	struct virtio_hw *hw = dev->data->dev_private;
 
 	if (dev->pci_dev)
@@ -1590,6 +1591,11 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			DEV_TX_OFFLOAD_UDP_CKSUM |
 			DEV_TX_OFFLOAD_TCP_CKSUM;
 	}
+
+	tso_mask = (1ULL << VIRTIO_NET_F_HOST_TSO4) |
+		(1ULL << VIRTIO_NET_F_HOST_TSO6);
+	if ((hw->guest_features & tso_mask) == tso_mask)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index daa6bff..ab3b138 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -63,6 +63,8 @@
 	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
 	 1u << VIRTIO_NET_F_CSUM	  |	\
+	 1u << VIRTIO_NET_F_HOST_TSO4	  |	\
+	 1u << VIRTIO_NET_F_HOST_TSO6	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1ULL << VIRTIO_F_VERSION_1)
 
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 0464bd1..134995e 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -51,6 +51,8 @@
 #include <rte_errno.h>
 #include <rte_byteorder.h>
 #include <rte_net.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -209,6 +211,111 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
 	return 0;
 }
 
+/* When doing TSO, the IP length is not included in the pseudo header
+ * checksum of the packet given to the PMD, but for virtio it is
+ * expected.
+ */
+static void
+virtio_tso_fix_cksum(struct rte_mbuf *m)
+{
+	/* common case: header is not fragmented */
+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
+			m->l4_len)) {
+		struct ipv4_hdr *iph;
+		struct ipv6_hdr *ip6h;
+		struct tcp_hdr *th;
+		uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
+		uint32_t tmp;
+
+		iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
+		th = RTE_PTR_ADD(iph, m->l3_len);
+		if ((iph->version_ihl >> 4) == 4) {
+			iph->hdr_checksum = 0;
+			iph->hdr_checksum = rte_ipv4_cksum(iph);
+			ip_len = iph->total_length;
+			ip_paylen = rte_cpu_to_be_16(rte_be_to_cpu_16(ip_len) -
+				m->l3_len);
+		} else {
+			ip6h = (struct ipv6_hdr *)iph;
+			ip_paylen = ip6h->payload_len;
+		}
+
+		/* calculate the new phdr checksum not including ip_paylen */
+		prev_cksum = th->cksum;
+		tmp = prev_cksum;
+		tmp += ip_paylen;
+		tmp = (tmp & 0xffff) + (tmp >> 16);
+		new_cksum = tmp;
+
+		/* replace it in the packet */
+		th->cksum = new_cksum;
+	} else {
+		const struct ipv4_hdr *iph;
+		struct ipv4_hdr iph_copy;
+		union {
+			uint16_t u16;
+			uint8_t u8[2];
+		} prev_cksum, new_cksum, ip_len, ip_paylen, ip_csum;
+		uint32_t tmp;
+
+		/* Same code than above, but we use rte_pktmbuf_read()
+		 * or we read/write in mbuf data one byte at a time to
+		 * avoid issues if the packet is multi segmented.
+		 */
+
+		uint8_t ip_version;
+
+		ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len) >> 4;
+
+		/* calculate ip checksum (API imposes to set it to 0)
+		 * and get ip payload len */
+		if (ip_version == 4) {
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 10) = 0;
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 11) = 0;
+			iph = rte_pktmbuf_read(m, m->l2_len,
+				sizeof(*iph), &iph_copy);
+			ip_csum.u16 = rte_ipv4_cksum(iph);
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 10) = ip_csum.u8[0];
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 11) = ip_csum.u8[1];
+
+			ip_len.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 2);
+			ip_len.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 3);
+
+			ip_paylen.u16 = rte_cpu_to_be_16(
+				rte_be_to_cpu_16(ip_len.u16) - m->l3_len);
+		} else {
+			ip_paylen.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 4);
+			ip_paylen.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 5);
+		}
+
+		/* calculate the new phdr checksum not including ip_paylen */
+		/* get phdr cksum at offset 16 of TCP header */
+		prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 16);
+		prev_cksum.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 17);
+		tmp = prev_cksum.u16;
+		tmp += ip_paylen.u16;
+		tmp = (tmp & 0xffff) + (tmp >> 16);
+		new_cksum.u16 = tmp;
+
+		/* replace it in the packet */
+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
+	}
+}
+
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		       uint16_t needed, int use_indirect, int can_push)
@@ -264,6 +371,9 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	}
 
 	/* Checksum Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG)
+		cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+
 	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
 	case PKT_TX_UDP_CKSUM:
 		hdr->csum_start = cookie->l2_len + cookie->l3_len;
@@ -284,9 +394,22 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		break;
 	}
 
-	hdr->gso_type = 0;
-	hdr->gso_size = 0;
-	hdr->hdr_len = 0;
+	/* TCP Segmentation Offload */
+	if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+		virtio_tso_fix_cksum(cookie);
+		hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+			VIRTIO_NET_HDR_GSO_TCPV6 :
+			VIRTIO_NET_HDR_GSO_TCPV4;
+		hdr->gso_size = cookie->tso_segsz;
+		hdr->hdr_len =
+			cookie->l2_len +
+			cookie->l3_len +
+			cookie->l4_len;
+	} else {
+		hdr->gso_type = 0;
+		hdr->gso_size = 0;
+		hdr->hdr_len = 0;
+	}
 
 	do {
 		start_dp[idx].addr  = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support Olivier Matz
@ 2016-10-03 12:51     ` Maxime Coquelin
  2016-10-05 11:56       ` Olivier Matz
  2016-10-11 14:04     ` Maxime Coquelin
  1 sibling, 1 reply; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-03 12:51 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Olivier,


On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 14 ++++----
>  drivers/net/virtio/virtio_ethdev.h |  2 +-
>  drivers/net/virtio/virtio_rxtx.c   | 69 ++++++++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h     |  1 +
>  4 files changed, 78 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
> index fa56032..43cb096 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>  	eth_dev->data->dev_flags = dev_flags;
>
>  	/* reset device and negotiate default features */
> -	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
> +	ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
>  	if (ret < 0)
>  		return ret;
>
> @@ -1351,13 +1351,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>  	int ret;
>
>  	PMD_INIT_LOG(DEBUG, "configure");
> +	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
> +	if (rxmode->hw_ip_checksum)
> +		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
>
> -	if (rxmode->hw_ip_checksum) {
> -		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
> -		return -EINVAL;
> -	}
> -
> -	req_features = VIRTIO_PMD_GUEST_FEATURES;
>  	/* if request features changed, reinit the device */
>  	if (req_features != hw->req_guest_features) {
>  		ret = virtio_init_device(dev, req_features);
> @@ -1578,6 +1575,9 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  	dev_info->default_txconf = (struct rte_eth_txconf) {
>  		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
>  	};
> +	dev_info->rx_offload_capa =
> +		DEV_RX_OFFLOAD_TCP_CKSUM |
> +		DEV_RX_OFFLOAD_UDP_CKSUM;
>  }
>
>  /*
> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> index 5d5e788..2fc9218 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -54,7 +54,7 @@
>  #define VIRTIO_MAX_RX_PKTLEN  9728
>
>  /* Features desired/implemented by this driver. */
> -#define VIRTIO_PMD_GUEST_FEATURES		\
> +#define VIRTIO_PMD_DEFAULT_GUEST_FEATURES	\
>  	(1u << VIRTIO_NET_F_MAC		  |	\
>  	 1u << VIRTIO_NET_F_STATUS	  |	\
>  	 1u << VIRTIO_NET_F_MQ		  |	\
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 724517e..eda678a 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -50,6 +50,7 @@
>  #include <rte_string_fns.h>
>  #include <rte_errno.h>
>  #include <rte_byteorder.h>
> +#include <rte_net.h>
>
>  #include "virtio_logs.h"
>  #include "virtio_ethdev.h"
> @@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
>  	}
>  }
>
> +/* Optionally fill offload information in structure */
> +static int
> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
> +{
> +	struct rte_net_hdr_lens hdr_lens;
> +	uint32_t hdrlen, ptype;
> +	int l4_supported = 0;
> +
> +	/* nothing to do */
> +	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> +		return 0;
Maybe we could first check whether offload features were negotiated?
Doing this, we could return before accessing the header and so avoid a
cache miss.

Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-03 12:51     ` Maxime Coquelin
@ 2016-10-05 11:56       ` Olivier Matz
  2016-10-05 13:27         ` Maxime Coquelin
  0 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-05 11:56 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Maxime,

On 10/03/2016 02:51 PM, Maxime Coquelin wrote:
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -50,6 +50,7 @@
>>  #include <rte_string_fns.h>
>>  #include <rte_errno.h>
>>  #include <rte_byteorder.h>
>> +#include <rte_net.h>
>>
>>  #include "virtio_logs.h"
>>  #include "virtio_ethdev.h"
>> @@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats
>> *stats, struct rte_mbuf *mbuf)
>>      }
>>  }
>>
>> +/* Optionally fill offload information in structure */
>> +static int
>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>> +{
>> +    struct rte_net_hdr_lens hdr_lens;
>> +    uint32_t hdrlen, ptype;
>> +    int l4_supported = 0;
>> +
>> +    /* nothing to do */
>> +    if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>> +        return 0;
> Maybe we could first check whether offload features were negotiated?
> Doing this, we could return before accessing the header and so avoid a
> cache miss.

Yes, doing this would avoid reading the virtio header when the rx
function is virtio_recv_pkts(). When using virtio_recv_mergeable_pkts(),
it won't have a big impact since we already need to read hdr->num_buffers.


I plan to do something like this in both recv functions:

@@ -854,6 +854,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
        int error;
        uint32_t i, nb_enqueued;
        uint32_t hdr_size;
+       uint64_t features;
        struct virtio_net_hdr *hdr;

        nb_used = VIRTQUEUE_NUSED(vq);
@@ -872,6 +873,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
        nb_rx = 0;
        nb_enqueued = 0;
        hdr_size = hw->vtnet_hdr_size;
+       features = hw->guest_features;

        for (i = 0; i < num ; i++) {
                rxm = rcv_pkts[i];
@@ -903,7 +905,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
                        rte_vlan_strip(rxm);

                /* Update offload features */
-               if (virtio_rx_offload(rxm, hdr) < 0) {
+               if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
+                               virtio_rx_offload(rxm, hdr) < 0) {
                        virtio_discard_rxbuf(vq, rxm);
                        rxvq->stats.errors++;
                        continue;


Thank you for the feedback.
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-05 11:56       ` Olivier Matz
@ 2016-10-05 13:27         ` Maxime Coquelin
  2016-10-05 13:30           ` Olivier Matz
  2016-10-12 13:02           ` Yuanhan Liu
  0 siblings, 2 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-05 13:27 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Olivier,

On 10/05/2016 01:56 PM, Olivier Matz wrote:
> Hi Maxime,
>
> On 10/03/2016 02:51 PM, Maxime Coquelin wrote:
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -50,6 +50,7 @@
>>>  #include <rte_string_fns.h>
>>>  #include <rte_errno.h>
>>>  #include <rte_byteorder.h>
>>> +#include <rte_net.h>
>>>
>>>  #include "virtio_logs.h"
>>>  #include "virtio_ethdev.h"
>>> @@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats
>>> *stats, struct rte_mbuf *mbuf)
>>>      }
>>>  }
>>>
>>> +/* Optionally fill offload information in structure */
>>> +static int
>>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>>> +{
>>> +    struct rte_net_hdr_lens hdr_lens;
>>> +    uint32_t hdrlen, ptype;
>>> +    int l4_supported = 0;
>>> +
>>> +    /* nothing to do */
>>> +    if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>>> +        return 0;
>> Maybe we could first check whether offload features were negotiated?
>> Doing this, we could return before accessing the header and so avoid a
>> cache miss.
>
> Yes, doing this would avoid reading the virtio header when the rx
> function is virtio_recv_pkts(). When using virtio_recv_mergeable_pkts(),
> it won't have a big impact since we already need to read hdr->num_buffers.
Right, it matters only for the non-mergeable buffers case.

>
>
> I plan to do something like this in both recv functions:
>
> @@ -854,6 +854,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>         int error;
>         uint32_t i, nb_enqueued;
>         uint32_t hdr_size;
> +       uint64_t features;
>         struct virtio_net_hdr *hdr;
>
>         nb_used = VIRTQUEUE_NUSED(vq);
> @@ -872,6 +873,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>         nb_rx = 0;
>         nb_enqueued = 0;
>         hdr_size = hw->vtnet_hdr_size;
> +       features = hw->guest_features;
>
>         for (i = 0; i < num ; i++) {
>                 rxm = rcv_pkts[i];
> @@ -903,7 +905,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>                         rte_vlan_strip(rxm);
>
>                 /* Update offload features */
> -               if (virtio_rx_offload(rxm, hdr) < 0) {
> +               if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/
And don't forget to update the test for LRO patch.
Except this, it sounds good.

Thanks,
Maxime
> +                               virtio_rx_offload(rxm, hdr) < 0) {
>                         virtio_discard_rxbuf(vq, rxm);
>                         rxvq->stats.errors++;
>                         continue;
>
> Thank you for the feedback.
> Olivier
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-05 13:27         ` Maxime Coquelin
@ 2016-10-05 13:30           ` Olivier Matz
  2016-10-12 13:02           ` Yuanhan Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-05 13:30 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/05/2016 03:27 PM, Maxime Coquelin wrote:
>> @@ -903,7 +905,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
>> **rx_pkts, uint16_t nb_pkts)
>>                         rte_vlan_strip(rxm);
>>
>>                 /* Update offload features */
>> -               if (virtio_rx_offload(rxm, hdr) < 0) {
>> +               if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
> s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/

oooh good catch :)

> And don't forget to update the test for LRO patch.

yep

> Except this, it sounds good.

Thanks, I'll send a v3 soon.

Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 10/12] virtio: add Tx " Olivier Matz
@ 2016-10-07  7:25     ` Maxime Coquelin
  2016-10-07 16:36       ` Olivier Matz
  2016-10-13  8:38     ` Yuanhan Liu
  1 sibling, 1 reply; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-07  7:25 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Olivier,

On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c |  7 +++++
>  drivers/net/virtio/virtio_ethdev.h |  1 +
>  drivers/net/virtio/virtio_rxtx.c   | 57 +++++++++++++++++++++++++-------------
>  3 files changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
> index 43cb096..55024cd 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1578,6 +1578,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  	dev_info->rx_offload_capa =
>  		DEV_RX_OFFLOAD_TCP_CKSUM |
>  		DEV_RX_OFFLOAD_UDP_CKSUM;
> +	dev_info->tx_offload_capa = 0;
> +
> +	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
> +		dev_info->tx_offload_capa |=
> +			DEV_TX_OFFLOAD_UDP_CKSUM |
> +			DEV_TX_OFFLOAD_TCP_CKSUM;
> +	}
>  }
>
>  /*
> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> index 2fc9218..202aa2e 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -62,6 +62,7 @@
>  	 1u << VIRTIO_NET_F_CTRL_VQ	  |	\
>  	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
>  	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
> +	 1u << VIRTIO_NET_F_CSUM	  |	\
>  	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
>  	 1ULL << VIRTIO_F_VERSION_1)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index eda678a..4ae11e7 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -213,13 +213,14 @@ static inline void
>  virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>  		       uint16_t needed, int use_indirect, int can_push)
>  {
> +	struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
>  	struct vq_desc_extra *dxp;
>  	struct virtqueue *vq = txvq->vq;
>  	struct vring_desc *start_dp;
>  	uint16_t seg_num = cookie->nb_segs;
>  	uint16_t head_idx, idx;
>  	uint16_t head_size = vq->hw->vtnet_hdr_size;
> -	unsigned long offs;
> +	struct virtio_net_hdr *hdr;
>
>  	head_idx = vq->vq_desc_head_idx;
>  	idx = head_idx;
> @@ -230,10 +231,9 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>  	start_dp = vq->vq_ring.desc;
>
>  	if (can_push) {
> -		/* put on zero'd transmit header (no offloads) */
> -		void *hdr = rte_pktmbuf_prepend(cookie, head_size);
> -
> -		memset(hdr, 0, head_size);
> +		/* prepend cannot fail, checked by caller */
> +		hdr = (struct virtio_net_hdr *)
> +			rte_pktmbuf_prepend(cookie, head_size);
>  	} else if (use_indirect) {
>  		/* setup tx ring slot to point to indirect
>  		 * descriptor list stored in reserved region.
> @@ -241,14 +241,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>  		 * the first slot in indirect ring is already preset
>  		 * to point to the header in reserved region
>  		 */
> -		struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
> -
> -		offs = idx * sizeof(struct virtio_tx_region)
> -			+ offsetof(struct virtio_tx_region, tx_indir);
> -
> -		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
> +		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
> +			RTE_PTR_DIFF(&txr[idx].tx_indir, txr);
>  		start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
>  		start_dp[idx].flags = VRING_DESC_F_INDIRECT;
> +		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
>
>  		/* loop below will fill in rest of the indirect elements */
>  		start_dp = txr[idx].tx_indir;
> @@ -257,15 +254,40 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>  		/* setup first tx ring slot to point to header
>  		 * stored in reserved region.
>  		 */
> -		offs = idx * sizeof(struct virtio_tx_region)
> -			+ offsetof(struct virtio_tx_region, tx_hdr);
> -
> -		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
> +		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
> +			RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
>  		start_dp[idx].len   = vq->hw->vtnet_hdr_size;
>  		start_dp[idx].flags = VRING_DESC_F_NEXT;
> +		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
> +
>  		idx = start_dp[idx].next;
>  	}
>
> +	/* Checksum Offload */
> +	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
> +	case PKT_TX_UDP_CKSUM:
> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
> +		hdr->csum_offset = 6;
> +		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +		break;
> +
> +	case PKT_TX_TCP_CKSUM:
> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
> +		hdr->csum_offset = 16;
> +		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +		break;
> +
> +	default:
> +		hdr->csum_start = 0;
> +		hdr->csum_offset = 0;
> +		hdr->flags = 0;
> +		break;
> +	}
> +
> +	hdr->gso_type = 0;
> +	hdr->gso_size = 0;
> +	hdr->hdr_len = 0;

In he case we don't use any offload, have you measured the performance 
regression
with current code when using a dedicated descriptor for the header?
I haven't tested you series, but I would think it is more than 15% for
64 bytes packets based on my trials to use a single descriptor.

Indeed, without your series, when using a dedicated desc for the header,
the header is not accessed in the virtio transmit path. It is zeroed at
init time.

Could we keep the same behaviour when offloading features aren't
negotiated?

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support
  2016-10-07  7:25     ` Maxime Coquelin
@ 2016-10-07 16:36       ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-07 16:36 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Maxime,

On 10/07/2016 09:25 AM, Maxime Coquelin wrote:
> Hi Olivier,
> 
> On 10/03/2016 11:00 AM, Olivier Matz wrote:
>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>> ---
>>  drivers/net/virtio/virtio_ethdev.c |  7 +++++
>>  drivers/net/virtio/virtio_ethdev.h |  1 +
>>  drivers/net/virtio/virtio_rxtx.c   | 57
>> +++++++++++++++++++++++++-------------
>>  3 files changed, 45 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.c
>> b/drivers/net/virtio/virtio_ethdev.c
>> index 43cb096..55024cd 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -1578,6 +1578,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev,
>> struct rte_eth_dev_info *dev_info)
>>      dev_info->rx_offload_capa =
>>          DEV_RX_OFFLOAD_TCP_CKSUM |
>>          DEV_RX_OFFLOAD_UDP_CKSUM;
>> +    dev_info->tx_offload_capa = 0;
>> +
>> +    if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
>> +        dev_info->tx_offload_capa |=
>> +            DEV_TX_OFFLOAD_UDP_CKSUM |
>> +            DEV_TX_OFFLOAD_TCP_CKSUM;
>> +    }
>>  }
>>
>>  /*
>> diff --git a/drivers/net/virtio/virtio_ethdev.h
>> b/drivers/net/virtio/virtio_ethdev.h
>> index 2fc9218..202aa2e 100644
>> --- a/drivers/net/virtio/virtio_ethdev.h
>> +++ b/drivers/net/virtio/virtio_ethdev.h
>> @@ -62,6 +62,7 @@
>>       1u << VIRTIO_NET_F_CTRL_VQ      |    \
>>       1u << VIRTIO_NET_F_CTRL_RX      |    \
>>       1u << VIRTIO_NET_F_CTRL_VLAN      |    \
>> +     1u << VIRTIO_NET_F_CSUM      |    \
>>       1u << VIRTIO_NET_F_MRG_RXBUF      |    \
>>       1ULL << VIRTIO_F_VERSION_1)
>>
>> diff --git a/drivers/net/virtio/virtio_rxtx.c
>> b/drivers/net/virtio/virtio_rxtx.c
>> index eda678a..4ae11e7 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -213,13 +213,14 @@ static inline void
>>  virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>>                 uint16_t needed, int use_indirect, int can_push)
>>  {
>> +    struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
>>      struct vq_desc_extra *dxp;
>>      struct virtqueue *vq = txvq->vq;
>>      struct vring_desc *start_dp;
>>      uint16_t seg_num = cookie->nb_segs;
>>      uint16_t head_idx, idx;
>>      uint16_t head_size = vq->hw->vtnet_hdr_size;
>> -    unsigned long offs;
>> +    struct virtio_net_hdr *hdr;
>>
>>      head_idx = vq->vq_desc_head_idx;
>>      idx = head_idx;
>> @@ -230,10 +231,9 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq,
>> struct rte_mbuf *cookie,
>>      start_dp = vq->vq_ring.desc;
>>
>>      if (can_push) {
>> -        /* put on zero'd transmit header (no offloads) */
>> -        void *hdr = rte_pktmbuf_prepend(cookie, head_size);
>> -
>> -        memset(hdr, 0, head_size);
>> +        /* prepend cannot fail, checked by caller */
>> +        hdr = (struct virtio_net_hdr *)
>> +            rte_pktmbuf_prepend(cookie, head_size);
>>      } else if (use_indirect) {
>>          /* setup tx ring slot to point to indirect
>>           * descriptor list stored in reserved region.
>> @@ -241,14 +241,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq,
>> struct rte_mbuf *cookie,
>>           * the first slot in indirect ring is already preset
>>           * to point to the header in reserved region
>>           */
>> -        struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
>> -
>> -        offs = idx * sizeof(struct virtio_tx_region)
>> -            + offsetof(struct virtio_tx_region, tx_indir);
>> -
>> -        start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
>> +        start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
>> +            RTE_PTR_DIFF(&txr[idx].tx_indir, txr);
>>          start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
>>          start_dp[idx].flags = VRING_DESC_F_INDIRECT;
>> +        hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
>>
>>          /* loop below will fill in rest of the indirect elements */
>>          start_dp = txr[idx].tx_indir;
>> @@ -257,15 +254,40 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq,
>> struct rte_mbuf *cookie,
>>          /* setup first tx ring slot to point to header
>>           * stored in reserved region.
>>           */
>> -        offs = idx * sizeof(struct virtio_tx_region)
>> -            + offsetof(struct virtio_tx_region, tx_hdr);
>> -
>> -        start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
>> +        start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
>> +            RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
>>          start_dp[idx].len   = vq->hw->vtnet_hdr_size;
>>          start_dp[idx].flags = VRING_DESC_F_NEXT;
>> +        hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
>> +
>>          idx = start_dp[idx].next;
>>      }
>>
>> +    /* Checksum Offload */
>> +    switch (cookie->ol_flags & PKT_TX_L4_MASK) {
>> +    case PKT_TX_UDP_CKSUM:
>> +        hdr->csum_start = cookie->l2_len + cookie->l3_len;
>> +        hdr->csum_offset = 6;
>> +        hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>> +        break;
>> +
>> +    case PKT_TX_TCP_CKSUM:
>> +        hdr->csum_start = cookie->l2_len + cookie->l3_len;
>> +        hdr->csum_offset = 16;
>> +        hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>> +        break;
>> +
>> +    default:
>> +        hdr->csum_start = 0;
>> +        hdr->csum_offset = 0;
>> +        hdr->flags = 0;
>> +        break;
>> +    }
>> +
>> +    hdr->gso_type = 0;
>> +    hdr->gso_size = 0;
>> +    hdr->hdr_len = 0;
> 
> In he case we don't use any offload, have you measured the performance
> regression
> with current code when using a dedicated descriptor for the header?
> I haven't tested you series, but I would think it is more than 15% for
> 64 bytes packets based on my trials to use a single descriptor.
> 
> Indeed, without your series, when using a dedicated desc for the header,
> the header is not accessed in the virtio transmit path. It is zeroed at
> init time.
> 
> Could we keep the same behaviour when offloading features aren't
> negotiated?

You're right, it could have a performance impact. I'll try to restore
the initial behavior when offload is disabled.

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
                     ` (11 preceding siblings ...)
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support Olivier Matz
@ 2016-10-11 11:35   ` Yuanhan Liu
  2016-10-11 12:14     ` Olivier MATZ
  12 siblings, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-11 11:35 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi,

Firstly, apologize for so late review. It's been forgotten :(

BTW, please feel free to ping me in future if I made no response 
in one or two weeks!

I haven't reviewed it carefully yet (something I will do tomorrow).
Before that, few quick questions.

Firstly, would you write down some test steps? Honestly, I'm not
quite sure how that works without the TCP/IP stack.

On Mon, Oct 03, 2016 at 11:00:11AM +0200, Olivier Matz wrote:
> This patchset, targetted for 16.11, introduces the support of rx and tx
> offload in virtio pmd.  To achieve this, some new mbuf flags must be
> introduced, as discussed in [1].
> 
> It applies on top of:
> - software packet type [2]
> - testpmd enhancements [3]

I didn't do the search. Have the two got merged?

	--yliu
> 
> The new mbuf checksum flags are backward compatible for current
> applications that assume that unknown_csum = good_cum (since there
> was only a bad_csum flag). But it the patchset is integrated, we
> should consider updating the PMDs to match the new API for 16.11.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
> [2] http://dpdk.org/ml/archives/dev/2016-October/048073.html
> [3] http://dpdk.org/ml/archives/dev/2016-September/046443.html
> 
> changes v1 -> v2
> - change mbuf checksum calculation static inline
> - fix checksum calculation for protocol where csum=0 means no csum
> - move mbuf checksum calculation in librte_net
> - use RTE_MIN() to set max rx/tx queue
> - rebase on top of head
> 
> Olivier Matz (12):
>   virtio: move device initialization in a function
>   virtio: setup and start cq in configure callback
>   virtio: reinitialize the device in configure callback
>   net: add function to calculate a checksum in a mbuf
>   mbuf: add new Rx checksum mbuf flags
>   app/testpmd: fix checksum stats in csum engine
>   mbuf: new flag for LRO
>   app/testpmd: display lro segment size
>   virtio: add Rx checksum offload support
>   virtio: add Tx checksum offload support
>   virtio: add Lro support
>   virtio: add Tso support
> 
>  app/test-pmd/csumonly.c                |   8 +-
>  doc/guides/rel_notes/release_16_11.rst |  16 ++
>  drivers/net/virtio/virtio_ethdev.c     | 182 +++++++++++++---------
>  drivers/net/virtio/virtio_ethdev.h     |  18 +--
>  drivers/net/virtio/virtio_pci.h        |   4 +-
>  drivers/net/virtio/virtio_rxtx.c       | 270 ++++++++++++++++++++++++++++++---
>  drivers/net/virtio/virtqueue.h         |   1 +
>  lib/librte_mbuf/rte_mbuf.c             |  18 ++-
>  lib/librte_mbuf/rte_mbuf.h             |  58 ++++++-
>  lib/librte_net/rte_ip.h                |  60 ++++++++
>  10 files changed, 526 insertions(+), 109 deletions(-)
> 
> Test plan
> =========
> 
> (not fully replayed on v2, but no major change)
> 
> Platform description
> --------------------
> 
>   guest (dpdk)
>   +----------------+
>   |                |
>   |                |
>   |         port0  +-----<---+
>   |       ixgbe /  |         |
>   |       directio |         |
>   |                |         |
>   |    port1       |         ^ flow1
>   +----------------+         | (flow2 is the reverse)
>          |                   |
>          | virtio            |
>          v                   |
>   +----------------+         |
>   |     tap0   /   |         |
>   |1.1.1.1   /     |         |
>   |ns-tap  /       |         |
>   |      /         |         |
>   |    /   ixgbe2  +------>--+
>   |  /    1.1.1.2  |
>   |/      ns-ixgbe |
>   +----------------+
>   host (linux, vhost-net)
> 
> 
> flow1:
>   host -(ixgbe)-> guest -(virtio)-> host
>   1.1.1.2 -> 1.1.1.1
> 
> flow2:
>   host -(virtio)-> guest -(ixgbe)-> host
>   1.1.1.2 -> 1.1.1.1
> 
> Host configuration
> ------------------
> 
> Start qemu with:
> 
> - a ne2k management interface to avoi any conflict with dpdk
> - 2 ixgbe interfaces given to with vm through vfio
> - a virtio net device, connected to a tap interface through vhost-net
> 
>   /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \
>     -smp 3 -serial telnet::40564,server,nowait -serial null \
>     -qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \
>     -device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \
>     -netdev user,id=user.0,hostfwd=tcp::34965-:22 \
>     -device vfio-pci,host=0000:04:00.0 -device vfio-pci,host=0000:04:00.1 \
>     -netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \
>     -device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \
>     -hda "/path/to/ubuntu-14.04-template.qcow2" \
>     -snapshot -vga none -display none
> 
> Move the tap interface in a netns, and configure it:
> 
>   ip netns add ns-tap
>   ip netns exec ns-tap ip l set lo up
>   ip link set tap0 netns ns-tap
>   ip netns exec ns-tap ip l set tap0 down
>   ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0
>   ip netns exec ns-tap ip l set tap0 up
>   ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0
>   ip netns exec ns-tap arp -s 1.1.1.2 02:00:00:00:00:00
>   ip netns exec ns-tap ip a
> 
> Move the ixgbe interface in a netns, and configure it:
> 
>   IXGBE=ixgbe2
>   ip netns add ns-ixgbe
>   ip netns exec ns-ixgbe ip l set lo up
>   ip link set ${IXGBE} netns ns-ixgbe
>   ip netns exec ns-ixgbe ip l set ${IXGBE} down
>   ip netns exec ns-ixgbe ip l set addr 02:00:00:00:00:00 dev ${IXGBE}
>   ip netns exec ns-ixgbe ip l set ${IXGBE} up
>   ip netns exec ns-ixgbe ip a a 1.1.1.2/24 dev ${IXGBE}
>   ip netns exec ns-ixgbe arp -s 1.1.1.1 02:00:00:00:00:01
>   ip netns exec ns-ixgbe ip a
> 
> Guest configuration
> -------------------
> 
> List of pci devices:
> 
>   00:02.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
>   00:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8029(AS) [10ec:8029]
>   00:04.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
>   00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]
> 
> Compile dpdk:
> 
>   cd dpdk.org
>   make config T=x86_64-native-linuxapp-gcc
>   make -j4
> 
> Prepare environment:
> 
>   mkdir -p /mnt/huge
>   mount -t hugetlbfs nodev /mnt/huge
>   echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>   modprobe uio_pci_generic
>   python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:02.0
>   python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:05.0
> 
> Run test
> ========
> 
> The test uses iperf to validate connectivity between the 2 netns of the
> host and trough the guest.
> 
> Iperf is run with:
> 
>   # flow1: host -(ixgbe)-> guest -(virtio)-> host
>   ip netns exec ns-tap iperf -s
>   ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
> 
>   # flow2: host -(virtio)-> guest -(ixgbe)-> host
>   ip netns exec ns-ixgbe iperf -s
>   ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
> 
> The guest runs testpmd with csum forward engine, its configuration
> depends on the test case.
> 
> test1: large packets (lro/tso)
> ------------------------------
> 
> Configuration of testpmd:
> 
>   ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
>     -i --port-topology=chained --disable-hw-vlan-filter \
>     --disable-hw-vlan-strip --enable-rx-cksum --enable-lro \
>     --crc-strip --txqflags=0
> 
>   set fwd csum
>   tso set 1440 0
>   csum set ip hw 0
>   csum set tcp hw 0
>   tso set 1440 1
>   #csum set ip hw 1 # not supported by virtio
>   csum set tcp hw 1
>   start
> 
> Iperf log:
> 
>   root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
>   ------------------------------------------------------------
>   Client connecting to 1.1.1.1, TCP port 5001
>   TCP window size: 85.0 KByte (default)
>   ------------------------------------------------------------
>   [  3] local 1.1.1.2 port 54460 connected with 1.1.1.1 port 5001
>   [ ID] Interval       Transfer     Bandwidth
>   [  3]  0.0-10.0 sec  6.14 GBytes  5.27 Gbits/sec
>   root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
>   ------------------------------------------------------------
>   Client connecting to 1.1.1.2, TCP port 5001
>   TCP window size: 85.0 KByte (default)
>   ------------------------------------------------------------
>   [  3] local 1.1.1.1 port 58312 connected with 1.1.1.2 port 5001
>   [ ID] Interval       Transfer     Bandwidth
>   [  3]  0.0-10.0 sec  6.70 GBytes  5.76 Gbits/sec
> 
> Example of what we see with "set verbose 1" in testpmd:
> 
>   -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
>   port=0, mbuf=0x7f968ad9fdc0, pkt_len=24682, nb_segs=13:
>   rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
>   tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
>   tx: m->tso_segsz=1440
>   tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4
> 
>   -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
>   port=1, mbuf=0x7f968acc9f40, pkt_len=42058, nb_segs=21:
>   rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN PKT_RX_LRO
>   rx: m->lro_segsz=1440
>   tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
>   tx: m->tso_segsz=1440
>   tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4
> 
> test2: hardware checksum only
> -----------------------------
> 
> Configuration of testpmd:
> 
>   ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
>     -i --port-topology=chained --disable-hw-vlan-filter \
>     --disable-hw-vlan-strip --enable-rx-cksum --crc-strip --txqflags=0
> 
>   set fwd csum
>   csum set ip hw 0
>   csum set tcp hw 0
>   csum set tcp hw 1
>   start
> 
> Iperf log:
> 
>   root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
>   ------------------------------------------------------------
>   Client connecting to 1.1.1.1, TCP port 5001
>   TCP window size: 85.0 KByte (default)
>   ------------------------------------------------------------
>   [  3] local 1.1.1.2 port 54462 connected with 1.1.1.1 port 5001
>   [ ID] Interval       Transfer     Bandwidth
>   [  3]  0.0-10.0 sec  4.49 GBytes  3.86 Gbits/sec
>   root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
>   ------------------------------------------------------------
>   Client connecting to 1.1.1.2, TCP port 5001
>   TCP window size: 85.0 KByte (default)
>   ------------------------------------------------------------
>   [  3] local 1.1.1.1 port 58314 connected with 1.1.1.2 port 5001
>   [ ID] Interval       Transfer     Bandwidth
>   [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec
> 
> Example of what we see with "set verbose 1" in testpmd:
> 
>   -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
>   port=0, mbuf=0x7f0adca89b40, pkt_len=1514, nb_segs=1:
>   rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
>   tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
>   tx: flags=PKT_TX_TCP_CKSUM PKT_TX_IPV4
> 
>   -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
>   port=1, mbuf=0x7f0adcb98d80, pkt_len=1514, nb_segs=1:
>   rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN
>   tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
>   tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM PKT_TX_IPV4
> 
> test3: no offload
> -----------------
> 
> Configuration of testpmd:
> 
>   ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
>     -i --port-topology=chained --disable-hw-vlan-filter --disable-hw-vlan-strip
> 
>   set fwd csum
>   start
> 
> Iperf log:
> 
>   root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
>   ------------------------------------------------------------
>   Client connecting to 1.1.1.1, TCP port 5001
>   TCP window size: 85.0 KByte (default)
>   ------------------------------------------------------------
>   [  3] local 1.1.1.2 port 54466 connected with 1.1.1.1 port 5001
>   [ ID] Interval       Transfer     Bandwidth
>   [  3]  0.0-10.0 sec  4.29 GBytes  3.68 Gbits/sec
>   root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
>   ------------------------------------------------------------
>   Client connecting to 1.1.1.2, TCP port 5001
>   TCP window size: 85.0 KByte (default)
>   ------------------------------------------------------------
>   [  3] local 1.1.1.1 port 58316 connected with 1.1.1.2 port 5001
>   [ ID] Interval       Transfer     Bandwidth
>   [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec
> 
> Example of what we see with "set verbose 1" in testpmd:
> 
>   -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
>   port=0, mbuf=0x7faf38b3e700, pkt_len=1514, nb_segs=1:
>   rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
>   tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4
> 
>   -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
>   port=1, mbuf=0x7faf38b71500, pkt_len=1514, nb_segs=1:
>   rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
>   tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4
> 
> -- 
> 2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support
  2016-10-11 11:35   ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Yuanhan Liu
@ 2016-10-11 12:14     ` Olivier MATZ
  2016-10-11 15:37       ` Yuanhan Liu
  0 siblings, 1 reply; 97+ messages in thread
From: Olivier MATZ @ 2016-10-11 12:14 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Yuanhan,

On 10/11/2016 01:35 PM, Yuanhan Liu wrote:
> Hi,
>
> Firstly, apologize for so late review. It's been forgotten :(
>
> BTW, please feel free to ping me in future if I made no response
> in one or two weeks!
>
> I haven't reviewed it carefully yet (something I will do tomorrow).
> Before that, few quick questions.
>
> Firstly, would you write down some test steps? Honestly, I'm not
> quite sure how that works without the TCP/IP stack.

Not sure I'm getting your question.
The test plan described in the cover letter works without any dpdk 
tcp/ip stack. It uses testpmd, which is able to bridge packets and ask 
for TCP segmentation.


> On Mon, Oct 03, 2016 at 11:00:11AM +0200, Olivier Matz wrote:
>> This patchset, targetted for 16.11, introduces the support of rx and tx
>> offload in virtio pmd.  To achieve this, some new mbuf flags must be
>> introduced, as discussed in [1].
>>
>> It applies on top of:
>> - software packet type [2]
>> - testpmd enhancements [3]
>
> I didn't do the search. Have the two got merged?

As of now, it's not merged yet. I think Thomas is on it.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function Olivier Matz
@ 2016-10-11 12:30     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 12:30 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Olivier,

On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Move all code related to device initialization in a new function
> virtio_init_device().
>
> This commit brings no functional change, it prepares the next commits
> that will add the offload support. For that, it will be needed to
> reinitialize the device from ethdev->configure(), using this new
> function.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 99 ++++++++++++++++++++++----------------
>  1 file changed, 58 insertions(+), 41 deletions(-)

Makes sense, feel free to add my:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/12] virtio: setup and start cq in configure callback
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 02/12] virtio: setup and start cq in configure callback Olivier Matz
@ 2016-10-11 12:47     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 12:47 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Move the configuration of control queue in the configure callback.
> This is needed by next commit, which introduces the reinitialization
> of the device in the configure callback to change the feature flags.
> Therefore, the control queue will have to be restarted at the same
> place.
>
> As virtio_dev_cq_queue_setup() is called from a place where
> config->max_virtqueue_pairs is not available, we need to store this in
> the private structure. It replaces max_rx_queues and max_tx_queues which
> have the same value. The log showing the value of max_rx_queues and
> max_tx_queues is also removed since config->max_virtqueue_pairs is
> already displayed above.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 43 +++++++++++++++++++-------------------
>  drivers/net/virtio/virtio_ethdev.h |  4 ++--
>  drivers/net/virtio/virtio_pci.h    |  3 +--
>  3 files changed, 24 insertions(+), 26 deletions(-)

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device " Olivier Matz
@ 2016-10-11 13:13     ` Maxime Coquelin
  2016-10-12 14:41     ` Yuanhan Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 13:13 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Add the ability to reset the virtio device in the configure callback
> if the features flag changed since previous reset. This will be possible
> with the introduction of offload support in next commits.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 26 +++++++++++++++++++-------
>  drivers/net/virtio/virtio_pci.h    |  1 +
>  2 files changed, 20 insertions(+), 7 deletions(-)

Looks good to me.
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
@ 2016-10-11 13:25     ` Maxime Coquelin
  2016-10-11 13:33       ` Olivier MATZ
  0 siblings, 1 reply; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 13:25 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> This function can be used to calculate the checksum of data embedded in
> mbuf, that can be composed of several segments.
>
> This function will be used by the virtio pmd in next commits to calculate
> the checksum in software in case the protocol is not recognized.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  doc/guides/rel_notes/release_16_11.rst |  5 +++
>  lib/librte_net/rte_ip.h                | 60 ++++++++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
> index 3d3c417..f29b44c 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -55,6 +55,11 @@ New Features
>    Added two new functions ``rte_get_rx_ol_flag_list()`` and
>    ``rte_get_tx_ol_flag_list()`` to dump offload flags as a string.
>
> +* **Added a functions to calculate the checksum of data in a mbuf.**
> +
> +  Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
> +  data embedded in an mbuf chain.
> +
>  Resolved Issues
>  ---------------
>
> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
> index 5b7554a..8499356 100644
> --- a/lib/librte_net/rte_ip.h
> +++ b/lib/librte_net/rte_ip.h
> @@ -230,6 +230,66 @@ rte_raw_cksum(const void *buf, size_t len)
>  }
>
>  /**
> + * Compute the raw (non complemented) checksum of a packet.
> + *
> + * @param m
> + *   The pointer to the mbuf.
> + * @param off
> + *   The offset in bytes to start the checksum.
> + * @param len
> + *   The length in bytes of the data to ckecksum.
> + */
> +static inline uint16_t
> +rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len)
> +{
> +	const struct rte_mbuf *seg;
> +	const char *buf;
> +	uint32_t sum, tmp;
> +	uint32_t seglen, done;
> +
> +	/* easy case: all data in the first segment */
> +	if (off + len <= rte_pktmbuf_data_len(m))
> +		return rte_raw_cksum(rte_pktmbuf_mtod_offset(m,
> +				const char *, off), len);
> +
> +	if (off + len > rte_pktmbuf_pkt_len(m))
unlikely?
> +		return 0; /* invalid params, return a dummy value */
Couldn't be better to return an error, so that the caller has a chance
to see it is passing wrong arguments?
The csum would be passed as an arg.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf
  2016-10-11 13:25     ` Maxime Coquelin
@ 2016-10-11 13:33       ` Olivier MATZ
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier MATZ @ 2016-10-11 13:33 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hi Maxime,

On 10/11/2016 03:25 PM, Maxime Coquelin wrote:
>>  /**
>> + * Compute the raw (non complemented) checksum of a packet.
>> + *
>> + * @param m
>> + *   The pointer to the mbuf.
>> + * @param off
>> + *   The offset in bytes to start the checksum.
>> + * @param len
>> + *   The length in bytes of the data to ckecksum.
>> + */
>> +static inline uint16_t
>> +rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len)
>> +{
>> +    const struct rte_mbuf *seg;
>> +    const char *buf;
>> +    uint32_t sum, tmp;
>> +    uint32_t seglen, done;
>> +
>> +    /* easy case: all data in the first segment */
>> +    if (off + len <= rte_pktmbuf_data_len(m))
>> +        return rte_raw_cksum(rte_pktmbuf_mtod_offset(m,
>> +                const char *, off), len);
>> +
>> +    if (off + len > rte_pktmbuf_pkt_len(m))
> unlikely?

Yes, will add it.

>> +        return 0; /* invalid params, return a dummy value */
> Couldn't be better to return an error, so that the caller has a chance
> to see it is passing wrong arguments?
> The csum would be passed as an arg.

Looks much better indeed. I'll change it for next revision.


Thanks,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/12] mbuf: add new Rx checksum mbuf flags
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
@ 2016-10-11 13:43     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 13:43 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Following discussions in [1] and [2], introduce a new bit to
> describe the Rx checksum status in mbuf.
>
> Before this patch, only one flag was available:
>   PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.
>
> And same for L3:
>   PKT_RX_IP_CKSUM_BAD: IP cksum of RX pkt. is not OK.
>
> This had 2 issues:
> - it was not possible to differentiate "checksum good" from
>   "checksum unknown".
> - it was not possible for a virtual driver to say "the checksum
>   in packet may be wrong, but data integrity is valid".
>
> This patch tries to solve this issue by having 4 states (2 bits)
> for the IP and L4 Rx checksums. New values are:
>
>  - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
>    -> the application should verify the checksum by sw
>  - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
>    -> the application can drop the packet without additional check
>  - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
>    -> the application can accept the packet without verifying the
>       checksum by sw
>  - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
>    data, but the integrity of the L4 data is verified.
>    -> the application can process the packet but must not verify the
>       checksum by sw. It has to take care to recalculate the cksum
>       if the packet is transmitted (either by sw or using tx offload)
>
>   And same for L3 (replace L4 by IP in description above).
>
> This commit tries to be compatible with existing applications that
> only check the existing flag (CKSUM_BAD).
>
> [1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
> [2] http://dpdk.org/ml/archives/dev/2016-June/040007.html
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  doc/guides/rel_notes/release_16_11.rst |  6 ++++
>  lib/librte_mbuf/rte_mbuf.c             | 16 +++++++++--
>  lib/librte_mbuf/rte_mbuf.h             | 51 ++++++++++++++++++++++++++++++++--
>  3 files changed, 68 insertions(+), 5 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
> index f29b44c..2aff84c 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -60,6 +60,12 @@ New Features
>    Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
>    data embedded in an mbuf chain.
>
> +* **Added new Rx checksum mbuf flags.**
> +
> +  Added new Rx checksum flags in mbufs to described more states: unknown,
s/described/describe/

With this typo fixed, it looks good to me:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 06/12] app/testpmd: fix checksum stats in csum engine
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
@ 2016-10-11 13:46     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 13:46 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> ---
>  app/test-pmd/csumonly.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index d5eb260..8c88ee8 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -679,8 +679,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>  		rx_ol_flags = m->ol_flags;
>
>  		/* Update the L3/L4 checksum error packet statistics */
> -		rx_bad_ip_csum += ((rx_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
> -		rx_bad_l4_csum += ((rx_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
> +		if ((rx_ol_flags & PKT_RX_IP_CKSUM_MASK) == PKT_RX_IP_CKSUM_BAD)
> +			rx_bad_ip_csum += 1;
> +		if ((rx_ol_flags & PKT_RX_L4_CKSUM_MASK) == PKT_RX_L4_CKSUM_BAD)
> +			rx_bad_l4_csum += 1;
>
>  		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
>  		 * and inner headers */
>

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/12] mbuf: new flag for LRO
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 07/12] mbuf: new flag for LRO Olivier Matz
@ 2016-10-11 13:48     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 13:48 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> When receiving coalesced packets in virtio, the original size of the
> segments is provided. This is a useful information because it allows to
> resegment with the same size.
>
> Add a RX new flag in mbuf, that can be set when packets are coalesced by
> a hardware or virtual driver when the m->tso_segsz field is valid and is
> set to the segment size of original packets.
>
> This flag is used in next commits in the virtio pmd.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  doc/guides/rel_notes/release_16_11.rst | 5 +++++
>  lib/librte_mbuf/rte_mbuf.c             | 2 ++
>  lib/librte_mbuf/rte_mbuf.h             | 7 +++++++
>  3 files changed, 14 insertions(+)

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/12] app/testpmd: display lro segment size
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 08/12] app/testpmd: display lro segment size Olivier Matz
@ 2016-10-11 13:49     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 13:49 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> In csumonly engine, display the value of LRO segment if the
> LRO flag is set.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  app/test-pmd/csumonly.c | 2 ++
>  1 file changed, 2 insertions(+)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support Olivier Matz
  2016-10-03 12:51     ` Maxime Coquelin
@ 2016-10-11 14:04     ` Maxime Coquelin
  2016-10-11 14:29       ` Olivier MATZ
  1 sibling, 1 reply; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 14:04 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 14 ++++----
>  drivers/net/virtio/virtio_ethdev.h |  2 +-
>  drivers/net/virtio/virtio_rxtx.c   | 69 ++++++++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h     |  1 +
>  4 files changed, 78 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
> index fa56032..43cb096 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>  	eth_dev->data->dev_flags = dev_flags;
>
>  	/* reset device and negotiate default features */
> -	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
> +	ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
>  	if (ret < 0)
>  		return ret;
>
> @@ -1351,13 +1351,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>  	int ret;
>
>  	PMD_INIT_LOG(DEBUG, "configure");
> +	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
> +	if (rxmode->hw_ip_checksum)
> +		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
>
> -	if (rxmode->hw_ip_checksum) {
> -		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
> -		return -EINVAL;
> -	}
> -
> -	req_features = VIRTIO_PMD_GUEST_FEATURES;
>  	/* if request features changed, reinit the device */
>  	if (req_features != hw->req_guest_features) {
>  		ret = virtio_init_device(dev, req_features);
> @@ -1578,6 +1575,9 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  	dev_info->default_txconf = (struct rte_eth_txconf) {
>  		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
>  	};
> +	dev_info->rx_offload_capa =
> +		DEV_RX_OFFLOAD_TCP_CKSUM |
> +		DEV_RX_OFFLOAD_UDP_CKSUM;
>  }
>
>  /*
> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> index 5d5e788..2fc9218 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -54,7 +54,7 @@
>  #define VIRTIO_MAX_RX_PKTLEN  9728
>
>  /* Features desired/implemented by this driver. */
> -#define VIRTIO_PMD_GUEST_FEATURES		\
> +#define VIRTIO_PMD_DEFAULT_GUEST_FEATURES	\
>  	(1u << VIRTIO_NET_F_MAC		  |	\
>  	 1u << VIRTIO_NET_F_STATUS	  |	\
>  	 1u << VIRTIO_NET_F_MQ		  |	\
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 724517e..eda678a 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -50,6 +50,7 @@
>  #include <rte_string_fns.h>
>  #include <rte_errno.h>
>  #include <rte_byteorder.h>
> +#include <rte_net.h>
>
>  #include "virtio_logs.h"
>  #include "virtio_ethdev.h"
> @@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
>  	}
>  }
>
> +/* Optionally fill offload information in structure */
> +static int
> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
> +{
> +	struct rte_net_hdr_lens hdr_lens;
> +	uint32_t hdrlen, ptype;
> +	int l4_supported = 0;
> +
> +	/* nothing to do */
> +	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
> +		return 0;
> +
> +	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
> +
> +	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
> +	m->packet_type = ptype;
> +	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
> +	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
> +		l4_supported = 1;
> +
> +	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> +		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
> +		if (hdr->csum_start <= hdrlen && l4_supported) {
> +			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
> +		} else {
> +			/* Unknown proto or tunnel, do sw cksum. We can assume
> +			 * the cksum field is in the first segment since the
> +			 * buffers we provided to the host are large enough.
> +			 * In case of SCTP, this will be wrong since it's a CRC
> +			 * but there's nothing we can do.
> +			 */
> +			uint16_t csum, off;
> +
> +			csum = rte_raw_cksum_mbuf(m, hdr->csum_start,
> +				rte_pktmbuf_pkt_len(m) - hdr->csum_start);
> +			if (csum != 0xffff)
Why don't we do the 1-complement if 0xffff?
> +				csum = ~csum;
> +			off = hdr->csum_offset + hdr->csum_start;
> +			if (rte_pktmbuf_data_len(m) >= off + 1)
> +				*rte_pktmbuf_mtod_offset(m, uint16_t *,
> +					off) = csum;
> +		}
> +	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
> +		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
> +	}
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 11/12] virtio: add Lro support
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 11/12] virtio: add Lro support Olivier Matz
@ 2016-10-11 14:21     ` Maxime Coquelin
  0 siblings, 0 replies; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 14:21 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/03/2016 11:00 AM, Olivier Matz wrote:
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c |  7 ++++++-
>  drivers/net/virtio/virtio_ethdev.h |  9 ---------
>  drivers/net/virtio/virtio_rxtx.c   | 21 +++++++++++++++++++++
>  3 files changed, 27 insertions(+), 10 deletions(-)

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-11 14:04     ` Maxime Coquelin
@ 2016-10-11 14:29       ` Olivier MATZ
  2016-10-11 14:36         ` Maxime Coquelin
  0 siblings, 1 reply; 97+ messages in thread
From: Olivier MATZ @ 2016-10-11 14:29 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/11/2016 04:04 PM, Maxime Coquelin wrote:
>> +/* Optionally fill offload information in structure */
>> +static int
>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>> +{
>> +    struct rte_net_hdr_lens hdr_lens;
>> +    uint32_t hdrlen, ptype;
>> +    int l4_supported = 0;
>> +
>> +    /* nothing to do */
>> +    if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>> +        return 0;
>> +
>> +    m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
>> +
>> +    ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
>> +    m->packet_type = ptype;
>> +    if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
>> +        (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
>> +        (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
>> +        l4_supported = 1;
>> +
>> +    if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
>> +        hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
>> +        if (hdr->csum_start <= hdrlen && l4_supported) {
>> +            m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
>> +        } else {
>> +            /* Unknown proto or tunnel, do sw cksum. We can assume
>> +             * the cksum field is in the first segment since the
>> +             * buffers we provided to the host are large enough.
>> +             * In case of SCTP, this will be wrong since it's a CRC
>> +             * but there's nothing we can do.
>> +             */
>> +            uint16_t csum, off;
>> +
>> +            csum = rte_raw_cksum_mbuf(m, hdr->csum_start,
>> +                rte_pktmbuf_pkt_len(m) - hdr->csum_start);
>> +            if (csum != 0xffff)
> Why don't we do the 1-complement if 0xffff?

This was modified after a comment from Xiao.

In checksum arithmetic (ones' complement), there are 2 equivalent ways 
to say the checksum is 0: 0xffff (0-), and 0x0000 (0+).
Some protocols like UDP use this to differentiate between 0xffff (packet 
checksum is 0) and 0x0000 (packet checksum is not calculated).

Here, we want to avoid to set a checksum to 0, in case it would mean no 
checksum for UDP packets. Instead, it is set to 0xffff, which is also a 
valid checksum for this packet.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-11 14:29       ` Olivier MATZ
@ 2016-10-11 14:36         ` Maxime Coquelin
  2016-10-11 14:49           ` Olivier MATZ
  0 siblings, 1 reply; 97+ messages in thread
From: Maxime Coquelin @ 2016-10-11 14:36 UTC (permalink / raw)
  To: Olivier MATZ, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/11/2016 04:29 PM, Olivier MATZ wrote:
>
>
> On 10/11/2016 04:04 PM, Maxime Coquelin wrote:
>>> +/* Optionally fill offload information in structure */
>>> +static int
>>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>>> +{
>>> +    struct rte_net_hdr_lens hdr_lens;
>>> +    uint32_t hdrlen, ptype;
>>> +    int l4_supported = 0;
>>> +
>>> +    /* nothing to do */
>>> +    if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>>> +        return 0;
>>> +
>>> +    m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
>>> +
>>> +    ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
>>> +    m->packet_type = ptype;
>>> +    if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
>>> +        (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
>>> +        (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
>>> +        l4_supported = 1;
>>> +
>>> +    if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
>>> +        hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
>>> +        if (hdr->csum_start <= hdrlen && l4_supported) {
>>> +            m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
>>> +        } else {
>>> +            /* Unknown proto or tunnel, do sw cksum. We can assume
>>> +             * the cksum field is in the first segment since the
>>> +             * buffers we provided to the host are large enough.
>>> +             * In case of SCTP, this will be wrong since it's a CRC
>>> +             * but there's nothing we can do.
>>> +             */
>>> +            uint16_t csum, off;
>>> +
>>> +            csum = rte_raw_cksum_mbuf(m, hdr->csum_start,
>>> +                rte_pktmbuf_pkt_len(m) - hdr->csum_start);
>>> +            if (csum != 0xffff)
>> Why don't we do the 1-complement if 0xffff?
>
> This was modified after a comment from Xiao.
>
> In checksum arithmetic (ones' complement), there are 2 equivalent ways
> to say the checksum is 0: 0xffff (0-), and 0x0000 (0+).
> Some protocols like UDP use this to differentiate between 0xffff (packet
> checksum is 0) and 0x0000 (packet checksum is not calculated).
>
> Here, we want to avoid to set a checksum to 0, in case it would mean no
> checksum for UDP packets. Instead, it is set to 0xffff, which is also a
> valid checksum for this packet.

Ha ok, I wasn't aware of this.
Thanks for the explanation!

Maybe not a big deal, but we could add likely around the test?

Maxime
>
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-11 14:36         ` Maxime Coquelin
@ 2016-10-11 14:49           ` Olivier MATZ
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier MATZ @ 2016-10-11 14:49 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/11/2016 04:36 PM, Maxime Coquelin wrote:
>
>
> On 10/11/2016 04:29 PM, Olivier MATZ wrote:
>>
>>
>> On 10/11/2016 04:04 PM, Maxime Coquelin wrote:
>>>> +/* Optionally fill offload information in structure */
>>>> +static int
>>>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>>>> +{
>>>> +    struct rte_net_hdr_lens hdr_lens;
>>>> +    uint32_t hdrlen, ptype;
>>>> +    int l4_supported = 0;
>>>> +
>>>> +    /* nothing to do */
>>>> +    if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>>>> +        return 0;
>>>> +
>>>> +    m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
>>>> +
>>>> +    ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
>>>> +    m->packet_type = ptype;
>>>> +    if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
>>>> +        (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
>>>> +        (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
>>>> +        l4_supported = 1;
>>>> +
>>>> +    if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
>>>> +        hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
>>>> +        if (hdr->csum_start <= hdrlen && l4_supported) {
>>>> +            m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
>>>> +        } else {
>>>> +            /* Unknown proto or tunnel, do sw cksum. We can assume
>>>> +             * the cksum field is in the first segment since the
>>>> +             * buffers we provided to the host are large enough.
>>>> +             * In case of SCTP, this will be wrong since it's a CRC
>>>> +             * but there's nothing we can do.
>>>> +             */
>>>> +            uint16_t csum, off;
>>>> +
>>>> +            csum = rte_raw_cksum_mbuf(m, hdr->csum_start,
>>>> +                rte_pktmbuf_pkt_len(m) - hdr->csum_start);
>>>> +            if (csum != 0xffff)
>>> Why don't we do the 1-complement if 0xffff?
>>
>> This was modified after a comment from Xiao.
>>
>> In checksum arithmetic (ones' complement), there are 2 equivalent ways
>> to say the checksum is 0: 0xffff (0-), and 0x0000 (0+).
>> Some protocols like UDP use this to differentiate between 0xffff (packet
>> checksum is 0) and 0x0000 (packet checksum is not calculated).
>>
>> Here, we want to avoid to set a checksum to 0, in case it would mean no
>> checksum for UDP packets. Instead, it is set to 0xffff, which is also a
>> valid checksum for this packet.
>
> Ha ok, I wasn't aware of this.
> Thanks for the explanation!
>
> Maybe not a big deal, but we could add likely around the test?

Yep, good idea.

Thanks!
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support
  2016-10-11 12:14     ` Olivier MATZ
@ 2016-10-11 15:37       ` Yuanhan Liu
  0 siblings, 0 replies; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-11 15:37 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Tue, Oct 11, 2016 at 02:14:10PM +0200, Olivier MATZ wrote:
> Hi Yuanhan,
> 
> On 10/11/2016 01:35 PM, Yuanhan Liu wrote:
> >Hi,
> >
> >Firstly, apologize for so late review. It's been forgotten :(
> >
> >BTW, please feel free to ping me in future if I made no response
> >in one or two weeks!
> >
> >I haven't reviewed it carefully yet (something I will do tomorrow).
> >Before that, few quick questions.
> >
> >Firstly, would you write down some test steps? Honestly, I'm not
> >quite sure how that works without the TCP/IP stack.
> 
> Not sure I'm getting your question.
> The test plan described in the cover letter works without any dpdk tcp/ip
> stack. It uses testpmd, which is able to bridge packets and ask for TCP
> segmentation.

Oops, I thought the patch list is the end of the cover letter :(
It looks like a great doc after a first glimpse.

I will look at your code tomorrow.

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-05 13:27         ` Maxime Coquelin
  2016-10-05 13:30           ` Olivier Matz
@ 2016-10-12 13:02           ` Yuanhan Liu
  2016-10-12 15:55             ` Olivier MATZ
  1 sibling, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-12 13:02 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Olivier Matz, dev, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil,
	stephen, dprovan, xiao.w.wang

On Wed, Oct 05, 2016 at 03:27:47PM +0200, Maxime Coquelin wrote:
> >                /* Update offload features */
> >-               if (virtio_rx_offload(rxm, hdr) < 0) {
> >+               if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
> s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/

There is a helper function for that: vtpci_with_feature.

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device " Olivier Matz
  2016-10-11 13:13     ` Maxime Coquelin
@ 2016-10-12 14:41     ` Yuanhan Liu
  2016-10-12 16:01       ` Olivier MATZ
  1 sibling, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-12 14:41 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
> @@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>  {
>  	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
>  	struct virtio_hw *hw = dev->data->dev_private;
> +	uint64_t req_features;
>  	int ret;
>  
>  	PMD_INIT_LOG(DEBUG, "configure");
> @@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>  		return -EINVAL;
>  	}
>  
> +	req_features = VIRTIO_PMD_GUEST_FEATURES;
> +	/* if request features changed, reinit the device */
> +	if (req_features != hw->req_guest_features) {
> +		ret = virtio_init_device(dev, req_features);
> +		if (ret < 0)
> +			return ret;
> +	}

Why do you have to reset virtio here? This doesn't make too much sense
to me.

IIUC, you want to make sure those TSO related features being unset at
init time, and enable it (by doing reset) when it's asked to be enabled
(by rte_eth_dev_configure)?

Why not always setting those features? We could do the actual offloads
when:

- those features have been negoiated

- they are enabled through rte_eth_dev_configure

With that, I think we could avoid the reset here?

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support
  2016-10-12 13:02           ` Yuanhan Liu
@ 2016-10-12 15:55             ` Olivier MATZ
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier MATZ @ 2016-10-12 15:55 UTC (permalink / raw)
  To: Yuanhan Liu, Maxime Coquelin
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/12/2016 03:02 PM, Yuanhan Liu wrote:
> On Wed, Oct 05, 2016 at 03:27:47PM +0200, Maxime Coquelin wrote:
>>>                 /* Update offload features */
>>> -               if (virtio_rx_offload(rxm, hdr) < 0) {
>>> +               if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
>> s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/
>
> There is a helper function for that: vtpci_with_feature.

Ok, will use it.

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
  2016-10-12 14:41     ` Yuanhan Liu
@ 2016-10-12 16:01       ` Olivier MATZ
  2016-10-13  7:54         ` Yuanhan Liu
  0 siblings, 1 reply; 97+ messages in thread
From: Olivier MATZ @ 2016-10-12 16:01 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

Hello Yuanhan,

On 10/12/2016 04:41 PM, Yuanhan Liu wrote:
> On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
>> @@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>>   {
>>   	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
>>   	struct virtio_hw *hw = dev->data->dev_private;
>> +	uint64_t req_features;
>>   	int ret;
>>
>>   	PMD_INIT_LOG(DEBUG, "configure");
>> @@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>>   		return -EINVAL;
>>   	}
>>
>> +	req_features = VIRTIO_PMD_GUEST_FEATURES;
>> +	/* if request features changed, reinit the device */
>> +	if (req_features != hw->req_guest_features) {
>> +		ret = virtio_init_device(dev, req_features);
>> +		if (ret < 0)
>> +			return ret;
>> +	}
>
> Why do you have to reset virtio here? This doesn't make too much sense
> to me.
>
> IIUC, you want to make sure those TSO related features being unset at
> init time, and enable it (by doing reset) when it's asked to be enabled
> (by rte_eth_dev_configure)?
>
> Why not always setting those features? We could do the actual offloads
> when:
>
> - those features have been negoiated
>
> - they are enabled through rte_eth_dev_configure
>
> With that, I think we could avoid the reset here?

It would work for TX, since you decide to use or not the feature. But I 
think this won't work for RX: if you negociate LRO at init, the host may 
send you large packets, even if LRO is disabled in dev_configure.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
  2016-10-12 16:01       ` Olivier MATZ
@ 2016-10-13  7:54         ` Yuanhan Liu
  2016-10-13 13:57           ` Olivier MATZ
  0 siblings, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13  7:54 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Wed, Oct 12, 2016 at 06:01:25PM +0200, Olivier MATZ wrote:
> Hello Yuanhan,
> 
> On 10/12/2016 04:41 PM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
> >>@@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>  {
> >>  	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
> >>  	struct virtio_hw *hw = dev->data->dev_private;
> >>+	uint64_t req_features;
> >>  	int ret;
> >>
> >>  	PMD_INIT_LOG(DEBUG, "configure");
> >>@@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>  		return -EINVAL;
> >>  	}
> >>
> >>+	req_features = VIRTIO_PMD_GUEST_FEATURES;
> >>+	/* if request features changed, reinit the device */
> >>+	if (req_features != hw->req_guest_features) {
> >>+		ret = virtio_init_device(dev, req_features);
> >>+		if (ret < 0)
> >>+			return ret;
> >>+	}
> >
> >Why do you have to reset virtio here? This doesn't make too much sense
> >to me.
> >
> >IIUC, you want to make sure those TSO related features being unset at
> >init time, and enable it (by doing reset) when it's asked to be enabled
> >(by rte_eth_dev_configure)?
> >
> >Why not always setting those features? We could do the actual offloads
> >when:
> >
> >- those features have been negoiated
> >
> >- they are enabled through rte_eth_dev_configure
> >
> >With that, I think we could avoid the reset here?
> 
> It would work for TX, since you decide to use or not the feature. But I
> think this won't work for RX: if you negociate LRO at init, the host may
> send you large packets, even if LRO is disabled in dev_configure.

I see. Thanks.

Besides, I think you should return error when LRO is not negoiated
after the reset (say, when it's disabled through qemu command line)?

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support Olivier Matz
@ 2016-10-13  8:18     ` Yuanhan Liu
  2016-10-13 14:02       ` Olivier MATZ
  2016-10-13 23:33       ` Stephen Hemminger
  0 siblings, 2 replies; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13  8:18 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> +	/* common case: header is not fragmented */
> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> +			m->l4_len)) {
...
> +		/* replace it in the packet */
> +		th->cksum = new_cksum;
> +	} else {
...
> +		/* replace it in the packet */
> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> +			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> +			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> +	}

The tcp header will always be in the mbuf, right? Otherwise, you can't
update the cksum field here. What's the point of introducing the "else
clause" then?

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support
  2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 10/12] virtio: add Tx " Olivier Matz
  2016-10-07  7:25     ` Maxime Coquelin
@ 2016-10-13  8:38     ` Yuanhan Liu
  2016-10-13 13:58       ` Olivier MATZ
  1 sibling, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13  8:38 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Mon, Oct 03, 2016 at 11:00:21AM +0200, Olivier Matz wrote:
> +	/* Checksum Offload */
> +	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
> +	case PKT_TX_UDP_CKSUM:
> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
> +		hdr->csum_offset = 6;
> +		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +		break;
> +
> +	case PKT_TX_TCP_CKSUM:
> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
> +		hdr->csum_offset = 16;

I would suggest to use "offsetof(...)" here, instead of some magic
number like 16.

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device in configure callback
  2016-10-13  7:54         ` Yuanhan Liu
@ 2016-10-13 13:57           ` Olivier MATZ
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier MATZ @ 2016-10-13 13:57 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/13/2016 09:54 AM, Yuanhan Liu wrote:
> On Wed, Oct 12, 2016 at 06:01:25PM +0200, Olivier MATZ wrote:
>> Hello Yuanhan,
>>
>> On 10/12/2016 04:41 PM, Yuanhan Liu wrote:
>>> On Mon, Oct 03, 2016 at 11:00:14AM +0200, Olivier Matz wrote:
>>>> @@ -1344,6 +1347,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>>>>   {
>>>>   	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
>>>>   	struct virtio_hw *hw = dev->data->dev_private;
>>>> +	uint64_t req_features;
>>>>   	int ret;
>>>>
>>>>   	PMD_INIT_LOG(DEBUG, "configure");
>>>> @@ -1353,6 +1357,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
>>>>   		return -EINVAL;
>>>>   	}
>>>>
>>>> +	req_features = VIRTIO_PMD_GUEST_FEATURES;
>>>> +	/* if request features changed, reinit the device */
>>>> +	if (req_features != hw->req_guest_features) {
>>>> +		ret = virtio_init_device(dev, req_features);
>>>> +		if (ret < 0)
>>>> +			return ret;
>>>> +	}
>>>
>>> Why do you have to reset virtio here? This doesn't make too much sense
>>> to me.
>>>
>>> IIUC, you want to make sure those TSO related features being unset at
>>> init time, and enable it (by doing reset) when it's asked to be enabled
>>> (by rte_eth_dev_configure)?
>>>
>>> Why not always setting those features? We could do the actual offloads
>>> when:
>>>
>>> - those features have been negoiated
>>>
>>> - they are enabled through rte_eth_dev_configure
>>>
>>> With that, I think we could avoid the reset here?
>>
>> It would work for TX, since you decide to use or not the feature. But I
>> think this won't work for RX: if you negociate LRO at init, the host may
>> send you large packets, even if LRO is disabled in dev_configure.
>
> I see. Thanks.
>
> Besides, I think you should return error when LRO is not negoiated
> after the reset (say, when it's disabled through qemu command line)?

Good idea, I now return an error if offload cannot be negotiated.

Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/12] virtio: add Tx checksum offload support
  2016-10-13  8:38     ` Yuanhan Liu
@ 2016-10-13 13:58       ` Olivier MATZ
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier MATZ @ 2016-10-13 13:58 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/13/2016 10:38 AM, Yuanhan Liu wrote:
> On Mon, Oct 03, 2016 at 11:00:21AM +0200, Olivier Matz wrote:
>> +	/* Checksum Offload */
>> +	switch (cookie->ol_flags & PKT_TX_L4_MASK) {
>> +	case PKT_TX_UDP_CKSUM:
>> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
>> +		hdr->csum_offset = 6;
>> +		hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>> +		break;
>> +
>> +	case PKT_TX_TCP_CKSUM:
>> +		hdr->csum_start = cookie->l2_len + cookie->l3_len;
>> +		hdr->csum_offset = 16;
>
> I would suggest to use "offsetof(...)" here, instead of some magic
> number like 16.

Will do, it's actually clearer.

Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13  8:18     ` Yuanhan Liu
@ 2016-10-13 14:02       ` Olivier MATZ
  2016-10-13 14:16         ` Yuanhan Liu
  2016-10-13 23:33       ` Stephen Hemminger
  1 sibling, 1 reply; 97+ messages in thread
From: Olivier MATZ @ 2016-10-13 14:02 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>> +/* When doing TSO, the IP length is not included in the pseudo header
>> + * checksum of the packet given to the PMD, but for virtio it is
>> + * expected.
>> + */
>> +static void
>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>> +{
>> +	/* common case: header is not fragmented */
>> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>> +			m->l4_len)) {
> ...
>> +		/* replace it in the packet */
>> +		th->cksum = new_cksum;
>> +	} else {
> ...
>> +		/* replace it in the packet */
>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> +			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> +			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>> +	}
>
> The tcp header will always be in the mbuf, right? Otherwise, you can't
> update the cksum field here. What's the point of introducing the "else
> clause" then?

Sorry, I don't see the problem you're pointing out here.

What I want to solve here is to support the cases where the mbuf is 
segmented in the middle of the network header (which is probably a rare 
case).

In the "else" part, I only access the mbuf byte by byte using the 
rte_pktmbuf_mtod_offset() accessor. An alternative would have been to 
copy the header in a linear buffer, fix the checksum, then copy it again 
in the packet, but there is no mbuf helpers to do these copies for now.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 00/12] net/virtio: add offload support
  2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
                   ` (12 preceding siblings ...)
  2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
@ 2016-10-13 14:15 ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 01/12] net/virtio: move device initialization in a function Olivier Matz
                     ` (11 more replies)
  13 siblings, 12 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:15 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

This patchset, targetted for 16.11, introduces the support of rx and tx
offload in virtio pmd.  To achieve this, some new mbuf flags must be
introduced, as discussed in [1].

It applies on master + a patch fixing the testpmd csum engine:
http://dpdk.org/dev/patchwork/patch/16538/

The new mbuf checksum flags are backward compatible for current
applications that assume that unknown_csum = good_cum (since there
was only a bad_csum flag). But it the patchset is integrated, we
should consider updating the PMDs to match the new API for 16.11.

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html

changes v2 -> v3

- fix typo in release note
- add unlikely() in cksum calculation error case
- add likely() in virtio rx function when cksum != 0xffff
- return an error code instead of the cksum in rte_raw_cksum_mbuf()
- do not access to the virtio header if no offload is negotiated (rx and tx)
- return an error if offload cannot be negotiated
- use offsetof() instead of magic hardcoded values for cksum offsets
- changefix some commit titles

changes v1 -> v2
- change mbuf checksum calculation static inline
- fix checksum calculation for protocol where csum=0 means no csum
- move mbuf checksum calculation in librte_net
- use RTE_MIN() to set max rx/tx queue
- rebase on top of head

Olivier Matz (12):
  virtio: move device initialization in a function
  virtio: setup and start cq in configure callback
  virtio: reinitialize the device in configure callback
  net: add function to calculate a checksum in a mbuf
  mbuf: add new Rx checksum mbuf flags
  app/testpmd: fix checksum stats in csum engine
  mbuf: new flag for LRO
  app/testpmd: display lro segment size
  virtio: add Rx checksum offload support
  virtio: add Tx checksum offload support
  virtio: add Lro support
  virtio: add Tso support

 app/test-pmd/csumonly.c                |   8 +-
 doc/guides/rel_notes/release_16_11.rst |  16 ++
 drivers/net/virtio/virtio_ethdev.c     | 197 ++++++++++++++--------
 drivers/net/virtio/virtio_ethdev.h     |  18 +-
 drivers/net/virtio/virtio_pci.h        |   4 +-
 drivers/net/virtio/virtio_rxtx.c       | 298 ++++++++++++++++++++++++++++++---
 drivers/net/virtio/virtqueue.h         |   1 +
 lib/librte_mbuf/rte_mbuf.c             |  18 +-
 lib/librte_mbuf/rte_mbuf.h             |  58 ++++++-
 lib/librte_net/rte_ip.h                |  71 ++++++++
 10 files changed, 580 insertions(+), 109 deletions(-)

Test plan
=========

(replayed on v3)

Platform description
--------------------

  guest (dpdk)
  +----------------+
  |                |
  |                |
  |         port0  +-----<---+
  |       ixgbe /  |         |
  |       directio |         |
  |                |         |
  |    port1       |         ^ flow1
  +----------------+         | (flow2 is the reverse)
         |                   |
         | virtio            |
         v                   |
  +----------------+         |
  |     tap0   /   |         |
  |1.1.1.1   /     |         |
  |ns-tap  /       |         |
  |      /         |         |
  |    /   ixgbe2  +------>--+
  |  /    1.1.1.2  |
  |/      ns-ixgbe |
  +----------------+
  host (linux, vhost-net)


flow1:
  host -(ixgbe)-> guest -(virtio)-> host
  1.1.1.2 -> 1.1.1.1

flow2:
  host -(virtio)-> guest -(ixgbe)-> host
  1.1.1.2 -> 1.1.1.1

Host configuration
------------------

Start qemu with:

- a ne2k management interface to avoi any conflict with dpdk
- 2 ixgbe interfaces given to with vm through vfio
- a virtio net device, connected to a tap interface through vhost-net

  /usr/bin/qemu-system-x86_64 -k fr -daemonize --enable-kvm -m 1G -cpu host \
    -smp 3 -serial telnet::40564,server,nowait -serial null \
    -qmp tcp::44340,server,nowait -monitor telnet::49229,server,nowait \
    -device ne2k_pci,mac=de:ad:de:01:02:03,netdev=user.0,addr=03 \
    -netdev user,id=user.0,hostfwd=tcp::34965-:22 \
    -device vfio-pci,host=0000:04:00.0 -device vfio-pci,host=0000:04:00.1 \
    -netdev type=tap,id=vhostnet0,script=no,vhost=on,queues=8 \
    -device virtio-net-pci,netdev=vhostnet0,ioeventfd=on,mq=on,vectors=17 \
    -hda "/path/to/ubuntu-14.04-template.qcow2" \
    -snapshot -vga none -display none

Move the tap interface in a netns, and configure it:

  ip netns add ns-tap
  ip netns exec ns-tap ip l set lo up
  ip link set tap0 netns ns-tap
  ip netns exec ns-tap ip l set tap0 down
  ip netns exec ns-tap ip l set addr 02:00:00:00:00:01 dev tap0
  ip netns exec ns-tap ip l set tap0 up
  ip netns exec ns-tap ip a a 1.1.1.1/24 dev tap0
  ip netns exec ns-tap arp -s 1.1.1.2 02:00:00:00:00:00
  ip netns exec ns-tap ip a

Move the ixgbe interface in a netns, and configure it:

  IXGBE=ixgbe2
  ip netns add ns-ixgbe
  ip netns exec ns-ixgbe ip l set lo up
  ip link set ${IXGBE} netns ns-ixgbe
  ip netns exec ns-ixgbe ip l set ${IXGBE} down
  ip netns exec ns-ixgbe ip l set addr 02:00:00:00:00:00 dev ${IXGBE}
  ip netns exec ns-ixgbe ip l set ${IXGBE} up
  ip netns exec ns-ixgbe ip a a 1.1.1.2/24 dev ${IXGBE}
  ip netns exec ns-ixgbe arp -s 1.1.1.1 02:00:00:00:00:01
  ip netns exec ns-ixgbe ip a

Guest configuration
-------------------

List of pci devices:

  00:02.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
  00:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8029(AS) [10ec:8029]
  00:04.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
  00:05.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]

Compile dpdk:

  cd dpdk.org
  make config T=x86_64-native-linuxapp-gcc
  make -j4

Prepare environment:

  mkdir -p /mnt/huge
  mount -t hugetlbfs nodev /mnt/huge
  echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
  modprobe uio_pci_generic
  python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:02.0
  python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:00:05.0

Run test
========

The test uses iperf to validate connectivity between the 2 netns of the
host and trough the guest.

Iperf is run with:

  # flow1: host -(ixgbe)-> guest -(virtio)-> host
  ip netns exec ns-tap iperf -s
  ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10

  # flow2: host -(virtio)-> guest -(ixgbe)-> host
  ip netns exec ns-ixgbe iperf -s
  ip netns exec ns-tap iperf -c 1.1.1.2 -t 10

The guest runs testpmd with csum forward engine, its configuration
depends on the test case.

test1: large packets (lro/tso)
------------------------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter \
    --disable-hw-vlan-strip --enable-rx-cksum --enable-lro \
    --crc-strip --txqflags=0

  set fwd csum
  tso set 1440 0
  csum set ip hw 0
  csum set tcp hw 0
  tso set 1440 1
  #csum set ip hw 1 # not supported by virtio
  csum set tcp hw 1
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54460 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.14 GBytes  5.27 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58312 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.70 GBytes  5.76 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7f968ad9fdc0, pkt_len=24682, nb_segs=13:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: m->tso_segsz=1440
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7f968acc9f40, pkt_len=42058, nb_segs=21:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN PKT_RX_LRO
  rx: m->lro_segsz=1440
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: m->tso_segsz=1440
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_L4_NO_CKSUM PKT_TX_TCP_SEG PKT_TX_IPV4

test2: hardware checksum only
-----------------------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter \
    --disable-hw-vlan-strip --enable-rx-cksum --crc-strip --txqflags=0

  set fwd csum
  csum set ip hw 0
  csum set tcp hw 0
  csum set tcp hw 1
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54462 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  4.49 GBytes  3.86 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58314 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7f0adca89b40, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: flags=PKT_TX_TCP_CKSUM PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7f0adcb98d80, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_NONE PKT_RX_IP_CKSUM_UNKNOWN
  tx: m->l2_len=14 m->l3_len=20 m->l4_len=32
  tx: flags=PKT_TX_IP_CKSUM PKT_TX_TCP_CKSUM PKT_TX_IPV4

test3: no offload
-----------------

Configuration of testpmd:

  ./build/app/testpmd -l 0,1 --log-level 8 -- --total-num-mbufs=16384 \
    -i --port-topology=chained --disable-hw-vlan-filter --disable-hw-vlan-strip

  set fwd csum
  start

Iperf log:

  root@ubuntu1404:~# ip netns exec ns-ixgbe iperf -c 1.1.1.1 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.1, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.2 port 54466 connected with 1.1.1.1 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  4.29 GBytes  3.68 Gbits/sec
  root@ubuntu1404:~# ip netns exec ns-tap iperf -c 1.1.1.2 -t 10
  ------------------------------------------------------------
  Client connecting to 1.1.1.2, TCP port 5001
  TCP window size: 85.0 KByte (default)
  ------------------------------------------------------------
  [  3] local 1.1.1.1 port 58316 connected with 1.1.1.2 port 5001
  [ ID] Interval       Transfer     Bandwidth
  [  3]  0.0-10.0 sec  6.66 GBytes  5.72 Gbits/sec

Example of what we see with "set verbose 1" in testpmd:

  -- flow1: ixgbe2 -> port0 (ixgbe) -> testpmd -> port1 (virtio) <-> tap0
  port=0, mbuf=0x7faf38b3e700, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4

  -- flow2: tap0 -> port1 (virtio)-> testpmd -> port0 (ixgbe) -> ixgbe2
  port=1, mbuf=0x7faf38b71500, pkt_len=1514, nb_segs=1:
  rx: l2_len=14 ethertype=800 l3_len=20 l4_proto=6 l4_len=32 flags=PKT_RX_L4_CKSUM_UNKNOWN PKT_RX_IP_CKSUM_UNKNOWN
  tx: flags=PKT_TX_L4_NO_CKSUM PKT_TX_IPV4

-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 01/12] net/virtio: move device initialization in a function
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback Olivier Matz
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Move all code related to device initialization in a new function
virtio_init_device().

This commit brings no functional change, it prepares the next commits
that will add the offload support. For that, it will be needed to
reinitialize the device from ethdev->configure(), using this new
function.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 99 ++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 41 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index b4dfc0a..77ca569 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1118,46 +1118,13 @@ rx_func_get(struct rte_eth_dev *eth_dev)
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
 }
 
-/*
- * This function is based on probe() function in virtio_pci.c
- * It returns 0 on success.
- */
-int
-eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+static int
+virtio_init_device(struct rte_eth_dev *eth_dev)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
 	struct virtio_net_config local_config;
-	struct rte_pci_device *pci_dev;
-	uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
-	int ret;
-
-	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr_mrg_rxbuf));
-
-	eth_dev->dev_ops = &virtio_eth_dev_ops;
-	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		rx_func_get(eth_dev);
-		return 0;
-	}
-
-	/* Allocate memory for storing MAC addresses */
-	eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
-	if (eth_dev->data->mac_addrs == NULL) {
-		PMD_INIT_LOG(ERR,
-			"Failed to allocate %d bytes needed to store MAC addresses",
-			VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
-		return -ENOMEM;
-	}
-
-	pci_dev = eth_dev->pci_dev;
-
-	if (pci_dev) {
-		ret = vtpci_init(pci_dev, hw, &dev_flags);
-		if (ret)
-			return ret;
-	}
+	struct rte_pci_device *pci_dev = eth_dev->pci_dev;
 
 	/* Reset the device although not necessary at startup */
 	vtpci_reset(hw);
@@ -1172,10 +1139,11 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 
 	/* If host does not support status then disable LSC */
 	if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
-		dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+		eth_dev->data->dev_flags &= ~RTE_ETH_DEV_INTR_LSC;
+	else
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
 
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
-	eth_dev->data->dev_flags = dev_flags;
 
 	rx_func_get(eth_dev);
 
@@ -1254,12 +1222,61 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
+	virtio_dev_cq_start(eth_dev);
+
+	return 0;
+}
+
+/*
+ * This function is based on probe() function in virtio_pci.c
+ * It returns 0 on success.
+ */
+int
+eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	uint32_t dev_flags = RTE_ETH_DEV_DETACHABLE;
+	int ret;
+
+	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr_mrg_rxbuf));
+
+	eth_dev->dev_ops = &virtio_eth_dev_ops;
+	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		rx_func_get(eth_dev);
+		return 0;
+	}
+
+	/* Allocate memory for storing MAC addresses */
+	eth_dev->data->mac_addrs = rte_zmalloc("virtio", VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR,
+			"Failed to allocate %d bytes needed to store MAC addresses",
+			VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN);
+		return -ENOMEM;
+	}
+
+	pci_dev = eth_dev->pci_dev;
+
+	if (pci_dev) {
+		ret = vtpci_init(pci_dev, hw, &dev_flags);
+		if (ret)
+			return ret;
+	}
+
+	eth_dev->data->dev_flags = dev_flags;
+
+	/* reset device and negotiate features */
+	ret = virtio_init_device(eth_dev);
+	if (ret < 0)
+		return ret;
+
 	/* Setup interrupt callback  */
 	if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		rte_intr_callback_register(&pci_dev->intr_handle,
-				   virtio_interrupt_handler, eth_dev);
-
-	virtio_dev_cq_start(eth_dev);
+			virtio_interrupt_handler, eth_dev);
 
 	return 0;
 }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 01/12] net/virtio: move device initialization in a function Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-11-02  1:38     ` Yao, Lei A
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 03/12] net/virtio: reinitialize the device " Olivier Matz
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Move the configuration of control queue in the configure callback.
This is needed by next commit, which introduces the reinitialization
of the device in the configure callback to change the feature flags.
Therefore, the control queue will have to be restarted at the same
place.

As virtio_dev_cq_queue_setup() is called from a place where
config->max_virtqueue_pairs is not available, we need to store this in
the private structure. It replaces max_rx_queues and max_tx_queues which
have the same value. The log showing the value of max_rx_queues and
max_tx_queues is also removed since config->max_virtqueue_pairs is
already displayed above.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 43 +++++++++++++++++++-------------------
 drivers/net/virtio/virtio_ethdev.h |  4 ++--
 drivers/net/virtio/virtio_pci.h    |  3 +--
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 77ca569..f3921ac 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -552,6 +552,9 @@ virtio_dev_close(struct rte_eth_dev *dev)
 	if (hw->started == 1)
 		virtio_dev_stop(dev);
 
+	if (hw->cvq)
+		virtio_dev_queue_release(hw->cvq->vq);
+
 	/* reset the NIC */
 	if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
@@ -1191,16 +1194,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 			config->max_virtqueue_pairs = 1;
 		}
 
-		hw->max_rx_queues =
-			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-		hw->max_tx_queues =
-			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-		virtio_dev_cq_queue_setup(eth_dev,
-					config->max_virtqueue_pairs * 2,
-					SOCKET_ID_ANY);
+		hw->max_queue_pairs = config->max_virtqueue_pairs;
 
 		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
 				config->max_virtqueue_pairs);
@@ -1211,19 +1205,15 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 				config->mac[2], config->mac[3],
 				config->mac[4], config->mac[5]);
 	} else {
-		hw->max_rx_queues = 1;
-		hw->max_tx_queues = 1;
+		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1");
+		hw->max_queue_pairs = 1;
 	}
 
-	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-			hw->max_rx_queues, hw->max_tx_queues);
 	if (pci_dev)
 		PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
-	virtio_dev_cq_start(eth_dev);
-
 	return 0;
 }
 
@@ -1285,7 +1275,6 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 {
 	struct rte_pci_device *pci_dev;
-	struct virtio_hw *hw = eth_dev->data->dev_private;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -1301,9 +1290,6 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 	eth_dev->tx_pkt_burst = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 
-	if (hw->cvq)
-		virtio_dev_queue_release(hw->cvq->vq);
-
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
@@ -1352,6 +1338,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
 
@@ -1360,6 +1347,16 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	/* Setup and start control queue */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		ret = virtio_dev_cq_queue_setup(dev,
+			hw->max_queue_pairs * 2,
+			SOCKET_ID_ANY);
+		if (ret < 0)
+			return ret;
+		virtio_dev_cq_start(dev);
+	}
+
 	hw->vlan_strip = rxmode->hw_vlan_strip;
 
 	if (rxmode->hw_vlan_filter
@@ -1553,8 +1550,10 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->driver_name = dev->driver->pci_drv.driver.name;
 	else
 		dev_info->driver_name = "virtio_user PMD";
-	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+	dev_info->max_rx_queues =
+		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_RX_QUEUES);
+	dev_info->max_tx_queues =
+		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_TX_QUEUES);
 	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
 	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
 	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 04d626b..dc18341 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -47,8 +47,8 @@
 #define PAGE_SIZE 4096
 #endif
 
-#define VIRTIO_MAX_RX_QUEUES 128
-#define VIRTIO_MAX_TX_QUEUES 128
+#define VIRTIO_MAX_RX_QUEUES 128U
+#define VIRTIO_MAX_TX_QUEUES 128U
 #define VIRTIO_MAX_MAC_ADDRS 64
 #define VIRTIO_MIN_RX_BUFSIZE 64
 #define VIRTIO_MAX_RX_PKTLEN  9728
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index b8295a7..6930cd6 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -246,8 +246,7 @@ struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
 	uint64_t    guest_features;
-	uint32_t    max_tx_queues;
-	uint32_t    max_rx_queues;
+	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
 	uint8_t	    vlan_strip;
 	uint8_t	    use_msix;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 03/12] net/virtio: reinitialize the device in configure callback
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 01/12] net/virtio: move device initialization in a function Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Add the ability to reset the virtio device in the configure callback
if the features flag changed since previous reset. This will be possible
with the introduction of offload support in next commits.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 26 +++++++++++++++++++-------
 drivers/net/virtio/virtio_pci.h    |  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index f3921ac..b5bc0ee 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1045,14 +1045,13 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 }
 
 static int
-virtio_negotiate_features(struct virtio_hw *hw)
+virtio_negotiate_features(struct virtio_hw *hw, uint64_t req_features)
 {
 	uint64_t host_features;
 
 	/* Prepare guest_features: feature that driver wants to support */
-	hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
 	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %" PRIx64,
-		hw->guest_features);
+		req_features);
 
 	/* Read device(host) feature bits */
 	host_features = hw->vtpci_ops->get_features(hw);
@@ -1063,6 +1062,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 	 * Negotiate features: Subset of device feature bits are written back
 	 * guest feature bits.
 	 */
+	hw->guest_features = req_features;
 	hw->guest_features = vtpci_negotiate_features(hw, host_features);
 	PMD_INIT_LOG(DEBUG, "features after negotiate = %" PRIx64,
 		hw->guest_features);
@@ -1081,6 +1081,8 @@ virtio_negotiate_features(struct virtio_hw *hw)
 		}
 	}
 
+	hw->req_guest_features = req_features;
+
 	return 0;
 }
 
@@ -1121,8 +1123,9 @@ rx_func_get(struct rte_eth_dev *eth_dev)
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
 }
 
+/* reset device and renegotiate features if needed */
 static int
-virtio_init_device(struct rte_eth_dev *eth_dev)
+virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
@@ -1137,7 +1140,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 
 	/* Tell the host we've known how to drive the device. */
 	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-	if (virtio_negotiate_features(hw) < 0)
+	if (virtio_negotiate_features(hw, req_features) < 0)
 		return -1;
 
 	/* If host does not support status then disable LSC */
@@ -1258,8 +1261,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 
 	eth_dev->data->dev_flags = dev_flags;
 
-	/* reset device and negotiate features */
-	ret = virtio_init_device(eth_dev);
+	/* reset device and negotiate default features */
+	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
 	if (ret < 0)
 		return ret;
 
@@ -1338,6 +1341,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	uint64_t req_features;
 	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
@@ -1347,6 +1351,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	req_features = VIRTIO_PMD_GUEST_FEATURES;
+	/* if request features changed, reinit the device */
+	if (req_features != hw->req_guest_features) {
+		ret = virtio_init_device(dev, req_features);
+		if (ret < 0)
+			return ret;
+	}
+
 	/* Setup and start control queue */
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		ret = virtio_dev_cq_queue_setup(dev,
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 6930cd6..bbf06ec 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -245,6 +245,7 @@ struct virtio_net_config;
 struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
+	uint64_t    req_guest_features;
 	uint64_t    guest_features;
 	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 04/12] net: add function to calculate a checksum in a mbuf
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (2 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 03/12] net/virtio: reinitialize the device " Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

This function can be used to calculate the checksum of data embedded in
mbuf, that can be composed of several segments.

This function will be used by the virtio pmd in next commits to calculate
the checksum in software in case the protocol is not recognized.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst |  5 +++
 lib/librte_net/rte_ip.h                | 71 ++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 51fc707..fbc0cbd 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -104,6 +104,11 @@ New Features
   The config option ``RTE_MACHINE`` can be used to pass code names to the compiler as ``-march`` flag.
 
 
+* **Added a functions to calculate the checksum of data in a mbuf.**
+
+  Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
+  data embedded in an mbuf chain.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 5b7554a..4491b86 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -230,6 +230,77 @@ rte_raw_cksum(const void *buf, size_t len)
 }
 
 /**
+ * Compute the raw (non complemented) checksum of a packet.
+ *
+ * @param m
+ *   The pointer to the mbuf.
+ * @param off
+ *   The offset in bytes to start the checksum.
+ * @param len
+ *   The length in bytes of the data to ckecksum.
+ * @param cksum
+ *   A pointer to the checksum, filled on success.
+ * @return
+ *   0 on success, -1 on error (bad length or offset).
+ */
+static inline int
+rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len,
+	uint16_t *cksum)
+{
+	const struct rte_mbuf *seg;
+	const char *buf;
+	uint32_t sum, tmp;
+	uint32_t seglen, done;
+
+	/* easy case: all data in the first segment */
+	if (off + len <= rte_pktmbuf_data_len(m)) {
+		*cksum = rte_raw_cksum(rte_pktmbuf_mtod_offset(m,
+				const char *, off), len);
+		return 0;
+	}
+
+	if (unlikely(off + len > rte_pktmbuf_pkt_len(m)))
+		return -1; /* invalid params, return a dummy value */
+
+	/* else browse the segment to find offset */
+	seglen = 0;
+	for (seg = m; seg != NULL; seg = seg->next) {
+		seglen = rte_pktmbuf_data_len(seg);
+		if (off < seglen)
+			break;
+		off -= seglen;
+	}
+	seglen -= off;
+	buf = rte_pktmbuf_mtod_offset(seg, const char *, off);
+	if (seglen >= len) {
+		/* all in one segment */
+		*cksum = rte_raw_cksum(buf, len);
+		return 0;
+	}
+
+	/* hard case: process checksum of several segments */
+	sum = 0;
+	done = 0;
+	for (;;) {
+		tmp = __rte_raw_cksum(buf, seglen, 0);
+		if (done & 1)
+			tmp = rte_bswap16(tmp);
+		sum += tmp;
+		done += seglen;
+		if (done == len)
+			break;
+		seg = seg->next;
+		buf = rte_pktmbuf_mtod(seg, const char *);
+		seglen = rte_pktmbuf_data_len(seg);
+		if (seglen > len - done)
+			seglen = len - done;
+	}
+
+	*cksum = __rte_raw_cksum_reduce(sum);
+	return 0;
+}
+
+/**
  * Process the IPv4 checksum of an IPv4 header.
  *
  * The checksum field must be set to 0 by the caller.
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 05/12] mbuf: add new Rx checksum mbuf flags
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (3 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 06/12] app/testpmd: adapt checksum stats in csum engine Olivier Matz
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Following discussions in [1] and [2], introduce a new bit to
describe the Rx checksum status in mbuf.

Before this patch, only one flag was available:
  PKT_RX_L4_CKSUM_BAD: L4 cksum of RX pkt. is not OK.

And same for L3:
  PKT_RX_IP_CKSUM_BAD: IP cksum of RX pkt. is not OK.

This had 2 issues:
- it was not possible to differentiate "checksum good" from
  "checksum unknown".
- it was not possible for a virtual driver to say "the checksum
  in packet may be wrong, but data integrity is valid".

This patch tries to solve this issue by having 4 states (2 bits)
for the IP and L4 Rx checksums. New values are:

 - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
   -> the application should verify the checksum by sw
 - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
   -> the application can drop the packet without additional check
 - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
   -> the application can accept the packet without verifying the
      checksum by sw
 - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
   data, but the integrity of the L4 data is verified.
   -> the application can process the packet but must not verify the
      checksum by sw. It has to take care to recalculate the cksum
      if the packet is transmitted (either by sw or using tx offload)

  And same for L3 (replace L4 by IP in description above).

This commit tries to be compatible with existing applications that
only check the existing flag (CKSUM_BAD).

[1] http://dpdk.org/ml/archives/dev/2016-May/039920.html
[2] http://dpdk.org/ml/archives/dev/2016-June/040007.html

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_16_11.rst |  6 ++++
 lib/librte_mbuf/rte_mbuf.c             | 16 +++++++++--
 lib/librte_mbuf/rte_mbuf.h             | 51 ++++++++++++++++++++++++++++++++--
 3 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index fbc0cbd..2ec63b2 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -109,6 +109,12 @@ New Features
   Added a new function ``rte_raw_cksum_mbuf()`` to process the checksum of
   data embedded in an mbuf chain.
 
+* **Added new Rx checksum mbuf flags.**
+
+  Added new Rx checksum flags in mbufs to describe more states: unknown,
+  good, bad, or not present (useful for virtual drivers). This modification
+  was done for IP and L4.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 4e1fdd1..8d9b875 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -309,7 +309,11 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	case PKT_RX_RSS_HASH: return "PKT_RX_RSS_HASH";
 	case PKT_RX_FDIR: return "PKT_RX_FDIR";
 	case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
+	case PKT_RX_L4_CKSUM_GOOD: return "PKT_RX_L4_CKSUM_GOOD";
+	case PKT_RX_L4_CKSUM_NONE: return "PKT_RX_L4_CKSUM_NONE";
 	case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
+	case PKT_RX_IP_CKSUM_GOOD: return "PKT_RX_IP_CKSUM_GOOD";
+	case PKT_RX_IP_CKSUM_NONE: return "PKT_RX_IP_CKSUM_NONE";
 	case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
 	case PKT_RX_VLAN_STRIPPED: return "PKT_RX_VLAN_STRIPPED";
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
@@ -333,8 +337,16 @@ rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT, NULL },
 		{ PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, NULL },
 		{ PKT_RX_FDIR, PKT_RX_FDIR, NULL },
-		{ PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, NULL },
-		{ PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD, NULL },
+		{ PKT_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_GOOD, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_NONE, PKT_RX_L4_CKSUM_MASK, NULL },
+		{ PKT_RX_L4_CKSUM_UNKNOWN, PKT_RX_L4_CKSUM_MASK,
+		  "PKT_RX_L4_CKSUM_UNKNOWN" },
+		{ PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_GOOD, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_NONE, PKT_RX_IP_CKSUM_MASK, NULL },
+		{ PKT_RX_IP_CKSUM_UNKNOWN, PKT_RX_IP_CKSUM_MASK,
+		  "PKT_RX_IP_CKSUM_UNKNOWN" },
 		{ PKT_RX_EIP_CKSUM_BAD, PKT_RX_EIP_CKSUM_BAD, NULL },
 		{ PKT_RX_VLAN_STRIPPED, PKT_RX_VLAN_STRIPPED, NULL },
 		{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 7541070..38022a3 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,8 +91,25 @@ extern "C" {
 
 #define PKT_RX_RSS_HASH      (1ULL << 1)  /**< RX packet with RSS hash result. */
 #define PKT_RX_FDIR          (1ULL << 2)  /**< RX packet with FDIR match indicate. */
-#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)  /**< L4 cksum of RX pkt. is not OK. */
-#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)  /**< IP cksum of RX pkt. is not OK. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
 #define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)  /**< External IP header checksum error. */
 
 /**
@@ -102,7 +119,35 @@ extern "C" {
  */
 #define PKT_RX_VLAN_STRIPPED (1ULL << 6)
 
-/* hole, some bits can be reused here  */
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
 
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 06/12] app/testpmd: adapt checksum stats in csum engine
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (4 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 07/12] mbuf: new flag for LRO Olivier Matz
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 app/test-pmd/csumonly.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 27d0f08..da15185 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -697,8 +697,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 		rx_ol_flags = m->ol_flags;
 
 		/* Update the L3/L4 checksum error packet statistics */
-		rx_bad_ip_csum += ((rx_ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
-		rx_bad_l4_csum += ((rx_ol_flags & PKT_RX_L4_CKSUM_BAD) != 0);
+		if ((rx_ol_flags & PKT_RX_IP_CKSUM_MASK) == PKT_RX_IP_CKSUM_BAD)
+			rx_bad_ip_csum += 1;
+		if ((rx_ol_flags & PKT_RX_L4_CKSUM_MASK) == PKT_RX_L4_CKSUM_BAD)
+			rx_bad_l4_csum += 1;
 
 		/* step 1: dissect packet, parsing optional vlan, ip4/ip6, vxlan
 		 * and inner headers */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 07/12] mbuf: new flag for LRO
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (5 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 06/12] app/testpmd: adapt checksum stats in csum engine Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 08/12] app/testpmd: display lro segment size Olivier Matz
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

When receiving coalesced packets in virtio, the original size of the
segments is provided. This is a useful information because it allows to
resegment with the same size.

Add a RX new flag in mbuf, that can be set when packets are coalesced by
a hardware or virtual driver when the m->tso_segsz field is valid and is
set to the segment size of original packets.

This flag is used in next commits in the virtio pmd.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_16_11.rst | 5 +++++
 lib/librte_mbuf/rte_mbuf.c             | 2 ++
 lib/librte_mbuf/rte_mbuf.h             | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 2ec63b2..c9fcfb9 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -115,6 +115,11 @@ New Features
   good, bad, or not present (useful for virtual drivers). This modification
   was done for IP and L4.
 
+* **Added a LRO mbuf flag.**
+
+  Added a new RX LRO mbuf flag, used when packets are coalesced. This
+  flag indicates that the segment size of original packets is known.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 8d9b875..63f43c8 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -319,6 +319,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
 	case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
+	case PKT_RX_LRO: return "PKT_RX_LRO";
 	default: return NULL;
 	}
 }
@@ -352,6 +353,7 @@ rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		{ PKT_RX_IEEE1588_PTP, PKT_RX_IEEE1588_PTP, NULL },
 		{ PKT_RX_IEEE1588_TMST, PKT_RX_IEEE1588_TMST, NULL },
 		{ PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ_STRIPPED, NULL },
+		{ PKT_RX_LRO, PKT_RX_LRO, NULL },
 	};
 	const char *name;
 	unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 38022a3..f5eedda 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -170,6 +170,13 @@ extern "C" {
  */
 #define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
 
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
 /* add new RX flags here */
 
 /* add new TX flags here */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 08/12] app/testpmd: display lro segment size
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (6 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 07/12] mbuf: new flag for LRO Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 09/12] net/virtio: add Rx checksum offload support Olivier Matz
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

In csumonly engine, display the value of LRO segment if the
LRO flag is set.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 app/test-pmd/csumonly.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index da15185..57e6ae2 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -822,6 +822,8 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				"l4_proto=%d l4_len=%d flags=%s\n",
 				info.l2_len, rte_be_to_cpu_16(info.ethertype),
 				info.l3_len, info.l4_proto, info.l4_len, buf);
+			if (rx_ol_flags & PKT_RX_LRO)
+				printf("rx: m->lro_segsz=%u\n", m->tso_segsz);
 			if (info.is_tunnel == 1)
 				printf("rx: outer_l2_len=%d outer_ethertype=%x "
 					"outer_l3_len=%d\n", info.outer_l2_len,
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 09/12] net/virtio: add Rx checksum offload support
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (7 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 08/12] app/testpmd: display lro segment size Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 10/12] net/virtio: add Tx " Olivier Matz
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c | 21 ++++++----
 drivers/net/virtio/virtio_ethdev.h |  2 +-
 drivers/net/virtio/virtio_rxtx.c   | 79 ++++++++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h     |  1 +
 4 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index b5bc0ee..00b4c38 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1262,7 +1262,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->data->dev_flags = dev_flags;
 
 	/* reset device and negotiate default features */
-	ret = virtio_init_device(eth_dev, VIRTIO_PMD_GUEST_FEATURES);
+	ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
 	if (ret < 0)
 		return ret;
 
@@ -1345,13 +1345,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
+	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
+	if (rxmode->hw_ip_checksum)
+		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
 
-	if (rxmode->hw_ip_checksum) {
-		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
-		return -EINVAL;
-	}
-
-	req_features = VIRTIO_PMD_GUEST_FEATURES;
 	/* if request features changed, reinit the device */
 	if (req_features != hw->req_guest_features) {
 		ret = virtio_init_device(dev, req_features);
@@ -1359,6 +1356,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 			return ret;
 	}
 
+	if (rxmode->hw_ip_checksum &&
+		!vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM)) {
+		PMD_DRV_LOG(NOTICE,
+			"rx ip checksum not available on this host");
+		return -ENOTSUP;
+	}
+
 	/* Setup and start control queue */
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		ret = virtio_dev_cq_queue_setup(dev,
@@ -1572,6 +1576,9 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->default_txconf = (struct rte_eth_txconf) {
 		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
 	};
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_TCP_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM;
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index dc18341..fd29a7f 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -54,7 +54,7 @@
 #define VIRTIO_MAX_RX_PKTLEN  9728
 
 /* Features desired/implemented by this driver. */
-#define VIRTIO_PMD_GUEST_FEATURES		\
+#define VIRTIO_PMD_DEFAULT_GUEST_FEATURES	\
 	(1u << VIRTIO_NET_F_MAC		  |	\
 	 1u << VIRTIO_NET_F_STATUS	  |	\
 	 1u << VIRTIO_NET_F_MQ		  |	\
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9ab441b..fc0d84b 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -51,6 +51,8 @@
 #include <rte_errno.h>
 #include <rte_byteorder.h>
 #include <rte_cpuflags.h>
+#include <rte_net.h>
+#include <rte_ip.h>
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -632,6 +634,63 @@ virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf *mbuf)
 	}
 }
 
+/* Optionally fill offload information in structure */
+static int
+virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
+{
+	struct rte_net_hdr_lens hdr_lens;
+	uint32_t hdrlen, ptype;
+	int l4_supported = 0;
+
+	/* nothing to do */
+	if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
+		return 0;
+
+	m->ol_flags |= PKT_RX_IP_CKSUM_UNKNOWN;
+
+	ptype = rte_net_get_ptype(m, &hdr_lens, RTE_PTYPE_ALL_MASK);
+	m->packet_type = ptype;
+	if ((ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP ||
+	    (ptype & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP)
+		l4_supported = 1;
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		hdrlen = hdr_lens.l2_len + hdr_lens.l3_len + hdr_lens.l4_len;
+		if (hdr->csum_start <= hdrlen && l4_supported) {
+			m->ol_flags |= PKT_RX_L4_CKSUM_NONE;
+		} else {
+			/* Unknown proto or tunnel, do sw cksum. We can assume
+			 * the cksum field is in the first segment since the
+			 * buffers we provided to the host are large enough.
+			 * In case of SCTP, this will be wrong since it's a CRC
+			 * but there's nothing we can do.
+			 */
+			uint16_t csum, off;
+
+			rte_raw_cksum_mbuf(m, hdr->csum_start,
+				rte_pktmbuf_pkt_len(m) - hdr->csum_start,
+				&csum);
+			if (likely(csum != 0xffff))
+				csum = ~csum;
+			off = hdr->csum_offset + hdr->csum_start;
+			if (rte_pktmbuf_data_len(m) >= off + 1)
+				*rte_pktmbuf_mtod_offset(m, uint16_t *,
+					off) = csum;
+		}
+	} else if (hdr->flags & VIRTIO_NET_HDR_F_DATA_VALID && l4_supported) {
+		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
+	}
+
+	return 0;
+}
+
+static inline int
+rx_offload_enabled(struct virtio_hw *hw)
+{
+	return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM);
+}
+
 #define VIRTIO_MBUF_BURST_SZ 64
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
 uint16_t
@@ -647,6 +706,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	int error;
 	uint32_t i, nb_enqueued;
 	uint32_t hdr_size;
+	int offload;
+	struct virtio_net_hdr *hdr;
 
 	nb_used = VIRTQUEUE_NUSED(vq);
 
@@ -664,6 +725,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	nb_rx = 0;
 	nb_enqueued = 0;
 	hdr_size = hw->vtnet_hdr_size;
+	offload = rx_offload_enabled(hw);
 
 	for (i = 0; i < num ; i++) {
 		rxm = rcv_pkts[i];
@@ -688,9 +750,18 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
 		rxm->data_len = (uint16_t)(len[i] - hdr_size);
 
+		hdr = (struct virtio_net_hdr *)((char *)rxm->buf_addr +
+			RTE_PKTMBUF_HEADROOM - hdr_size);
+
 		if (hw->vlan_strip)
 			rte_vlan_strip(rxm);
 
+		if (offload && virtio_rx_offload(rxm, hdr) < 0) {
+			virtio_discard_rxbuf(vq, rxm);
+			rxvq->stats.errors++;
+			continue;
+		}
+
 		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
 
 		rx_pkts[nb_rx++] = rxm;
@@ -750,6 +821,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
 	uint16_t extra_idx;
 	uint32_t seg_res;
 	uint32_t hdr_size;
+	int offload;
 
 	nb_used = VIRTQUEUE_NUSED(vq);
 
@@ -765,6 +837,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
 	extra_idx = 0;
 	seg_res = 0;
 	hdr_size = hw->vtnet_hdr_size;
+	offload = rx_offload_enabled(hw);
 
 	while (i < nb_used) {
 		struct virtio_net_hdr_mrg_rxbuf *header;
@@ -810,6 +883,12 @@ virtio_recv_mergeable_pkts(void *rx_queue,
 		rx_pkts[nb_rx] = rxm;
 		prev = rxm;
 
+		if (offload && virtio_rx_offload(rxm, &header->hdr) < 0) {
+			virtio_discard_rxbuf(vq, rxm);
+			rxvq->stats.errors++;
+			continue;
+		}
+
 		seg_res = seg_num - 1;
 
 		while (seg_res != 0) {
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6737b81..ef0027b 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -223,6 +223,7 @@ struct virtqueue {
  */
 struct virtio_net_hdr {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
+#define VIRTIO_NET_HDR_F_DATA_VALID 2    /**< Checksum is valid */
 	uint8_t flags;
 #define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 10/12] net/virtio: add Tx checksum offload support
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (8 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 09/12] net/virtio: add Rx checksum offload support Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 11/12] net/virtio: add Lro support Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support Olivier Matz
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |  7 ++++
 drivers/net/virtio/virtio_ethdev.h |  1 +
 drivers/net/virtio/virtio_rxtx.c   | 73 +++++++++++++++++++++++++++-----------
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 00b4c38..c3c53be 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1579,6 +1579,13 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_TCP_CKSUM |
 		DEV_RX_OFFLOAD_UDP_CKSUM;
+	dev_info->tx_offload_capa = 0;
+
+	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
+		dev_info->tx_offload_capa |=
+			DEV_TX_OFFLOAD_UDP_CKSUM |
+			DEV_TX_OFFLOAD_TCP_CKSUM;
+	}
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index fd29a7f..adca6ba 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -62,6 +62,7 @@
 	 1u << VIRTIO_NET_F_CTRL_VQ	  |	\
 	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
+	 1u << VIRTIO_NET_F_CSUM	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1u << VIRTIO_RING_F_INDIRECT_DESC |    \
 	 1ULL << VIRTIO_F_VERSION_1)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index fc0d84b..675dc43 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -53,6 +53,8 @@
 #include <rte_cpuflags.h>
 #include <rte_net.h>
 #include <rte_ip.h>
+#include <rte_udp.h>
+#include <rte_tcp.h>
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -207,18 +209,27 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
 	return 0;
 }
 
+static inline int
+tx_offload_enabled(struct virtio_hw *hw)
+{
+	return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM);
+}
+
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		       uint16_t needed, int use_indirect, int can_push)
 {
+	struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
 	struct vq_desc_extra *dxp;
 	struct virtqueue *vq = txvq->vq;
 	struct vring_desc *start_dp;
 	uint16_t seg_num = cookie->nb_segs;
 	uint16_t head_idx, idx;
 	uint16_t head_size = vq->hw->vtnet_hdr_size;
-	unsigned long offs;
+	struct virtio_net_hdr *hdr;
+	int offload;
 
+	offload = tx_offload_enabled(vq->hw);
 	head_idx = vq->vq_desc_head_idx;
 	idx = head_idx;
 	dxp = &vq->vq_descx[idx];
@@ -228,10 +239,12 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	start_dp = vq->vq_ring.desc;
 
 	if (can_push) {
-		/* put on zero'd transmit header (no offloads) */
-		void *hdr = rte_pktmbuf_prepend(cookie, head_size);
-
-		memset(hdr, 0, head_size);
+		/* prepend cannot fail, checked by caller */
+		hdr = (struct virtio_net_hdr *)
+			rte_pktmbuf_prepend(cookie, head_size);
+		/* if offload disabled, it is not zeroed below, do it now */
+		if (offload == 0)
+			memset(hdr, 0, head_size);
 	} else if (use_indirect) {
 		/* setup tx ring slot to point to indirect
 		 * descriptor list stored in reserved region.
@@ -239,14 +252,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		 * the first slot in indirect ring is already preset
 		 * to point to the header in reserved region
 		 */
-		struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
-
-		offs = idx * sizeof(struct virtio_tx_region)
-			+ offsetof(struct virtio_tx_region, tx_indir);
-
-		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+			RTE_PTR_DIFF(&txr[idx].tx_indir, txr);
 		start_dp[idx].len   = (seg_num + 1) * sizeof(struct vring_desc);
 		start_dp[idx].flags = VRING_DESC_F_INDIRECT;
+		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
 
 		/* loop below will fill in rest of the indirect elements */
 		start_dp = txr[idx].tx_indir;
@@ -255,15 +265,43 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		/* setup first tx ring slot to point to header
 		 * stored in reserved region.
 		 */
-		offs = idx * sizeof(struct virtio_tx_region)
-			+ offsetof(struct virtio_tx_region, tx_hdr);
-
-		start_dp[idx].addr  = txvq->virtio_net_hdr_mem + offs;
+		start_dp[idx].addr  = txvq->virtio_net_hdr_mem +
+			RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
 		start_dp[idx].len   = vq->hw->vtnet_hdr_size;
 		start_dp[idx].flags = VRING_DESC_F_NEXT;
+		hdr = (struct virtio_net_hdr *)&txr[idx].tx_hdr;
+
 		idx = start_dp[idx].next;
 	}
 
+	if (offload) {
+		/* Checksum Offload */
+		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
+		case PKT_TX_UDP_CKSUM:
+			hdr->csum_start = cookie->l2_len + cookie->l3_len;
+			hdr->csum_offset = offsetof(struct udp_hdr,
+				dgram_cksum);
+			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+			break;
+
+		case PKT_TX_TCP_CKSUM:
+			hdr->csum_start = cookie->l2_len + cookie->l3_len;
+			hdr->csum_offset = offsetof(struct tcp_hdr, cksum);
+			hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+			break;
+
+		default:
+			hdr->csum_start = 0;
+			hdr->csum_offset = 0;
+			hdr->flags = 0;
+			break;
+		}
+
+		hdr->gso_type = 0;
+		hdr->gso_size = 0;
+		hdr->hdr_len = 0;
+	}
+
 	do {
 		start_dp[idx].addr  = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
 		start_dp[idx].len   = cookie->data_len;
@@ -527,11 +565,6 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	PMD_INIT_FUNC_TRACE();
 
-	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
-	    != ETH_TXQ_FLAGS_NOXSUMS) {
-		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
-		return -EINVAL;
-	}
 
 	virtio_update_rxtx_handler(dev, tx_conf);
 
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 11/12] net/virtio: add Lro support
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (9 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 10/12] net/virtio: add Tx " Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support Olivier Matz
  11 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++++++++++++++-
 drivers/net/virtio/virtio_ethdev.h |  9 ---------
 drivers/net/virtio/virtio_rxtx.c   | 25 ++++++++++++++++++++++++-
 3 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index c3c53be..109f855 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1348,6 +1348,10 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	req_features = VIRTIO_PMD_DEFAULT_GUEST_FEATURES;
 	if (rxmode->hw_ip_checksum)
 		req_features |= (1ULL << VIRTIO_NET_F_GUEST_CSUM);
+	if (rxmode->enable_lro)
+		req_features |=
+			(1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+			(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 
 	/* if request features changed, reinit the device */
 	if (req_features != hw->req_guest_features) {
@@ -1363,6 +1367,14 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -ENOTSUP;
 	}
 
+	if (rxmode->enable_lro &&
+		(!vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
+			!vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4))) {
+		PMD_DRV_LOG(NOTICE,
+			"lro not available on this host");
+		return -ENOTSUP;
+	}
+
 	/* Setup and start control queue */
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		ret = virtio_dev_cq_queue_setup(dev,
@@ -1578,7 +1590,8 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	};
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_TCP_CKSUM |
-		DEV_RX_OFFLOAD_UDP_CKSUM;
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_LRO;
 	dev_info->tx_offload_capa = 0;
 
 	if (hw->guest_features & (1ULL << VIRTIO_NET_F_CSUM)) {
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index adca6ba..d55e7ed 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -117,13 +117,4 @@ uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
 
-/*
- * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
- * frames larger than 1514 bytes. We do not yet support software LRO
- * via tcp_lro_rx().
- */
-#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
-			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
-
-
 #endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 675dc43..0fa635a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -715,13 +715,36 @@ virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
 		m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
 	}
 
+	/* GSO request, save required information in mbuf */
+	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+		/* Check unsupported modes */
+		if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
+		    (hdr->gso_size == 0)) {
+			return -EINVAL;
+		}
+
+		/* Update mss lengthes in mbuf */
+		m->tso_segsz = hdr->gso_size;
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+			case VIRTIO_NET_HDR_GSO_TCPV4:
+			case VIRTIO_NET_HDR_GSO_TCPV6:
+				m->ol_flags |= PKT_RX_LRO | \
+					PKT_RX_L4_CKSUM_NONE;
+				break;
+			default:
+				return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
 static inline int
 rx_offload_enabled(struct virtio_hw *hw)
 {
-	return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM);
+	return vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_CSUM) ||
+		vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO4) ||
+		vtpci_with_feature(hw, VIRTIO_NET_F_GUEST_TSO6);
 }
 
 #define VIRTIO_MBUF_BURST_SZ 64
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support
  2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
                     ` (10 preceding siblings ...)
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 11/12] net/virtio: add Lro support Olivier Matz
@ 2016-10-13 14:16   ` Olivier Matz
  2016-10-13 16:05     ` Yuanhan Liu
  11 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 14:16 UTC (permalink / raw)
  To: dev, yuanhan.liu
  Cc: konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/virtio/virtio_ethdev.c |   6 ++
 drivers/net/virtio/virtio_ethdev.h |   2 +
 drivers/net/virtio/virtio_rxtx.c   | 133 +++++++++++++++++++++++++++++++++++--
 3 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 109f855..969edb6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1572,6 +1572,7 @@ virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complet
 static void
 virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
+	uint64_t tso_mask;
 	struct virtio_hw *hw = dev->data->dev_private;
 
 	if (dev->pci_dev)
@@ -1599,6 +1600,11 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			DEV_TX_OFFLOAD_UDP_CKSUM |
 			DEV_TX_OFFLOAD_TCP_CKSUM;
 	}
+
+	tso_mask = (1ULL << VIRTIO_NET_F_HOST_TSO4) |
+		(1ULL << VIRTIO_NET_F_HOST_TSO6);
+	if ((hw->guest_features & tso_mask) == tso_mask)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
 }
 
 /*
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d55e7ed..f77f618 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -63,6 +63,8 @@
 	 1u << VIRTIO_NET_F_CTRL_RX	  |	\
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
 	 1u << VIRTIO_NET_F_CSUM	  |	\
+	 1u << VIRTIO_NET_F_HOST_TSO4	  |	\
+	 1u << VIRTIO_NET_F_HOST_TSO6	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1u << VIRTIO_RING_F_INDIRECT_DESC |    \
 	 1ULL << VIRTIO_F_VERSION_1)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 0fa635a..4b01ea3 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -209,10 +209,117 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
 	return 0;
 }
 
+/* When doing TSO, the IP length is not included in the pseudo header
+ * checksum of the packet given to the PMD, but for virtio it is
+ * expected.
+ */
+static void
+virtio_tso_fix_cksum(struct rte_mbuf *m)
+{
+	/* common case: header is not fragmented */
+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
+			m->l4_len)) {
+		struct ipv4_hdr *iph;
+		struct ipv6_hdr *ip6h;
+		struct tcp_hdr *th;
+		uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
+		uint32_t tmp;
+
+		iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
+		th = RTE_PTR_ADD(iph, m->l3_len);
+		if ((iph->version_ihl >> 4) == 4) {
+			iph->hdr_checksum = 0;
+			iph->hdr_checksum = rte_ipv4_cksum(iph);
+			ip_len = iph->total_length;
+			ip_paylen = rte_cpu_to_be_16(rte_be_to_cpu_16(ip_len) -
+				m->l3_len);
+		} else {
+			ip6h = (struct ipv6_hdr *)iph;
+			ip_paylen = ip6h->payload_len;
+		}
+
+		/* calculate the new phdr checksum not including ip_paylen */
+		prev_cksum = th->cksum;
+		tmp = prev_cksum;
+		tmp += ip_paylen;
+		tmp = (tmp & 0xffff) + (tmp >> 16);
+		new_cksum = tmp;
+
+		/* replace it in the packet */
+		th->cksum = new_cksum;
+	} else {
+		const struct ipv4_hdr *iph;
+		struct ipv4_hdr iph_copy;
+		union {
+			uint16_t u16;
+			uint8_t u8[2];
+		} prev_cksum, new_cksum, ip_len, ip_paylen, ip_csum;
+		uint32_t tmp;
+
+		/* Same code than above, but we use rte_pktmbuf_read()
+		 * or we read/write in mbuf data one byte at a time to
+		 * avoid issues if the packet is multi segmented.
+		 */
+
+		uint8_t ip_version;
+
+		ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len) >> 4;
+
+		/* calculate ip checksum (API imposes to set it to 0)
+		 * and get ip payload len */
+		if (ip_version == 4) {
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 10) = 0;
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 11) = 0;
+			iph = rte_pktmbuf_read(m, m->l2_len,
+				sizeof(*iph), &iph_copy);
+			ip_csum.u16 = rte_ipv4_cksum(iph);
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 10) = ip_csum.u8[0];
+			*rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 11) = ip_csum.u8[1];
+
+			ip_len.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 2);
+			ip_len.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 3);
+
+			ip_paylen.u16 = rte_cpu_to_be_16(
+				rte_be_to_cpu_16(ip_len.u16) - m->l3_len);
+		} else {
+			ip_paylen.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 4);
+			ip_paylen.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+				m->l2_len + 5);
+		}
+
+		/* calculate the new phdr checksum not including ip_paylen */
+		/* get phdr cksum at offset 16 of TCP header */
+		prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 16);
+		prev_cksum.u8[1] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 17);
+		tmp = prev_cksum.u16;
+		tmp += ip_paylen.u16;
+		tmp = (tmp & 0xffff) + (tmp >> 16);
+		new_cksum.u16 = tmp;
+
+		/* replace it in the packet */
+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
+	}
+}
+
 static inline int
 tx_offload_enabled(struct virtio_hw *hw)
 {
-	return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM);
+	return vtpci_with_feature(hw, VIRTIO_NET_F_CSUM) ||
+		vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO4) ||
+		vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
 }
 
 static inline void
@@ -274,8 +381,11 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		idx = start_dp[idx].next;
 	}
 
+	/* Checksum Offload / TSO */
 	if (offload) {
-		/* Checksum Offload */
+		if (cookie->ol_flags & PKT_TX_TCP_SEG)
+			cookie->ol_flags |= PKT_TX_TCP_CKSUM;
+
 		switch (cookie->ol_flags & PKT_TX_L4_MASK) {
 		case PKT_TX_UDP_CKSUM:
 			hdr->csum_start = cookie->l2_len + cookie->l3_len;
@@ -297,9 +407,22 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 			break;
 		}
 
-		hdr->gso_type = 0;
-		hdr->gso_size = 0;
-		hdr->hdr_len = 0;
+		/* TCP Segmentation Offload */
+		if (cookie->ol_flags & PKT_TX_TCP_SEG) {
+			virtio_tso_fix_cksum(cookie);
+			hdr->gso_type = (cookie->ol_flags & PKT_TX_IPV6) ?
+				VIRTIO_NET_HDR_GSO_TCPV6 :
+				VIRTIO_NET_HDR_GSO_TCPV4;
+			hdr->gso_size = cookie->tso_segsz;
+			hdr->hdr_len =
+				cookie->l2_len +
+				cookie->l3_len +
+				cookie->l4_len;
+		} else {
+			hdr->gso_type = 0;
+			hdr->gso_size = 0;
+			hdr->hdr_len = 0;
+		}
 	}
 
 	do {
-- 
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 14:02       ` Olivier MATZ
@ 2016-10-13 14:16         ` Yuanhan Liu
  2016-10-13 14:52           ` Olivier MATZ
  0 siblings, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13 14:16 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>+/* When doing TSO, the IP length is not included in the pseudo header
> >>+ * checksum of the packet given to the PMD, but for virtio it is
> >>+ * expected.
> >>+ */
> >>+static void
> >>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>+{
> >>+	/* common case: header is not fragmented */
> >>+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>+			m->l4_len)) {
> >...
> >>+		/* replace it in the packet */
> >>+		th->cksum = new_cksum;
> >>+	} else {
> >...
> >>+		/* replace it in the packet */
> >>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>+	}
> >
> >The tcp header will always be in the mbuf, right? Otherwise, you can't
> >update the cksum field here. What's the point of introducing the "else
> >clause" then?
> 
> Sorry, I don't see the problem you're pointing out here.
> 
> What I want to solve here is to support the cases where the mbuf is
> segmented in the middle of the network header (which is probably a rare
> case).

How it's gonna segmented?

> 
> In the "else" part, I only access the mbuf byte by byte using the
> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> the header in a linear buffer, fix the checksum, then copy it again in the
> packet, but there is no mbuf helpers to do these copies for now.

In the "else" clause, the ip header is still in the mbuf, right?
Why do you have to access it the way like:

	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
		m->l2_len) >> 4;

Why can't you just use

	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
	iph->version_ihl ....;

Sorry, I'm just a bit lost.

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 14:16         ` Yuanhan Liu
@ 2016-10-13 14:52           ` Olivier MATZ
  2016-10-13 15:01             ` Yuanhan Liu
  2016-10-13 15:04             ` Yuanhan Liu
  0 siblings, 2 replies; 97+ messages in thread
From: Olivier MATZ @ 2016-10-13 14:52 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
>>
>>
>> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
>>> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>>>> +/* When doing TSO, the IP length is not included in the pseudo header
>>>> + * checksum of the packet given to the PMD, but for virtio it is
>>>> + * expected.
>>>> + */
>>>> +static void
>>>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>>>> +{
>>>> +	/* common case: header is not fragmented */
>>>> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>>>> +			m->l4_len)) {
>>> ...
>>>> +		/* replace it in the packet */
>>>> +		th->cksum = new_cksum;
>>>> +	} else {
>>> ...
>>>> +		/* replace it in the packet */
>>>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>>>> +			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>>>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>>>> +			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>>>> +	}
>>>
>>> The tcp header will always be in the mbuf, right? Otherwise, you can't
>>> update the cksum field here. What's the point of introducing the "else
>>> clause" then?
>>
>> Sorry, I don't see the problem you're pointing out here.
>>
>> What I want to solve here is to support the cases where the mbuf is
>> segmented in the middle of the network header (which is probably a rare
>> case).
>
> How it's gonna segmented?

The mbuf is given by the application. So if the application generates a 
segmented mbuf, it should work.

This could happen for instance if the application uses mbuf clones to 
share the IP/TCP/data part of the mbuf and prepend a specific 
Ethernet/vlan for different destination.


>> In the "else" part, I only access the mbuf byte by byte using the
>> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
>> the header in a linear buffer, fix the checksum, then copy it again in the
>> packet, but there is no mbuf helpers to do these copies for now.
>
> In the "else" clause, the ip header is still in the mbuf, right?
> Why do you have to access it the way like:
>
> 	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> 		m->l2_len) >> 4;
>
> Why can't you just use
>
> 	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> 	iph->version_ihl ....;

AFAIK, there is no requirement that each network header has to be 
contiguous in a mbuf segment.

Of course, a split in the middle of a network header probably never 
happens... but we never knows, as it is not forbidden. I think the code 
should be robust enough to avoid accesses to wrong addresses.

Hope it's clear enough :)

Thanks
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 14:52           ` Olivier MATZ
@ 2016-10-13 15:01             ` Yuanhan Liu
  2016-10-13 15:15               ` Olivier MATZ
  2016-10-13 15:04             ` Yuanhan Liu
  1 sibling, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13 15:01 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> >On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> >>
> >>
> >>On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>>>+/* When doing TSO, the IP length is not included in the pseudo header
> >>>>+ * checksum of the packet given to the PMD, but for virtio it is
> >>>>+ * expected.
> >>>>+ */
> >>>>+static void
> >>>>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>>>+{
> >>>>+	/* common case: header is not fragmented */
> >>>>+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>>>+			m->l4_len)) {
> >>>...
> >>>>+		/* replace it in the packet */
> >>>>+		th->cksum = new_cksum;
> >>>>+	} else {
> >>>...
> >>>>+		/* replace it in the packet */
> >>>>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>>>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>>>+	}
> >>>
> >>>The tcp header will always be in the mbuf, right? Otherwise, you can't
> >>>update the cksum field here. What's the point of introducing the "else
> >>>clause" then?
> >>
> >>Sorry, I don't see the problem you're pointing out here.
> >>
> >>What I want to solve here is to support the cases where the mbuf is
> >>segmented in the middle of the network header (which is probably a rare
> >>case).
> >
> >How it's gonna segmented?
> 
> The mbuf is given by the application. So if the application generates a
> segmented mbuf, it should work.
> 
> This could happen for instance if the application uses mbuf clones to share
> the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
> different destination.
> 
> 
> >>In the "else" part, I only access the mbuf byte by byte using the
> >>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> >>the header in a linear buffer, fix the checksum, then copy it again in the
> >>packet, but there is no mbuf helpers to do these copies for now.
> >
> >In the "else" clause, the ip header is still in the mbuf, right?
> >Why do you have to access it the way like:
> >
> >	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >		m->l2_len) >> 4;
> >
> >Why can't you just use
> >
> >	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >	iph->version_ihl ....;
> 
> AFAIK, there is no requirement that each network header has to be contiguous
> in a mbuf segment.
> 
> Of course, a split in the middle of a network header probably never
> happens... but we never knows, as it is not forbidden. I think the code
> should be robust enough to avoid accesses to wrong addresses.
> 
> Hope it's clear enough :)

Thanks, but not really. Maybe let me ask this way: what wrong would
happen if we use
	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
to access the IP header? Is it about the endian?

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 14:52           ` Olivier MATZ
  2016-10-13 15:01             ` Yuanhan Liu
@ 2016-10-13 15:04             ` Yuanhan Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13 15:04 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> >In the "else" clause, the ip header is still in the mbuf, right?
> >Why do you have to access it the way like:
> >
> >	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >		m->l2_len) >> 4;
> >
> >Why can't you just use
> >
> >	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >	iph->version_ihl ....;
> 
> AFAIK, there is no requirement that each network header has to be contiguous
> in a mbuf segment.
> 
> Of course, a split in the middle of a network header probably never
> happens... but we never knows, as it is not forbidden. I think the code
> should be robust enough to avoid accesses to wrong addresses.

One more question is do you have any case to trigger the "else" clause?

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 15:01             ` Yuanhan Liu
@ 2016-10-13 15:15               ` Olivier MATZ
  2016-10-13 15:29                 ` Yuanhan Liu
  0 siblings, 1 reply; 97+ messages in thread
From: Olivier MATZ @ 2016-10-13 15:15 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
> On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
>>
>>
>> On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
>>> On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
>>>>
>>>>
>>>> On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
>>>>> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>>>>>> +/* When doing TSO, the IP length is not included in the pseudo header
>>>>>> + * checksum of the packet given to the PMD, but for virtio it is
>>>>>> + * expected.
>>>>>> + */
>>>>>> +static void
>>>>>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>>>>>> +{
>>>>>> +	/* common case: header is not fragmented */
>>>>>> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>>>>>> +			m->l4_len)) {
>>>>> ...
>>>>>> +		/* replace it in the packet */
>>>>>> +		th->cksum = new_cksum;
>>>>>> +	} else {
>>>>> ...
>>>>>> +		/* replace it in the packet */
>>>>>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>>>>>> +			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>>>>>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>>>>>> +			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>>>>>> +	}
>>>>>
>>>>> The tcp header will always be in the mbuf, right? Otherwise, you can't
>>>>> update the cksum field here. What's the point of introducing the "else
>>>>> clause" then?
>>>>
>>>> Sorry, I don't see the problem you're pointing out here.
>>>>
>>>> What I want to solve here is to support the cases where the mbuf is
>>>> segmented in the middle of the network header (which is probably a rare
>>>> case).
>>>
>>> How it's gonna segmented?
>>
>> The mbuf is given by the application. So if the application generates a
>> segmented mbuf, it should work.
>>
>> This could happen for instance if the application uses mbuf clones to share
>> the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
>> different destination.
>>
>>
>>>> In the "else" part, I only access the mbuf byte by byte using the
>>>> rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
>>>> the header in a linear buffer, fix the checksum, then copy it again in the
>>>> packet, but there is no mbuf helpers to do these copies for now.
>>>
>>> In the "else" clause, the ip header is still in the mbuf, right?
>>> Why do you have to access it the way like:
>>>
>>> 	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>>> 		m->l2_len) >> 4;
>>>
>>> Why can't you just use
>>>
>>> 	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>>> 	iph->version_ihl ....;
>>
>> AFAIK, there is no requirement that each network header has to be contiguous
>> in a mbuf segment.
>>
>> Of course, a split in the middle of a network header probably never
>> happens... but we never knows, as it is not forbidden. I think the code
>> should be robust enough to avoid accesses to wrong addresses.
>>
>> Hope it's clear enough :)
>
> Thanks, but not really. Maybe let me ask this way: what wrong would
> happen if we use
> 	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> to access the IP header? Is it about the endian?

If you have a packet split like this:

mbuf segment 1                     mbuf segment 2
----------------------------       ------------------------------
| Ethernet header |  IP hea|       |der | TCP header | data
----------------------------       ------------------------------
                    ^
                    iph

The IP header is not contiguous. So accessing to the end of the 
structure will access to a wrong location.

> One more question is do you have any case to trigger the "else" clause?

No, but I think it may happen.

Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 15:15               ` Olivier MATZ
@ 2016-10-13 15:29                 ` Yuanhan Liu
  2016-10-13 15:45                   ` Olivier Matz
  0 siblings, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13 15:29 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang

On Thu, Oct 13, 2016 at 05:15:24PM +0200, Olivier MATZ wrote:
> 
> 
> On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
> >On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
> >>
> >>
> >>On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
> >>>On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
> >>>>
> >>>>
> >>>>On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
> >>>>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> >>>>>>+/* When doing TSO, the IP length is not included in the pseudo header
> >>>>>>+ * checksum of the packet given to the PMD, but for virtio it is
> >>>>>>+ * expected.
> >>>>>>+ */
> >>>>>>+static void
> >>>>>>+virtio_tso_fix_cksum(struct rte_mbuf *m)
> >>>>>>+{
> >>>>>>+	/* common case: header is not fragmented */
> >>>>>>+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> >>>>>>+			m->l4_len)) {
> >>>>>...
> >>>>>>+		/* replace it in the packet */
> >>>>>>+		th->cksum = new_cksum;
> >>>>>>+	} else {
> >>>>>...
> >>>>>>+		/* replace it in the packet */
> >>>>>>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>>>+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> >>>>>>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>>>>+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> >>>>>>+	}
> >>>>>
> >>>>>The tcp header will always be in the mbuf, right? Otherwise, you can't
> >>>>>update the cksum field here. What's the point of introducing the "else
> >>>>>clause" then?
> >>>>
> >>>>Sorry, I don't see the problem you're pointing out here.
> >>>>
> >>>>What I want to solve here is to support the cases where the mbuf is
> >>>>segmented in the middle of the network header (which is probably a rare
> >>>>case).
> >>>
> >>>How it's gonna segmented?
> >>
> >>The mbuf is given by the application. So if the application generates a
> >>segmented mbuf, it should work.
> >>
> >>This could happen for instance if the application uses mbuf clones to share
> >>the IP/TCP/data part of the mbuf and prepend a specific Ethernet/vlan for
> >>different destination.
> >>
> >>
> >>>>In the "else" part, I only access the mbuf byte by byte using the
> >>>>rte_pktmbuf_mtod_offset() accessor. An alternative would have been to copy
> >>>>the header in a linear buffer, fix the checksum, then copy it again in the
> >>>>packet, but there is no mbuf helpers to do these copies for now.
> >>>
> >>>In the "else" clause, the ip header is still in the mbuf, right?
> >>>Why do you have to access it the way like:
> >>>
> >>>	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >>>		m->l2_len) >> 4;
> >>>
> >>>Why can't you just use
> >>>
> >>>	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >>>	iph->version_ihl ....;
> >>
> >>AFAIK, there is no requirement that each network header has to be contiguous
> >>in a mbuf segment.
> >>
> >>Of course, a split in the middle of a network header probably never
> >>happens... but we never knows, as it is not forbidden. I think the code
> >>should be robust enough to avoid accesses to wrong addresses.
> >>
> >>Hope it's clear enough :)
> >
> >Thanks, but not really. Maybe let me ask this way: what wrong would
> >happen if we use
> >	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
> >to access the IP header? Is it about the endian?
> 
> If you have a packet split like this:
> 
> mbuf segment 1                     mbuf segment 2
> ----------------------------       ------------------------------
> | Ethernet header |  IP hea|       |der | TCP header | data
> ----------------------------       ------------------------------
>                    ^
>                    iph

Thanks, that's clear. How could you be able to access the tcp header
from the first mbuf then? I mean, how is the following code supposed
to work?

    prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
			m->l2_len + m->l3_len + 16);

> The IP header is not contiguous. So accessing to the end of the structure
> will access to a wrong location.
> 
> >One more question is do you have any case to trigger the "else" clause?
> 
> No, but I think it may happen.

A piece of untest code is not trusted though ...

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 15:29                 ` Yuanhan Liu
@ 2016-10-13 15:45                   ` Olivier Matz
  2016-10-13 16:01                     ` Yuanhan Liu
  0 siblings, 1 reply; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 15:45 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang



Le 13 octobre 2016 17:29:35 CEST, Yuanhan Liu <yuanhan.liu@linux.intel.com> a écrit :
>On Thu, Oct 13, 2016 at 05:15:24PM +0200, Olivier MATZ wrote:
>> 
>> 
>> On 10/13/2016 05:01 PM, Yuanhan Liu wrote:
>> >On Thu, Oct 13, 2016 at 04:52:25PM +0200, Olivier MATZ wrote:
>> >>
>> >>
>> >>On 10/13/2016 04:16 PM, Yuanhan Liu wrote:
>> >>>On Thu, Oct 13, 2016 at 04:02:49PM +0200, Olivier MATZ wrote:
>> >>>>
>> >>>>
>> >>>>On 10/13/2016 10:18 AM, Yuanhan Liu wrote:
>> >>>>>On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>> >>>>>>+/* When doing TSO, the IP length is not included in the pseudo
>header
>> >>>>>>+ * checksum of the packet given to the PMD, but for virtio it
>is
>> >>>>>>+ * expected.
>> >>>>>>+ */
>> >>>>>>+static void
>> >>>>>>+virtio_tso_fix_cksum(struct rte_mbuf *m)
>> >>>>>>+{
>> >>>>>>+	/* common case: header is not fragmented */
>> >>>>>>+	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>> >>>>>>+			m->l4_len)) {
>> >>>>>...
>> >>>>>>+		/* replace it in the packet */
>> >>>>>>+		th->cksum = new_cksum;
>> >>>>>>+	} else {
>> >>>>>...
>> >>>>>>+		/* replace it in the packet */
>> >>>>>>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> >>>>>>+			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>> >>>>>>+		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>> >>>>>>+			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>> >>>>>>+	}
>> >>>>>
>> >>>>>The tcp header will always be in the mbuf, right? Otherwise, you
>can't
>> >>>>>update the cksum field here. What's the point of introducing the
>"else
>> >>>>>clause" then?
>> >>>>
>> >>>>Sorry, I don't see the problem you're pointing out here.
>> >>>>
>> >>>>What I want to solve here is to support the cases where the mbuf
>is
>> >>>>segmented in the middle of the network header (which is probably
>a rare
>> >>>>case).
>> >>>
>> >>>How it's gonna segmented?
>> >>
>> >>The mbuf is given by the application. So if the application
>generates a
>> >>segmented mbuf, it should work.
>> >>
>> >>This could happen for instance if the application uses mbuf clones
>to share
>> >>the IP/TCP/data part of the mbuf and prepend a specific
>Ethernet/vlan for
>> >>different destination.
>> >>
>> >>
>> >>>>In the "else" part, I only access the mbuf byte by byte using the
>> >>>>rte_pktmbuf_mtod_offset() accessor. An alternative would have
>been to copy
>> >>>>the header in a linear buffer, fix the checksum, then copy it
>again in the
>> >>>>packet, but there is no mbuf helpers to do these copies for now.
>> >>>
>> >>>In the "else" clause, the ip header is still in the mbuf, right?
>> >>>Why do you have to access it the way like:
>> >>>
>> >>>	ip_version = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>> >>>		m->l2_len) >> 4;
>> >>>
>> >>>Why can't you just use
>> >>>
>> >>>	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>> >>>	iph->version_ihl ....;
>> >>
>> >>AFAIK, there is no requirement that each network header has to be
>contiguous
>> >>in a mbuf segment.
>> >>
>> >>Of course, a split in the middle of a network header probably never
>> >>happens... but we never knows, as it is not forbidden. I think the
>code
>> >>should be robust enough to avoid accesses to wrong addresses.
>> >>
>> >>Hope it's clear enough :)
>> >
>> >Thanks, but not really. Maybe let me ask this way: what wrong would
>> >happen if we use
>> >	iph = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *, m->l2_len);
>> >to access the IP header? Is it about the endian?
>> 
>> If you have a packet split like this:
>> 
>> mbuf segment 1                     mbuf segment 2
>> ----------------------------       ------------------------------
>> | Ethernet header |  IP hea|       |der | TCP header | data
>> ----------------------------       ------------------------------
>>                    ^
>>                    iph
>
>Thanks, that's clear. How could you be able to access the tcp header
>from the first mbuf then? I mean, how is the following code supposed
>to work?
>
>    prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
>			m->l2_len + m->l3_len + 16);
>

Oh I see... Sorry there was a confusion on my side with another (internal) macro that browses the segments if the offset ils not in the first one.

If you agree, let's add the code without the else part, I'll fix it for the rc2.


>> The IP header is not contiguous. So accessing to the end of the
>structure
>> will access to a wrong location.
>> 
>> >One more question is do you have any case to trigger the "else"
>clause?
>> 
>> No, but I think it may happen.
>
>A piece of untest code is not trusted though ...
>
>	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 15:45                   ` Olivier Matz
@ 2016-10-13 16:01                     ` Yuanhan Liu
  0 siblings, 0 replies; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13 16:01 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, thoams

On Thu, Oct 13, 2016 at 05:45:21PM +0200, Olivier Matz wrote:
> >> If you have a packet split like this:
> >> 
> >> mbuf segment 1                     mbuf segment 2
> >> ----------------------------       ------------------------------
> >> | Ethernet header |  IP hea|       |der | TCP header | data
> >> ----------------------------       ------------------------------
> >>                    ^
> >>                    iph
> >
> >Thanks, that's clear. How could you be able to access the tcp header
> >from the first mbuf then? I mean, how is the following code supposed
> >to work?
> >
> >    prev_cksum.u8[0] = *rte_pktmbuf_mtod_offset(m, uint8_t *,
> >			m->l2_len + m->l3_len + 16);
> >
> 
> Oh I see... Sorry there was a confusion on my side with another (internal) macro that browses the segments if the offset ils not in the first one.
> 
> If you agree, let's add the code without the else part, I'll fix it for the rc2.

Good. That's okay to me.

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support Olivier Matz
@ 2016-10-13 16:05     ` Yuanhan Liu
  2016-10-13 18:50       ` Thomas Monjalon
  0 siblings, 1 reply; 97+ messages in thread
From: Yuanhan Liu @ 2016-10-13 16:05 UTC (permalink / raw)
  To: Olivier Matz, Thomas Monjalon
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, stephen, dprovan,
	xiao.w.wang, maxime.coquelin

On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
> +/* When doing TSO, the IP length is not included in the pseudo header
> + * checksum of the packet given to the PMD, but for virtio it is
> + * expected.
> + */
> +static void
> +virtio_tso_fix_cksum(struct rte_mbuf *m)
> +{
> +	/* common case: header is not fragmented */
> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> +			m->l4_len)) {
> +		struct ipv4_hdr *iph;
> +		struct ipv6_hdr *ip6h;
> +		struct tcp_hdr *th;
> +		uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
> +		uint32_t tmp;
...
> +	} else {

As discussed just now, if you drop the else part, you could add my
ACK for the whole virtio changes, and Review-ed by for all mbuf and
other changes.

Thoams, please pick them by youself directly: since it depends on
other patches and they will be picked (or already be picked?) by you.

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support
  2016-10-13 16:05     ` Yuanhan Liu
@ 2016-10-13 18:50       ` Thomas Monjalon
  2016-10-13 19:58         ` Olivier Matz
  0 siblings, 1 reply; 97+ messages in thread
From: Thomas Monjalon @ 2016-10-13 18:50 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Yuanhan Liu, dev, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil,
	stephen, dprovan, xiao.w.wang, maxime.coquelin

2016-10-14 00:05, Yuanhan Liu:
> On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
> > +/* When doing TSO, the IP length is not included in the pseudo header
> > + * checksum of the packet given to the PMD, but for virtio it is
> > + * expected.
> > + */
> > +static void
> > +virtio_tso_fix_cksum(struct rte_mbuf *m)
> > +{
> > +	/* common case: header is not fragmented */
> > +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> > +			m->l4_len)) {
> > +		struct ipv4_hdr *iph;
> > +		struct ipv6_hdr *ip6h;
> > +		struct tcp_hdr *th;
> > +		uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
> > +		uint32_t tmp;
> ...
> > +	} else {
> 
> As discussed just now, if you drop the else part, you could add my
> ACK for the whole virtio changes, and Review-ed by for all mbuf and
> other changes.
> 
> Thoams, please pick them by youself directly: since it depends on
> other patches and they will be picked (or already be picked?) by you.

Applied
	- without TSO checksum on fragmented header
	- with some release notes changes
	- with Yuanhan acked/reviewed
Thanks

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support
  2016-10-13 18:50       ` Thomas Monjalon
@ 2016-10-13 19:58         ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-13 19:58 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Yuanhan Liu, dev, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil,
	stephen, dprovan, xiao.w.wang, maxime.coquelin



On 10/13/2016 08:50 PM, Thomas Monjalon wrote:
> 2016-10-14 00:05, Yuanhan Liu:
>> On Thu, Oct 13, 2016 at 04:16:11PM +0200, Olivier Matz wrote:
>>> +/* When doing TSO, the IP length is not included in the pseudo header
>>> + * checksum of the packet given to the PMD, but for virtio it is
>>> + * expected.
>>> + */
>>> +static void
>>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>>> +{
>>> +	/* common case: header is not fragmented */
>>> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>>> +			m->l4_len)) {
>>> +		struct ipv4_hdr *iph;
>>> +		struct ipv6_hdr *ip6h;
>>> +		struct tcp_hdr *th;
>>> +		uint16_t prev_cksum, new_cksum, ip_len, ip_paylen;
>>> +		uint32_t tmp;
>> ...
>>> +	} else {
>>
>> As discussed just now, if you drop the else part, you could add my
>> ACK for the whole virtio changes, and Review-ed by for all mbuf and
>> other changes.
>>
>> Thoams, please pick them by youself directly: since it depends on
>> other patches and they will be picked (or already be picked?) by you.
> 
> Applied
> 	- without TSO checksum on fragmented header
> 	- with some release notes changes
> 	- with Yuanhan acked/reviewed
> Thanks
> 

Thanks Thomas, and also to Xiao, Maxime and Yuanhan for the review!

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13  8:18     ` Yuanhan Liu
  2016-10-13 14:02       ` Olivier MATZ
@ 2016-10-13 23:33       ` Stephen Hemminger
  2016-10-18 14:07         ` Olivier Matz
  1 sibling, 1 reply; 97+ messages in thread
From: Stephen Hemminger @ 2016-10-13 23:33 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Olivier Matz, dev, konstantin.ananyev, sugesh.chandran,
	bruce.richardson, jianfeng.tan, helin.zhang, adrien.mazarguil,
	dprovan, xiao.w.wang

On Thu, 13 Oct 2016 16:18:39 +0800
Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
> > +/* When doing TSO, the IP length is not included in the pseudo header
> > + * checksum of the packet given to the PMD, but for virtio it is
> > + * expected.
> > + */
> > +static void
> > +virtio_tso_fix_cksum(struct rte_mbuf *m)
> > +{
> > +	/* common case: header is not fragmented */
> > +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
> > +			m->l4_len)) {  
> ...
> > +		/* replace it in the packet */
> > +		th->cksum = new_cksum;
> > +	} else {  
> ...
> > +		/* replace it in the packet */
> > +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> > +			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
> > +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
> > +			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
> > +	}  
> 
> The tcp header will always be in the mbuf, right? Otherwise, you can't
> update the cksum field here. What's the point of introducing the "else
> clause" then?
> 
> 	--yliu

You need to check the reference count before updating any data in mbuf.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support
  2016-10-13 23:33       ` Stephen Hemminger
@ 2016-10-18 14:07         ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-10-18 14:07 UTC (permalink / raw)
  To: Stephen Hemminger, Yuanhan Liu
  Cc: dev, konstantin.ananyev, sugesh.chandran, bruce.richardson,
	jianfeng.tan, helin.zhang, adrien.mazarguil, dprovan,
	xiao.w.wang

Hi Stephen,

On 10/14/2016 01:33 AM, Stephen Hemminger wrote:
> On Thu, 13 Oct 2016 16:18:39 +0800
> Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
>> On Mon, Oct 03, 2016 at 11:00:23AM +0200, Olivier Matz wrote:
>>> +/* When doing TSO, the IP length is not included in the pseudo header
>>> + * checksum of the packet given to the PMD, but for virtio it is
>>> + * expected.
>>> + */
>>> +static void
>>> +virtio_tso_fix_cksum(struct rte_mbuf *m)
>>> +{
>>> +	/* common case: header is not fragmented */
>>> +	if (likely(rte_pktmbuf_data_len(m) >= m->l2_len + m->l3_len +
>>> +			m->l4_len)) {  
>> ...
>>> +		/* replace it in the packet */
>>> +		th->cksum = new_cksum;
>>> +	} else {  
>> ...
>>> +		/* replace it in the packet */
>>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>>> +			m->l2_len + m->l3_len + 16) = new_cksum.u8[0];
>>> +		*rte_pktmbuf_mtod_offset(m, uint8_t *,
>>> +			m->l2_len + m->l3_len + 17) = new_cksum.u8[1];
>>> +	}  
>>
>> The tcp header will always be in the mbuf, right? Otherwise, you can't
>> update the cksum field here. What's the point of introducing the "else
>> clause" then?
>>
>> 	--yliu
> 
> You need to check the reference count before updating any data in mbuf.
> 

That's correct, I'll fix that.

Thanks for the comment,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback
  2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback Olivier Matz
@ 2016-11-02  1:38     ` Yao, Lei A
  2016-11-08 14:58       ` Olivier Matz
  0 siblings, 1 reply; 97+ messages in thread
From: Yao, Lei A @ 2016-11-02  1:38 UTC (permalink / raw)
  To: Olivier Matz, dev, yuanhan.liu; +Cc: dprovan

Hi, Olivier

During the validation work with v16.11-rc2, I find that this patch will cause VM crash if enable virtio bonding in VM. Could you have a check at your side? The following is steps at my side. Thanks a lot

1. bind PF port to igb_uio.
modprobe uio
insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
./tools/dpdk-devbind.py --bind=igb_uio 84:00.1

2. start vhost switch.
./examples/vhost/build/vhost-switch -c 0x1c0000 -n 4 --socket-mem 4096,4096 - -p 0x1 --mergeable 0 --vm2vm 0 --socket-file ./vhost-net

3. bootup one vm with four virtio net device
qemu-system-x86_64 \
-name vm0 -enable-kvm -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 \
-device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 \
-daemonize -monitor unix:/tmp/vm0_monitor.sock,server,nowait \
-net nic,vlan=0,macaddr=00:00:00:c7:56:64,addr=1f \
net user,vlan=0,hostfwd=tcp:10.239.129.127:6107:22 \
-chardev socket,id=char0,path=./vhost-net \
-netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
-device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01 \
-chardev socket,id=char1,path=./vhost-net \
-netdev type=vhost-user,id=netdev1,chardev=char1,vhostforce \
-device virtio-net-pci,netdev=netdev1,mac=52:54:00:00:00:02 \
-chardev socket,id=char2,path=./vhost-net \
-netdev type=vhost-user,id=netdev2,chardev=char2,vhostforce \
-device virtio-net-pci,netdev=netdev2,mac=52:54:00:00:00:03 \
-chardev socket,id=char3,path=./vhost-net \
-netdev type=vhost-user,id=netdev3,chardev=char3,vhostforce \
-device virtio-net-pci,netdev=netdev3,mac=52:54:00:00:00:04 \
-cpu host -smp 8 -m 4096 \
-object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
-numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu16.img -vnc :10

4. on vm:
bind virtio net device to igb_uio
modprobe uio
insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
tools/dpdk-devbind.py --bind=igb_uio 00:04.0 00:05.0 00:06.0 00:07.0
5. startup test_pmd app
./x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 4 - -i --txqflags=0xf00 --disable-hw-vlan-filter
6. create one bonding device (port 4)
create bonded device 0 0 (the first 0: mode, the second: the socket number)
show bonding config 4
7. bind port 0, 1, 2 to port 4
add bonding slave 0 4
add bonding slave 1 4
add bonding slave 2 4
port start 4
Result: just after port start 4(port 4 is bonded port), the vm shutdown immediately.

BRs
Lei

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
Sent: Thursday, October 13, 2016 10:16 PM
To: dev@dpdk.org; yuanhan.liu@linux.intel.com
Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Chandran, Sugesh <sugesh.chandran@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Zhang, Helin <helin.zhang@intel.com>; adrien.mazarguil@6wind.com; stephen@networkplumber.org; dprovan@bivio.net; Wang, Xiao W <xiao.w.wang@intel.com>; maxime.coquelin@redhat.com
Subject: [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback

Move the configuration of control queue in the configure callback.
This is needed by next commit, which introduces the reinitialization of the device in the configure callback to change the feature flags.
Therefore, the control queue will have to be restarted at the same place.

As virtio_dev_cq_queue_setup() is called from a place where
config->max_virtqueue_pairs is not available, we need to store this in
the private structure. It replaces max_rx_queues and max_tx_queues which have the same value. The log showing the value of max_rx_queues and max_tx_queues is also removed since config->max_virtqueue_pairs is already displayed above.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 43 +++++++++++++++++++-------------------
 drivers/net/virtio/virtio_ethdev.h |  4 ++--
 drivers/net/virtio/virtio_pci.h    |  3 +--
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 77ca569..f3921ac 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -552,6 +552,9 @@ virtio_dev_close(struct rte_eth_dev *dev)
 	if (hw->started == 1)
 		virtio_dev_stop(dev);
 
+	if (hw->cvq)
+		virtio_dev_queue_release(hw->cvq->vq);
+
 	/* reset the NIC */
 	if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); @@ -1191,16 +1194,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 			config->max_virtqueue_pairs = 1;
 		}
 
-		hw->max_rx_queues =
-			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-		hw->max_tx_queues =
-			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-		virtio_dev_cq_queue_setup(eth_dev,
-					config->max_virtqueue_pairs * 2,
-					SOCKET_ID_ANY);
+		hw->max_queue_pairs = config->max_virtqueue_pairs;
 
 		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
 				config->max_virtqueue_pairs);
@@ -1211,19 +1205,15 @@ virtio_init_device(struct rte_eth_dev *eth_dev)
 				config->mac[2], config->mac[3],
 				config->mac[4], config->mac[5]);
 	} else {
-		hw->max_rx_queues = 1;
-		hw->max_tx_queues = 1;
+		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=1");
+		hw->max_queue_pairs = 1;
 	}
 
-	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-			hw->max_rx_queues, hw->max_tx_queues);
 	if (pci_dev)
 		PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
 			eth_dev->data->port_id, pci_dev->id.vendor_id,
 			pci_dev->id.device_id);
 
-	virtio_dev_cq_start(eth_dev);
-
 	return 0;
 }
 
@@ -1285,7 +1275,6 @@ static int
 eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)  {
 	struct rte_pci_device *pci_dev;
-	struct virtio_hw *hw = eth_dev->data->dev_private;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -1301,9 +1290,6 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
 	eth_dev->tx_pkt_burst = NULL;
 	eth_dev->rx_pkt_burst = NULL;
 
-	if (hw->cvq)
-		virtio_dev_queue_release(hw->cvq->vq);
-
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
@@ -1352,6 +1338,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)  {
 	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct virtio_hw *hw = dev->data->dev_private;
+	int ret;
 
 	PMD_INIT_LOG(DEBUG, "configure");
 
@@ -1360,6 +1347,16 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 		return -EINVAL;
 	}
 
+	/* Setup and start control queue */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		ret = virtio_dev_cq_queue_setup(dev,
+			hw->max_queue_pairs * 2,
+			SOCKET_ID_ANY);
+		if (ret < 0)
+			return ret;
+		virtio_dev_cq_start(dev);
+	}
+
 	hw->vlan_strip = rxmode->hw_vlan_strip;
 
 	if (rxmode->hw_vlan_filter
@@ -1553,8 +1550,10 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		dev_info->driver_name = dev->driver->pci_drv.driver.name;
 	else
 		dev_info->driver_name = "virtio_user PMD";
-	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+	dev_info->max_rx_queues =
+		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_RX_QUEUES);
+	dev_info->max_tx_queues =
+		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_TX_QUEUES);
 	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
 	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
 	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS; diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 04d626b..dc18341 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -47,8 +47,8 @@
 #define PAGE_SIZE 4096
 #endif
 
-#define VIRTIO_MAX_RX_QUEUES 128
-#define VIRTIO_MAX_TX_QUEUES 128
+#define VIRTIO_MAX_RX_QUEUES 128U
+#define VIRTIO_MAX_TX_QUEUES 128U
 #define VIRTIO_MAX_MAC_ADDRS 64
 #define VIRTIO_MIN_RX_BUFSIZE 64
 #define VIRTIO_MAX_RX_PKTLEN  9728
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index b8295a7..6930cd6 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -246,8 +246,7 @@ struct virtio_hw {
 	struct virtnet_ctl *cvq;
 	struct rte_pci_ioport io;
 	uint64_t    guest_features;
-	uint32_t    max_tx_queues;
-	uint32_t    max_rx_queues;
+	uint32_t    max_queue_pairs;
 	uint16_t    vtnet_hdr_size;
 	uint8_t	    vlan_strip;
 	uint8_t	    use_msix;
--
2.8.1

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback
  2016-11-02  1:38     ` Yao, Lei A
@ 2016-11-08 14:58       ` Olivier Matz
  0 siblings, 0 replies; 97+ messages in thread
From: Olivier Matz @ 2016-11-08 14:58 UTC (permalink / raw)
  To: Yao, Lei A, dev, yuanhan.liu
  Cc: Ananyev, Konstantin, Chandran, Sugesh, Richardson, Bruce, Tan,
	Jianfeng, Zhang, Helin, adrien.mazarguil, stephen, dprovan, Wang,
	Xiao W, maxime.coquelin

Hi Lei,

On 11/02/2016 02:38 AM, Yao, Lei A wrote:
> Hi, Olivier
> 
> During the validation work with v16.11-rc2, I find that this patch will cause VM crash if enable virtio bonding in VM. Could you have a check at your side? The following is steps at my side. Thanks a lot
> 
> 1. bind PF port to igb_uio.
> modprobe uio
> insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> ./tools/dpdk-devbind.py --bind=igb_uio 84:00.1
> 
> 2. start vhost switch.
> ./examples/vhost/build/vhost-switch -c 0x1c0000 -n 4 --socket-mem 4096,4096 - -p 0x1 --mergeable 0 --vm2vm 0 --socket-file ./vhost-net
> 
> 3. bootup one vm with four virtio net device
> qemu-system-x86_64 \
> -name vm0 -enable-kvm -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 \
> -device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 \
> -daemonize -monitor unix:/tmp/vm0_monitor.sock,server,nowait \
> -net nic,vlan=0,macaddr=00:00:00:c7:56:64,addr=1f \
> net user,vlan=0,hostfwd=tcp:10.239.129.127:6107:22 \
> -chardev socket,id=char0,path=./vhost-net \
> -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \
> -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01 \
> -chardev socket,id=char1,path=./vhost-net \
> -netdev type=vhost-user,id=netdev1,chardev=char1,vhostforce \
> -device virtio-net-pci,netdev=netdev1,mac=52:54:00:00:00:02 \
> -chardev socket,id=char2,path=./vhost-net \
> -netdev type=vhost-user,id=netdev2,chardev=char2,vhostforce \
> -device virtio-net-pci,netdev=netdev2,mac=52:54:00:00:00:03 \
> -chardev socket,id=char3,path=./vhost-net \
> -netdev type=vhost-user,id=netdev3,chardev=char3,vhostforce \
> -device virtio-net-pci,netdev=netdev3,mac=52:54:00:00:00:04 \
> -cpu host -smp 8 -m 4096 \
> -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \
> -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu16.img -vnc :10
> 
> 4. on vm:
> bind virtio net device to igb_uio
> modprobe uio
> insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> tools/dpdk-devbind.py --bind=igb_uio 00:04.0 00:05.0 00:06.0 00:07.0
> 5. startup test_pmd app
> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 4 - -i --txqflags=0xf00 --disable-hw-vlan-filter
> 6. create one bonding device (port 4)
> create bonded device 0 0 (the first 0: mode, the second: the socket number)
> show bonding config 4
> 7. bind port 0, 1, 2 to port 4
> add bonding slave 0 4
> add bonding slave 1 4
> add bonding slave 2 4
> port start 4
> Result: just after port start 4(port 4 is bonded port), the vm shutdown immediately.

Sorry for the late answer. I reproduced the issue on rc2, and I confirm
that Yuanhan's patchset fixes it in rc3.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2016-11-08 14:58 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21  8:08 [dpdk-dev] [PATCH 00/12] net/virtio: add offload support Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 01/12] virtio: move device initialization in a function Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 02/12] virtio: setup and start cq in configure callback Olivier Matz
2016-07-21 21:15   ` Stephen Hemminger
2016-07-22  7:54     ` Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 03/12] virtio: reinitialize the device " Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 04/12] mbuf: add function to calculate a checksum Olivier Matz
2016-07-21 10:51   ` Ananyev, Konstantin
2016-07-21 16:26     ` Don Provan
2016-07-21 16:46       ` Olivier Matz
2016-07-22  8:24     ` Olivier Matz
2016-08-29 14:52       ` Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
2016-07-21 21:22   ` Stephen Hemminger
2016-07-22  8:03     ` Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 07/12] mbuf: new flag for LRO Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 08/12] app/testpmd: display lro segment size Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 09/12] virtio: add Rx checksum offload support Olivier Matz
2016-07-27  9:52   ` Wang, Xiao W
2016-07-21  8:08 ` [dpdk-dev] [PATCH 10/12] virtio: add Tx " Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 11/12] virtio: add Lro support Olivier Matz
2016-07-21  8:08 ` [dpdk-dev] [PATCH 12/12] virtio: add Tso support Olivier Matz
2016-10-03  9:00 ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Olivier Matz
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 01/12] virtio: move device initialization in a function Olivier Matz
2016-10-11 12:30     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 02/12] virtio: setup and start cq in configure callback Olivier Matz
2016-10-11 12:47     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 03/12] virtio: reinitialize the device " Olivier Matz
2016-10-11 13:13     ` Maxime Coquelin
2016-10-12 14:41     ` Yuanhan Liu
2016-10-12 16:01       ` Olivier MATZ
2016-10-13  7:54         ` Yuanhan Liu
2016-10-13 13:57           ` Olivier MATZ
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
2016-10-11 13:25     ` Maxime Coquelin
2016-10-11 13:33       ` Olivier MATZ
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
2016-10-11 13:43     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 06/12] app/testpmd: fix checksum stats in csum engine Olivier Matz
2016-10-11 13:46     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 07/12] mbuf: new flag for LRO Olivier Matz
2016-10-11 13:48     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 08/12] app/testpmd: display lro segment size Olivier Matz
2016-10-11 13:49     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support Olivier Matz
2016-10-03 12:51     ` Maxime Coquelin
2016-10-05 11:56       ` Olivier Matz
2016-10-05 13:27         ` Maxime Coquelin
2016-10-05 13:30           ` Olivier Matz
2016-10-12 13:02           ` Yuanhan Liu
2016-10-12 15:55             ` Olivier MATZ
2016-10-11 14:04     ` Maxime Coquelin
2016-10-11 14:29       ` Olivier MATZ
2016-10-11 14:36         ` Maxime Coquelin
2016-10-11 14:49           ` Olivier MATZ
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 10/12] virtio: add Tx " Olivier Matz
2016-10-07  7:25     ` Maxime Coquelin
2016-10-07 16:36       ` Olivier Matz
2016-10-13  8:38     ` Yuanhan Liu
2016-10-13 13:58       ` Olivier MATZ
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 11/12] virtio: add Lro support Olivier Matz
2016-10-11 14:21     ` Maxime Coquelin
2016-10-03  9:00   ` [dpdk-dev] [PATCH v2 12/12] virtio: add Tso support Olivier Matz
2016-10-13  8:18     ` Yuanhan Liu
2016-10-13 14:02       ` Olivier MATZ
2016-10-13 14:16         ` Yuanhan Liu
2016-10-13 14:52           ` Olivier MATZ
2016-10-13 15:01             ` Yuanhan Liu
2016-10-13 15:15               ` Olivier MATZ
2016-10-13 15:29                 ` Yuanhan Liu
2016-10-13 15:45                   ` Olivier Matz
2016-10-13 16:01                     ` Yuanhan Liu
2016-10-13 15:04             ` Yuanhan Liu
2016-10-13 23:33       ` Stephen Hemminger
2016-10-18 14:07         ` Olivier Matz
2016-10-11 11:35   ` [dpdk-dev] [PATCH v2 00/12] net/virtio: add offload support Yuanhan Liu
2016-10-11 12:14     ` Olivier MATZ
2016-10-11 15:37       ` Yuanhan Liu
2016-10-13 14:15 ` [dpdk-dev] [PATCH v3 " Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 01/12] net/virtio: move device initialization in a function Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback Olivier Matz
2016-11-02  1:38     ` Yao, Lei A
2016-11-08 14:58       ` Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 03/12] net/virtio: reinitialize the device " Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 04/12] net: add function to calculate a checksum in a mbuf Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 05/12] mbuf: add new Rx checksum mbuf flags Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 06/12] app/testpmd: adapt checksum stats in csum engine Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 07/12] mbuf: new flag for LRO Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 08/12] app/testpmd: display lro segment size Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 09/12] net/virtio: add Rx checksum offload support Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 10/12] net/virtio: add Tx " Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 11/12] net/virtio: add Lro support Olivier Matz
2016-10-13 14:16   ` [dpdk-dev] [PATCH v3 12/12] net/virtio: add Tso support Olivier Matz
2016-10-13 16:05     ` Yuanhan Liu
2016-10-13 18:50       ` Thomas Monjalon
2016-10-13 19:58         ` Olivier Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).