From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f52.google.com (mail-pg0-f52.google.com [74.125.83.52]) by dpdk.org (Postfix) with ESMTP id D627B7D05 for ; Thu, 24 Aug 2017 17:02:16 +0200 (CEST) Received: by mail-pg0-f52.google.com with SMTP id q10so5062919pge.1 for ; Thu, 24 Aug 2017 08:02:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=UwxE/bpIACYvX7+oDres8c9tZBx0l+BgyTF+OFgEOjc=; b=CAFypihbFyh/Y2qDogFfpuxEFMJ3f9ZzDiocO2+JkZJYAKnp9+32kOaTyFJtzz9rvG NyuW4b7dLRtBvgcrgatgAYHsfUrEW7nmqVrwATUr0qMWtu0AXYzwQWoopLtP1Fuc/REw zsaqAwn2GNBNZVy2cdt8N3Miy9dsJJvN6sUpHTLVYC7FJ8gpcjSAbPLH90NbQlH37CEF AnbekMpqkU1MUfEejoVbGHG0RoDWCfm/bA6C10FtjTBBi+SGKZD2kDzhQMqyDPhBnW++ kqX4Nua6YTpcAh/qNHTtfs0gNSYriBQgx2l57o/5boKGDxO/O6w+v8N0G4ERdsv7jOfu GruA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=UwxE/bpIACYvX7+oDres8c9tZBx0l+BgyTF+OFgEOjc=; b=Oe+rHBjBceZ2lO8E1/4rPs2Tc7oxy6u0Jq9IZ8CvYmI6CTrCh0GpeGuW2CVUjANDUa 9hX8xazUvSx1a9LIb7/PQ+cGfYhDOBMhdBThTCaRxLdAvgHJZd6MZJNlz8Wn19BMWKvR rsXUUO9PUHL60s3cjD/PIBxnKtWVPUtWnhc4T1P6W4KQgfXpRc0GOQRb4jcobj7Ho4I7 2bp65d2K1eBKBzbNlLargKwIJskmPq7CuhgrgCjYjV07YDIIHqSfgWVScyXMqjQuFNTR 9mAFWrjT8ja48WumWicvuNYAYoFOG3OlRUksQOEJ4YVG7fHna7GdcH86Lpyd90NAC8q+ lYgg== X-Gm-Message-State: AHYfb5istwpHSH/dg0fYguZBQu2vGCpAEogqk9DPu/XyUVNWqo3pPoDQ 4fvsNSkeIkYqxa9v+pVHLT6S1tTt6mgg X-Received: by 10.84.224.203 with SMTP id k11mr7149977pln.231.1503586935763; Thu, 24 Aug 2017 08:02:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.151.235 with HTTP; Thu, 24 Aug 2017 08:02:15 -0700 (PDT) In-Reply-To: <03135AEA779D444E90975C2703F148DC58C0D228@IRSMSX107.ger.corp.intel.com> References: <20170801142631.10652-1-sysugaozhenyu@gmail.com> <74F120C019F4A64C9B78E802F6AD4CC278DD0473@IRSMSX106.ger.corp.intel.com> <74F120C019F4A64C9B78E802F6AD4CC278DD1215@IRSMSX106.ger.corp.intel.com> <0372200F-0C00-4E55-858C-9B297BFE5D33@vmware.com> <74F120C019F4A64C9B78E802F6AD4CC278DE88A3@IRSMSX106.ger.corp.intel.com> <03135AEA779D444E90975C2703F148DC58C0D228@IRSMSX107.ger.corp.intel.com> From: Gao Zhenyu Date: Thu, 24 Aug 2017 23:02:15 +0800 Message-ID: To: "O Mahony, Billy" Cc: "Loftus, Ciara" , "dev@openvswitch.org" , "users@dpdk.org" X-Mailman-Approved-At: Thu, 24 Aug 2017 17:54:16 +0200 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX cksum in ovs-dpdk side X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Aug 2017 15:02:17 -0000 Thanks for the comments! Yes, the best way is calculating cksum if destination need cksum. But in current ovs-dpdk process, it is hard to tell whether this whole batch packets need cksum or not when delivering to destination. If we check(check PKT_TX_L4_MASK and has l4 header) packets one by one will introduce regression in some usecases. (In the previous email, Ciara give a testing on my first patch and see about 4% regression in pure forwarding packet testing ) About offlording to physical nic, I had make some testing on it and it doesn't show significant improvment but disable dpdk tx vectorization.(may not good for small packets) I prefer to implement software cksum first then count hardware offloading later. The VM I use for testing is centos7,kernel version is 3.10.0-514.16.1.el7.x86_64. Supporting cksum has a additional benefit, the vhost-net can enable NETIF_F_SG (enable scatter-gather feature). 2017-08-24 17:07 GMT+08:00 O Mahony, Billy : > Hi Gao, > > Thanks for working on this. Lack of checksum offload is big difference > between ovs and ovs-dpdk when using linux stack in the guest. > > The thing that struck me was that rather than immediately calculating the > L4 checksum in the host on vhost rx that the calculation should be delaye= d > until it's known to be absolutely required to be done on the host. If the > packet is for another VM a checksum is not required as the bits are not > going over a physical medium. And if the packets is destined for a NIC th= en > the checksum can be offloaded if the NIC supports it. > > I'm not sure why doing the L4 sum in the guest should give a performance > gain. The processing still has to be done. Maybe the guest code was > compiled for an older architecture and is not using as efficient a set of > instructions? > > In any case the best advantage of having dpdk virtio device support > offload is if it can further offload to a NIC or avoid cksum entirely if > the packet is destined for a local VM. > > Thanks, > Billy. > > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Gao Zhenyu > > Sent: Wednesday, August 23, 2017 4:12 PM > > To: Loftus, Ciara > > Cc: dev@openvswitch.org; users@dpdk.org > > Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX > > cksum in ovs-dpdk side > > > > Yes, maintaining only one implementation is resonable. > > However making ovs-dpdk to support vhost tx-cksum first is doable as > well. > > We can have it in ovs, and replace it with new DPDK API once ovs update > its > > dpdk version which contains the tx-cksum implementation. > > > > > > Thanks > > Zhenyu Gao > > > > 2017-08-23 21:59 GMT+08:00 Loftus, Ciara : > > > > > > > > > > Hi Ciara > > > > > > > > You had a general concern below; can we conclude on that before > > > > going further ? > > > > > > > > Thanks Darrell > > > > > > > > =E2=80=9C > > > > > On another note I have a general concern. I understand similar > > > functionality > > > > > is present in the DPDK vhost sample app. I wonder if it would be > > > feasible > > > > for > > > > > this to be implemented in the DPDK vhost library and leveraged > > > > > here, > > > > rather > > > > > than having two implementations in two separate code bases. > > > > > > This is something I'd like to see, although I wouldn't block on this > > > patch waiting for it. > > > Maybe we can have the initial implementation as it is (if it proves > > > beneficial), then move to a common DPDK API if/when it becomes > > available. > > > > > > I've cc'ed DPDK users list hoping for some input. To summarise: > > > From my understanding, the DPDK vhost sample application calculates T= X > > > checksum for packets received from vHost ports with invalid/0 > checksums: > > > http://dpdk.org/browse/dpdk/tree/examples/vhost/main.c#n910 > > > The patch being discussed in this thread (also here: > > > https://patchwork.ozlabs.org/patch/802070/) it seems does something > > > very similar. > > > Wondering on the feasibility of putting this functionality in a > > > rte_vhost library call such that we don't have two separate > > implementations? > > > > > > Thanks, > > > Ciara > > > > > > > > > > > > > I have some other comments inline. > > > > > > > > > > Thanks, > > > > > Ciara > > > > =E2=80=9C > > > > > > > > > > > > > > > > From: Gao Zhenyu > > > > Date: Wednesday, August 16, 2017 at 6:38 AM > > > > To: "Loftus, Ciara" > > > > Cc: "blp@ovn.org" , "Chandran, Sugesh" > > > > , "ktraynor@redhat.com" > > > > , Darrell Ball , > > > > "dev@openvswitch.org" > > > > Subject: Re: [ovs-dev] [PATCH v1] netdev-dpdk: Implement TCP/UDP TX > > > > cksum in ovs-dpdk side > > > > > > > > Hi Loftus, > > > > I had submitted a new version, please see > > > > https://patchwork.ozlabs.org/patch/802070/ > > > > It move the cksum to vhost receive side. > > > > Thanks > > > > Zhenyu Gao > > > > > > > > 2017-08-10 12:35 GMT+08:00 Gao Zhenyu : > > > > I see, for flows in phy-phy setup, they should not be calculate > cksum. > > > > I will revise my patch to do the cksum for vhost port only. I will > > > > send > > > a new > > > > patch next week. > > > > > > > > Thanks > > > > Zhenyu Gao > > > > > > > > 2017-08-08 17:53 GMT+08:00 Loftus, Ciara : > > > > > > > > > > Hi Loftus, > > > > > > > > > > Thanks for testing and the comments! > > > > > Can you show more details about your phy-vm-phy,phy-phy setup and > > > > > testing steps? Then I can reproduce it to see if I can solve this > > > > > pps > > > problem. > > > > > > > > You're welcome. I forgot to mention my tests were with 64B packets. > > > > > > > > For phy-phy the setup is a single host with 2 dpdk physical ports > > > > and 1 > > > flow > > > > rule port1 -> port2. > > > > See figure 3 here: > > > > https://tools.ietf.org/html/draft-ietf-bmwg-vswitch- > > > > opnfv-04#section-4 > > > > > > > > For the phy-vm-phy the setup is a single host with 2 dpdk physical > > > > ports > > > and 2 > > > > vhostuser ports with flow rules: > > > > Dpdk1 -> vhost 1 & vhost2 -> dpdk2 > > > > IP rules are set up in the VM to route packets from vhost1 to vhost > 2. > > > > See figure 4 in the link above. > > > > > > > > > > > > > > BTW, how about throughput, did you saw improvment? > > > > > > > > By throughput if you mean 0% packet loss, I did not test this. > > > > > > > > Thanks, > > > > Ciara > > > > > > > > > > > > > > I would like to implement vhost->vhost part. > > > > > > > > > > Thanks > > > > > Zhenyu Gao > > > > > > > > > > 2017-08-04 22:52 GMT+08:00 Loftus, Ciara = : > > > > > > > > > > > > Currently, the dpdk-vhost side in ovs doesn't support tcp/udp t= x > > > cksum. > > > > > > So L4 packets's cksum were calculated in VM side but performanc= e > > > > > > is > > > not > > > > > > good. > > > > > > Implementing tcp/udp tx cksum in ovs-dpdk side improves > > > > > > throughput > > > > and > > > > > > makes virtio-net frontend-driver support NETIF_F_SG as well > > > > > > > > > > > > Signed-off-by: Zhenyu Gao > > > > > > --- > > > > > > > > > > > > Here is some performance number: > > > > > > > > > > > > Setup: > > > > > > > > > > > > qperf client > > > > > > +---------+ > > > > > > | VM | > > > > > > +---------+ > > > > > > | > > > > > > | qperf server > > > > > > +--------------+ +------------+ > > > > > > | vswitch+dpdk | | bare-metal | > > > > > > +--------------+ +------------+ > > > > > > | | > > > > > > | | > > > > > > pNic---------PhysicalSwitch---- > > > > > > > > > > > > do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -= K > > > eth0 tx > > > > > on' > > > > > > in VM side. > > > > > > It offload cksum job to ovs-dpdk side. > > > > > > > > > > > > do cksum in VM: Applied this patch and execute 'ethtool -K eth0 > > > > > > tx > > > off' in > > > > > VM > > > > > > side. > > > > > > VM calculate cksum for tcp/udp packets. > > > > > > > > > > > > We can see huge improvment in TCP throughput if we leverage > > > > > > ovs-dpdk cksum. > > > > > Hi Zhenyu, > > > > > > > > > > Thanks for the patch. I tested some alternative use cases and > > > > unfortunately I > > > > > see a degradation for phy-phy and phy-vm-phy topologies. > > > > > Here are my results: > > > > > > > > > > phy-vm-phy: > > > > > without patch: 0.871Mpps > > > > > with patch (offload=3Don): 0.877Mpps with patch (offload=3Doff): > > > > > 0.891Mpps > > > > > > > > > > phy-phy: > > > > > without patch: 13.581Mpps > > > > > with patch: 13.055Mpps > > > > > > > > > > The half a million pps drop for the second test case is concernin= g > > > > > to > > > me but > > > > > not surprising since we're adding extra complexity to > > > netdev_dpdk_send() > > > > > Could this be avoided? Would it make sense to put this > > > > > functionality somewhere else eg. vhost receive? > > > > > > > > > > On another note I have a general concern. I understand similar > > > functionality > > > > > is present in the DPDK vhost sample app. I wonder if it would be > > > feasible > > > > for > > > > > this to be implemented in the DPDK vhost library and leveraged > > > > > here, > > > > rather > > > > > than having two implementations in two separate code bases. > > > > > > > > > > I have some other comments inline. > > > > > > > > > > Thanks, > > > > > Ciara > > > > > > > > > > > > > > > > > [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2 > > > > > > host-qperf- > > > > server01 > > > > > > tcp_bw tcp_lat udp_bw udp_lat > > > > > > do cksum in ovs-dpdk do cksum in VM > without > > > this patch > > > > > > tcp_bw: > > > > > > bw =3D 2.05 MB/sec bw =3D 1.92 MB/sec bw = =3D > 1.95 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 3.9 MB/sec bw =3D 3.99 MB/sec bw = =3D > 3.98 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 8.09 MB/sec bw =3D 7.82 MB/sec bw = =3D > 8.19 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 14.9 MB/sec bw =3D 14.8 MB/sec bw = =3D > 15.7 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 27.7 MB/sec bw =3D 28 MB/sec bw = =3D > 29.7 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 51.2 MB/sec bw =3D 50.9 MB/sec bw = =3D > 54.9 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 86.7 MB/sec bw =3D 86.8 MB/sec bw = =3D > 95.1 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 149 MB/sec bw =3D 160 MB/sec bw = =3D > 149 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 211 MB/sec bw =3D 205 MB/sec bw = =3D > 216 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 271 MB/sec bw =3D 254 MB/sec bw = =3D > 275 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 326 MB/sec bw =3D 303 MB/sec bw = =3D > 321 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 407 MB/sec bw =3D 359 MB/sec bw = =3D > 361 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 816 MB/sec bw =3D 512 MB/sec bw = =3D > 419 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 840 MB/sec bw =3D 756 MB/sec bw = =3D > 457 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 1.07 GB/sec bw =3D 880 MB/sec bw = =3D > 480 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 1.17 GB/sec bw =3D 1.01 GB/sec bw = =3D > 488 > > > MB/sec > > > > > > tcp_bw: > > > > > > bw =3D 1.17 GB/sec bw =3D 1.11 GB/sec bw = =3D > 483 > > > MB/sec > > > > > > tcp_lat: > > > > > > latency =3D 29 us latency =3D 29.2 us lat= ency > =3D > > > 29.6 us > > > > > > tcp_lat: > > > > > > latency =3D 28.9 us latency =3D 29.3 us lat= ency > =3D > > > 29.5 us > > > > > > tcp_lat: > > > > > > latency =3D 29 us latency =3D 29.3 us lat= ency > =3D > > > 29.6 us > > > > > > tcp_lat: > > > > > > latency =3D 29 us latency =3D 29.4 us lat= ency > =3D > > > 29.5 us > > > > > > tcp_lat: > > > > > > latency =3D 29 us latency =3D 29.2 us lat= ency > =3D > > > 29.6 us > > > > > > tcp_lat: > > > > > > latency =3D 29.1 us latency =3D 29.3 us lat= ency > =3D > > > 29.7 us > > > > > > tcp_lat: > > > > > > latency =3D 29.4 us latency =3D 29.6 us lat= ency > =3D > > > 30 us > > > > > > tcp_lat: > > > > > > latency =3D 29.8 us latency =3D 30.1 us lat= ency > =3D > > > 30.2 us > > > > > > tcp_lat: > > > > > > latency =3D 30.9 us latency =3D 30.9 us lat= ency > =3D > > > 31 us > > > > > > tcp_lat: > > > > > > latency =3D 46.9 us latency =3D 46.2 us lat= ency > =3D > > > 32.2 us > > > > > > tcp_lat: > > > > > > latency =3D 51.5 us latency =3D 52.6 us lat= ency > =3D > > > 34.5 us > > > > > > tcp_lat: > > > > > > latency =3D 43.9 us latency =3D 43.8 us lat= ency > =3D > > > 43.6 us > > > > > > tcp_lat: > > > > > > latency =3D 47.6 us latency =3D 48 us lat= ency > =3D > > > 48.1 us > > > > > > tcp_lat: > > > > > > latency =3D 77.7 us latency =3D 78.8 us lat= ency > =3D > > > 78.8 us > > > > > > tcp_lat: > > > > > > latency =3D 82.8 us latency =3D 82.3 us lat= ency > =3D > > > 116 us > > > > > > tcp_lat: > > > > > > latency =3D 94.8 us latency =3D 94.2 us lat= ency > =3D > > > 134 us > > > > > > tcp_lat: > > > > > > latency =3D 167 us latency =3D 197 us lat= ency > =3D > > > 172 us > > > > > > udp_bw: > > > > > > send_bw =3D 418 KB/sec send_bw =3D 413 KB/sec sen= d_bw > =3D > > > 403 > > > > > KB/sec > > > > > > recv_bw =3D 410 KB/sec recv_bw =3D 412 KB/sec rec= v_bw > =3D > > > 400 > > > > KB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 831 KB/sec send_bw =3D 825 KB/sec sen= d_bw > =3D > > > 810 > > > > > KB/sec > > > > > > recv_bw =3D 828 KB/sec recv_bw =3D 816 KB/sec rec= v_bw > =3D > > > 807 > > > > KB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 1.67 MB/sec send_bw =3D 1.65 MB/sec sen= d_bw > =3D > > > 1.63 > > > > > > MB/sec > > > > > > recv_bw =3D 1.64 MB/sec recv_bw =3D 1.62 MB/sec rec= v_bw > =3D > > > 1.63 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 3.36 MB/sec send_bw =3D 3.29 MB/sec sen= d_bw > =3D > > > 3.26 > > > > > > MB/sec > > > > > > recv_bw =3D 3.29 MB/sec recv_bw =3D 3.25 MB/sec rec= v_bw > =3D > > > 2.82 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 6.72 MB/sec send_bw =3D 6.61 MB/sec sen= d_bw > =3D > > > 6.45 > > > > > > MB/sec > > > > > > recv_bw =3D 6.54 MB/sec recv_bw =3D 6.59 MB/sec rec= v_bw > =3D > > > 6.45 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 13.4 MB/sec send_bw =3D 13.2 MB/sec sen= d_bw > =3D > > > 13 > > > > > > MB/sec > > > > > > recv_bw =3D 13.1 MB/sec recv_bw =3D 13.1 MB/sec rec= v_bw > =3D > > > 13 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 26.8 MB/sec send_bw =3D 26.4 MB/sec sen= d_bw > =3D > > > 25.9 > > > > > > MB/sec > > > > > > recv_bw =3D 26.4 MB/sec recv_bw =3D 26.2 MB/sec rec= v_bw > =3D > > > 25.7 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 53.4 MB/sec send_bw =3D 52.5 MB/sec sen= d_bw > =3D > > > 52 > > > > > > MB/sec > > > > > > recv_bw =3D 48.4 MB/sec recv_bw =3D 51.8 MB/sec rec= v_bw > =3D > > > 51.2 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 106 MB/sec send_bw =3D 104 MB/sec sen= d_bw > =3D > > > 103 > > > > > > MB/sec > > > > > > recv_bw =3D 98.9 MB/sec recv_bw =3D 93.2 MB/sec rec= v_bw > =3D > > > 100 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 213 MB/sec send_bw =3D 206 MB/sec sen= d_bw > =3D > > > 205 > > > > > > MB/sec > > > > > > recv_bw =3D 197 MB/sec recv_bw =3D 196 MB/sec rec= v_bw > =3D > > > 202 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 417 MB/sec send_bw =3D 405 MB/sec sen= d_bw > =3D > > > 401 > > > > > > MB/sec > > > > > > recv_bw =3D 400 MB/sec recv_bw =3D 333 MB/sec rec= v_bw > =3D > > > 358 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 556 MB/sec send_bw =3D 552 MB/sec sen= d_bw > =3D > > > 557 > > > > > > MB/sec > > > > > > recv_bw =3D 361 MB/sec recv_bw =3D 365 MB/sec rec= v_bw > =3D > > > 362 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 865 MB/sec send_bw =3D 866 MB/sec sen= d_bw > =3D > > > 863 > > > > > > MB/sec > > > > > > recv_bw =3D 564 MB/sec recv_bw =3D 573 MB/sec rec= v_bw > =3D > > > 584 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 1.05 GB/sec send_bw =3D 1.09 GB/sec sen= d_bw > =3D > > > 1.08 > > > > > > GB/sec > > > > > > recv_bw =3D 789 MB/sec recv_bw =3D 732 MB/sec rec= v_bw > =3D > > > 793 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 1.18 GB/sec send_bw =3D 1.23 GB/sec sen= d_bw > =3D > > > 1.19 > > > > > > GB/sec > > > > > > recv_bw =3D 658 MB/sec recv_bw =3D 788 MB/sec rec= v_bw > =3D > > > 673 > > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 1.3 GB/sec send_bw =3D 1.3 GB/sec sen= d_bw > =3D > > > 1.3 > > > > > GB/sec > > > > > > recv_bw =3D 659 MB/sec recv_bw =3D 763 MB/sec rec= v_bw > =3D > > > 762 > > > > > MB/sec > > > > > > udp_bw: > > > > > > send_bw =3D 0 bytes/sec send_bw =3D 0 bytes/sec sen= d_bw > =3D 0 > > > > > > bytes/sec > > > > > > recv_bw =3D 0 bytes/sec recv_bw =3D 0 bytes/sec rec= v_bw > =3D 0 > > > > bytes/sec > > > > > > udp_lat: > > > > > > latency =3D 26.7 us latency =3D 26.5 us lat= ency > =3D > > > 26.4 us > > > > > > udp_lat: > > > > > > latency =3D 26.7 us latency =3D 26.5 us lat= ency > =3D > > > 26.3 us > > > > > > udp_lat: > > > > > > latency =3D 26.7 us latency =3D 26.7 us lat= ency > =3D > > > 26.3 us > > > > > > udp_lat: > > > > > > latency =3D 26.7 us latency =3D 26.6 us lat= ency > =3D > > > 26.3 us > > > > > > udp_lat: > > > > > > latency =3D 26.7 us latency =3D 26.7 us lat= ency > =3D > > > 26.7 us > > > > > > udp_lat: > > > > > > latency =3D 27 us latency =3D 26.7 us lat= ency > =3D > > > 26.6 us > > > > > > udp_lat: > > > > > > latency =3D 27 us latency =3D 26.9 us lat= ency > =3D > > > 26.7 us > > > > > > udp_lat: > > > > > > latency =3D 27.6 us latency =3D 27.4 us lat= ency > =3D > > > 27.3 us > > > > > > udp_lat: > > > > > > latency =3D 28.1 us latency =3D 28 us lat= ency > =3D > > > 28 us > > > > > > udp_lat: > > > > > > latency =3D 29.4 us latency =3D 29.2 us lat= ency > =3D > > > 29.2 us > > > > > > udp_lat: > > > > > > latency =3D 31 us latency =3D 31 us lat= ency > =3D > > > 30.8 us > > > > > > udp_lat: > > > > > > latency =3D 41.4 us latency =3D 41.4 us lat= ency > =3D > > > 41.3 us > > > > > > udp_lat: > > > > > > latency =3D 41.6 us latency =3D 41.5 us lat= ency > =3D > > > 41.5 us > > > > > > udp_lat: > > > > > > latency =3D 64.9 us latency =3D 65 us lat= ency > =3D > > > 65 us > > > > > > udp_lat: > > > > > > latency =3D 72.3 us latency =3D 72 us lat= ency > =3D > > > 72 us > > > > > > udp_lat: > > > > > > latency =3D 121 us latency =3D 122 us lat= ency > =3D > > > 122 us > > > > > > udp_lat: > > > > > > latency =3D 0 ns latency =3D 0 ns lat= ency > =3D > > > 0 ns > > > > > > > > > > > > > > > > > > lib/netdev-dpdk.c | 84 > > > > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > > > > > 1 file changed, 82 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > > > > > ea17b97..d27d615 100644 > > > > > > --- a/lib/netdev-dpdk.c > > > > > > +++ b/lib/netdev-dpdk.c > > > > > > @@ -28,6 +28,7 @@ > > > > > > #include > > > > > > #include > > > > > > #include > > > > > > +#include > > > > > > #include > > > > > > #include > > > > > > #include > > > > > > @@ -1392,6 +1393,84 @@ netdev_dpdk_rxq_dealloc(struct > > netdev_rxq > > > > > > *rxq) > > > > > > rte_free(rx); > > > > > > } > > > > > > > > > > > > +static inline void > > > > > > +netdev_refill_l4_cksum(const char *data, struct dp_packet *pkt= , > > > > > > + uint8_t l4_proto, bool is_ipv4) { > > > > > > + void *l3hdr =3D (void *)(data + pkt->l3_ofs); > > > > > > + > > > > > > + if (l4_proto =3D=3D IPPROTO_TCP) { > > > > > > + struct tcp_header *tcp_hdr =3D (struct tcp_header *)(d= ata > > > > > > + + > > > pkt- > > > > > >l4_ofs); > > > > > > + > > > > > > + pkt->mbuf.l2_len =3D pkt->l3_ofs; > > > > > > + pkt->mbuf.l3_len =3D pkt->l4_ofs - pkt->l3_ofs; > > > > > > + tcp_hdr->tcp_csum =3D 0; > > > > > > + if (is_ipv4) { > > > > > > + tcp_hdr->tcp_csum =3D rte_ipv4_udptcp_cksum(l3hdr, > > > tcp_hdr); > > > > > > + pkt->mbuf.ol_flags ^=3D PKT_TX_TCP_CKSUM | > PKT_TX_IPV4; > > > > > > + } else { > > > > > > + pkt->mbuf.ol_flags ^=3D PKT_TX_TCP_CKSUM | > PKT_TX_IPV6; > > > > > > + tcp_hdr->tcp_csum =3D rte_ipv6_udptcp_cksum(l3hdr, > > > tcp_hdr); > > > > > > + } > > > > > > + } else if (l4_proto =3D=3D IPPROTO_UDP) { > > > > > > + struct udp_header *udp_hdr =3D (struct udp_header *)(d= ata > > > > > > + + > > > pkt- > > > > > > >l4_ofs); > > > > > > + /* do not recalculate udp cksum if it was 0 */ > > > > > > + if (udp_hdr->udp_csum !=3D 0) { > > > > > > + pkt->mbuf.l2_len =3D pkt->l3_ofs; > > > > > > + pkt->mbuf.l3_len =3D pkt->l4_ofs - pkt->l3_ofs; > > > > > > + udp_hdr->udp_csum =3D 0; > > > > > > + if (is_ipv4) { > > > > > > + /*do not calculate udp cksum if it was a > > > > > > + fragment > > > IP*/ > > > > > > + if (IP_IS_FRAGMENT(((struct ipv4_hdr *)l3hdr)-= > > > > > > > + fragment_offset)) { > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + pkt->mbuf.ol_flags ^=3D PKT_TX_UDP_CKSUM | > > > PKT_TX_IPV4; > > > > > > + udp_hdr->udp_csum =3D > > > > > > + rte_ipv4_udptcp_cksum(l3hdr, > > > > udp_hdr); > > > > > > + } else { > > > > > > + pkt->mbuf.ol_flags ^=3D PKT_TX_UDP_CKSUM | > > > PKT_TX_IPV6; > > > > > > + udp_hdr->udp_csum =3D > > > > > > + rte_ipv6_udptcp_cksum(l3hdr, > > > > udp_hdr); > > > > > > + } > > > > > > + } > > > > > > + } > > > > > > +} > > > > > > + > > > > > > +static inline void > > > > > > +netdev_prepare_tx_csum(struct dp_packet **pkts, int pkt_cnt) { > > > > > > + int i; > > > > > > + > > > > > > + for (i =3D 0; i < pkt_cnt; i++) { > > > > > > + ovs_be16 dl_type; > > > > > > + struct dp_packet *pkt =3D (struct dp_packet *)pkts[i]; > > > > > > + const char *data =3D dp_packet_data(pkt); > > > > > > + void *l3hdr =3D (char *)(data + pkt->l3_ofs); > > > > > > + > > > > > > + if (pkt->l4_ofs =3D=3D UINT16_MAX || pkt->l3_ofs =3D= =3D > > > > > > + UINT16_MAX) > > > { > > > > > > + continue; > > > > > > + } > > > > > > + /* This take a assumption that it should be a vhost > > > > > > + packet > > > if this > > > > > > + * packet was allocated by DPDK pool and try sending t= o > > > pNic. */ > > > > > > + if (pkt->source =3D=3D DPBUF_DPDK && > > > > > > + !(pkt->mbuf.ol_flags & PKT_TX_L4_MASK)) { > > > > > > + // DPDK vhost-user tags PKT_TX_L4_MASK if a L4 > > > > > > + packet > > > need > > > > > cksum > > > > > > + continue; > > > > > > + } > > > > > The comments here could be formatted better. Suggest combining > > > > > both > > > > into > > > > > one comment before the 'if'. > > > > > Not sure the term 'pNIC' is widely used. Suggest using 'dpdk port= '. > > > > > > > > > > > + > > > > > > + dl_type =3D *(ovs_be16 *)(data + pkt->l3_ofs - 2); > > > > > > + if (dl_type =3D=3D htons(ETH_TYPE_IP)) { > > > > > > + netdev_refill_l4_cksum(data, pkt, > > > > > > + ((struct ipv4_hdr > > > *)l3hdr)->next_proto_id, > > > > > > + true); > > > > > > + } else if (dl_type =3D=3D htons(ETH_TYPE_IPV6)) { > > > > > > + netdev_refill_l4_cksum(data, pkt, > > > > > > + ((struct ipv6_hdr > > > *)l3hdr)->proto, > > > > > > + false); > > > > > > + } > > > > > > + } > > > > > > +} > > > > > > + > > > > > > /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. > > > > > > Takes > > > ownership of > > > > > > * 'pkts', even in case of failure. > > > > > > * > > > > > > @@ -1833,6 +1912,8 @@ netdev_dpdk_send__(struct netdev_dpdk > > > > *dev, > > > > > > int qid, > > > > > > return; > > > > > > } > > > > > > > > > > > > + netdev_prepare_tx_csum(batch->packets, batch->count); > > > > > > > > > > Putting this here assumes we only prepare the csum for vhost -> > > > > > dpdk or vhost -> ring cases. What about vhost -> vhost? > > > > > > > > > > > + > > > > > > if (OVS_UNLIKELY(concurrent_txq)) { > > > > > > qid =3D qid % dev->up.n_txq; > > > > > > rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > > > > > > @@ -2741,8 +2822,7 @@ netdev_dpdk_vhost_class_init(void) > > > > > > if (ovsthread_once_start(&once)) { > > > > > > rte_vhost_driver_callback_register(&virtio_net_device_ > ops); > > > > > > rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TS= O4 > > > > > > - | 1ULL << > VIRTIO_NET_F_HOST_TSO6 > > > > > > - | 1ULL << VIRTIO_NET_F_CSUM)= ; > > > > > > + | 1ULL << > > > > > > + VIRTIO_NET_F_HOST_TSO6); > > > > > > ovs_thread_create("vhost_thread", start_vhost_loop, > > > > > > NULL); > > > > > > > > > > > > ovsthread_once_done(&once); > > > > > > -- > > > > > > 1.8.3.1 > > > > > > > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > dev@openvswitch.org > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >