From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adrien.mazarguil@6wind.com>
Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53])
 by dpdk.org (Postfix) with ESMTP id 7E5742C39
 for <dev@dpdk.org>; Wed, 30 Nov 2016 08:40:12 +0100 (CET)
Received: by mail-wm0-f53.google.com with SMTP id u144so52059504wmu.1
 for <dev@dpdk.org>; Tue, 29 Nov 2016 23:40:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=tRaHLVFQ4iBEwQca0tWZAIIBXzQk4l0SqH1JESJRbwc=;
 b=VzDAD4cxuERFwU2PxwfXv1pii54h7FwMd0hcKFsflZuS6uAUOEvDaVOL+Vdo/gcGgB
 clJxuN0s7DPjv6kwvkLDjceOtJB/St9Z8XC+RWna/ERxh4IX0ODA9Ren5zX1aIl2bVvw
 fA6hPtDYfb/LMIg3UE2L2j/YGGikZjp9dDymNqxLJJkmtlyhys1OLfwSdNA9Ldlqq+9c
 fmgwufflxbNysz1dp0GbgxPwCyGnax62zzCtVlNCFvK/dMcUqNqetiVvAjGKGmUgUlFg
 eTVWUiyaboV2I3UgKwq2Tk0CFBBpiEYHX4GA01RFcla9TSCN/5M2OTpNdYJiY9jrlkT9
 VtQA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=tRaHLVFQ4iBEwQca0tWZAIIBXzQk4l0SqH1JESJRbwc=;
 b=igMYeuz73DHCoCePO488D+ADh2a7F//qujv/2LUinycmz+YLZbjmVANGtG0dvTneAO
 rMjMTO0pBa1IkZcvaEC6R62ITJ4bKxAwymXs5vKyE3uf8Agd7P+z6q289GReg3Vh2xx6
 9Fd4HF1FRWZAEwV7enF3SMsRXrUnl0l/RihRms5fVEG6kBDvMciBU5Rf5iyqIJFAOcbh
 Ibqy+nkeoyVYeODah+lYDQ2DrVdA/bE0ADUtCsQH3AIkxUX41oxjKYmD279E8rXEZeWi
 +rszkiR8NR4ZqWPT3gUWabS4O3y/gLIQ/1aoyGPtIEfWmXX1nGqyTrgJRRRChdc8Vdsc
 xTTA==
X-Gm-Message-State: AKaTC02UcYQofMpwWeB8B8mmm6adDCeH5twBySbFIGc4jneOt9vSbDbMQEzgsDs8fgnoa3OO
X-Received: by 10.28.87.84 with SMTP id l81mr29265844wmb.48.1480491612174;
 Tue, 29 Nov 2016 23:40:12 -0800 (PST)
Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net.
 [82.239.227.177])
 by smtp.gmail.com with ESMTPSA id vr9sm71523207wjc.35.2016.11.29.23.40.10
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 29 Nov 2016 23:40:11 -0800 (PST)
Date: Wed, 30 Nov 2016 08:40:03 +0100
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: dev@dpdk.org, Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>,
 Stephen Hurd <stephen.hurd@broadcom.com>,
 Jan Medala <jan@semihalf.com>, Jakub Palider <jpa@semihalf.com>,
 John Daley <johndale@cisco.com>,
 Alejandro Lucero <alejandro.lucero@netronome.com>,
 Harish Patil <harish.patil@qlogic.com>,
 Rasesh Mody <rasesh.mody@qlogic.com>,
 Jerin Jacob <jerin.jacob@caviumnetworks.com>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>, Yong Wang <yongwang@vmware.com>,
 Tomasz Kulasek <tomaszx.kulasek@intel.com>,
 konstantin.ananyev@intel.com, olivier.matz@6wind.com
Message-ID: <20161130074003.GD10340@6wind.com>
References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com>
 <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com>
 <8317180.L80Qf11uiu@xps13>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8317180.L80Qf11uiu@xps13>
Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Nov 2016 07:40:12 -0000

On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote:
> We need attention of every PMD developers on this thread.

I've been following this thread from the beginning while working on rte_flow
and wanted to see where it was headed before replying. (I know, v11 was
submitted about 1 month ago but still.)

> Reminder of what Konstantin suggested:
> "
> - if the PMD supports TX offloads AND
> - if to be able use any of these offloads the upper layer SW would have to:
>     * modify the contents of the packet OR
>     * obey HW specific restrictions
> then it is a PMD developer responsibility to provide tx_prep() that would implement
> expected modifications of the packet contents and restriction checks.
> Otherwise, tx_prep() implementation is not required and can be safely set to NULL.      
> "
> 
> I copy/paste also my previous conclusion:
> 
> Before txprep, there is only one API: the application must prepare the
> packets checksum itself (get_psd_sum in testpmd).
> With txprep, the application have 2 choices: keep doing the job itself
> or call txprep which calls a PMD-specific function.

Something is definitely needed here, and only PMDs can provide it. I think
applications should not have to clear checksum fields or initialize them to
some magic value, same goes for any other offload or hardware limitation
that needs to be worked around.

tx_prep() is one possible answer to this issue, however as mentioned in the
original patch it can be very expensive if exposed by the PMD.

Another issue I'm more concerned about is the way limitations are managed
(struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
structure contains new fields that are only relevant to a few devices, and I
fear it will keep growing with each new hardware quirk to manage, breaking
ABIs in the process.

What are applications supposed to do, check each of them regardless before
attempting to send a burst?

I understand tx_prep() automates this process, however I'm wondering why
isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
example, tx_prep() has an extra check in case of TSO that the TX burst
function does not perform. This ends up being much more expensive to
applications due to the additional loop doing redundant testing on each
mbuf.

If, say as a performance improvement, we decided to leave the validation
part to the TX burst function; what remains in tx_prep() is basically heavy
"preparation" requiring mbuf changes (i.e. erasing checksums, for now).

Following the same logic, why can't such a thing be made part of the TX
burst function as well (through a direct call to rte_phdr_cksum_fix()
whenever necessary). From an application standpoint, what are the advantages
of having to:

 if (tx_prep()) // iterate and update mbufs as needed
     tx_burst(); // iterate and send

Compared to:

 tx_burst(); // iterate, update as needed and send

Note that PMDs could still provide different TX callbacks depending on the
set of enabled offloads so performance is not unnecessarily impacted.

In my opinion the second approach is both faster to applications and more
friendly from a usability perspective, am I missing something obvious?

> The question is: does non-Intel drivers need a checksum preparation for TSO?
> Will it behave well if txprep does nothing in these drivers?
> 
> When looking at the code, most of drivers handle the TSO flags.
> But it is hard to know whether they rely on the pseudo checksum or not.
> 
> git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' drivers/net/
> 
> drivers/net/bnxt/bnxt_txr.c
> drivers/net/cxgbe/sge.c
> drivers/net/e1000/em_rxtx.c
> drivers/net/e1000/igb_rxtx.c
> drivers/net/ena/ena_ethdev.c
> drivers/net/enic/enic_rxtx.c
> drivers/net/fm10k/fm10k_rxtx.c
> drivers/net/i40e/i40e_rxtx.c
> drivers/net/ixgbe/ixgbe_rxtx.c
> drivers/net/mlx4/mlx4.c
> drivers/net/mlx5/mlx5_rxtx.c
> drivers/net/nfp/nfp_net.c
> drivers/net/qede/qede_rxtx.c
> drivers/net/thunderx/nicvf_rxtx.c
> drivers/net/virtio/virtio_rxtx.c
> drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> Please, we need a comment for each driver saying
> "it is OK, we do not need any checksum preparation for TSO"
> or
> "yes we have to implement tx_prepare or TSO will not work in this mode"

For both mlx4 and mlx5 then,
"it is OK, we do not need any checksum preparation for TSO".

Actually I do not think we'll ever need tx_prep() unless we add our own
quirks to struct rte_eth_desc_lim (and friends) which are currently quietly
handled by TX burst functions.

-- 
Adrien Mazarguil
6WIND