DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Cc: Thomas Monjalon <thomas.monjalon@6wind.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
Date: Thu, 28 Jul 2016 13:01:16 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB97725836B8AB57@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <20160728113853.GA14755@localhost.localdomain>



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, July 28, 2016 12:39 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> 
> On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
> > > If it does not cope up then it can skip tx'ing in the actual tx
> > > burst itself and move the "skipped" tx packets to end of the list in
> > > the tx burst so that application can take the action on "skipped"
> > > packet after the tx burst
> >
> > Sorry, that's too cryptic for me.
> > Can you reword it somehow?
> 
> OK.
> 1) lets say application requests 32 packets to send it using tx_burst.
> 2) packets are from p0 to p31
> 3) in driver due to some reason, it is not able to send the packets due to some constraints in the driver(say expect p2 and p16 everything
> else sent successfully by the driver)
> 4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and return 30
> 5) application can take action on p2 and p16 based the return value of 30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]

That would introduce packets reordering and unnecessary complicate the PMD TX functions.
Again it would require changes in *all* existing PMD tx functions.
So we don't plan to do things that way.

> 
> 
> >
> > >
> > >
> > > > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > > > Please read the original Tomasz's patch, I think he explained
> > > > possible use-cases with lot of details.
> > >
> > > Sorry, it is not very clear in terms of use cases.
> >
> > Ok, what I meant to say:
> > Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
> > - setup ipv4 header cksum field.
> > - calculate the pseudo header checksum
> > - setup tcp/udp cksum field.
> >
> > Rules how these calculations need to be done and which fields need to
> > be updated, may vary depending on HW underneath and requested offloads.
> > tx_prep() - supposed to hide all these nuances from user and allow him
> > to use TX HW offloads in a transparent way.
> 
> Not sure I understand it completely. Bit contradicting with below statement
> |We would document what tx_prep() supposed to do, and in what cases user
> |don't need it.

How that contradicts?
Right now to make HW TX offloads to work user is required to do particular actions:
1. set mbuf.ol_flags properly.
2. setup mbuf.tx_offload fields properly.
3. update L3/L4 header fields in a particular way.

We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
Though if he still prefers to do #3  by himself - that's ok too.
 
> 
> How about introducing a new ethdev generic eal command-line mode OR new ethdev_configure hint that PMD driver is in "tx_prep-
> >tx_burst" mode instead of just tx_burst? That way no fast-path performance degradation for the PMD that does not need it

User might want to send different packets over different devices,
or even over different queues over the same device.
Or even he might want to call tx_prep() for one group of packets,
but skip for different group of packets for the same TX queue. 
So I think we should allow user to decide when/where to call it.

> 
> 
> >
> > Another main purpose of tx_prep(): for multi-segment packets is to
> > check that number of segments doesn't exceed  HW limit.
> > Again right now users have to do that on their own.
> >
> > >
> > > In HW perspective, It it tries to avoid the illegal state. But not
> > > sure calling "back to back" tx prepare and then tx burst how does it
> > > improve the situation as the check illegal state check introduce in
> > > actual tx burst it self.
> > >
> > > In SW perspective, its try to avoid sending malformed packets. In my
> > > view the same can achieved with existing tx burst it self as PMD is
> > > the one finally send the packets on the wire.
> >
> > Ok, so your question is: why not to put that functionality into
> > tx_burst() itself, right?
> > For few reasons:
> > 1. putting that functionality into tx_burst() would introduce unnecessary
> >     slowdown for cases when that functionality is not needed
> >     (one segment per packet, no HW offloads).
> 
> These parameters can be configured on init time

No always, see above.

> 
> > 2. User might don't want to use tx_prep() - he/she might have its
> >     own analog, which he/she belives is faster/smarter,etc.
> 
> That's the current mode. Right?

Yes.

> > 3.  Having it a s separate function would allow user control when/where
> >       to call it, let say only for some packets, or probably call tx_prep()
> >       on one core, and do actual tx_burst() for these packets on the other.


> Why to process it under tx_prep() as application can always process the packet in one core

Because not every application wants to do it over the same core.
Some apps would like to do it on the same core, some apps would like to do it on different core.
With proposed API both models are possible.

> 
> > >
> > > proposal quote:
> > >
> > > 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
> > >    packet burst to be safely transmitted on device for desired HW
> > >    offloads (set/reset checksum field according to the hardware
> > >    requirements) and check HW constraints (number of segments per
> > >    packet, etc).
> > >
> > >    While the limitations and requirements may differ for devices, it
> > >    requires to extend rte_eth_dev structure with new function pointer
> > >    "tx_pkt_prep" which can be implemented in the driver to prepare and
> > >    verify packets, in devices specific way, before burst, what should to
> > >    prevent application to send malformed packets.
> > >
> > >
> > > >
> > > > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> > > >
> > > > If you refer as lost cycles here something like:
> > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
> > > > yes.
> > > > Though comparing to actual work need to be done for most HW TX
> > > > offloads, I think it is neglectable.
> > >
> > > Not sure.
> > >
> > > > Again, as I said before, it is totally voluntary for the application.
> > >
> > > Not according to proposal. It can't be too as application has no
> > > idea what PMD driver does with "prep" what is the implication on a
> > > HW if application does not
> >
> > Why application writer wouldn't have an idea?
> > We would document what tx_prep() supposed to do, and in what cases user don't need it.
> 
> But how he/she detect that on that run-time ?

By the application logic for example.
If let say is doing the l2fwd for that group of packets, it would know
that it doesn't need to do tx_prep().

To be honest, I don't understand what is your concern here.
That proposed change doesn't break any existing functionality,
doesn't introduce any new requirements to the existing API, 
and wouldn't introduce any performance regression for existing apps.
It is a an extension, and user is free not to use it, if it doesn't fit his needs.
>From other side there are users who are interested in that functionality,
and they do have use-cases for  it.
So what worries you?
Konstantin

> 
> > Then it would be up to the user:
> > - not to use it at all (one segment per packet, no HW TX offloads)
> 
> We already have TX flags for that
> 
> > - not to use tx_prep(), and make necessary preparations himself,
> >   that what people have to do now.
> > - use tx_prep()
> >
> > Konstantin
> >

  parent reply	other threads:[~2016-07-28 13:01 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-20 14:24 [dpdk-dev] [PATCH] " Tomasz Kulasek
2016-07-20 15:01 ` Thomas Monjalon
2016-07-20 15:13   ` Ananyev, Konstantin
2016-07-20 15:22     ` Thomas Monjalon
2016-07-20 15:42       ` Kulasek, TomaszX
2016-07-21 15:24 ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
2016-07-21 22:48   ` Ananyev, Konstantin
2016-07-27  8:59     ` Thomas Monjalon
2016-07-27 17:10       ` Jerin Jacob
2016-07-27 17:33         ` Ananyev, Konstantin
2016-07-27 17:41           ` Jerin Jacob
2016-07-27 20:51             ` Ananyev, Konstantin
2016-07-28  2:13               ` Jerin Jacob
2016-07-28 10:36                 ` Ananyev, Konstantin
2016-07-28 11:38                   ` Jerin Jacob
2016-07-28 12:07                     ` Avi Kivity
2016-07-28 13:01                     ` Ananyev, Konstantin [this message]
2016-07-28 13:58                       ` Olivier MATZ
2016-07-28 14:21                         ` Ananyev, Konstantin
2016-07-28 13:59                       ` Jerin Jacob
2016-07-28 14:52                         ` Thomas Monjalon
2016-07-28 16:25                           ` Jerin Jacob
2016-07-28 17:07                             ` Thomas Monjalon
2016-07-31  7:50     ` Vlad Zolotarov
2016-07-28 12:04   ` Avi Kivity
2016-07-31  7:46 ` [dpdk-dev] [PATCH] " Vlad Zolotarov
2016-07-31  8:10   ` Vlad Zolotarov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB97725836B8AB57@irsmsx105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=thomas.monjalon@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).