DPDK patches and discussions
 help / color / mirror / Atom feed
From: Vlad Zolotarov <vladz@cloudius-systems.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Avi Kivity <avi@cloudius-systems.com>,
	Thomas Monjalon <thomas.monjalon@6wind.com>,
	"didier.pallard" <didier.pallard@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598
Date: Sun, 13 Sep 2015 15:24:59 +0300	[thread overview]
Message-ID: <55F56B1B.80606@cloudius-systems.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836A85E36@irsmsx105.ger.corp.intel.com>



On 09/13/15 14:47, Ananyev, Konstantin wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity
>> Sent: Friday, September 11, 2015 6:48 PM
>> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598
>>
>> On 09/11/2015 07:08 PM, Thomas Monjalon wrote:
>>> 2015-09-11 18:43, Avi Kivity:
>>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
>>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com
>>>>> <mailto:thomas.monjalon@6wind.com>> wrote:
>>>>>> 2015-09-11 17:47, Avi Kivity:
>>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote:
>>>>>>>> Hi vlad,
>>>>>>>>
>>>>>>>> Documentation states that a packet (or multiple packets in transmit
>>>>>>>> segmentation) can span any number of
>>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH
>>>>>>>> minus 2.
>>>>>>>>
>>>>>>>> Shouldn't there be a test in transmit function that drops
>>>>> properly the
>>>>>>>> mbufs with a too large number of
>>>>>>>> segments, while incrementing a statistic; otherwise transmit
>>>>> function
>>>>>>>> may be locked by the faulty packet without
>>>>>>>> notification.
>>>>>>>>
>>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose
>>>>> to the
>>>>>>> application, an mbuf check function.  This way applications that can
>>>>>>> generate complex packets can verify that the device will be able to
>>>>>>> process them, and applications that only generate simple mbufs can
>>>>> avoid
>>>>>>> the overhead by not calling the function.
>>>>>> More than a check, it should be exposed as a capability of the port.
>>>>>> Anyway, if the application sends too much segments, the driver must
>>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to
>>>>>> allow easy debugging.
>>>>> I agree with Thomas - this should not be optional. Malformed packets
>>>>> should be dropped. In the icgbe case it's a very simple test - it's a
>>>>> single branch per packet so i doubt that it could impose any
>>>>> measurable performance degradation.
>>>> A drop allows the application no chance to recover.  The driver must
>>>> either provide the ability for the application to know that it cannot
>>>> accept the packet, or it must fix it up itself.
>>> I have the feeling that everybody agrees on the same thing:
>>> the application must be able to make a well formed packet by checking
>>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs?
>> It is not generic enough.  i40e has a limit that it imposes post-TSO.
>>
>>
>>> In case the application fails in its checks, the driver must drop it and
>>> notify the user via a stat counter.
>>> The driver can also remove the hardware limitation by gathering the segments
>>> but it may be hard to implement and would be a slow operation.
>> I think that to satisfy both the 64b full line rate applications and the
>> more complicated full stack applications, this must be made optional.
>> In particular, and application that only forwards packets will never hit
>> a NIC's limits, so it need not take any action. That's why I think a
>> verification function is ideal; a forwarding application can ignore it,
>> and a complex application can call it, and if it fails the packet, it
>> can linearize it itself, removing complexity from dpdk itself.
> I think that's a good approach to that problem.
> As I remember we discussed something similar a while ago -
> A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions,
> calculate pseudo-header checksum, reset ip header len, etc.
>
>  From other hand we also can add two more fields into rte_eth_dev_info:
> 1) Max num of segs per TSO packet (tx_max_seg ?).
> 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?).
> So for ixgbe both will have value 40 - wthresh,
> while for i40e 1) would be UINT8_MAX and 2) will be 8.
> Then upper layer can use that information to select an optimal size for its TX buffers.

HW limitations differ from HW to HW not only by values but also by their 
nature - for instance for Qlogic bnx2x NICs the limitations may not be 
expressed in the values above so this must be a callback.

>   
> Konstantin
>

  reply	other threads:[~2015-09-13 12:25 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-13 18:06 Vlad Zolotarov
2015-08-13 20:28 ` Zhang, Helin
2015-08-14  5:37   ` Vlad Zolotarov
2015-08-19  0:42     ` Lu, Wenzhuo
2015-08-19  4:55       ` Vladislav Zolotarov
2015-08-19  7:43         ` Ananyev, Konstantin
2015-08-19 10:02           ` Vlad Zolotarov
2015-08-20  8:41             ` Ananyev, Konstantin
2015-08-20  8:56               ` Vlad Zolotarov
2015-08-20  9:05                 ` Vlad Zolotarov
2015-08-20  9:06                   ` Vlad Zolotarov
2015-08-25 17:33                     ` Ananyev, Konstantin
2015-08-25 17:39                       ` Avi Kivity
2015-08-19 17:29         ` Zhang, Helin
2015-08-25 18:13           ` Zhang, Helin
2015-08-25 18:33             ` Vladislav Zolotarov
2015-08-25 18:43               ` Zhang, Helin
2015-08-25 18:52                 ` Vlad Zolotarov
2015-08-25 19:16                   ` Zhang, Helin
2015-08-25 19:23                     ` Avi Kivity
2015-08-25 19:30                     ` Vladislav Zolotarov
2015-08-25 20:07                       ` Vlad Zolotarov
2015-08-25 20:13                       ` Zhang, Helin
2015-09-09 12:18                         ` Thomas Monjalon
2015-09-09 13:19                           ` Ananyev, Konstantin
2015-09-11 15:17                             ` Vladislav Zolotarov
2015-09-11 14:25                   ` didier.pallard
2015-09-11 14:47                     ` Avi Kivity
2015-09-11 14:55                       ` Thomas Monjalon
2015-09-11 15:12                         ` Vladislav Zolotarov
2015-09-11 15:43                           ` Avi Kivity
2015-09-11 16:04                             ` Vladislav Zolotarov
2015-09-11 16:07                               ` Richardson, Bruce
2015-09-11 16:14                                 ` Vladislav Zolotarov
2015-09-11 17:44                                 ` Avi Kivity
2015-09-11 16:08                             ` Thomas Monjalon
2015-09-11 16:18                               ` Vladislav Zolotarov
2015-09-11 17:17                                 ` Matthew Hall
2015-09-11 17:42                                   ` Ananyev, Konstantin
2015-09-11 17:58                                     ` Matthew Hall
2015-09-11 17:48                               ` Avi Kivity
2015-09-13 11:47                                 ` Ananyev, Konstantin
2015-09-13 12:24                                   ` Vlad Zolotarov [this message]
2015-09-13 12:32                                   ` Avi Kivity
2015-09-13 15:54                                     ` Ananyev, Konstantin
2015-09-13 16:01                                       ` Avi Kivity
2015-09-11 16:00                           ` Richardson, Bruce
2015-09-11 16:13                             ` Vladislav Zolotarov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F56B1B.80606@cloudius-systems.com \
    --to=vladz@cloudius-systems.com \
    --cc=avi@cloudius-systems.com \
    --cc=dev@dpdk.org \
    --cc=didier.pallard@6wind.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=thomas.monjalon@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).