DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Wiles, Keith" <keith.wiles@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	Sam <batmanustc@gmail.com>, dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] Where is the padding code in DPDK?
Date: Thu, 15 Nov 2018 13:32:04 +0000	[thread overview]
Message-ID: <6F20B08D-7219-450E-BB75-8C884C40E862@intel.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35B4249D@smartserver.smartshare.dk>



> On Nov 15, 2018, at 4:27 AM, Morten Brørup <mb@smartsharesystems.com> wrote:
> 
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
>>> On Nov 14, 2018, at 4:51 AM, Morten Brørup <mb@smartsharesystems.com>
>> wrote:
>>> 
>>> Anatoly,
>>> 
>>> This differs from the Linux kernel's behavior, where padding belongs
>> in the NIC driver layer, not in the protocol layer. If you pass a runt
>> frame (too short packet) to a Linux NIC driver's transmission function,
>> the NIC driver (or NIC hardware) will pad the frame to make it valid.
>> E.g. look at the rhine_start_tx() function in the kernel:
>> https://elixir.bootlin.com/linux/v4.9.137/source/drivers/net/ethernet/v
>> ia/via-rhine.c#L1800
>> 
>> The PMD in DPDK rejects the frame or extend the number of bytes to
>> send. Padding assumes you are zeroing out the packet to meet the NIC
>> required length. In PMDs unless they are concerned with security they
>> just make sure the number of bytes to be sent are correct for the
>> hardware (60 bytes min). Most NICs can do this padding in hardware as
>> the packet is sent.
> 
> Great, so let's extend DPDK to provide that feature!

If we expect the hardware to extend the length to the minimum length for Ethernet, then why would we need to extend DPDK?
> 
>> 
>> If we are talking about virtio and only talking to virtio software
>> backend then you can send any size packet, but the stacks or code
>> receiving the packet you need to make sure it does not throw the packet
>> away because it is a runt packet. Most NICs throw away Runts and are
>> never received to memory. In software based design like virtio you can
>> do whatever you want in the length, but I would suggest following the
>> Ethernet standard anyway.
> 
> Good point: If virtio is considered an Ethernet type interface (although it is able to handle really large Jumbo frames), then yes, the minimum packet size requirements should apply to this too. This is probably a question for the virtio folks to decide: Is virtio considered an Ethernet interface, or another type of interface (without Ethernet packet size requirements, like the "localhost" pseudo interface)?
> 
> But how about other non-physical interfaces, are they all considered Ethernet type interfaces? And to take it to the extreme: Should DPDK by design only support Ethernet type interfaces?
> 
DPDK supports a lot of different device types today and Ethernet is just one of the main interfaces. e.g. Cryptodev, compressdev and others not related to Ethernet. In a pure sense virtio is not a Ethernet device, but a virtual device interface. A virtual device interface like some in DPDK are sometimes getting packets received on Ethernet interfaces and it makes sense to use the same thought process for virtio. In some case virtio has been extended to hardware and their are virtio NICs in the world so assuming I can sent short frames is not going to be reasonable.

>> 
>> Now some stacks or code (like Pktgen) assume the hardware will append
>> the CRC (4 bytes) and this means the application needs to at least do
>> 60 byte frames for the PMD, unless you know the hardware will do the
>> right thing. The challenge is that applications in DPDK do not know the
>> details of the NIC at that level and should always assume the packet
>> being sent and received are valid Ethernet frames. This means at lease
>> 60 bytes as all NICs add the CRC now a days and not all of them adjust
>> the size of the frame.
>> 
>> If you do not send the PMD a 60 byte frame then you are expecting the
>> NIC to handle the padding and appending the CRC or at least expecting
>> the PMD to adjust the size, which I know is not in all PMDs or from my
>> dealing with writing Pktgen for DPDK.
> 
> You said it! And it proves my point about what higher layer developers probably expect of lower layers.

Still does not prove anything, the PMDs are not expected to adjust the packet length.
> 
>> 
>> If you are expecting DPDK PMDs to be Linux drivers then you need to
>> adjust your thinking and only send the PMD 60 bytes at least. Unless
>> you want to modify all of the PMDs to force the size to 60bytes, then I
>> have no objection to that patch just need to get all of the PMDs
>> maintainers to agree with your patch.
> 
> I agree that different thinking is required, and Linux is not always perfect. However, we allowed to copy good ideas from Linux - and I think that having padding in Ethernet PMDs is a perfectly logical concept. There are quite a few PMD maintainers, and I was hoping to take the discussion about the high level concept on the open mailing list before we involve the PMD maintainers about the implementation.

Then get the maintainers to modify the PMDs or submit a patch to fix them all!
Until all of the PMDs have been modified then this discussion is not very useful, because you must always send to Ethdev a 60 byte frame.

> 
> I think that a stack or code using DPDK as its lower layer expects DPDK to provide some offloading, and since padding to 60 byte payload is a very common event in stacks (due to empty TCP ACK packets), this is an obvious offload candidate!

DPDK was created doing mostly packet forwarding, but it has expanded a great deal over the 10+ year (not all of them in open source).
> 
> Of course, if DPDK was only designed for packet forwarding applications, and not also intended for use as a lower layer for stacks, then padding to 60 byte payload should not be required. I guess that DPDK was initially designed for packet forwarding applications, but is this still the case today, or should DPDK evolve to also accommodate the needs of stacks?

Making one assumption and leading into all PMDs must pad a frame is not reasonable.
> 
> If padding is not included in the PMDs, consider this (highly theoretical examples but for the discussion of the concept): The DPDK packet manipulation libraries could be required to do it, e.g. for fragmentation reassembly of two extremely small packets, totaling less than 60 byte payload, or for IPsec decapsulation of a very small packet. Otherwise the application would have to do it just before calling the PMD TX functions.
> 

As I stated get the maintainers to change the PMDs or submit a patch for all PMDs.
> 
>> 
>> On RX frames of less then 64 bytes (with CRC) are runts and most NICs
>> today will not receive these frames unless you program the hardware to
>> do so. ‘In my day’ :-) we had collision on the wire which created a
>> huge amount of fragments or Runts, today is not the case with point-to-
>> point links we have today.
> 
> I agree that RX of frames of less than 64 bytes (with CRC) - on Ethernet interfaces! - should still be considered runts, and thus should be discarded and counted as errors.
> 
>> 
>>> 
>>> If DPDK does not pad short frames passed to the egress function of
>> the NIC drivers, it should be noted in the documentation - this is not
>> the expected behavior by protocol developers.
>>> 
>>> Or even better: The NIC hardware (or driver) should ensure padding,
>> possibly considering it a TX Offload feature. Generating packets
>> shorter than 60 bytes data is common - just consider the amount of TCP
>> ACK packets, which are typically only 14 + 20 + 20 = 54 bytes (incl.
>> the 14 byte Ethernet header).
>>> 
>>> 
>>> Med venlig hilsen / kind regards
>>> - Morten Brørup
>>> 
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Burakov,
>> Anatoly
>>>> Sent: Wednesday, November 14, 2018 11:18 AM
>>>> To: Sam
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] Where is the padding code in DPDK?
>>>> 
>>>> On 14-Nov-18 5:45 AM, Sam wrote:
>>>>> OK, then shortly speaking, DPDK will NOT care about padding.
>>>>> NIC will care about padding while send and recv with NIC.
>>>>> kernel will care about while send and recv with vhostuser port.
>>>>> 
>>>>> Is that right?
>>>> 
>>>> I cannot speak for virtio/vhost user since i am not terribly
>> familiar
>>>> with them. For regular packets, generally speaking, packets shorter
>>>> than
>>>> 60 bytes are invalid. Whether DPDK does or does not care about
>> padding
>>>> is irrelevant, because *you* are attempting to transmit packets that
>>>> are
>>>> not valid. You shouldn't rely on this behavior.
>>>> 
>>>>> 
>>>>> 
>>>>> Burakov, Anatoly <anatoly.burakov@intel.com
>>>>> <mailto:anatoly.burakov@intel.com>> 于2018年11月13日周二
>> 下午5:29写道:
>>>>> 
>>>>>   On 13-Nov-18 7:16 AM, Sam wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> As we know, ethernet frame must longer then 64B.
>>>>>> 
>>>>>> So if I create rte_mbuf and fill it with just 60B data, will
>>>>>> rte_eth_tx_burst add padding data, let the frame longer then
>>>> 64B
>>>>>> 
>>>>>> If it does, where is the code?
>>>>>> 
>>>>> 
>>>>>   Others can correct me if i'm wrong here, but specifically in
>> case
>>>> of
>>>>>   64-byte packets, these are the shortest valid packets that you
>>>> can
>>>>>   send,
>>>>>   and a 64-byte packet will actually carry only 60 bytes' worth of
>>>> packet
>>>>>   data, because there's a 4-byte CRC frame at the end (see
>> Ethernet
>>>> frame
>>>>>   format). If you enabled CRC offload, then your NIC will append
>>>> the 4
>>>>>   bytes at transmit. If you haven't, then it's up to each
>>>> individual
>>>>>   driver/NIC to accept/reject such a packet because it can rightly
>>>> be
>>>>>   considered malformed.
>>>>> 
>>>>>   In addition, your NIC may add e.g. VLAN tags or other stuff,
>>>> again
>>>>>   depending on hardware offloads that you have enabled in your TX
>>>>>   configuration, which may push the packet size beyond 64 bytes
>>>> while
>>>>>   having only 60 bytes of actual packet data.
>>>>> 
>>>>>   --
>>>>>   Thanks,
>>>>>   Anatoly
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks,
>>>> Anatoly
>>> 
>> 
>> Regards,
>> Keith
> 

Regards,
Keith


      reply	other threads:[~2018-11-15 13:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-13  7:16 Sam
2018-11-13  7:17 ` Sam
2018-11-13  7:22   ` Sam
2018-11-13  9:29 ` Burakov, Anatoly
2018-11-14  5:45   ` Sam
2018-11-14 10:17     ` Burakov, Anatoly
2018-11-14 10:51       ` Morten Brørup
2018-11-14 16:19         ` Wiles, Keith
2018-11-15  2:07           ` Sam
2018-11-15  2:13             ` Sam
2018-11-15 10:06             ` Burakov, Anatoly
2018-11-15 10:27           ` Morten Brørup
2018-11-15 13:32             ` Wiles, Keith [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6F20B08D-7219-450E-BB75-8C884C40E862@intel.com \
    --to=keith.wiles@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=batmanustc@gmail.com \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).