Re: [dpdk-dev] questions about new offload ethdev api

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Olivier Matz <olivier.matz@6wind.com>
To: Andrew Rybchenko <arybchenko@solarflare.com>
Cc: Ferruh Yigit <ferruh.yigit@intel.com>,
	Shahaf Shuler <shahafs@mellanox.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Matan Azrad <matan@mellanox.com>,
	Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Subject: Re: [dpdk-dev] questions about new offload ethdev api
Date: Fri, 27 Dec 2019 14:54:28 +0100	[thread overview]
Message-ID: <20191227135428.GP22738@platinum> (raw)
In-Reply-To: <b5263021-18f4-7e25-56f9-23b4e5ca3467@solarflare.com>

Hi,

Few comments below.

On Mon, Dec 16, 2019 at 11:39:05AM +0300, Andrew Rybchenko wrote:
> On 12/10/19 9:07 PM, Ferruh Yigit wrote:
> > On 1/23/2018 2:34 PM, Shahaf Shuler wrote:
> >> Tuesday, January 23, 2018 3:53 PM, Olivier Matz:
> > 
> > <...>
> > 
> >>>
> >>> 2/ meaning of rxmode.jumbo_frame, rxmode.enable_scatter, 
> >>> rxmode.max_rx_pkt_len
> >>>
> >>> While it's not related to the new API, it is probably a good opportunity
> >>> to clarify the meaning of these flags. I'm not able to find a good
> >>> documentation about them.
> >>>
> >>> Here is my understanding, the configuration only depends on: - the maximum
> >>> rx frame length - the amount of data available in a mbuf (minus headroom)
> >>>
> >>> Flags to set in rxmode (example): 
> >>> +---------------+----------------+----------------+-----------------+ |
> >>> |mbuf_data_len=1K|mbuf_data_len=2K|mbuf_data_len=16K| 
> >>> +---------------+----------------+----------------+-----------------+ 
> >>> |max_rx_len=1500|enable_scatter  |                |                 | 
> >>> +---------------+----------------+----------------+-----------------+ 
> >>> |max_rx_len=9000|enable_scatter, |enable_scatter, |jumbo_frame      | |
> >>> |jumbo_frame     |jumbo_frame     |                 | 
> >>> +---------------+----------------+----------------+-----------------+

Due to successive quotes, the table was not readable in my mail client,
here it is again (narrower):

+------------+---------------+---------------+---------------+
|            |mbuf_data_len= |mbuf_data_len= |mbuf_data_len= |
|            |1K             |2K             |16K            |
+------------+---------------+---------------+---------------+
|max_rx_len= |enable_scatter |               |               |
|1500        |               |               |               |
+------------+---------------+---------------+---------------+
|max_rx_len= |enable_scatter,|enable_scatter,|jumbo_frame    |
|9000        |jumbo_frame    |jumbo_frame    |               |
+------------+---------------+---------------+---------------+

> >>> If this table is correct, the flag jumbo_frame would be equivalent to check
> >>> if max_rx_pkt_len is above a threshold.
> >>>
> >>> And enable_scatter could be deduced from the mbuf size of the given rxq 
> >>> (which is a bit harder but maybe doable).
> >>
> >> I glad you raised this subject. We had a lot of discussion on it internally
> >> in Mellanox.
> >>
> >> I fully agree. All application needs is to specify the maximum packet size it
> >> wants to receive.
> >>
> >> I think also the lack of documentation is causing PMDs to use those flags
> >> wrongly. For example - some PMDs set the jumbo_frame flag internally without
> >> it being set by the application.
> >>
> >> I would like to add one more item : MTU. What is the relation (if any)
> >> between setting MTU and the max_rx_len ? I know MTU stands for Max Transmit
> >> Unit, however at least in Linux it is the same for the Send and the receive.
> >>
> >>
> > 
> > (Resurrecting the thread after two years, I will reply again with latest
> > understanding.)
> > 
> > Thanks Olivier for above summary and table, and unfortunately usage still not
> > consistent between PMDs. According my understanding:
> > 
> > 'max_rx_pkt_len' is user configuration value, to limit the size packet that is
> > shared with host, but this doesn't limit the size of packet that NIC receives.

When you say the size of packet shared with the host, do you mean for
instance that the NIC will receive a 1500B packet and will only write
128 bytes of data in the mbuf?

If yes, this was not my understanding. I suppose it could be used for
monitoring. What should be the value for rx offload infos like checksum
or packet type if the packet (or the header) is truncated?

> Also comment in lib/librte_ethdev/rte_ethdev.h says that the
> rxmode field is used if (and I think only if) JUMBO_FRAME is
> enabled. So, if user wants to set it on device configure stage,
> device *must* support JUMBO_FRAME offload which mean that
> driver code handles rxmode.max_rx_pkt_len and either accept it
> and configures HW appropriately or return an error if specified
> value is wrong. Basically it is written in jumbo frame feature
> definition in features.rst. User has max_rx_pktlen in dev_info
> to find out maximum supported value for max_rx_pkt_len.
> 
> > Like if the mbuf size of the mempool used by a queue is 1024 bytes, we don't
> > want packets bigger than buffer size, but if NIC supports it is possible receive
> > 6000 bytes packet and split data into multiple buffers, and we can use multi
> > segment packets to represent it.
> > So what we need is NIC ability to limit the size of data to share to host and
> > scattered Rx support (device + driver).
> 
> It requires RX_SCATTER offload enabled and it must be
> controlled by the user only (not PMD) since it basically
> mean if the application is ready to handle multi-segment
> packets (have code which takes a look at the number of
> segments and next pointers etc). Moreover, application
> may disable MULTI_SEG Tx offload (and drivers may ignore
> number of segments and next pointer as well).

Agree, I think it is important that the application can control the
enabling of rx scatter, either by a flag, or simply by passing
max_rx_len <= mbuf_data_len.

> > But MTU limits the size of the packet that NIC receives.
> 
> Personally I've never treated it this way. For me the only
> difference between max_rx_pkt_len and MTU is:
>  - max_rx_pkt_len is entire packet with all L2 headers and
>    even FCS (basically everything which could be provided
>    to application in mbuf)
>  - MTU does not cover L2 (and VLANs, I'm not sure about MPLS)
> 
> > Assuming above are correct J,
> > 
> > Using mbuf data size as 'max_rx_pkt_len' without asking from user is an option,
> > but perhaps user has different reason to limit packet size, so I think better to
> > keep as different config option.
> > 
> > I think PMD itself enabling "jumbo frame" offload is not too bad, and
> > acceptable, since providing a large MTU already implies it.
> 
> Yes

+1

> > But not sure about PMD enabling scattered Rx, application may want to force to
> > receive single segment mbufs, for that case PMD enabling this config on its own
> > looks like a problem.
> 
> Yes

+1

> > But user really needs this when a packet doesn't fit to the mbuf, so providing a
> > MTU larger than 'max_rx_pkt_len' _may_ imply enabling scattered Rx, I assume
> > this is the logic in some PMDs which looks acceptable.
> 
> I don't think so. IMO auto enabling Rx scatter from PMD is a
> breakage of a contract between application and driver.
> As stated above the application may be simply not ready to
> handle multi-segment packets correctly.
> 
> I think that providing an MTU larger than 'max_rx_pkt_len' is simply a
> change of max_rx_pkt_len = (MTU plus space for L2+).

As VLAN(s) are not taken in account in MTU, it means that if MTU is
1500, max Ethernet len is 1500 + 14 (eth hdr) + 4 (vlan) + 4 (2nd vlan /
qinq) + 4 (crc) = 1522.

Shouldn't we only use a L2 lengths instead of MTU? I don't know what
is usually expected by different hardware (mtu or l2 len).

> > And PMD behavior should be according for mentioned configs:
> > 
> > 1) Don't change user provided 'max_rx_pkt_len' value
> I have no strong opinion. However, it is important to clarify
> which understanding of max_rx_pkt_len vs MTU is the right one.
> 
> > 2) If jumbo frame is not enabled, don't limit the size of packets to the host (I
> > think this is based on assumption that mbuf size always will be > 1514)
> 
> I think that JUMBO_FRAME is not relevant here. It is just a
> promise to take a look at max_rx_pkt_len on configure or
> start stage.
> 
> > 3) When user request to set the MTU bigger than ETH_MAX, PMD enable jumbo frame
> > support (if it is not enabled by user already and supported by HW). If HW
> > doesn't support if of course it should fail.
> 
> I'm not sure which ETH_MAX is mentioned above.
> #define ETH_MAX_MTU 0xFFFFU /* 65535, same as IP_MAX_MTU */
> or do you mean
> #define ETH_FRAME_LEN 1514 /* Max. octets in frame sans FCS */
> or even
> #define ETH_DATA_LEN 1500 /* Max. octets in payload */
> 
> We should be careful when we talk about Ethernet lengths and
> MTU.
> 
> > 4) When user request to set MTU bigger than 'max_rx_pkt_len'
> 
> I think the second parameter to consider here is not
> max_rx_pkt_len, but amount of space for data in single
> mbuf (for all Rx queues).
> 
> > 4a) if "scattered Rx" is enabled, configure the MTU and limit packet size to
> > host to 'max_rx_pkt_len'
> 
> Yes and no. IMO configure the MTU and bump max_rx_pkt_len.
> 
> > 4b) if "scattered Rx" is not enabled but HW supports it, enable "scattered Rx"
> > by PMD, configure the MTU and limit packet size to host to 'max_rx_pkt_len'
> 
> No, I think it is wrong to enable Rx scatter from PMD.
> 
> > 4c) if "scattered Rx" is not enabled and not supported by HW, fail MTU set.
> 
> Yes, regardless support in HW.
> 
> > 4d) if HW doesn't support to limit the packet size to host, but requested MTU
> > bigger than 'max_rx_pkt_len' it should fail.
> 
> I would rephrase it as impossibility to disable Rx scatter.
> If so, it must be driver responsibility to drop scattered
> packets if Rx scatter offload is not enabled.
> 
> > Btw, I am aware of that some PMDs have a larger MTU by default and can't limit
> > the packet size to host to 'max_rx_pkt_len' value, I don't know what to do in
> > that case, fail in configure? Or at least be sure configured mempool's mbuf size
> > is big enough?
> 
> See above.
> 
> Thanks for reminder about the topic.

I have the impression that what we want can be done with these 3
values:

- max_rx_pkt_size: maximum size of received packet, larger ones are dropped
- max_rx_data_size: maximum size of data copied in a mbuf chain, larger packets
  are truncated
- max_rx_seg_size: maximum size written in a segment (this can be retrieved
  from pool private info = rte_pktmbuf_data_room_size() - RTE_PKTMBUF_HEADROOM)

I think the first 2 values should be L2 lengths, including CRC if
CRC-receive is enabled.

if max_rx_data_size <= max_rx_seg_size, scatter is disabled
if max_rx_data_size < max_rx_pkt_size, packets can be truncated

In case a PMD is not able to limit a packet size, it can be advertised
by a capability, and it would be up to the application to do it by sw.


Olivier

next prev parent reply	other threads:[~2019-12-27 13:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-23 13:53 Olivier Matz
2018-01-23 14:34 ` Shahaf Shuler
2018-03-08 15:45   ` Ferruh Yigit
2019-12-10 18:07   ` Ferruh Yigit
2019-12-16  8:39     ` Andrew Rybchenko
2019-12-27 13:54       ` Olivier Matz [this message]
2019-12-27 14:23         ` Ananyev, Konstantin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191227135428.GP22738@platinum \
    --to=olivier.matz@6wind.com \
    --cc=arybchenko@solarflare.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=matan@mellanox.com \
    --cc=shahafs@mellanox.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).