DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>,
	Andrew Rybchenko <arybchenko@solarflare.com>
Cc: "Yigit, Ferruh" <ferruh.yigit@intel.com>,
	Shahaf Shuler <shahafs@mellanox.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Thomas Monjalon <thomas@monjalon.net>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	Matan Azrad <matan@mellanox.com>,
	Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Subject: Re: [dpdk-dev] questions about new offload ethdev api
Date: Fri, 27 Dec 2019 14:23:18 +0000	[thread overview]
Message-ID: <SN6PR11MB2558BC7CC8321E85621038F19A2A0@SN6PR11MB2558.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20191227135428.GP22738@platinum>



> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Friday, December 27, 2019 1:54 PM
> To: Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Richardson, Bruce <bruce.richardson@intel.com>; Matan
> Azrad <matan@mellanox.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Subject: Re: [dpdk-dev] questions about new offload ethdev api
> 
> Hi,
> 
> Few comments below.
> 
> On Mon, Dec 16, 2019 at 11:39:05AM +0300, Andrew Rybchenko wrote:
> > On 12/10/19 9:07 PM, Ferruh Yigit wrote:
> > > On 1/23/2018 2:34 PM, Shahaf Shuler wrote:
> > >> Tuesday, January 23, 2018 3:53 PM, Olivier Matz:
> > >
> > > <...>
> > >
> > >>>
> > >>> 2/ meaning of rxmode.jumbo_frame, rxmode.enable_scatter,
> > >>> rxmode.max_rx_pkt_len
> > >>>
> > >>> While it's not related to the new API, it is probably a good opportunity
> > >>> to clarify the meaning of these flags. I'm not able to find a good
> > >>> documentation about them.
> > >>>
> > >>> Here is my understanding, the configuration only depends on: - the maximum
> > >>> rx frame length - the amount of data available in a mbuf (minus headroom)
> > >>>
> > >>> Flags to set in rxmode (example):
> > >>> +---------------+----------------+----------------+-----------------+ |
> > >>> |mbuf_data_len=1K|mbuf_data_len=2K|mbuf_data_len=16K|
> > >>> +---------------+----------------+----------------+-----------------+
> > >>> |max_rx_len=1500|enable_scatter  |                |                 |
> > >>> +---------------+----------------+----------------+-----------------+
> > >>> |max_rx_len=9000|enable_scatter, |enable_scatter, |jumbo_frame      | |
> > >>> |jumbo_frame     |jumbo_frame     |                 |
> > >>> +---------------+----------------+----------------+-----------------+
> 
> Due to successive quotes, the table was not readable in my mail client,
> here it is again (narrower):
> 
> +------------+---------------+---------------+---------------+
> |            |mbuf_data_len= |mbuf_data_len= |mbuf_data_len= |
> |            |1K             |2K             |16K            |
> +------------+---------------+---------------+---------------+
> |max_rx_len= |enable_scatter |               |               |
> |1500        |               |               |               |
> +------------+---------------+---------------+---------------+
> |max_rx_len= |enable_scatter,|enable_scatter,|jumbo_frame    |
> |9000        |jumbo_frame    |jumbo_frame    |               |
> +------------+---------------+---------------+---------------+
> 
> > >>> If this table is correct, the flag jumbo_frame would be equivalent to check
> > >>> if max_rx_pkt_len is above a threshold.
> > >>>
> > >>> And enable_scatter could be deduced from the mbuf size of the given rxq
> > >>> (which is a bit harder but maybe doable).
> > >>
> > >> I glad you raised this subject. We had a lot of discussion on it internally
> > >> in Mellanox.
> > >>
> > >> I fully agree. All application needs is to specify the maximum packet size it
> > >> wants to receive.
> > >>
> > >> I think also the lack of documentation is causing PMDs to use those flags
> > >> wrongly. For example - some PMDs set the jumbo_frame flag internally without
> > >> it being set by the application.
> > >>
> > >> I would like to add one more item : MTU. What is the relation (if any)
> > >> between setting MTU and the max_rx_len ? I know MTU stands for Max Transmit
> > >> Unit, however at least in Linux it is the same for the Send and the receive.
> > >>
> > >>
> > >
> > > (Resurrecting the thread after two years, I will reply again with latest
> > > understanding.)
> > >
> > > Thanks Olivier for above summary and table, and unfortunately usage still not
> > > consistent between PMDs. According my understanding:
> > >
> > > 'max_rx_pkt_len' is user configuration value, to limit the size packet that is
> > > shared with host, but this doesn't limit the size of packet that NIC receives.
> 
> When you say the size of packet shared with the host, do you mean for
> instance that the NIC will receive a 1500B packet and will only write
> 128 bytes of data in the mbuf?
> 
> If yes, this was not my understanding. I suppose it could be used for
> monitoring. What should be the value for rx offload infos like checksum
> or packet type if the packet (or the header) is truncated?
> 
> > Also comment in lib/librte_ethdev/rte_ethdev.h says that the
> > rxmode field is used if (and I think only if) JUMBO_FRAME is
> > enabled. So, if user wants to set it on device configure stage,
> > device *must* support JUMBO_FRAME offload which mean that
> > driver code handles rxmode.max_rx_pkt_len and either accept it
> > and configures HW appropriately or return an error if specified
> > value is wrong. Basically it is written in jumbo frame feature
> > definition in features.rst. User has max_rx_pktlen in dev_info
> > to find out maximum supported value for max_rx_pkt_len.
> >
> > > Like if the mbuf size of the mempool used by a queue is 1024 bytes, we don't
> > > want packets bigger than buffer size, but if NIC supports it is possible receive
> > > 6000 bytes packet and split data into multiple buffers, and we can use multi
> > > segment packets to represent it.
> > > So what we need is NIC ability to limit the size of data to share to host and
> > > scattered Rx support (device + driver).
> >
> > It requires RX_SCATTER offload enabled and it must be
> > controlled by the user only (not PMD) since it basically
> > mean if the application is ready to handle multi-segment
> > packets (have code which takes a look at the number of
> > segments and next pointers etc). Moreover, application
> > may disable MULTI_SEG Tx offload (and drivers may ignore
> > number of segments and next pointer as well).
> 
> Agree, I think it is important that the application can control the
> enabling of rx scatter, either by a flag, or simply by passing
> max_rx_len <= mbuf_data_len.
> 
> > > But MTU limits the size of the packet that NIC receives.
> >
> > Personally I've never treated it this way. For me the only
> > difference between max_rx_pkt_len and MTU is:
> >  - max_rx_pkt_len is entire packet with all L2 headers and
> >    even FCS (basically everything which could be provided
> >    to application in mbuf)
> >  - MTU does not cover L2 (and VLANs, I'm not sure about MPLS)
> >
> > > Assuming above are correct J,
> > >
> > > Using mbuf data size as 'max_rx_pkt_len' without asking from user is an option,
> > > but perhaps user has different reason to limit packet size, so I think better to
> > > keep as different config option.
> > >
> > > I think PMD itself enabling "jumbo frame" offload is not too bad, and
> > > acceptable, since providing a large MTU already implies it.
> >
> > Yes
> 
> +1
> 
> > > But not sure about PMD enabling scattered Rx, application may want to force to
> > > receive single segment mbufs, for that case PMD enabling this config on its own
> > > looks like a problem.
> >
> > Yes
> 
> +1
> 
> > > But user really needs this when a packet doesn't fit to the mbuf, so providing a
> > > MTU larger than 'max_rx_pkt_len' _may_ imply enabling scattered Rx, I assume
> > > this is the logic in some PMDs which looks acceptable.
> >
> > I don't think so. IMO auto enabling Rx scatter from PMD is a
> > breakage of a contract between application and driver.
> > As stated above the application may be simply not ready to
> > handle multi-segment packets correctly.
> >
> > I think that providing an MTU larger than 'max_rx_pkt_len' is simply a
> > change of max_rx_pkt_len = (MTU plus space for L2+).
> 
> As VLAN(s) are not taken in account in MTU, it means that if MTU is
> 1500, max Ethernet len is 1500 + 14 (eth hdr) + 4 (vlan) + 4 (2nd vlan /
> qinq) + 4 (crc) = 1522.
> 
> Shouldn't we only use a L2 lengths instead of MTU? I don't know what
> is usually expected by different hardware (mtu or l2 len).
> 
> > > And PMD behavior should be according for mentioned configs:
> > >
> > > 1) Don't change user provided 'max_rx_pkt_len' value
> > I have no strong opinion. However, it is important to clarify
> > which understanding of max_rx_pkt_len vs MTU is the right one.
> >
> > > 2) If jumbo frame is not enabled, don't limit the size of packets to the host (I
> > > think this is based on assumption that mbuf size always will be > 1514)
> >
> > I think that JUMBO_FRAME is not relevant here. It is just a
> > promise to take a look at max_rx_pkt_len on configure or
> > start stage.
> >
> > > 3) When user request to set the MTU bigger than ETH_MAX, PMD enable jumbo frame
> > > support (if it is not enabled by user already and supported by HW). If HW
> > > doesn't support if of course it should fail.
> >
> > I'm not sure which ETH_MAX is mentioned above.
> > #define ETH_MAX_MTU 0xFFFFU /* 65535, same as IP_MAX_MTU */
> > or do you mean
> > #define ETH_FRAME_LEN 1514 /* Max. octets in frame sans FCS */
> > or even
> > #define ETH_DATA_LEN 1500 /* Max. octets in payload */
> >
> > We should be careful when we talk about Ethernet lengths and
> > MTU.
> >
> > > 4) When user request to set MTU bigger than 'max_rx_pkt_len'
> >
> > I think the second parameter to consider here is not
> > max_rx_pkt_len, but amount of space for data in single
> > mbuf (for all Rx queues).
> >
> > > 4a) if "scattered Rx" is enabled, configure the MTU and limit packet size to
> > > host to 'max_rx_pkt_len'
> >
> > Yes and no. IMO configure the MTU and bump max_rx_pkt_len.
> >
> > > 4b) if "scattered Rx" is not enabled but HW supports it, enable "scattered Rx"
> > > by PMD, configure the MTU and limit packet size to host to 'max_rx_pkt_len'
> >
> > No, I think it is wrong to enable Rx scatter from PMD.
> >
> > > 4c) if "scattered Rx" is not enabled and not supported by HW, fail MTU set.
> >
> > Yes, regardless support in HW.
> >
> > > 4d) if HW doesn't support to limit the packet size to host, but requested MTU
> > > bigger than 'max_rx_pkt_len' it should fail.
> >
> > I would rephrase it as impossibility to disable Rx scatter.
> > If so, it must be driver responsibility to drop scattered
> > packets if Rx scatter offload is not enabled.
> >
> > > Btw, I am aware of that some PMDs have a larger MTU by default and can't limit
> > > the packet size to host to 'max_rx_pkt_len' value, I don't know what to do in
> > > that case, fail in configure? Or at least be sure configured mempool's mbuf size
> > > is big enough?
> >
> > See above.
> >
> > Thanks for reminder about the topic.
> 
> I have the impression that what we want can be done with these 3
> values:
> 
> - max_rx_pkt_size: maximum size of received packet, larger ones are dropped
> - max_rx_data_size: maximum size of data copied in a mbuf chain, larger packets
>   are truncated

Do we really need 'max_rx_data_size'?
Can't it just always be equal to max_rx_pkt_size (as we have it right now)? 

> - max_rx_seg_size: maximum size written in a segment (this can be retrieved
>   from pool private info = rte_pktmbuf_data_room_size() - RTE_PKTMBUF_HEADROOM)
> 
> I think the first 2 values should be L2 lengths, including CRC if
> CRC-receive is enabled.
> 
> if max_rx_data_size <= max_rx_seg_size, scatter is disabled
> if max_rx_data_size < max_rx_pkt_size, packets can be truncated
> 
> In case a PMD is not able to limit a packet size, it can be advertised
> by a capability, and it would be up to the application to do it by sw.
> 
> 
> Olivier

      reply	other threads:[~2019-12-27 14:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-23 13:53 Olivier Matz
2018-01-23 14:34 ` Shahaf Shuler
2018-03-08 15:45   ` Ferruh Yigit
2019-12-10 18:07   ` Ferruh Yigit
2019-12-16  8:39     ` Andrew Rybchenko
2019-12-27 13:54       ` Olivier Matz
2019-12-27 14:23         ` Ananyev, Konstantin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR11MB2558BC7CC8321E85621038F19A2A0@SN6PR11MB2558.namprd11.prod.outlook.com \
    --to=konstantin.ananyev@intel.com \
    --cc=arybchenko@solarflare.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=jerinj@marvell.com \
    --cc=matan@mellanox.com \
    --cc=olivier.matz@6wind.com \
    --cc=shahafs@mellanox.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).