DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Chengchang Tang <tangchengchang@huawei.com>,
	Andrew Rybchenko <arybchenko@solarflare.com>,
Cc: linuxarm@huawei.com, thomas@monjalon.net,
	Olivier MATZ <olivier.matz@6wind.com>,
	Matan Azrad <matan@mellanox.com>,
	Jerin Jacob <jerin.jacob@caviumnetworks.com>,
	Shy Shyman <shys@mellanox.com>, Qi Zhang <qi.z.zhang@intel.com>,
	Konstantin Ananyev <konstantin.ananyev@intel.com>
Subject: Re: [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info
Date: Wed, 24 Jun 2020 19:32:23 +0100
Message-ID: <ae00d51e-5785-0c7f-0ee6-c0dee8415da7@intel.com> (raw)
In-Reply-To: <4650f0a7-64cf-831d-631c-243ea510fd38@intel.com>

On 6/24/2020 9:52 AM, Ferruh Yigit wrote:
> On 6/24/2020 4:48 AM, Chengchang Tang wrote:
>> On 2020/6/23 17:30, Andrew Rybchenko wrote:
>>> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>>>> In common practice, PMD configure the rx_buf_size according to the data
>>>> room size of the object in mempool. But in fact the final value is related
>>>> to the specifications of hw, and its values will affect the number of
>>>> fragments in recieving pkts.
>>>> At present, we seem to have no way to espose relevant information to upper
>>>> layer users.
>>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>>>> size used in recieving pkts for hw.
>>> I'm OK with the change in general.
>>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
>>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
>>> ethdev.
>>> I think it is important to update PMDs which provides the
>>> information to fill the field in.
>> My plan is to divide the subsequent series into two patches,
>> one to modify rte_eth_rxq_info, and one to add our hns3 PMD
>> implementation of rxq_info_get. Should i update all the PMDs
>> that provide this information and test programs such as
>> testpmd at the same time?
> Hi Chengchang, Andrew,
> No objection to the change, but it should be crystal clear what is added. These
> are for PMD developers to implement and when it is not clear we end up having
> different implementations and inconsistencies.
> There is already some confusion for the Rx packet size etc.. my concern is
> adding more to it, here all we have is "size of RX buffer." comment, I think we
> need more.

cc'ed a few more people.

Back to this favorite topic, how to configure/limit the packet size.

Can you please help to have a common/correct understanding? I tried to clarify
as much as I got it, any comment welcome. (I know it is long, please bare with me)

The related config options I can see,
1) rte_eth_conf->rxmode->max_rx_pkt_len
2) rte_eth_dev_info->max_rx_pktlen
4) rte_eth_dev->data->mtu
6) dev->data->scattered_rx
7) rte_eth_dev_info->min_mtu, rte_eth_dev_info->max_mtu

'mtu' (4): Both for Tx and Rx. The network layer payload length. Default value

'max_rx_pkt_len' (1): Only for Rx, maximum Rx frame length configured by

'max_rx_pktlen' (2): Device reported value on what maximum Rx frame length it
can receive. Application shouldn't set Rx frame length more than this value.

'DEV_RX_OFFLOAD_JUMBO_FRAME' (3): Device Jumbo Frame capability.
	When not enabled the Rx frame length is 'MTU' + overhead
	When enabled Rx frame length is 'max_rx_pkt_len'

'DEV_RX_OFFLOAD_SCATTER' (5): Capability to scatter packet to multiple
descriptor by device and driver converting this to chained mbuf.

'dev->data->scattered_rx' (6): The current status of driver scattered Rx, in
device data mostly for PMD internal usage.

'rte_eth_dev_info->min_mtu' & 'rte_eth_dev_info->max_mtu' (7): minimum and
maximum MTU values device supported.
'max_mtu' == 'max_rx_pkt_len' - L2_OVERHEAD.

I can see two different limits,
a) The Rx frame length limit that device can receive from wire. Any packet
larger than this size will be dropped by device in an early stage.
b) The Rx buffer length limit that received packets are written to. Device
shouldn't DMA larger than reserved buffer size.

If device supports scattered Rx to multiple descriptors, it can be possible to
configure (a) > (b).
Otherwise configuration have to be (b) >= (a).

For example if the mbuf size is 2Kb and the device can receive up to 9000 bytes.
Options are:
- If device supports it, large packet will be scattered on multiple mbufs
- or need to configure device Rx frame length to 2K (mbuf size)
- or need to allocate mbuf big enough to get largest possible packet (9000)

Issues I see:

i) Although the code clearly says 'max_rx_pkt_len' is only valid when jumbo
frames enabled, some drivers are taking it account always.

ii) Some drivers enable 'scattered_rx' & 'jumbo frame' implicitly, without

iii) Having both 'mtu' & 'max_rx_pkt_len' are confusing, although they are not
exactly same thing they are related. Difference is MTU applies for Tx too, and
L2 network layer overhead is not included.
'MTU' can be more interested by upper layers, 'max_rx_pkt_len' is more driver
level information. And driver should know how to convert one to another.

iv) 'max_rx_pkt_len' provided as part of 'rte_eth_dev_configure()' and there is
no API to update it later.
'mtu' is not part of 'rte_eth_dev_configure()', it can only be updated later
with specific API.
But driver have to keep these two values consistent.

v) 'max_rx_pktlen' & 'max_mtu' reports from driver are redundant information.
Again they are not same thing, but correlated.

Suggested changes:

Overall unify 'max_rx_pkt_len' & 'mtu' as much as possible, at first step:

i) Make 'max_rx_pkt_len' independent from 'DEV_RX_OFFLOAD_JUMBO_FRAME', so
'max_rx_pkt_len' value will be always valid, jumbo frame enabled or not.

ii) in '.dev_configure' convert 'max_rx_pkt_len' value to 'mtu' value, this will
be only point 'max_rx_pkt_len' is used, after that point PMD will always use
'mtu' value.
Even don't reflect 'rte_eth_dev_set_mtu()' changes to 'max_rx_pkt_len' anymore.

iii) Don't make 'max_rx_pkt_len' a mandatory config option, let it be '0' by
application, in that case 'rte_eth_dev_configure()' will set
"'max_rx_pkt_len' = RTE_ETHER_MAX_LEN" if 'DEV_RX_OFFLOAD_JUMBO_FRAME' disabled
"'max_rx_pkt_len' = 9000 if 'DEV_RX_OFFLOAD_JUMBO_FRAME' enabled

iv) Allow implicit update of 'DEV_RX_OFFLOAD_JUMBO_FRAME' on MTU set, since
setting a large MTU implies the jumbo frame request. And there is no harm to

v) Do NOT allow implicit update of 'DEV_RX_OFFLOAD_SCATTER' on MTU set (when Rx
frame length > Rx buffer length), since application may not be capable of
parsing chained mbufs. Instead fails the MTU set in that case.
[This can break some applications, relying on this implicit set.]

Any comments?

Additional details:

Behavior of some drivers:

What igb & ixgbe does
- Set Rx frame limit (a) using 'max_rx_pkt_len' (1)
- Set Rx buffer limit (b) using mbuf data size
- Enable Scattered Rx (5 & 6) if the Rx frame limit (a) bigger than Rx buffer
limit (b) (even user not requested for it)

What i40e does same as above, only differences
- Return error if jumbo frame enabled and 'max_rx_pkt_len' < RTE_ETHER_MAX_LEN

- Set Rx frame limit (a)
  - using 'max_rx_pkt_len' (1) when jumbo frame enabled
  - using 'mtu' when jumbo frame not enabled.
- Set Rx buffer limit (b) using mbuf data size
- If Rx frame limit (a) bigger than Rx buffer limit (b), and user not requested
'DEV_RX_OFFLOAD_SCATTER', return error.

- Set Rx frame limit (a) using 'max_rx_pkt_len' (1). Implicitly enable jumbo
frame based on 'max_rx_pkt_len'.
- I can't able find how Rx buffer limit (b) set
- Enable Scattered Rx (5) if the Rx frame limit (a) bigger than Rx buffer limit
(b) (even user not requested for it). 'dev->data->scattered_rx' not set at all.

> Adding a PMD implementation and testpmd updates helps to clarify the
> intention/usage, so I suggest sending them as a single patch with this one.
> Updating all PMDs is a bigger ask and sometimes too hard because of lack of
> knowledge on the internals of other PMDs, although this is causing feature gaps
> time to time, we are not mandating this to developers, so please update as many
> PMD as you can, that you are confident, rest should be done by their maintainers.
>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> ---
>>>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>>> index 0f6d053..82b7e98 100644
>>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>>>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>>>  } __rte_cache_min_aligned;
>>>>  /**
>>>> --
>>>> 2.7.4
>>> .

  reply	other threads:[~2020-06-24 18:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23  6:48 Chengchang Tang
2020-06-23  9:30 ` Andrew Rybchenko
2020-06-24  3:48   ` Chengchang Tang
2020-06-24  8:52     ` Ferruh Yigit
2020-06-24 18:32       ` Ferruh Yigit [this message]
2020-06-25  9:06         ` Andrew Rybchenko
2020-06-23 14:48 ` Stephen Hemminger
2020-06-23 15:22   ` Andrew Rybchenko
2020-07-22  6:38 ` [dpdk-dev] [RFC v2 0/3] add rx buffer size " Chengchang Tang
2020-07-22  6:38   ` [dpdk-dev] [RFC v2 1/3] ethdev: add a field " Chengchang Tang
2020-07-22  6:38   ` [dpdk-dev] [RFC v2 2/3] net/hns3: add support for query of rx/tx queue info Chengchang Tang
2020-07-22  6:38   ` [dpdk-dev] [RFC v2 3/3] app/testpmd: Add RX buffer size dispaly in queue info querry Chengchang Tang
2020-07-28  6:29   ` [dpdk-dev] [RFC v2 0/3] add rx buffer size for rte_eth_rxq_info Chengchang Tang
2020-07-28  9:30     ` Ferruh Yigit
2020-07-28 11:39       ` Chengchang Tang
2020-07-28 15:27         ` Thomas Monjalon
  -- strict thread matches above, loose matches on Subject: below --
2020-06-18 12:35 [dpdk-dev] [RFC] ethdev: add a field " Chengchang Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae00d51e-5785-0c7f-0ee6-c0dee8415da7@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=arybchenko@solarflare.com \
    --cc=dev@dpdk.org \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=linuxarm@huawei.com \
    --cc=matan@mellanox.com \
    --cc=olivier.matz@6wind.com \
    --cc=qi.z.zhang@intel.com \
    --cc=shys@mellanox.com \
    --cc=tangchengchang@huawei.com \
    --cc=thomas@monjalon.net \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git