DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <arybchenko@solarflare.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>,
	Chengchang Tang <tangchengchang@huawei.com>, <dev@dpdk.org>
Cc: <linuxarm@huawei.com>, <thomas@monjalon.net>,
	Olivier MATZ <olivier.matz@6wind.com>,
	Matan Azrad <matan@mellanox.com>,
	Jerin Jacob <jerin.jacob@caviumnetworks.com>,
	Shy Shyman <shys@mellanox.com>, Qi Zhang <qi.z.zhang@intel.com>,
	Konstantin Ananyev <konstantin.ananyev@intel.com>
Subject: Re: [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info
Date: Thu, 25 Jun 2020 12:06:18 +0300
Message-ID: <f1f26668-e78a-fa5f-0760-b0929f1a62f7@solarflare.com> (raw)
In-Reply-To: <ae00d51e-5785-0c7f-0ee6-c0dee8415da7@intel.com>

On 6/24/20 9:32 PM, Ferruh Yigit wrote:
> On 6/24/2020 9:52 AM, Ferruh Yigit wrote:
>> On 6/24/2020 4:48 AM, Chengchang Tang wrote:
>>> On 2020/6/23 17:30, Andrew Rybchenko wrote:
>>>> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>>>>> In common practice, PMD configure the rx_buf_size according to the data
>>>>> room size of the object in mempool. But in fact the final value is related
>>>>> to the specifications of hw, and its values will affect the number of
>>>>> fragments in recieving pkts.
>>>>> At present, we seem to have no way to espose relevant information to upper
>>>>> layer users.
>>>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>>>>> size used in recieving pkts for hw.
>>>> I'm OK with the change in general.
>>>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
>>>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
>>>> ethdev.
>>>> I think it is important to update PMDs which provides the
>>>> information to fill the field in.
>>> My plan is to divide the subsequent series into two patches,
>>> one to modify rte_eth_rxq_info, and one to add our hns3 PMD
>>> implementation of rxq_info_get. Should i update all the PMDs
>>> that provide this information and test programs such as
>>> testpmd at the same time?
>> Hi Chengchang, Andrew,
>> No objection to the change, but it should be crystal clear what is added. These
>> are for PMD developers to implement and when it is not clear we end up having
>> different implementations and inconsistencies.
>> There is already some confusion for the Rx packet size etc.. my concern is
>> adding more to it, here all we have is "size of RX buffer." comment, I think we
>> need more.
> cc'ed a few more people.
> Back to this favorite topic, how to configure/limit the packet size.
> Can you please help to have a common/correct understanding? I tried to clarify
> as much as I got it, any comment welcome. (I know it is long, please bare with me)
> The related config options I can see,
> 1) rte_eth_conf->rxmode->max_rx_pkt_len
> 2) rte_eth_dev_info->max_rx_pktlen
> 4) rte_eth_dev->data->mtu
> 6) dev->data->scattered_rx
> 7) rte_eth_dev_info->min_mtu, rte_eth_dev_info->max_mtu
> 'mtu' (4): Both for Tx and Rx. The network layer payload length. Default value
> 'max_rx_pkt_len' (1): Only for Rx, maximum Rx frame length configured by
> application.
> 'max_rx_pktlen' (2): Device reported value on what maximum Rx frame length it
> can receive. Application shouldn't set Rx frame length more than this value.
> 'DEV_RX_OFFLOAD_JUMBO_FRAME' (3): Device Jumbo Frame capability.
> 	When not enabled the Rx frame length is 'MTU' + overhead
> 	When enabled Rx frame length is 'max_rx_pkt_len'
> 'DEV_RX_OFFLOAD_SCATTER' (5): Capability to scatter packet to multiple
> descriptor by device and driver converting this to chained mbuf.
> 'dev->data->scattered_rx' (6): The current status of driver scattered Rx, in
> device data mostly for PMD internal usage.
> 'rte_eth_dev_info->min_mtu' & 'rte_eth_dev_info->max_mtu' (7): minimum and
> maximum MTU values device supported.
> 'max_mtu' == 'max_rx_pkt_len' - L2_OVERHEAD.
> I can see two different limits,
> a) The Rx frame length limit that device can receive from wire. Any packet
> larger than this size will be dropped by device in an early stage.
> b) The Rx buffer length limit that received packets are written to. Device
> shouldn't DMA larger than reserved buffer size.
> If device supports scattered Rx to multiple descriptors, it can be possible to
> configure (a) > (b).
> Otherwise configuration have to be (b) >= (a).
> For example if the mbuf size is 2Kb and the device can receive up to 9000 bytes.
> Options are:
> - If device supports it, large packet will be scattered on multiple mbufs
> - or need to configure device Rx frame length to 2K (mbuf size)
> - or need to allocate mbuf big enough to get largest possible packet (9000)
> Issues I see:
> -------------
> i) Although the code clearly says 'max_rx_pkt_len' is only valid when jumbo
> frames enabled, some drivers are taking it account always.

Ack, that's not good.

> ii) Some drivers enable 'scattered_rx' & 'jumbo frame' implicitly, without
> application.

Ack that it is a problem especially for scatter.

> iii) Having both 'mtu' & 'max_rx_pkt_len' are confusing, although they are not
> exactly same thing they are related. Difference is MTU applies for Tx too, and
> L2 network layer overhead is not included.
> 'MTU' can be more interested by upper layers, 'max_rx_pkt_len' is more driver
> level information. And driver should know how to convert one to another.


> iv) 'max_rx_pkt_len' provided as part of 'rte_eth_dev_configure()' and there is
> no API to update it later.
> 'mtu' is not part of 'rte_eth_dev_configure()', it can only be updated later
> with specific API.
> But driver have to keep these two values consistent.


> v) 'max_rx_pktlen' & 'max_mtu' reports from driver are redundant information.
> Again they are not same thing, but correlated.


> Suggested changes:
> -----------------
> Overall unify 'max_rx_pkt_len' & 'mtu' as much as possible, at first step:
> i) Make 'max_rx_pkt_len' independent from 'DEV_RX_OFFLOAD_JUMBO_FRAME', so
> 'max_rx_pkt_len' value will be always valid, jumbo frame enabled or not.

It will make handling of the max_rx_pkt_len mandatory in
all network PMDs.

> ii) in '.dev_configure' convert 'max_rx_pkt_len' value to 'mtu' value, this will
> be only point 'max_rx_pkt_len' is used, after that point PMD will always use
> 'mtu' value.

I'm not sure that it is a right direction.
Above you say that 'max_rx_pkt_len' is more driver level and
I agree with it. I guess most drivers operate it finally
(i.e. configure underlying HW in terms of max_rx_pkt_len,
not MTU). So, converted from max_rx_pkt_len to MTU on ethdev
level and covered back from MTU to max_rx_pkt_len in drivers.

> Even don't reflect 'rte_eth_dev_set_mtu()' changes to 'max_rx_pkt_len' anymore.

Not sure that I get it. max_rx_pkt_len is used on dev_configure
only. Is it reported on get somewhere?

> iii) Don't make 'max_rx_pkt_len' a mandatory config option, let it be '0' by
> application, in that case 'rte_eth_dev_configure()' will set
> "'max_rx_pkt_len' = RTE_ETHER_MAX_LEN" if 'DEV_RX_OFFLOAD_JUMBO_FRAME' disabled
> "'max_rx_pkt_len' = 9000 if 'DEV_RX_OFFLOAD_JUMBO_FRAME' enabled

Why 9000? IMHO, if max_rx_pkt_len is 0, just use value derived
from MTU.

> iv) Allow implicit update of 'DEV_RX_OFFLOAD_JUMBO_FRAME' on MTU set, since
> setting a large MTU implies the jumbo frame request. And there is no harm to
> application.

Yes and I'd deprecate DEV_RX_OFFLOAD_JUMBO_FRAME.

> v) Do NOT allow implicit update of 'DEV_RX_OFFLOAD_SCATTER' on MTU set (when Rx
> frame length > Rx buffer length), since application may not be capable of
> parsing chained mbufs. Instead fails the MTU set in that case.
> [This can break some applications, relying on this implicit set.]

Yes, definitely.

> Any comments?
> Additional details:
> -------------------
> Behavior of some drivers:
> What igb & ixgbe does
> - Set Rx frame limit (a) using 'max_rx_pkt_len' (1)
> - Set Rx buffer limit (b) using mbuf data size
> - Enable Scattered Rx (5 & 6) if the Rx frame limit (a) bigger than Rx buffer
> limit (b) (even user not requested for it)
> What i40e does same as above, only differences
> - Return error if jumbo frame enabled and 'max_rx_pkt_len' < RTE_ETHER_MAX_LEN
> sfc:
> - Set Rx frame limit (a)
>   - using 'max_rx_pkt_len' (1) when jumbo frame enabled
>   - using 'mtu' when jumbo frame not enabled.
> - Set Rx buffer limit (b) using mbuf data size
> - If Rx frame limit (a) bigger than Rx buffer limit (b), and user not requested
> 'DEV_RX_OFFLOAD_SCATTER', return error.


> octeontx2:
> - Set Rx frame limit (a) using 'max_rx_pkt_len' (1). Implicitly enable jumbo
> frame based on 'max_rx_pkt_len'.
> - I can't able find how Rx buffer limit (b) set
> - Enable Scattered Rx (5) if the Rx frame limit (a) bigger than Rx buffer limit
> (b) (even user not requested for it). 'dev->data->scattered_rx' not set at all.
>> Adding a PMD implementation and testpmd updates helps to clarify the
>> intention/usage, so I suggest sending them as a single patch with this one.
>> Updating all PMDs is a bigger ask and sometimes too hard because of lack of
>> knowledge on the internals of other PMDs, although this is causing feature gaps
>> time to time, we are not mandating this to developers, so please update as many
>> PMD as you can, that you are confident, rest should be done by their maintainers.
>>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>>> ---
>>>>>  lib/librte_ethdev/rte_ethdev.h | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>>>> index 0f6d053..82b7e98 100644
>>>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>>>>  	struct rte_eth_rxconf conf; /**< queue config parameters. */
>>>>>  	uint8_t scattered_rx;       /**< scattered packets RX supported. */
>>>>>  	uint16_t nb_desc;           /**< configured number of RXDs. */
>>>>> +	uint16_t rx_bufsize;        /**< size of RX buffer. */
>>>>>  } __rte_cache_min_aligned;
>>>>>  /**

  reply	other threads:[~2020-06-25  9:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23  6:48 Chengchang Tang
2020-06-23  9:30 ` Andrew Rybchenko
2020-06-24  3:48   ` Chengchang Tang
2020-06-24  8:52     ` Ferruh Yigit
2020-06-24 18:32       ` Ferruh Yigit
2020-06-25  9:06         ` Andrew Rybchenko [this message]
2020-06-23 14:48 ` Stephen Hemminger
2020-06-23 15:22   ` Andrew Rybchenko
2020-07-22  6:38 ` [dpdk-dev] [RFC v2 0/3] add rx buffer size " Chengchang Tang
2020-07-22  6:38   ` [dpdk-dev] [RFC v2 1/3] ethdev: add a field " Chengchang Tang
2020-07-22  6:38   ` [dpdk-dev] [RFC v2 2/3] net/hns3: add support for query of rx/tx queue info Chengchang Tang
2020-07-22  6:38   ` [dpdk-dev] [RFC v2 3/3] app/testpmd: Add RX buffer size dispaly in queue info querry Chengchang Tang
2020-07-28  6:29   ` [dpdk-dev] [RFC v2 0/3] add rx buffer size for rte_eth_rxq_info Chengchang Tang
2020-07-28  9:30     ` Ferruh Yigit
2020-07-28 11:39       ` Chengchang Tang
2020-07-28 15:27         ` Thomas Monjalon
  -- strict thread matches above, loose matches on Subject: below --
2020-06-18 12:35 [dpdk-dev] [RFC] ethdev: add a field " Chengchang Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1f26668-e78a-fa5f-0760-b0929f1a62f7@solarflare.com \
    --to=arybchenko@solarflare.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=linuxarm@huawei.com \
    --cc=matan@mellanox.com \
    --cc=olivier.matz@6wind.com \
    --cc=qi.z.zhang@intel.com \
    --cc=shys@mellanox.com \
    --cc=tangchengchang@huawei.com \
    --cc=thomas@monjalon.net \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git