From: Andrew Rybchenko <arybchenko@solarflare.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>,
Chengchang Tang <tangchengchang@huawei.com>, <dev@dpdk.org>
Cc: <linuxarm@huawei.com>, <thomas@monjalon.net>,
Olivier MATZ <olivier.matz@6wind.com>,
Matan Azrad <matan@mellanox.com>,
Jerin Jacob <jerin.jacob@caviumnetworks.com>,
Shy Shyman <shys@mellanox.com>, Qi Zhang <qi.z.zhang@intel.com>,
Konstantin Ananyev <konstantin.ananyev@intel.com>
Subject: Re: [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info
Date: Thu, 25 Jun 2020 12:06:18 +0300 [thread overview]
Message-ID: <f1f26668-e78a-fa5f-0760-b0929f1a62f7@solarflare.com> (raw)
In-Reply-To: <ae00d51e-5785-0c7f-0ee6-c0dee8415da7@intel.com>
On 6/24/20 9:32 PM, Ferruh Yigit wrote:
> On 6/24/2020 9:52 AM, Ferruh Yigit wrote:
>> On 6/24/2020 4:48 AM, Chengchang Tang wrote:
>>>
>>> On 2020/6/23 17:30, Andrew Rybchenko wrote:
>>>> On 6/23/20 9:48 AM, Chengchang Tang wrote:
>>>>> In common practice, PMD configure the rx_buf_size according to the data
>>>>> room size of the object in mempool. But in fact the final value is related
>>>>> to the specifications of hw, and its values will affect the number of
>>>>> fragments in recieving pkts.
>>>>>
>>>>> At present, we seem to have no way to espose relevant information to upper
>>>>> layer users.
>>>>>
>>>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer
>>>>> size used in recieving pkts for hw.
>>>>>
>>>>
>>>> I'm OK with the change in general.
>>>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize',
>>>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in
>>>> ethdev.
>>>>
>>>> I think it is important to update PMDs which provides the
>>>> information to fill the field in.
>>>
>>> My plan is to divide the subsequent series into two patches,
>>> one to modify rte_eth_rxq_info, and one to add our hns3 PMD
>>> implementation of rxq_info_get. Should i update all the PMDs
>>> that provide this information and test programs such as
>>> testpmd at the same time?
>>
>> Hi Chengchang, Andrew,
>>
>> No objection to the change, but it should be crystal clear what is added. These
>> are for PMD developers to implement and when it is not clear we end up having
>> different implementations and inconsistencies.
>>
>> There is already some confusion for the Rx packet size etc.. my concern is
>> adding more to it, here all we have is "size of RX buffer." comment, I think we
>> need more.
>
> cc'ed a few more people.
>
> Back to this favorite topic, how to configure/limit the packet size.
>
> Can you please help to have a common/correct understanding? I tried to clarify
> as much as I got it, any comment welcome. (I know it is long, please bare with me)
>
>
> The related config options I can see,
> 1) rte_eth_conf->rxmode->max_rx_pkt_len
> 2) rte_eth_dev_info->max_rx_pktlen
> 3) DEV_RX_OFFLOAD_JUMBO_FRAME
> 4) rte_eth_dev->data->mtu
> 5) DEV_RX_OFFLOAD_SCATTER
> 6) dev->data->scattered_rx
> 7) rte_eth_dev_info->min_mtu, rte_eth_dev_info->max_mtu
>
> 'mtu' (4): Both for Tx and Rx. The network layer payload length. Default value
> 'RTE_ETHER_MTU'.
>
> 'max_rx_pkt_len' (1): Only for Rx, maximum Rx frame length configured by
> application.
>
> 'max_rx_pktlen' (2): Device reported value on what maximum Rx frame length it
> can receive. Application shouldn't set Rx frame length more than this value.
>
> 'DEV_RX_OFFLOAD_JUMBO_FRAME' (3): Device Jumbo Frame capability.
> When not enabled the Rx frame length is 'MTU' + overhead
> When enabled Rx frame length is 'max_rx_pkt_len'
>
> 'DEV_RX_OFFLOAD_SCATTER' (5): Capability to scatter packet to multiple
> descriptor by device and driver converting this to chained mbuf.
>
> 'dev->data->scattered_rx' (6): The current status of driver scattered Rx, in
> device data mostly for PMD internal usage.
>
> 'rte_eth_dev_info->min_mtu' & 'rte_eth_dev_info->max_mtu' (7): minimum and
> maximum MTU values device supported.
> 'max_mtu' == 'max_rx_pkt_len' - L2_OVERHEAD.
>
>
> I can see two different limits,
> a) The Rx frame length limit that device can receive from wire. Any packet
> larger than this size will be dropped by device in an early stage.
> b) The Rx buffer length limit that received packets are written to. Device
> shouldn't DMA larger than reserved buffer size.
>
> If device supports scattered Rx to multiple descriptors, it can be possible to
> configure (a) > (b).
> Otherwise configuration have to be (b) >= (a).
>
> For example if the mbuf size is 2Kb and the device can receive up to 9000 bytes.
> Options are:
> - If device supports it, large packet will be scattered on multiple mbufs
> - or need to configure device Rx frame length to 2K (mbuf size)
> - or need to allocate mbuf big enough to get largest possible packet (9000)
>
>
>
> Issues I see:
> -------------
>
> i) Although the code clearly says 'max_rx_pkt_len' is only valid when jumbo
> frames enabled, some drivers are taking it account always.
Ack, that's not good.
> ii) Some drivers enable 'scattered_rx' & 'jumbo frame' implicitly, without
> having 'DEV_RX_OFFLOAD_JUMBO_FRAME' or 'DEV_RX_OFFLOAD_SCATTER' requested by
> application.
Ack that it is a problem especially for scatter.
> iii) Having both 'mtu' & 'max_rx_pkt_len' are confusing, although they are not
> exactly same thing they are related. Difference is MTU applies for Tx too, and
> L2 network layer overhead is not included.
> 'MTU' can be more interested by upper layers, 'max_rx_pkt_len' is more driver
> level information. And driver should know how to convert one to another.
Agree
> iv) 'max_rx_pkt_len' provided as part of 'rte_eth_dev_configure()' and there is
> no API to update it later.
> 'mtu' is not part of 'rte_eth_dev_configure()', it can only be updated later
> with specific API.
> But driver have to keep these two values consistent.
Agree
> v) 'max_rx_pktlen' & 'max_mtu' reports from driver are redundant information.
> Again they are not same thing, but correlated.
Agree
>
> Suggested changes:
> -----------------
>
> Overall unify 'max_rx_pkt_len' & 'mtu' as much as possible, at first step:
>
> i) Make 'max_rx_pkt_len' independent from 'DEV_RX_OFFLOAD_JUMBO_FRAME', so
> 'max_rx_pkt_len' value will be always valid, jumbo frame enabled or not.
It will make handling of the max_rx_pkt_len mandatory in
all network PMDs.
>
> ii) in '.dev_configure' convert 'max_rx_pkt_len' value to 'mtu' value, this will
> be only point 'max_rx_pkt_len' is used, after that point PMD will always use
> 'mtu' value.
I'm not sure that it is a right direction.
Above you say that 'max_rx_pkt_len' is more driver level and
I agree with it. I guess most drivers operate it finally
(i.e. configure underlying HW in terms of max_rx_pkt_len,
not MTU). So, converted from max_rx_pkt_len to MTU on ethdev
level and covered back from MTU to max_rx_pkt_len in drivers.
> Even don't reflect 'rte_eth_dev_set_mtu()' changes to 'max_rx_pkt_len' anymore.
Not sure that I get it. max_rx_pkt_len is used on dev_configure
only. Is it reported on get somewhere?
>
> iii) Don't make 'max_rx_pkt_len' a mandatory config option, let it be '0' by
> application, in that case 'rte_eth_dev_configure()' will set
> "'max_rx_pkt_len' = RTE_ETHER_MAX_LEN" if 'DEV_RX_OFFLOAD_JUMBO_FRAME' disabled
> "'max_rx_pkt_len' = 9000 if 'DEV_RX_OFFLOAD_JUMBO_FRAME' enabled
Why 9000? IMHO, if max_rx_pkt_len is 0, just use value derived
from MTU.
>
> iv) Allow implicit update of 'DEV_RX_OFFLOAD_JUMBO_FRAME' on MTU set, since
> setting a large MTU implies the jumbo frame request. And there is no harm to
> application.
Yes and I'd deprecate DEV_RX_OFFLOAD_JUMBO_FRAME.
>
> v) Do NOT allow implicit update of 'DEV_RX_OFFLOAD_SCATTER' on MTU set (when Rx
> frame length > Rx buffer length), since application may not be capable of
> parsing chained mbufs. Instead fails the MTU set in that case.
> [This can break some applications, relying on this implicit set.]
Yes, definitely.
>
>
> Any comments?
>
>
>
> Additional details:
> -------------------
>
> Behavior of some drivers:
>
> What igb & ixgbe does
> - Set Rx frame limit (a) using 'max_rx_pkt_len' (1)
> - Set Rx buffer limit (b) using mbuf data size
> - Enable Scattered Rx (5 & 6) if the Rx frame limit (a) bigger than Rx buffer
> limit (b) (even user not requested for it)
>
> What i40e does same as above, only differences
> - Return error if jumbo frame enabled and 'max_rx_pkt_len' < RTE_ETHER_MAX_LEN
>
> sfc:
> - Set Rx frame limit (a)
> - using 'max_rx_pkt_len' (1) when jumbo frame enabled
> - using 'mtu' when jumbo frame not enabled.
> - Set Rx buffer limit (b) using mbuf data size
> - If Rx frame limit (a) bigger than Rx buffer limit (b), and user not requested
> 'DEV_RX_OFFLOAD_SCATTER', return error.
Ack
>
> octeontx2:
> - Set Rx frame limit (a) using 'max_rx_pkt_len' (1). Implicitly enable jumbo
> frame based on 'max_rx_pkt_len'.
> - I can't able find how Rx buffer limit (b) set
> - Enable Scattered Rx (5) if the Rx frame limit (a) bigger than Rx buffer limit
> (b) (even user not requested for it). 'dev->data->scattered_rx' not set at all.
>
>
>> Adding a PMD implementation and testpmd updates helps to clarify the
>> intention/usage, so I suggest sending them as a single patch with this one.
>>
>> Updating all PMDs is a bigger ask and sometimes too hard because of lack of
>> knowledge on the internals of other PMDs, although this is causing feature gaps
>> time to time, we are not mandating this to developers, so please update as many
>> PMD as you can, that you are confident, rest should be done by their maintainers.
>>
>>>>
>>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
>>>>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>>
>>>>> ---
>>>>> lib/librte_ethdev/rte_ethdev.h | 1 +
>>>>> 1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>>>> index 0f6d053..82b7e98 100644
>>>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info {
>>>>> struct rte_eth_rxconf conf; /**< queue config parameters. */
>>>>> uint8_t scattered_rx; /**< scattered packets RX supported. */
>>>>> uint16_t nb_desc; /**< configured number of RXDs. */
>>>>> + uint16_t rx_bufsize; /**< size of RX buffer. */
>>>>> } __rte_cache_min_aligned;
>>>>>
>>>>> /**
next prev parent reply other threads:[~2020-06-25 9:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-23 6:48 Chengchang Tang
2020-06-23 9:30 ` Andrew Rybchenko
2020-06-24 3:48 ` Chengchang Tang
2020-06-24 8:52 ` Ferruh Yigit
2020-06-24 18:32 ` Ferruh Yigit
2020-06-25 9:06 ` Andrew Rybchenko [this message]
2020-06-23 14:48 ` Stephen Hemminger
2020-06-23 15:22 ` Andrew Rybchenko
2020-07-22 6:38 ` [dpdk-dev] [RFC v2 0/3] add rx buffer size " Chengchang Tang
2020-07-22 6:38 ` [dpdk-dev] [RFC v2 1/3] ethdev: add a field " Chengchang Tang
2020-07-22 6:38 ` [dpdk-dev] [RFC v2 2/3] net/hns3: add support for query of rx/tx queue info Chengchang Tang
2020-07-22 6:38 ` [dpdk-dev] [RFC v2 3/3] app/testpmd: Add RX buffer size dispaly in queue info querry Chengchang Tang
2020-07-28 6:29 ` [dpdk-dev] [RFC v2 0/3] add rx buffer size for rte_eth_rxq_info Chengchang Tang
2020-07-28 9:30 ` Ferruh Yigit
2020-07-28 11:39 ` Chengchang Tang
2020-07-28 15:27 ` Thomas Monjalon
-- strict thread matches above, loose matches on Subject: below --
2020-06-18 12:35 [dpdk-dev] [RFC] ethdev: add a field " Chengchang Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f1f26668-e78a-fa5f-0760-b0929f1a62f7@solarflare.com \
--to=arybchenko@solarflare.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=jerin.jacob@caviumnetworks.com \
--cc=konstantin.ananyev@intel.com \
--cc=linuxarm@huawei.com \
--cc=matan@mellanox.com \
--cc=olivier.matz@6wind.com \
--cc=qi.z.zhang@intel.com \
--cc=shys@mellanox.com \
--cc=tangchengchang@huawei.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).