From: Andrew Rybchenko <arybchenko@solarflare.com> To: Ferruh Yigit <ferruh.yigit@intel.com>, Chengchang Tang <tangchengchang@huawei.com>, <dev@dpdk.org> Cc: <linuxarm@huawei.com>, <thomas@monjalon.net>, Olivier MATZ <olivier.matz@6wind.com>, Matan Azrad <matan@mellanox.com>, Jerin Jacob <jerin.jacob@caviumnetworks.com>, Shy Shyman <shys@mellanox.com>, Qi Zhang <qi.z.zhang@intel.com>, Konstantin Ananyev <konstantin.ananyev@intel.com> Subject: Re: [dpdk-dev] [RFC] ethdev: add a field for rte_eth_rxq_info Date: Thu, 25 Jun 2020 12:06:18 +0300 Message-ID: <f1f26668-e78a-fa5f-0760-b0929f1a62f7@solarflare.com> (raw) In-Reply-To: <ae00d51e-5785-0c7f-0ee6-c0dee8415da7@intel.com> On 6/24/20 9:32 PM, Ferruh Yigit wrote: > On 6/24/2020 9:52 AM, Ferruh Yigit wrote: >> On 6/24/2020 4:48 AM, Chengchang Tang wrote: >>> >>> On 2020/6/23 17:30, Andrew Rybchenko wrote: >>>> On 6/23/20 9:48 AM, Chengchang Tang wrote: >>>>> In common practice, PMD configure the rx_buf_size according to the data >>>>> room size of the object in mempool. But in fact the final value is related >>>>> to the specifications of hw, and its values will affect the number of >>>>> fragments in recieving pkts. >>>>> >>>>> At present, we seem to have no way to espose relevant information to upper >>>>> layer users. >>>>> >>>>> Add a field named rx_bufsize in rte_eth_rxq_info to indicate the buffer >>>>> size used in recieving pkts for hw. >>>>> >>>> >>>> I'm OK with the change in general. >>>> I'm unsure which name to use: 'rx_buf_size' or 'rx_bursize', >>>> since I found both 'min_rx_buf_size' and 'min_rx_bufsize' in >>>> ethdev. >>>> >>>> I think it is important to update PMDs which provides the >>>> information to fill the field in. >>> >>> My plan is to divide the subsequent series into two patches, >>> one to modify rte_eth_rxq_info, and one to add our hns3 PMD >>> implementation of rxq_info_get. Should i update all the PMDs >>> that provide this information and test programs such as >>> testpmd at the same time? >> >> Hi Chengchang, Andrew, >> >> No objection to the change, but it should be crystal clear what is added. These >> are for PMD developers to implement and when it is not clear we end up having >> different implementations and inconsistencies. >> >> There is already some confusion for the Rx packet size etc.. my concern is >> adding more to it, here all we have is "size of RX buffer." comment, I think we >> need more. > > cc'ed a few more people. > > Back to this favorite topic, how to configure/limit the packet size. > > Can you please help to have a common/correct understanding? I tried to clarify > as much as I got it, any comment welcome. (I know it is long, please bare with me) > > > The related config options I can see, > 1) rte_eth_conf->rxmode->max_rx_pkt_len > 2) rte_eth_dev_info->max_rx_pktlen > 3) DEV_RX_OFFLOAD_JUMBO_FRAME > 4) rte_eth_dev->data->mtu > 5) DEV_RX_OFFLOAD_SCATTER > 6) dev->data->scattered_rx > 7) rte_eth_dev_info->min_mtu, rte_eth_dev_info->max_mtu > > 'mtu' (4): Both for Tx and Rx. The network layer payload length. Default value > 'RTE_ETHER_MTU'. > > 'max_rx_pkt_len' (1): Only for Rx, maximum Rx frame length configured by > application. > > 'max_rx_pktlen' (2): Device reported value on what maximum Rx frame length it > can receive. Application shouldn't set Rx frame length more than this value. > > 'DEV_RX_OFFLOAD_JUMBO_FRAME' (3): Device Jumbo Frame capability. > When not enabled the Rx frame length is 'MTU' + overhead > When enabled Rx frame length is 'max_rx_pkt_len' > > 'DEV_RX_OFFLOAD_SCATTER' (5): Capability to scatter packet to multiple > descriptor by device and driver converting this to chained mbuf. > > 'dev->data->scattered_rx' (6): The current status of driver scattered Rx, in > device data mostly for PMD internal usage. > > 'rte_eth_dev_info->min_mtu' & 'rte_eth_dev_info->max_mtu' (7): minimum and > maximum MTU values device supported. > 'max_mtu' == 'max_rx_pkt_len' - L2_OVERHEAD. > > > I can see two different limits, > a) The Rx frame length limit that device can receive from wire. Any packet > larger than this size will be dropped by device in an early stage. > b) The Rx buffer length limit that received packets are written to. Device > shouldn't DMA larger than reserved buffer size. > > If device supports scattered Rx to multiple descriptors, it can be possible to > configure (a) > (b). > Otherwise configuration have to be (b) >= (a). > > For example if the mbuf size is 2Kb and the device can receive up to 9000 bytes. > Options are: > - If device supports it, large packet will be scattered on multiple mbufs > - or need to configure device Rx frame length to 2K (mbuf size) > - or need to allocate mbuf big enough to get largest possible packet (9000) > > > > Issues I see: > ------------- > > i) Although the code clearly says 'max_rx_pkt_len' is only valid when jumbo > frames enabled, some drivers are taking it account always. Ack, that's not good. > ii) Some drivers enable 'scattered_rx' & 'jumbo frame' implicitly, without > having 'DEV_RX_OFFLOAD_JUMBO_FRAME' or 'DEV_RX_OFFLOAD_SCATTER' requested by > application. Ack that it is a problem especially for scatter. > iii) Having both 'mtu' & 'max_rx_pkt_len' are confusing, although they are not > exactly same thing they are related. Difference is MTU applies for Tx too, and > L2 network layer overhead is not included. > 'MTU' can be more interested by upper layers, 'max_rx_pkt_len' is more driver > level information. And driver should know how to convert one to another. Agree > iv) 'max_rx_pkt_len' provided as part of 'rte_eth_dev_configure()' and there is > no API to update it later. > 'mtu' is not part of 'rte_eth_dev_configure()', it can only be updated later > with specific API. > But driver have to keep these two values consistent. Agree > v) 'max_rx_pktlen' & 'max_mtu' reports from driver are redundant information. > Again they are not same thing, but correlated. Agree > > Suggested changes: > ----------------- > > Overall unify 'max_rx_pkt_len' & 'mtu' as much as possible, at first step: > > i) Make 'max_rx_pkt_len' independent from 'DEV_RX_OFFLOAD_JUMBO_FRAME', so > 'max_rx_pkt_len' value will be always valid, jumbo frame enabled or not. It will make handling of the max_rx_pkt_len mandatory in all network PMDs. > > ii) in '.dev_configure' convert 'max_rx_pkt_len' value to 'mtu' value, this will > be only point 'max_rx_pkt_len' is used, after that point PMD will always use > 'mtu' value. I'm not sure that it is a right direction. Above you say that 'max_rx_pkt_len' is more driver level and I agree with it. I guess most drivers operate it finally (i.e. configure underlying HW in terms of max_rx_pkt_len, not MTU). So, converted from max_rx_pkt_len to MTU on ethdev level and covered back from MTU to max_rx_pkt_len in drivers. > Even don't reflect 'rte_eth_dev_set_mtu()' changes to 'max_rx_pkt_len' anymore. Not sure that I get it. max_rx_pkt_len is used on dev_configure only. Is it reported on get somewhere? > > iii) Don't make 'max_rx_pkt_len' a mandatory config option, let it be '0' by > application, in that case 'rte_eth_dev_configure()' will set > "'max_rx_pkt_len' = RTE_ETHER_MAX_LEN" if 'DEV_RX_OFFLOAD_JUMBO_FRAME' disabled > "'max_rx_pkt_len' = 9000 if 'DEV_RX_OFFLOAD_JUMBO_FRAME' enabled Why 9000? IMHO, if max_rx_pkt_len is 0, just use value derived from MTU. > > iv) Allow implicit update of 'DEV_RX_OFFLOAD_JUMBO_FRAME' on MTU set, since > setting a large MTU implies the jumbo frame request. And there is no harm to > application. Yes and I'd deprecate DEV_RX_OFFLOAD_JUMBO_FRAME. > > v) Do NOT allow implicit update of 'DEV_RX_OFFLOAD_SCATTER' on MTU set (when Rx > frame length > Rx buffer length), since application may not be capable of > parsing chained mbufs. Instead fails the MTU set in that case. > [This can break some applications, relying on this implicit set.] Yes, definitely. > > > Any comments? > > > > Additional details: > ------------------- > > Behavior of some drivers: > > What igb & ixgbe does > - Set Rx frame limit (a) using 'max_rx_pkt_len' (1) > - Set Rx buffer limit (b) using mbuf data size > - Enable Scattered Rx (5 & 6) if the Rx frame limit (a) bigger than Rx buffer > limit (b) (even user not requested for it) > > What i40e does same as above, only differences > - Return error if jumbo frame enabled and 'max_rx_pkt_len' < RTE_ETHER_MAX_LEN > > sfc: > - Set Rx frame limit (a) > - using 'max_rx_pkt_len' (1) when jumbo frame enabled > - using 'mtu' when jumbo frame not enabled. > - Set Rx buffer limit (b) using mbuf data size > - If Rx frame limit (a) bigger than Rx buffer limit (b), and user not requested > 'DEV_RX_OFFLOAD_SCATTER', return error. Ack > > octeontx2: > - Set Rx frame limit (a) using 'max_rx_pkt_len' (1). Implicitly enable jumbo > frame based on 'max_rx_pkt_len'. > - I can't able find how Rx buffer limit (b) set > - Enable Scattered Rx (5) if the Rx frame limit (a) bigger than Rx buffer limit > (b) (even user not requested for it). 'dev->data->scattered_rx' not set at all. > > >> Adding a PMD implementation and testpmd updates helps to clarify the >> intention/usage, so I suggest sending them as a single patch with this one. >> >> Updating all PMDs is a bigger ask and sometimes too hard because of lack of >> knowledge on the internals of other PMDs, although this is causing feature gaps >> time to time, we are not mandating this to developers, so please update as many >> PMD as you can, that you are confident, rest should be done by their maintainers. >> >>>> >>>>> Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> >>>> >>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> >>>> >>>>> --- >>>>> lib/librte_ethdev/rte_ethdev.h | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h >>>>> index 0f6d053..82b7e98 100644 >>>>> --- a/lib/librte_ethdev/rte_ethdev.h >>>>> +++ b/lib/librte_ethdev/rte_ethdev.h >>>>> @@ -1306,6 +1306,7 @@ struct rte_eth_rxq_info { >>>>> struct rte_eth_rxconf conf; /**< queue config parameters. */ >>>>> uint8_t scattered_rx; /**< scattered packets RX supported. */ >>>>> uint16_t nb_desc; /**< configured number of RXDs. */ >>>>> + uint16_t rx_bufsize; /**< size of RX buffer. */ >>>>> } __rte_cache_min_aligned; >>>>> >>>>> /**
next prev parent reply other threads:[~2020-06-25 9:06 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-23 6:48 Chengchang Tang 2020-06-23 9:30 ` Andrew Rybchenko 2020-06-24 3:48 ` Chengchang Tang 2020-06-24 8:52 ` Ferruh Yigit 2020-06-24 18:32 ` Ferruh Yigit 2020-06-25 9:06 ` Andrew Rybchenko [this message] 2020-06-23 14:48 ` Stephen Hemminger 2020-06-23 15:22 ` Andrew Rybchenko 2020-07-22 6:38 ` [dpdk-dev] [RFC v2 0/3] add rx buffer size " Chengchang Tang 2020-07-22 6:38 ` [dpdk-dev] [RFC v2 1/3] ethdev: add a field " Chengchang Tang 2020-07-22 6:38 ` [dpdk-dev] [RFC v2 2/3] net/hns3: add support for query of rx/tx queue info Chengchang Tang 2020-07-22 6:38 ` [dpdk-dev] [RFC v2 3/3] app/testpmd: Add RX buffer size dispaly in queue info querry Chengchang Tang 2020-07-28 6:29 ` [dpdk-dev] [RFC v2 0/3] add rx buffer size for rte_eth_rxq_info Chengchang Tang 2020-07-28 9:30 ` Ferruh Yigit 2020-07-28 11:39 ` Chengchang Tang 2020-07-28 15:27 ` Thomas Monjalon -- strict thread matches above, loose matches on Subject: below -- 2020-06-18 12:35 [dpdk-dev] [RFC] ethdev: add a field " Chengchang Tang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=f1f26668-e78a-fa5f-0760-b0929f1a62f7@solarflare.com \ --to=arybchenko@solarflare.com \ --cc=dev@dpdk.org \ --cc=ferruh.yigit@intel.com \ --cc=jerin.jacob@caviumnetworks.com \ --cc=konstantin.ananyev@intel.com \ --cc=linuxarm@huawei.com \ --cc=matan@mellanox.com \ --cc=olivier.matz@6wind.com \ --cc=qi.z.zhang@intel.com \ --cc=shys@mellanox.com \ --cc=tangchengchang@huawei.com \ --cc=thomas@monjalon.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
DPDK patches and discussions This inbox may be cloned and mirrored by anyone: git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \ dev@dpdk.org public-inbox-index dev Example config snippet for mirrors. Newsgroup available over NNTP: nntp://inbox.dpdk.org/inbox.dpdk.dev AGPL code for this site: git clone https://public-inbox.org/public-inbox.git