From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 2381099FE for ; Fri, 26 May 2017 09:25:59 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 May 2017 00:25:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,395,1491289200"; d="scan'208";a="266632007" Received: from unknown (HELO localhost.localdomain) ([10.239.128.234]) by fmsmga004.fm.intel.com with ESMTP; 26 May 2017 00:25:54 -0700 Date: Fri, 26 May 2017 15:26:13 +0800 From: Jiayu Hu To: "Ananyev, Konstantin" Cc: "dev@dpdk.org" , "Wiles, Keith" , "yuanhan.liu@linux.intel.com" Message-ID: <20170526072613.GA85810@localhost.localdomain> References: <1491309106-94264-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-2-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772583FAF8C27@IRSMSX109.ger.corp.intel.com> <20170523103154.GA2033@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAF9BAE@IRSMSX109.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB9772583FAF9BAE@IRSMSX109.ger.corp.intel.com> User-Agent: Mutt/1.7.1 (2016-10-04) Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 May 2017 07:26:01 -0000 Hi Konstantin, On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote: > > Hi Jiayu, > > > > > Hi Konstantin, > > > > Thanks for your comments. My replies/questions are below. > > > > BRs, > > Jiayu > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote: > > > Hi Jiayu, > > > My comments/questions below. > > > Konstantin > > > > > > > > > > > For applications, DPDK GRO provides three external functions to > > > > enable/disable GRO: > > > > - rte_gro_init: initialize GRO environment; > > > > - rte_gro_enable: enable GRO for a given port; > > > > - rte_gro_disable: disable GRO for a given port. > > > > Before using GRO, applications should explicitly call rte_gro_init to > > > > initizalize GRO environment. After that, applications can call > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for > > > > specific ports. > > > > > > I think this is too restrictive and wouldn't meet various user's needs. > > > User might want to: > > > - enable/disable GRO for particular RX queue > > > - or even setup different GRO types for different RX queues, > > > i.e, - GRO over IPV4/TCP for queue 0, and GRO over IPV6/TCP for queue 1, etc. > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility > > to applications. But are there any scenarios that different queues of a port may > > require different GRO control (i.e. GRO types and enable/disable GRO)? > > I think yes. > > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons, > > > But invoke gro() manually at somepoint in his code. > > > > An application-used GRO library can enable more flexibility to applications. Besides, > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And > > the same issue happens in GSO, and even more seriously. This is because applications > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist. > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW > > segmentation/reassembling libraries to applications. > > > > > - Many users would like to control size (number of flows/items per flow), > > > max allowed packet size, max timeout, etc., for different GRO tables. > > > - User would need a way to flush all or only timeout packets from particular GRO tables. > > > > > > So I think that API needs to extended to become be much more fine-grained. > > > Something like that: > > > > > > struct rte_gro_tbl_param { > > > int32_t socket_id; > > > size_t max_flows; > > > size_t max_items_per_flow; > > > size_t max_pkt_size; > > > uint64_t packet_timeout_cycles; > > > > > > > > > ... > > > }; > > > > > > struct rte_gro_tbl; > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param); > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl); > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets? > > For any packets that sits in the gro_table for too long. > > > > > > > > > /* > > > * process packets, might store some packets inside the GRO table, > > > * returns number of filled entries in pkt[] > > > */ > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num); > > > > > > /* > > > * retirieves up to num timeouted packets from the table. > > > */ > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num); > > > > Currently, we implement GRO as RX callback, whose processing logic is simple: > > receive burst packets -> perform GRO -> return to application. GRO stops after > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when > > and who will call it? > > I mean the following scenario: > We receive a packet, find it is eligible for GRO and put it into gro_table > in expectation - there would be more packets for the same flow. > But it could happen that we would never (or for some long time) receive > any new packets for that flow. > So the first packet would never be delivered to the upper layer, > or delivered too late. > So we need a mechanism to extract such not merged packets > and push them to the upper layer. > My thought it would be application responsibility to call it from time to time. > That's actually another reason why I think we shouldn't use application > to always use RX callbacks for GRO. Currently, we only provide one reassembly function, rte_gro_reassemble_burst, which merges N inputted packets at a time. After finishing processing these packets, it returns all of them and reset hashing tables. Therefore, there are no packets in hashing tables after rte_gro_reassemble_burst returns. If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly function, like rte_gro_reassemble, which processes one given packet at a time and won't reset hashing tables. Applications decide when to flush packets in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used to flush the packets. Do you mean that? > > > > > > > > > And if you like to keep them as helper functions: > > > > > > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param); > > > void rte_gro_port_disable(uint8_t port); > > > > > > > Though from my perspective, it is out of scope of that library, > > > and I'd leave it to the upper layer to decide which callbacks and > > > in what order have to be installed on particular port. > > > > In my opinion, GRO library includes two parts. One is in charge of reassembling > > packets, the other is in charge of implementing GRO as RX callback. And > > rte_gro_port_enable/disable belongs to the second part. You mean we should let > > applications to register RX callback, and GRO library just provides reassembling > > related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)? > > Yes, that would be my preference. > Let the application developer to decide how/when to > call GRO functions (callback, direct call, some combination). > > > > > > > > > .... > > > > > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c > > > > new file mode 100644 > > > > index 0000000..996b382 > > > > --- /dev/null > > > > +++ b/lib/librte_gro/rte_gro.c > > > > @@ -0,0 +1,219 @@ > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > + > > > > +#include "rte_gro.h" > > > > +#include "rte_gro_common.h" > > > > + > > > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL}; > > > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL}; > > > > > > Why not to initialize these arrays properly at building time (here)? > > > > Yes, I should declare them as static variables. > > > > > > > > > + > > > > +struct rte_gro_status *gro_status; > > > > + > > > > +/** > > > > + * Internal function. It creates one hashing table for all > > > > + * DPDK-supported GRO types, and all of them are stored in an object > > > > + * of struct rte_gro_tbl. > > > > + * > > > > + * @param name > > > > + * Name for GRO lookup table > > > > + * @param nb_entries > > > > + * Element number of each hashing table > > > > + * @param socket_id > > > > + * socket id > > > > + * @param gro_tbl > > > > + * gro_tbl points to a rte_gro_tbl object, which will be initalized > > > > + * inside rte_gro_tbl_setup. > > > > + * @return > > > > + * If create successfully, return a positive value; if not, return > > > > + * a negative value. > > > > + */ > > > > +static int > > > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries, > > > > + uint16_t socket_id, struct rte_gro_tbl *gro_tbl) > > > > +{ > > > > + gro_tbl_create_fn create_tbl_fn; > > > > + const uint32_t len = strlen(name) + 10; > > > > + char tbl_name[len]; > > > > + int i; > > > > + > > > > + for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) { > > > > + sprintf(tbl_name, "%s_%u", name, i); > > > > + create_tbl_fn = tbl_create_functions[i]; > > > > + if (create_tbl_fn && (create_tbl_fn(name, > > > > + nb_entries, > > > > + socket_id, > > > > + &(gro_tbl-> > > > > + lkp_tbls[i].hash_tbl)) > > > > + < 0)) { > > > > > > Shouldn't you destroy already created tables here? > > > > Yes, I should add codes to destory created tables before creating new ones. > > > > > > > > > + return -1; > > > > + } > > > > + gro_tbl->lkp_tbls[i].gro_type = i; > > > > + } > > > > + return 1; > > > > +} > > > > + > > > > +/** > > > > + * Internal function. It frees all the hashing tables stored in > > > > + * the given struct rte_gro_tbl object. > > > > + */ > > > > +static void > > > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl) > > > > +{ > > > > + int i; > > > > + > > > > + if (gro_tbl == NULL) > > > > + return; > > > > + for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) { > > > > + rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl); > > > > + gro_tbl->lkp_tbls[i].hash_tbl = NULL; > > > > + gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE; > > > > + } > > > > +} > > > > + > > > > +/** > > > > + * Internal function. It performs all supported GRO types on inputted > > > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and > > > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6 > > > > + * packets. Packets of unsupported GRO types won't be processed. For > > > > + * ethernet devices, which want to support GRO, this function is used to > > > > + * registered as RX callback for all queues. > > > > + * > > > > + * @param pkts > > > > + * Packets to reassemble. > > > > + * @param nb_pkts > > > > + * The number of packets to reassemble. > > > > + * @param gro_tbl > > > > + * pointer points to an object of struct rte_gro_tbl, which has been > > > > + * initialized by rte_gro_tbl_setup. > > > > + * @return > > > > + * Packet number after GRO. If reassemble successfully, the value is > > > > + * less than nb_pkts; if not, the value is equal to nb_pkts. If the > > > > + * parameters are invalid, return 0. > > > > + */ > > > > +static uint16_t > > > > +rte_gro_reassemble_burst(uint8_t port __rte_unused, > > > > + uint16_t queue __rte_unused, > > > > + struct rte_mbuf **pkts, > > > > + uint16_t nb_pkts, > > > > + uint16_t max_pkts __rte_unused, > > > > + void *gro_tbl) > > > > +{ > > > > + if ((gro_tbl == NULL) || (pkts == NULL)) { > > > > + printf("invalid parameters for GRO.\n"); > > > > + return 0; > > > > + } > > > > + uint16_t nb_after_gro = nb_pkts; > > > > > > Here and everywhere - please move variable definitions to the top of the block. > > > > Thanks. I will modify them in next version. > > > > > > > > > + > > > > + return nb_after_gro; > > > > +} > > > > + > > > > +void > > > > +rte_gro_init(void) > > > > +{ > > > > + uint8_t nb_port, i; > > > > + uint16_t nb_queue; > > > > + struct rte_eth_dev_info dev_info; > > > > + > > > > + /* if init already, return immediately */ > > > > + if (gro_status) { > > > > + printf("repeatly init GRO environment\n"); > > > > + return; > > > > + } > > > > + > > > > + gro_status = (struct rte_gro_status *)rte_zmalloc( > > > > + NULL, > > > > + sizeof(struct rte_gro_status), > > > > + 0); > > > > + > > > > + nb_port = rte_eth_dev_count(); > > > > > > Number of ports and number of configured queues per port can change dynamically. > > > In fact, I don't think we need that function and global gro_status at all. > > > To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks > > > Searching for particular cb_func and cb_arg values. > > > > Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But > > we can't remove RX callback by searching for particular cb_func inside GRO library, since > > these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock) > > is unavailable to GRO library. To my knowledge, the only way is to provide a new API in > > lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values. > > You mean we need to add this API? > > Yes my though was to add something like: > rte_eth_find_rx_callback() /*find specific callback *./ > or rte_eth_remove_get_rx_callbacks() /*get a copy of list of all currently installed callback */ > > But if we move installing GRO callbacks out of scope of the library, we probably wouldn't need it. > > > > > > > > > > + gro_status->ports = (struct gro_port_status *)rte_zmalloc( > > > > + NULL, > > > > + nb_port * sizeof(struct gro_port_status), > > > > + 0); > > > > + gro_status->nb_port = nb_port; > > > > + > > > > + for (i = 0; i < nb_port; i++) { > > > > + rte_eth_dev_info_get(i, &dev_info); > > > > + nb_queue = dev_info.nb_rx_queues; > > > > + gro_status->ports[i].gro_tbls = > > > > + (struct rte_gro_tbl **)rte_zmalloc( > > > > + NULL, > > > > + nb_queue * sizeof(struct rte_gro_tbl *), > > > > + 0); > > > > + gro_status->ports[i].gro_cbs = > > > > + (struct rte_eth_rxtx_callback **) > > > > + rte_zmalloc( > > > > + NULL, > > > > + nb_queue * > > > > + sizeof(struct rte_eth_rxtx_callback *), > > > > + 0); > > > > + } > > > > +} > > > > + > > > > +} > > > .... > > > > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h > > > .... > > > > > > > + > > > > +typedef int (*gro_tbl_create_fn)( > > > > + char *name, > > > > + uint32_t nb_entries, > > > > + uint16_t socket_id, > > > > + struct rte_hash **hash_tbl); > > > > + > > > > > > If you have create_fn, then you'll probably need a destroy_fn too. > > > Second thing why always assume that particular GRO implementation would > > > always use rte_hash to store it's data? > > > Probably better hide that inside something neutral, like 'struct gro_data;' or so. > > > > Agree. I will modify it. > > > > > > > > > +typedef int32_t (*gro_reassemble_fn)( > > > > + struct rte_hash *hash_tbl, > > > > + struct gro_item_list *item_list); > > > > +#endif > > > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk > > > > index b5215c0..8956821 100644 > > > > --- a/mk/rte.app.mk > > > > +++ b/mk/rte.app.mk > > > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING) += -lrte_ring > > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL) += -lrte_eal > > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE) += -lrte_cmdline > > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER) += -lrte_reorder > > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO) += -lrte_gro > > > > > > > > ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) > > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI) += -lrte_kni > > > > -- > > > > 2.7.4