From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 37E527CB1 for ; Wed, 24 May 2017 14:38:29 +0200 (CEST) Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP; 24 May 2017 05:38:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,386,1491289200"; d="scan'208";a="104831554" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by orsmga005.jf.intel.com with ESMTP; 24 May 2017 05:38:26 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.12]) by IRSMSX154.ger.corp.intel.com ([169.254.12.233]) with mapi id 14.03.0319.002; Wed, 24 May 2017 13:38:26 +0100 From: "Ananyev, Konstantin" To: "Hu, Jiayu" CC: "dev@dpdk.org" , "Wiles, Keith" , "yuanhan.liu@linux.intel.com" Thread-Topic: [PATCH v3 1/3] lib: add Generic Receive Offload API framework Thread-Index: AQHSvNIioTj7qilZNEunAvoVQTYKUKIAL6+AgAGlgACAAN/38A== Date: Wed, 24 May 2017 12:38:25 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583FAF9BAE@IRSMSX109.ger.corp.intel.com> References: <1491309106-94264-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-2-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772583FAF8C27@IRSMSX109.ger.corp.intel.com> <20170523103154.GA2033@localhost.localdomain> In-Reply-To: <20170523103154.GA2033@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 May 2017 12:38:31 -0000 Hi Jiayu, >=20 > Hi Konstantin, >=20 > Thanks for your comments. My replies/questions are below. >=20 > BRs, > Jiayu >=20 > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote: > > Hi Jiayu, > > My comments/questions below. > > Konstantin > > > > > > > > For applications, DPDK GRO provides three external functions to > > > enable/disable GRO: > > > - rte_gro_init: initialize GRO environment; > > > - rte_gro_enable: enable GRO for a given port; > > > - rte_gro_disable: disable GRO for a given port. > > > Before using GRO, applications should explicitly call rte_gro_init to > > > initizalize GRO environment. After that, applications can call > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO = for > > > specific ports. > > > > I think this is too restrictive and wouldn't meet various user's needs. > > User might want to: > > - enable/disable GRO for particular RX queue > > - or even setup different GRO types for different RX queues, > > i.e, - GRO over IPV4/TCP for queue 0, and GRO over IPV6/TCP for que= ue 1, etc. >=20 > The reason for enabling/disabling GRO per-port instead of per-queue is th= at LINUX > controls GRO per-port. To control GRO per-queue indeed can provide more f= lexibility > to applications. But are there any scenarios that different queues of a p= ort may > require different GRO control (i.e. GRO types and enable/disable GRO)? I think yes. >=20 > > - For various reasons, user might prefer not to use RX callbacks for va= rious reasons, > > But invoke gro() manually at somepoint in his code. >=20 > An application-used GRO library can enable more flexibility to applicatio= ns. Besides, > when perform GRO in ethdev layer or inside PMD drivers, it is an issue th= at > rte_eth_rx_burst returns actually received packet number or GROed packet = number. And > the same issue happens in GSO, and even more seriously. This is because a= pplications > , like VPP, always rely on the return value of rte_eth_tx_burst to decide= further > operations. If applications can direcly call GRO/GSO libraries, this issu= e won't exist. > And DPDK is a library, which is not a holistic system like LINUX. We don'= t need to do > the same as LINUX. Therefore, maybe it's a better idea to directly provid= e SW > segmentation/reassembling libraries to applications. >=20 > > - Many users would like to control size (number of flows/items per flow= ), > > max allowed packet size, max timeout, etc., for different GRO tables. > > - User would need a way to flush all or only timeout packets from parti= cular GRO tables. > > > > So I think that API needs to extended to become be much more fine-grain= ed. > > Something like that: > > > > struct rte_gro_tbl_param { > > int32_t socket_id; > > size_t max_flows; > > size_t max_items_per_flow; > > size_t max_pkt_size; > > uint64_t packet_timeout_cycles; > > > > > > ... > > }; > > > > struct rte_gro_tbl; > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *p= aram); > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl); >=20 > Yes, I agree with you. It's necessary to provide more fine-grained contro= l APIs to > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP = packets? For any packets that sits in the gro_table for too long. >=20 > > > > /* > > * process packets, might store some packets inside the GRO table, > > * returns number of filled entries in pkt[] > > */ > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *= pkt[], uint32_t num); > > > > /* > > * retirieves up to num timeouted packets from the table. > > */ > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, st= ruct rte_mbuf *pkt[], uint32_t num); >=20 > Currently, we implement GRO as RX callback, whose processing logic is sim= ple: > receive burst packets -> perform GRO -> return to application. GRO stops = after > finishing processing received packets. If we provide rte_gro_tbl_timeout,= when > and who will call it? I mean the following scenario: We receive a packet, find it is eligible for GRO and put it into gro_table in expectation - there would be more packets for the same flow. But it could happen that we would never (or for some long time) receive any new packets for that flow. So the first packet would never be delivered to the upper layer, or delivered too late. So we need a mechanism to extract such not merged packets and push them to the upper layer. My thought it would be application responsibility to call it from time to t= ime. That's actually another reason why I think we shouldn't use application to always use RX callbacks for GRO. >=20 > > > > And if you like to keep them as helper functions: > > > > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param); > > void rte_gro_port_disable(uint8_t port); > > > > Though from my perspective, it is out of scope of that library, > > and I'd leave it to the upper layer to decide which callbacks and > > in what order have to be installed on particular port. >=20 > In my opinion, GRO library includes two parts. One is in charge of reasse= mbling > packets, the other is in charge of implementing GRO as RX callback. And > rte_gro_port_enable/disable belongs to the second part. You mean we shoul= d let > applications to register RX callback, and GRO library just provides reass= embling > related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)? Yes, that would be my preference. Let the application developer to decide how/when to call GRO functions (callback, direct call, some combination). >=20 > > > > .... > > > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c > > > new file mode 100644 > > > index 0000000..996b382 > > > --- /dev/null > > > +++ b/lib/librte_gro/rte_gro.c > > > @@ -0,0 +1,219 @@ > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#include "rte_gro.h" > > > +#include "rte_gro_common.h" > > > + > > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] =3D {NULL}; > > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] =3D {NULL}; > > > > Why not to initialize these arrays properly at building time (here)? >=20 > Yes, I should declare them as static variables. >=20 > > > > > + > > > +struct rte_gro_status *gro_status; > > > + > > > +/** > > > + * Internal function. It creates one hashing table for all > > > + * DPDK-supported GRO types, and all of them are stored in an object > > > + * of struct rte_gro_tbl. > > > + * > > > + * @param name > > > + * Name for GRO lookup table > > > + * @param nb_entries > > > + * Element number of each hashing table > > > + * @param socket_id > > > + * socket id > > > + * @param gro_tbl > > > + * gro_tbl points to a rte_gro_tbl object, which will be initalized > > > + * inside rte_gro_tbl_setup. > > > + * @return > > > + * If create successfully, return a positive value; if not, return > > > + * a negative value. > > > + */ > > > +static int > > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries, > > > + uint16_t socket_id, struct rte_gro_tbl *gro_tbl) > > > +{ > > > + gro_tbl_create_fn create_tbl_fn; > > > + const uint32_t len =3D strlen(name) + 10; > > > + char tbl_name[len]; > > > + int i; > > > + > > > + for (i =3D 0; i < GRO_SUPPORT_TYPE_NB; i++) { > > > + sprintf(tbl_name, "%s_%u", name, i); > > > + create_tbl_fn =3D tbl_create_functions[i]; > > > + if (create_tbl_fn && (create_tbl_fn(name, > > > + nb_entries, > > > + socket_id, > > > + &(gro_tbl-> > > > + lkp_tbls[i].hash_tbl)) > > > + < 0)) { > > > > Shouldn't you destroy already created tables here? >=20 > Yes, I should add codes to destory created tables before creating new one= s. >=20 > > > > > + return -1; > > > + } > > > + gro_tbl->lkp_tbls[i].gro_type =3D i; > > > + } > > > + return 1; > > > +} > > > + > > > +/** > > > + * Internal function. It frees all the hashing tables stored in > > > + * the given struct rte_gro_tbl object. > > > + */ > > > +static void > > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl) > > > +{ > > > + int i; > > > + > > > + if (gro_tbl =3D=3D NULL) > > > + return; > > > + for (i =3D 0; i < GRO_SUPPORT_TYPE_NB; i++) { > > > + rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl); > > > + gro_tbl->lkp_tbls[i].hash_tbl =3D NULL; > > > + gro_tbl->lkp_tbls[i].gro_type =3D GRO_EMPTY_TYPE; > > > + } > > > +} > > > + > > > +/** > > > + * Internal function. It performs all supported GRO types on inputte= d > > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and > > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IP= v6 > > > + * packets. Packets of unsupported GRO types won't be processed. For > > > + * ethernet devices, which want to support GRO, this function is use= d to > > > + * registered as RX callback for all queues. > > > + * > > > + * @param pkts > > > + * Packets to reassemble. > > > + * @param nb_pkts > > > + * The number of packets to reassemble. > > > + * @param gro_tbl > > > + * pointer points to an object of struct rte_gro_tbl, which has bee= n > > > + * initialized by rte_gro_tbl_setup. > > > + * @return > > > + * Packet number after GRO. If reassemble successfully, the value i= s > > > + * less than nb_pkts; if not, the value is equal to nb_pkts. If the > > > + * parameters are invalid, return 0. > > > + */ > > > +static uint16_t > > > +rte_gro_reassemble_burst(uint8_t port __rte_unused, > > > + uint16_t queue __rte_unused, > > > + struct rte_mbuf **pkts, > > > + uint16_t nb_pkts, > > > + uint16_t max_pkts __rte_unused, > > > + void *gro_tbl) > > > +{ > > > + if ((gro_tbl =3D=3D NULL) || (pkts =3D=3D NULL)) { > > > + printf("invalid parameters for GRO.\n"); > > > + return 0; > > > + } > > > + uint16_t nb_after_gro =3D nb_pkts; > > > > Here and everywhere - please move variable definitions to the top of th= e block. >=20 > Thanks. I will modify them in next version. >=20 > > > > > + > > > + return nb_after_gro; > > > +} > > > + > > > +void > > > +rte_gro_init(void) > > > +{ > > > + uint8_t nb_port, i; > > > + uint16_t nb_queue; > > > + struct rte_eth_dev_info dev_info; > > > + > > > + /* if init already, return immediately */ > > > + if (gro_status) { > > > + printf("repeatly init GRO environment\n"); > > > + return; > > > + } > > > + > > > + gro_status =3D (struct rte_gro_status *)rte_zmalloc( > > > + NULL, > > > + sizeof(struct rte_gro_status), > > > + 0); > > > + > > > + nb_port =3D rte_eth_dev_count(); > > > > Number of ports and number of configured queues per port can change dyn= amically. > > In fact, I don't think we need that function and global gro_status at a= ll. > > To add/delete RX callback for particular port/queue - you can just scan= over exisiting callbacks > > Searching for particular cb_func and cb_arg values. >=20 > Yes, it's better to store these parameters (e.g. hashing table pointers) = as cb_arg. But > we can't remove RX callback by searching for particular cb_func inside GR= O library, since > these operations need locking protection and the lock variable (i.e. rte_= eth_rx_cb_lock) > is unavailable to GRO library. To my knowledge, the only way is to provid= e a new API in > lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func= or cb_arg values. > You mean we need to add this API? Yes my though was to add something like: rte_eth_find_rx_callback() /*find specific callback = *./ or rte_eth_remove_get_rx_callbacks() /*get a copy of list of all currentl= y installed callback */=20 But if we move installing GRO callbacks out of scope of the library, we pro= bably wouldn't need it. >=20 > > > > > + gro_status->ports =3D (struct gro_port_status *)rte_zmalloc( > > > + NULL, > > > + nb_port * sizeof(struct gro_port_status), > > > + 0); > > > + gro_status->nb_port =3D nb_port; > > > + > > > + for (i =3D 0; i < nb_port; i++) { > > > + rte_eth_dev_info_get(i, &dev_info); > > > + nb_queue =3D dev_info.nb_rx_queues; > > > + gro_status->ports[i].gro_tbls =3D > > > + (struct rte_gro_tbl **)rte_zmalloc( > > > + NULL, > > > + nb_queue * sizeof(struct rte_gro_tbl *), > > > + 0); > > > + gro_status->ports[i].gro_cbs =3D > > > + (struct rte_eth_rxtx_callback **) > > > + rte_zmalloc( > > > + NULL, > > > + nb_queue * > > > + sizeof(struct rte_eth_rxtx_callback *), > > > + 0); > > > + } > > > +} > > > + > > > +} > > .... > > > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro= _common.h > > .... > > > > > + > > > +typedef int (*gro_tbl_create_fn)( > > > + char *name, > > > + uint32_t nb_entries, > > > + uint16_t socket_id, > > > + struct rte_hash **hash_tbl); > > > + > > > > If you have create_fn, then you'll probably need a destroy_fn too. > > Second thing why always assume that particular GRO implementation would > > always use rte_hash to store it's data? > > Probably better hide that inside something neutral, like 'struct gro_da= ta;' or so. >=20 > Agree. I will modify it. >=20 > > > > > +typedef int32_t (*gro_reassemble_fn)( > > > + struct rte_hash *hash_tbl, > > > + struct gro_item_list *item_list); > > > +#endif > > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk > > > index b5215c0..8956821 100644 > > > --- a/mk/rte.app.mk > > > +++ b/mk/rte.app.mk > > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING) +=3D -l= rte_ring > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL) +=3D -lrte_eal > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE) +=3D -lrte_cmdline > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER) +=3D -lrte_reorder > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO) +=3D -lrte_gro > > > > > > ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) > > > _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI) +=3D -lrte_kni > > > -- > > > 2.7.4