From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM01-BY2-obe.outbound.protection.outlook.com (mail-by2nam01on0063.outbound.protection.outlook.com [104.47.34.63]) by dpdk.org (Postfix) with ESMTP id C8BC2234 for ; Wed, 10 May 2017 16:12:35 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=5GxwaJWFTbeqYhQxKUOxn7q/mVqLHAJVS3lPiglplWA=; b=JpVBu4oyHEgVJFOidwqOcODDiQvfOGrzIgIIBhXlYQ5R6/DCkTblZEUhjIjISiMn/aj6D3+zmN89fXF3FVJTbGdH69DzF7eJJ3X+SL4M0AJcVfz2vJ0uB646f4u2/NFi2eukYl0YDv1XDAx42Q0z9i9Be/UDikuPgtj3cbMKKCQ= Authentication-Results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=caviumnetworks.com; Received: from jerin (171.76.125.152) by BN3PR0701MB1717.namprd07.prod.outlook.com (10.163.39.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1075.11; Wed, 10 May 2017 14:12:30 +0000 Date: Wed, 10 May 2017 19:42:13 +0530 From: Jerin Jacob To: Harry van Haaren Cc: dev@dpdk.org, Gage Eads , Bruce Richardson Message-ID: <20170510141202.GA8431@jerin> References: <1492768299-84016-1-git-send-email-harry.van.haaren@intel.com> <1492768299-84016-2-git-send-email-harry.van.haaren@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1492768299-84016-2-git-send-email-harry.van.haaren@intel.com> User-Agent: Mutt/1.8.2 (2017-04-18) X-Originating-IP: [171.76.125.152] X-ClientProxiedBy: MA1PR01CA0019.INDPRD01.PROD.OUTLOOK.COM (10.164.117.26) To BN3PR0701MB1717.namprd07.prod.outlook.com (10.163.39.16) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c8420ac5-0760-4c7b-f91f-08d497ae9a22 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201703131423075)(201703031133081); SRVR:BN3PR0701MB1717; X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1717; 3:ae8d928/ThHzJHK1J/HsMA0Ke01mBLiAc46Y97dnp52e41NBDkc2W7D5yux5ccrgybumgDu9tdrdkhwbEUcSsUj679FD8gS1Qn6AarCAJj48/8+WddXt7XtYR2vZwFL/k5eMeJzNl+ZF8xaSLfyJBZXvPN+FM0g6tRSONM6O45ABDWzl8DQVf+S12mH9aW3oEUh10vGlHn1Uv8eZBNrGhFYrW72e4RRGfi/MA4tggjB4DMvxFIn9ZjMUa+SfB7SIp1QNdu2CZpSaFHc4S3zBfz8BSjbEOGhLL4lmWrmN69LbcDZk7Oc+7QdN2lVwiU18taeTnuV058vZhnKcTSFFFg==; 25:rm0hgOkLxeXsjVe2wB3j8t0aRkxQEPuWX3u36fKmgQNWvcQOTMDlqApRR18ifkVto37cK5DFMB/vDjkaY9SnRS0qfRIpsk6B8nCc/VdBu2VFgo+Sx/irIYsp1Ifc/wnrk4n3Zqdoy1OdqMG8IToC198wTlHMB2FwTOf9DXFHjRjW82xx7MWaV1UnxR/gt5mV9ph630HQ0SoucSRQf+0kLOhLqHmJxhfvFlrzpsISywji4/Y+mP9vP1UjyXGmFNzVjFmiz20HEML9QJz4zmbQa8VyqjT3DTYGsMNIdBmGplLlCjZt0JA6gyFjnh3u04FFQpkiuwuGWx+3KRfp33B0fpzdysYDXrEZ+sladVKRWp/efrmAOwLM0BpvcQUxLfvLO9vDUnGAxnZ3S/d2Vy18dKAMF2oHn/hii/cOVNVYtk4d60SwVm9Nad4TQE8z/1+AXTWKpWkU4ryvsDgzw443++3TawlVLhd2tLTk5ExlInw= X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1717; 31:pN61OntdqIZypNGE93mfERrCJdPrPzXnHRGtxzW7iUVEraPwU9Ndb+4QD4ZETs6IQCpWuTD88ZsmbVBPLiLB3SnqRTHOizOe31+E5rWECPHjZLZ6PtX7RrFWRdDb7TzlKGNTuvRsRMtrWBmBPDdhGG8CR8zjKhBRYz6N/Cmh2whnFgL1FXG7Hmplv1GoYWNC3HJFVepAIs/6S7UJVBFlM50SyP/uwUWf/HUkiWv2mZpHE7ykxn06MNC0+awKwKs1v0wOFZD6VcSYmu+F0+Gnm7TEycCy++Wen5eaGJm01iw=; 20:3gkP2Wa2DGPOFVdln5rFHO+/dw1LfS+WBFmqLrWsQ+/QIBygsmrN+xVvbxI7ZbRHmFwPrgpSVcrDZBLyp4MGgSzJjgEmBZsNLDx91u7xRipHImxvHlQYFY2dYf5tv51DSQgoGZC1Tf+tBCUEEi48/kbARIuLpc7nFiuoVxEEbwPYvU7QTko/AEYx6LP1xok7QOdJQOCosPXhNk+YrN4qMtdjvW9UpjD4Xh+JHmL2j7yDc6Isl3STt5oOK5eOJ16vLwaL29cCzoaRCtFGiyx73BVtEgEAFCmSxl3cQcNt0yvxfi0PDTTKvKOBsqbbVmbgJ4cIZqrWLL6IkxugmEYaCMnfg7vdv6jQfsJgMtJYvCS/9oBCmZ41fn/Zi+jlvMjWO+KiyfVUHvsA3OAG5CQZ0Dee4dGJWRnYZdeBWlrkYTCY3W33WGuhc+sLITu6FikYdrOEc2BWEhkEyNi23GKvSSgSNDPf3jGy8LtjsZ3uN6pNvuHaW14VLI98MGjEkOXKqGkjb8BFe87y/hRt2111r0z/KDqruU+WJ0sG9yW1ck77p0zkl/wqCe6nxylzr8I8qIopCoDZnW9xIVSzJlb4UV6uprHMQJ/vbwiIoHgzBpg= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(228905959029699); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(6041248)(20161123564025)(20161123560025)(20161123562025)(20161123558100)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148); SRVR:BN3PR0701MB1717; BCL:0; PCL:0; RULEID:; SRVR:BN3PR0701MB1717; X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1717; 4:jUl+Du4Jct6ihxW2Tg9AIEKp7FhFZoW+Q8uqUXQulGIQgnlkL3jYL70AGeJZTW+2PmEayQmXRGmg35Q/PVFxB69EuXX+MCc0gz3evfypspQikHFtLEJblQmK0GXcZOuEo6Hu3uJX0lzrNlNdNmKtjbMR61yOr9mV1/zVSbSvUibXpClnFSKrZOXo3Uuk+ScM/nI+lMRKIZr7AAp3IlYTya11fJ3dyM2h7TXHYQmwAkKCsI4TpR1ycdUVLaFZ0w6LCUm1WyEUclxkyvT5ibxTEfgJDljpPM6bgxIGUzU8Q8KSfx2F9DLGvNlrTUHkAK0Hvdhe+FS/KoFVxJp1+5uC20NNR46TNyFcSuEGfVWZcIFeP+MZD0Fp1/fO530syPWL9Lkuezo3GvR7j+l0HiwwVnq0z3pJOX2chtONl5ZYAdyK3qzkMdsX0T3wwwv2+4DBMhNA/bLRQsDsyCQxnRz6ziT6kKRTZ5tCeYcemm+A5QftvIWfvE56wnxAzOoQWVCEunO9ENyLjrLvBmOgLa3jLVTAhtr62DW/ijkSqQPiXGtgeWkzShn0hVVu94DZGE0SSGbPr1PvD6x7ZXe57AgWmtwl0fJO/WhF476pEur+CpZRETrVDI8Cams4FaKvSdwydFQQOWL6bhsFri6XuES9tgDMz6MfEU5RTL5GIFchkpTJPE9KV+wF5j0vQYaw9s/0kQRtWkKblvU1ZFbgKMdXvX1cEenNErgaTs/ucPxAGL37IqIcqX0AxgBB5gNEGwucHrUxJC480k6FeB+1Rm4nfkoGci2EjwqRaY6AgLvXc8AOtrUe4DJTbKTPaF3/ikRw X-Forefront-PRVS: 03030B9493 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(39850400002)(39400400002)(39410400002)(39840400002)(39450400003)(13464003)(51914003)(6666003)(6496005)(6116002)(8676002)(47776003)(4326008)(54356999)(4001350100001)(2950100002)(33656002)(76176999)(42882006)(50986999)(6246003)(189998001)(38730400002)(229853002)(110136004)(25786009)(66066001)(42186005)(3846002)(305945005)(50466002)(23726003)(6916009)(55016002)(478600001)(2906002)(54906002)(5660300001)(9686003)(83506001)(81166006)(33716001)(1076002)(53936002)(72206003)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:BN3PR0701MB1717; H:jerin; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN3PR0701MB1717; 23:kBcG9LpErhFMMXbvkHRQHIA05NSbsSJJ1W5Kjoz?= =?us-ascii?Q?pSBQ/Nz3pyV2in6v/54cnJtPOqgsBxWWKjf7ARoBo9zCcv3FOhuk/doXmIhx?= =?us-ascii?Q?toxajjcG906lCkVBperJh3jOp6QDt+cHTIcs/6oK0GlAsXXF80emFDdZIiDd?= =?us-ascii?Q?XxTk8F3DWA/rEep1RUSWsC2YCLa5AZtQNsIPXcy1+Agji4SPFksXmm6JabHX?= =?us-ascii?Q?Y+/9VN/3xrSjDJvG0BnImap7/VPNvSENCFYucKzdjCOTYJ0Bun5+rHjzp17H?= =?us-ascii?Q?a9TPkNNx3tPQscLB2TxlcHJjepqDdjYPsVz4ggun0C7MB2dBmHdR8DqGtPZ/?= =?us-ascii?Q?aAVL2z8AJEtqmMt+KwJ/TkWEJps7jvNl0gHumEtI9zKo4yEtikeHlF6cdmKl?= =?us-ascii?Q?5MsrsIiHsLvBRYkd3np2SvwOgvl8fBkT7QjfQBAXGlaxvX4UQ8wA5JLhrmYU?= =?us-ascii?Q?18QkgJ7FdRD56hl/RMA3dViJZv3cCYRJd+6uzElQiTppsoEZcK+69PD833dw?= =?us-ascii?Q?HzWrBxoOSXA6TPH/MYPoPPeNV6QMpX7DDkpdQ+hDTSoaLdWkFaISyr//hbc+?= =?us-ascii?Q?xRs1oda4S2/AWbqIbS3+X/J6pKUfEvNO8a1xCB3pNlqDLzRR41KnxvvBSGJP?= =?us-ascii?Q?y7fcxUBMrSEEYscqrP5ocek+pDdTi7EikETqwY/gfoPRkzHFPKuhMjW+GRuA?= =?us-ascii?Q?FXXnZCg1n/pHGOUneUgcUNFnCYjGf5TJ3/ovlEV0JuCLC1GlkSNcSANq3FH9?= =?us-ascii?Q?zJGkFbAhQLzm+5iKxpH3GoILPc2d+e4vKw/ebKdyNeqwbLNdzmidc9EUK9QW?= =?us-ascii?Q?w9h8Jah3198SfhdF1TJv9O6YCnr4ZiCV1NB961/p1ocipcKQUgq326OtzFBj?= =?us-ascii?Q?S+5wiNffiZKr/iIY6EnxSuB54a9f//Bcn0aANA4FbOKnAwKpP3hKTHnbYOxn?= =?us-ascii?Q?R6+CAFZnRW43AaXFuE/XcikuTeD1LWXi5tMilGQ9sNxw+rpwhJM0y0AI2Mc0?= =?us-ascii?Q?h4V3CSrhAeiXKQqkytnP/cu41gotSrF/KaI8svEVAzUAAz+aeaTMw2Pv1A3o?= =?us-ascii?Q?reXpGRWA+0oN2dhl7KMmfapQFF3CDOy84h7QFz42leRG2HebcEAkhvGet3Si?= =?us-ascii?Q?PiAmDl+a7mw5ZWJDVVyOhNjShk8AVqZpqc1qE4RuswO7lJH7pSTBg3PHAGBY?= =?us-ascii?Q?iXlFirswsrf3eNtEMtlDi/lomSNpzpzzUhs6R?= X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1717; 6:gD3S7OFyPpr6fZKUDLUqqDHLY7aZHtvFfBvliheC9HKtisf8Oth6xXOOoKtrVR7r/VtWTjDsRRtC8jwtN7vF2BwHiCydHsJ1dH4o2Z8213JTG1WhWVsJoZfRSLBfiEddQGBUlT2sHBYRQlAeiKBYbsuvH9Z/fX3YYaXx+94PzwgWXMQNFrIvYPqL9+fndPZse0dmGALHUt+UWKPSdQxEeYIeGzuxudWwFJgQN/N0pNcSwn8/BvIAFi3VnSFX+Jgal+1j8g0vt1/M+T8Wt7bWmp3Sy8RnBdTS9betkDcQ4spr2IR00u4EsI+0JttpcdqK282zsxYODCr2PXFjTQHABYy+16wXSd0aAvS9ReZYvzh7f5+3//YP6fRkkKi7ffO11vgKBTdQOf8xkGorVbRiekScg2UWBC3tAtQ7/ZVfAg9L6JfUhs1r7zLzzurqL32bB5zzYb/JDRI7Z0v0MLumu7TE8hbPKTNGZh9Qm+wiNy5ZpjMZC0mr5FF5EgHbvrhorStrg+Lhb2wbaTYGmIZQ1A==; 5:ddGDI5YF4h1e/uN0a82oReu5UVppts3qUHQKgm6SGXc5cp6PEgr6j9mwpj6mBIrsfBo1IVUxR4Q+n/0wv4NcA0slqd6Zg2JIbJKHWtmpeX16AUmmKx6fdOyE62Tdgim7hmR8GdgOk5A5dxwaJYB4ug==; 24:RHWMcKMzOZ3NhRYBUO/+YjS3gFaL9v18+y7OyzlQtjTf/RLLSPsQG5ueDeEaHLKzlehbYzlaaZN5v1tw2DxgyX52azq1QYvRJb6jCDfg54s= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1717; 7:K8hOsWkb6CbJcGqBn+z1UQlFRgx1xsnIgd+zQco81EN2e57QAh4joXmzlWHyBZAMmKZSW7H7IEmzVcaX7uCwyLB488rLR4l7CZ19nDpm7bcImgMGNHJBTMyz58d6nqjw5hLSuZoOHeXY56Yi5MKECVacYc6sAYbkmKr9qaSgdynBiXk12OUPhIap+4dyacutXjQiSdbRuDSOicbWYtU/wAxbUeCacoyNXuY9Z12Jp1Y/cR/zJ5UNxpKm36uG0bxs6m3/Sy356NtZEk7/mZrk2oQH8s+8x1cfnwmQdKUKvV+s8bBC6Za7oQxuJmqSCD1Q7x52Z5m0LJmVdpJQ+w/ALw== X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 May 2017 14:12:30.6158 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0701MB1717 Subject: Re: [dpdk-dev] [PATCH 1/3] examples/eventdev_pipeline: added sample app X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 May 2017 14:12:36 -0000 -----Original Message----- > Date: Fri, 21 Apr 2017 10:51:37 +0100 > From: Harry van Haaren > To: dev@dpdk.org > CC: jerin.jacob@caviumnetworks.com, Harry van Haaren > , Gage Eads , Bruce > Richardson > Subject: [PATCH 1/3] examples/eventdev_pipeline: added sample app > X-Mailer: git-send-email 2.7.4 > > This commit adds a sample app for the eventdev library. > The app has been tested with DPDK 17.05-rc2, hence this > release (or later) is recommended. > > The sample app showcases a pipeline processing use-case, > with event scheduling and processing defined per stage. > The application recieves traffic as normal, with each > packet traversing the pipeline. Once the packet has > been processed by each of the pipeline stages, it is > transmitted again. > > The app provides a framework to utilize cores for a single > role or multiple roles. Examples of roles are the RX core, > TX core, Scheduling core (in the case of the event/sw PMD), > and worker cores. > > Various flags are available to configure numbers of stages, > cycles of work at each stage, type of scheduling, number of > worker cores, queue depths etc. For a full explaination, > please refer to the documentation. > > Signed-off-by: Gage Eads > Signed-off-by: Bruce Richardson > Signed-off-by: Harry van Haaren Thanks for the example application to share the SW view. I could make it run on HW after some tweaking(not optimized though) [...] > +#define MAX_NUM_STAGES 8 > +#define BATCH_SIZE 16 > +#define MAX_NUM_CORE 64 How about RTE_MAX_LCORE? > + > +static unsigned int active_cores; > +static unsigned int num_workers; > +static unsigned long num_packets = (1L << 25); /* do ~32M packets */ > +static unsigned int num_fids = 512; > +static unsigned int num_priorities = 1; looks like its not used. > +static unsigned int num_stages = 1; > +static unsigned int worker_cq_depth = 16; > +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY; > +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1}; > +static int16_t qid[MAX_NUM_STAGES] = {-1}; Moving all fastpath related variables under a structure with cache aligned will help. > +static int worker_cycles; > +static int enable_queue_priorities; > + > +struct prod_data { > + uint8_t dev_id; > + uint8_t port_id; > + int32_t qid; > + unsigned num_nic_ports; > +}; cache aligned ? > + > +struct cons_data { > + uint8_t dev_id; > + uint8_t port_id; > +}; > + cache aligned ? > +static struct prod_data prod_data; > +static struct cons_data cons_data; > + > +struct worker_data { > + uint8_t dev_id; > + uint8_t port_id; > +}; cache aligned ? > + > +static unsigned *enqueue_cnt; > +static unsigned *dequeue_cnt; > + > +static volatile int done; > +static volatile int prod_stop; No one updating the prod_stop. > +static int quiet; > +static int dump_dev; > +static int dump_dev_signal; > + > +static uint32_t rx_lock; > +static uint32_t tx_lock; > +static uint32_t sched_lock; > +static bool rx_single; > +static bool tx_single; > +static bool sched_single; > + > +static unsigned rx_core[MAX_NUM_CORE]; > +static unsigned tx_core[MAX_NUM_CORE]; > +static unsigned sched_core[MAX_NUM_CORE]; > +static unsigned worker_core[MAX_NUM_CORE]; > + > +static bool > +core_in_use(unsigned lcore_id) { > + return (rx_core[lcore_id] || sched_core[lcore_id] || > + tx_core[lcore_id] || worker_core[lcore_id]); > +} > + > +static struct rte_eth_dev_tx_buffer *tx_buf[RTE_MAX_ETHPORTS]; > + > +static void > +rte_eth_tx_buffer_retry(struct rte_mbuf **pkts, uint16_t unsent, > + void *userdata) IMO, It is better to not use rte_eth_* for application functions. > +{ > + int port_id = (uintptr_t) userdata; > + unsigned _sent = 0; > + > + do { > + /* Note: hard-coded TX queue */ > + _sent += rte_eth_tx_burst(port_id, 0, &pkts[_sent], > + unsent - _sent); > + } while (_sent != unsent); > +} > + > +static int > +consumer(void) > +{ > + const uint64_t freq_khz = rte_get_timer_hz() / 1000; > + struct rte_event packets[BATCH_SIZE]; > + > + static uint64_t npackets; > + static uint64_t received; > + static uint64_t received_printed; > + static uint64_t time_printed; > + static uint64_t start_time; > + unsigned i, j; > + uint8_t dev_id = cons_data.dev_id; > + uint8_t port_id = cons_data.port_id; > + > + if (!npackets) > + npackets = num_packets; > + > + do { > + uint16_t n = rte_event_dequeue_burst(dev_id, port_id, > + packets, RTE_DIM(packets), 0); const uint16_t n = > + > + if (n == 0) { > + for (j = 0; j < rte_eth_dev_count(); j++) > + rte_eth_tx_buffer_flush(j, 0, tx_buf[j]); > + return 0; > + } > + if (start_time == 0) > + time_printed = start_time = rte_get_timer_cycles(); > + > + received += n; > + for (i = 0; i < n; i++) { > + uint8_t outport = packets[i].mbuf->port; > + rte_eth_tx_buffer(outport, 0, tx_buf[outport], > + packets[i].mbuf); > + } > + > + if (!quiet && received >= received_printed + (1<<22)) { > + const uint64_t now = rte_get_timer_cycles(); > + const uint64_t delta_cycles = now - start_time; > + const uint64_t elapsed_ms = delta_cycles / freq_khz; > + const uint64_t interval_ms = > + (now - time_printed) / freq_khz; > + > + uint64_t rx_noprint = received - received_printed; > + printf("# consumer RX=%"PRIu64", time %"PRIu64 > + "ms, avg %.3f mpps [current %.3f mpps]\n", > + received, elapsed_ms, > + (received) / (elapsed_ms * 1000.0), > + rx_noprint / (interval_ms * 1000.0)); > + received_printed = received; > + time_printed = now; > + } > + > + dequeue_cnt[0] += n; > + > + if (num_packets > 0 && npackets > 0) { > + npackets -= n; > + if (npackets == 0 || npackets > num_packets) > + done = 1; > + } Looks like very complicated logic.I think we can simplify it. > + } while (0); do while(0); really required here? > + > + return 0; > +} > + > +static int > +producer(void) > +{ > + static uint8_t eth_port; > + struct rte_mbuf *mbufs[BATCH_SIZE]; > + struct rte_event ev[BATCH_SIZE]; > + uint32_t i, num_ports = prod_data.num_nic_ports; > + int32_t qid = prod_data.qid; > + uint8_t dev_id = prod_data.dev_id; > + uint8_t port_id = prod_data.port_id; > + uint32_t prio_idx = 0; > + > + const uint16_t nb_rx = rte_eth_rx_burst(eth_port, 0, mbufs, BATCH_SIZE); > + if (++eth_port == num_ports) > + eth_port = 0; > + if (nb_rx == 0) { > + rte_pause(); > + return 0; > + } > + > + for (i = 0; i < nb_rx; i++) { > + ev[i].flow_id = mbufs[i]->hash.rss; prefetching the buff[i+1] may help here? > + ev[i].op = RTE_EVENT_OP_NEW; > + ev[i].sched_type = queue_type; The value of RTE_EVENT_QUEUE_CFG_ORDERED_ONLY != RTE_SCHED_TYPE_ORDERED. So, we cannot assign .sched_type as queue_type. I think, one option could be to avoid translation in application is to - Remove RTE_EVENT_QUEUE_CFG_ALL_TYPES, RTE_EVENT_QUEUE_CFG_*_ONLY - Introduce a new RTE_EVENT_DEV_CAP_ to denote RTE_EVENT_QUEUE_CFG_ALL_TYPES cap ability - add sched_type in struct rte_event_queue_conf. If capability flag is not set then implementation takes sched_type value for the queue. Any thoughts? > + ev[i].queue_id = qid; > + ev[i].event_type = RTE_EVENT_TYPE_CPU; IMO, RTE_EVENT_TYPE_ETHERNET is the better option here as it is producing the Ethernet packets/events. > + ev[i].sub_event_type = 0; > + ev[i].priority = RTE_EVENT_DEV_PRIORITY_NORMAL; > + ev[i].mbuf = mbufs[i]; > + RTE_SET_USED(prio_idx); > + } > + > + const int nb_tx = rte_event_enqueue_burst(dev_id, port_id, ev, nb_rx); For producer pattern i.e a burst of RTE_EVENT_OP_NEW, OcteonTX can do burst operation unlike FORWARD case(which is one event at a time).Earlier, I thought I can abstract the producer pattern in PMD, but it looks like we are going with application driven producer model based on latest RFC.So I think, we can add one flag to rte_event_enqueue_burst to denote all the events are of type RTE_EVENT_OP_NEW as hint.SW driver can ignore this. I can send a patch for the same. Any thoughts? > + if (nb_tx != nb_rx) { > + for (i = nb_tx; i < nb_rx; i++) > + rte_pktmbuf_free(mbufs[i]); > + } > + enqueue_cnt[0] += nb_tx; > + > + if (unlikely(prod_stop)) I think, No one updating the prod_stop > + done = 1; > + > + return 0; > +} > + > +static inline void > +schedule_devices(uint8_t dev_id, unsigned lcore_id) > +{ > + if (rx_core[lcore_id] && (rx_single || > + rte_atomic32_cmpset(&rx_lock, 0, 1))) { This pattern(rte_atomic32_cmpset) makes application can inject only "one core" worth of packets. Not enough for low-end cores. May be we need multiple producer options. I think, new RFC is addressing it. > + producer(); > + rte_atomic32_clear((rte_atomic32_t *)&rx_lock); > + } > + > + if (sched_core[lcore_id] && (sched_single || > + rte_atomic32_cmpset(&sched_lock, 0, 1))) { > + rte_event_schedule(dev_id); > + if (dump_dev_signal) { > + rte_event_dev_dump(0, stdout); > + dump_dev_signal = 0; > + } > + rte_atomic32_clear((rte_atomic32_t *)&sched_lock); > + } Lot of unwanted code if RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED set. I think, We can make common code with compile time aware and make runtime workers based on the flag.. i.e rte_eal_remote_launch(worker_x, &worker_data[worker_idx], lcore_id); rte_eal_remote_launch(worker_y, &worker_data[worker_idx], lcore_id); May we can improve after initial version. > + > + if (tx_core[lcore_id] && (tx_single || > + rte_atomic32_cmpset(&tx_lock, 0, 1))) { > + consumer(); Should consumer() need to come in this pattern? I am thinking like if events is from last stage then call consumer() in worker() I think, above scheme works better when the _same_ worker code need to run the case where 1) ethdev HW is capable to enqueuing the packets to same txq from multiple thread 2) ethdev is not capable to do so. So, The above cases can be addressed in configuration time where we link the queues to port case 1) Link all workers to last queue case 2) Link only worker to last queue and keeping the common worker code. HW implementation has functional and performance issue if "two" ports are assigned to one lcore for dequeue. The above scheme fixes that problem too. > + rte_atomic32_clear((rte_atomic32_t *)&tx_lock); > + } > +} > + > +static int > +worker(void *arg) > +{ > + struct rte_event events[BATCH_SIZE]; > + > + struct worker_data *data = (struct worker_data *)arg; > + uint8_t dev_id = data->dev_id; > + uint8_t port_id = data->port_id; > + size_t sent = 0, received = 0; > + unsigned lcore_id = rte_lcore_id(); > + > + while (!done) { > + uint16_t i; > + > + schedule_devices(dev_id, lcore_id); > + > + if (!worker_core[lcore_id]) { > + rte_pause(); > + continue; > + } > + > + uint16_t nb_rx = rte_event_dequeue_burst(dev_id, port_id, > + events, RTE_DIM(events), 0); > + > + if (nb_rx == 0) { > + rte_pause(); > + continue; > + } > + received += nb_rx; > + > + for (i = 0; i < nb_rx; i++) { > + struct ether_hdr *eth; > + struct ether_addr addr; > + struct rte_mbuf *m = events[i].mbuf; > + > + /* The first worker stage does classification */ > + if (events[i].queue_id == qid[0]) > + events[i].flow_id = m->hash.rss % num_fids; Not sure why we need do(shrinking the flows) this in worker() in queue based pipeline. If an PMD has any specific requirement on num_fids,I think, we can move this configuration stage or PMD can choose optimum fid internally to avoid modulus operation tax in fastpath in all PMD. Does struct rte_event_queue_conf.nb_atomic_flows help here? > + > + events[i].queue_id = next_qid[events[i].queue_id]; > + events[i].op = RTE_EVENT_OP_FORWARD; missing events[i].sched_type.HW PMD does not work with this. I think, we can use similar scheme like next_qid for next_sched_type. > + > + /* change mac addresses on packet (to use mbuf data) */ > + eth = rte_pktmbuf_mtod(m, struct ether_hdr *); > + ether_addr_copy(ð->d_addr, &addr); > + ether_addr_copy(ð->s_addr, ð->d_addr); > + ether_addr_copy(&addr, ð->s_addr); IMO, We can make packet processing code code as "static inline function" so different worker types can reuse. > + > + /* do a number of cycles of work per packet */ > + volatile uint64_t start_tsc = rte_rdtsc(); > + while (rte_rdtsc() < start_tsc + worker_cycles) > + rte_pause(); Ditto. I think, All worker specific variables like "worker_cycles" can moved into one structure and use. > + } > + uint16_t nb_tx = rte_event_enqueue_burst(dev_id, port_id, > + events, nb_rx); > + while (nb_tx < nb_rx && !done) > + nb_tx += rte_event_enqueue_burst(dev_id, port_id, > + events + nb_tx, > + nb_rx - nb_tx); > + sent += nb_tx; > + } > + > + if (!quiet) > + printf(" worker %u thread done. RX=%zu TX=%zu\n", > + rte_lcore_id(), received, sent); > + > + return 0; > +} > + > +/* > + * Parse the coremask given as argument (hexadecimal string) and fill > + * the global configuration (core role and core count) with the parsed > + * value. > + */ > +static int xdigit2val(unsigned char c) multiple instance of "xdigit2val" in DPDK repo. May be we can push this as common code. > +{ > + int val; > + > + if (isdigit(c)) > + val = c - '0'; > + else if (isupper(c)) > + val = c - 'A' + 10; > + else > + val = c - 'a' + 10; > + return val; > +} > + > + > +static void > +usage(void) > +{ > + const char *usage_str = > + " Usage: eventdev_demo [options]\n" > + " Options:\n" > + " -n, --packets=N Send N packets (default ~32M), 0 implies no limit\n" > + " -f, --atomic-flows=N Use N random flows from 1 to N (default 16)\n" I think, this parameter now, effects the application fast path code.I think, it should eventdev configuration para-mater. > + " -s, --num_stages=N Use N atomic stages (default 1)\n" > + " -r, --rx-mask=core mask Run NIC rx on CPUs in core mask\n" > + " -w, --worker-mask=core mask Run worker on CPUs in core mask\n" > + " -t, --tx-mask=core mask Run NIC tx on CPUs in core mask\n" > + " -e --sched-mask=core mask Run scheduler on CPUs in core mask\n" > + " -c --cq-depth=N Worker CQ depth (default 16)\n" > + " -W --work-cycles=N Worker cycles (default 0)\n" > + " -P --queue-priority Enable scheduler queue prioritization\n" > + " -o, --ordered Use ordered scheduling\n" > + " -p, --parallel Use parallel scheduling\n" IMO, all stage being "parallel" or "ordered" or "atomic" is one mode of operation. It is valid have to any combination. We need to express that in command like example: 3 stage with O->A->P > + " -q, --quiet Minimize printed output\n" > + " -D, --dump Print detailed statistics before exit" > + "\n"; > + fprintf(stderr, "%s", usage_str); > + exit(1); > +} > + [...] > + rx_single = (popcnt == 1); > + break; > + case 't': > + tx_lcore_mask = parse_coremask(optarg); > + popcnt = __builtin_popcountll(tx_lcore_mask); > + tx_single = (popcnt == 1); > + break; > + case 'e': > + sched_lcore_mask = parse_coremask(optarg); > + popcnt = __builtin_popcountll(sched_lcore_mask); > + sched_single = (popcnt == 1); > + break; > + default: > + usage(); > + } > + } > + > + if (worker_lcore_mask == 0 || rx_lcore_mask == 0 || > + sched_lcore_mask == 0 || tx_lcore_mask == 0) { Need to honor RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED i.e sched_lcore_mask is zero can be valid case. > + printf("Core part of pipeline was not assigned any cores. " > + "This will stall the pipeline, please check core masks " > + "(use -h for details on setting core masks):\n" > + "\trx: %"PRIu64"\n\ttx: %"PRIu64"\n\tsched: %"PRIu64 > + "\n\tworkers: %"PRIu64"\n", > + rx_lcore_mask, tx_lcore_mask, sched_lcore_mask, > + worker_lcore_mask); > + rte_exit(-1, "Fix core masks\n"); > + } > + if (num_stages == 0 || num_stages > MAX_NUM_STAGES) > + usage(); > + > + for (i = 0; i < MAX_NUM_CORE; i++) { > + rx_core[i] = !!(rx_lcore_mask & (1UL << i)); > + tx_core[i] = !!(tx_lcore_mask & (1UL << i)); > + sched_core[i] = !!(sched_lcore_mask & (1UL << i)); > + worker_core[i] = !!(worker_lcore_mask & (1UL << i)); > + > + if (worker_core[i]) > + num_workers++; > + if (core_in_use(i)) > + active_cores++; > + } > +} > + > + > +struct port_link { > + uint8_t queue_id; > + uint8_t priority; > +}; > + > +static int > +setup_eventdev(struct prod_data *prod_data, > + struct cons_data *cons_data, > + struct worker_data *worker_data) > +{ > + const uint8_t dev_id = 0; > + /* +1 stages is for a SINGLE_LINK TX stage */ > + const uint8_t nb_queues = num_stages + 1; > + /* + 2 is one port for producer and one for consumer */ > + const uint8_t nb_ports = num_workers + 2; selection of number of ports is a function of rte_event_has_producer(). I think, it will be addressed with RFC. > + const struct rte_event_dev_config config = { > + .nb_event_queues = nb_queues, > + .nb_event_ports = nb_ports, > + .nb_events_limit = 4096, > + .nb_event_queue_flows = 1024, > + .nb_event_port_dequeue_depth = 128, > + .nb_event_port_enqueue_depth = 128, OCTEONTX PMD driver has .nb_event_port_dequeue_depth = 1 and .nb_event_port_enqueue_depth = 1 and struct rte_event_dev_info.min_dequeue_timeout_ns = 853 value. I think, we need to check the rte_event_dev_info_get() first to get the sane values and take RTE_MIN or RTE_MAX based on the use case. or I can ignore this value in OCTEONTX PMD. But I am not sure NXP case, Any thoughts from NXP folks. > + }; > + const struct rte_event_port_conf wkr_p_conf = { > + .dequeue_depth = worker_cq_depth, Same as above > + .enqueue_depth = 64, Same as above > + .new_event_threshold = 4096, > + }; > + struct rte_event_queue_conf wkr_q_conf = { > + .event_queue_cfg = queue_type, > + .priority = RTE_EVENT_DEV_PRIORITY_NORMAL, > + .nb_atomic_flows = 1024, > + .nb_atomic_order_sequences = 1024, > + }; > + const struct rte_event_port_conf tx_p_conf = { > + .dequeue_depth = 128, Same as above > + .enqueue_depth = 128, Same as above > + .new_event_threshold = 4096, > + }; > + const struct rte_event_queue_conf tx_q_conf = { > + .priority = RTE_EVENT_DEV_PRIORITY_HIGHEST, > + .event_queue_cfg = > + RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY | > + RTE_EVENT_QUEUE_CFG_SINGLE_LINK, > + .nb_atomic_flows = 1024, > + .nb_atomic_order_sequences = 1024, > + }; > + > + struct port_link worker_queues[MAX_NUM_STAGES]; > + struct port_link tx_queue; > + unsigned i; > + > + int ret, ndev = rte_event_dev_count(); > + if (ndev < 1) { > + printf("%d: No Eventdev Devices Found\n", __LINE__); > + return -1; > + } > + > + struct rte_event_dev_info dev_info; > + ret = rte_event_dev_info_get(dev_id, &dev_info); > + printf("\tEventdev %d: %s\n", dev_id, dev_info.driver_name); > + > + ret = rte_event_dev_configure(dev_id, &config); > + if (ret < 0) > + printf("%d: Error configuring device\n", __LINE__) Don't process further with failed configure. > + > + /* Q creation - one load balanced per pipeline stage*/ > + > + /* set up one port per worker, linking to all stage queues */ > + for (i = 0; i < num_workers; i++) { > + struct worker_data *w = &worker_data[i]; > + w->dev_id = dev_id; > + if (rte_event_port_setup(dev_id, i, &wkr_p_conf) < 0) { > + printf("Error setting up port %d\n", i); > + return -1; > + } > + > + uint32_t s; > + for (s = 0; s < num_stages; s++) { > + if (rte_event_port_link(dev_id, i, > + &worker_queues[s].queue_id, > + &worker_queues[s].priority, > + 1) != 1) { > + printf("%d: error creating link for port %d\n", > + __LINE__, i); > + return -1; > + } > + } > + w->port_id = i; > + } > + /* port for consumer, linked to TX queue */ > + if (rte_event_port_setup(dev_id, i, &tx_p_conf) < 0) { If ethdev supports MT txq queue support then this port can be linked to worker too. something to consider for future. > + printf("Error setting up port %d\n", i); > + return -1; > + } > + if (rte_event_port_link(dev_id, i, &tx_queue.queue_id, > + &tx_queue.priority, 1) != 1) { > + printf("%d: error creating link for port %d\n", > + __LINE__, i); > + return -1; > + } > + /* port for producer, no links */ > + const struct rte_event_port_conf rx_p_conf = { > + .dequeue_depth = 8, > + .enqueue_depth = 8, same as above issue.You could get default config first and configure. > + .new_event_threshold = 1200, > + }; > + if (rte_event_port_setup(dev_id, i + 1, &rx_p_conf) < 0) { > + printf("Error setting up port %d\n", i); > + return -1; > + } > + > + *prod_data = (struct prod_data){.dev_id = dev_id, > + .port_id = i + 1, > + .qid = qid[0] }; > + *cons_data = (struct cons_data){.dev_id = dev_id, > + .port_id = i }; > + > + enqueue_cnt = rte_calloc(0, > + RTE_CACHE_LINE_SIZE/(sizeof(enqueue_cnt[0])), > + sizeof(enqueue_cnt[0]), 0); > + dequeue_cnt = rte_calloc(0, > + RTE_CACHE_LINE_SIZE/(sizeof(dequeue_cnt[0])), > + sizeof(dequeue_cnt[0]), 0); Why array? looks like enqueue_cnt[1] and dequeue_cnt[1] not used anywhere. > + > + if (rte_event_dev_start(dev_id) < 0) { > + printf("Error starting eventdev\n"); > + return -1; > + } > + > + return dev_id; > +} > + > +static void > +signal_handler(int signum) > +{ > + if (done || prod_stop) I think, No one updating the prod_stop > + rte_exit(1, "Exiting on signal %d\n", signum); > + if (signum == SIGINT || signum == SIGTERM) { > + printf("\n\nSignal %d received, preparing to exit...\n", > + signum); > + done = 1; > + } > + if (signum == SIGTSTP) > + rte_event_dev_dump(0, stdout); > +} > + > +int > +main(int argc, char **argv) [...] > + RTE_LCORE_FOREACH_SLAVE(lcore_id) { > + if (lcore_id >= MAX_NUM_CORE) > + break; > + > + if (!rx_core[lcore_id] && !worker_core[lcore_id] && > + !tx_core[lcore_id] && !sched_core[lcore_id]) > + continue; > + > + if (rx_core[lcore_id]) > + printf( > + "[%s()] lcore %d executing NIC Rx, and using eventdev port %u\n", > + __func__, lcore_id, prod_data.port_id); These prints wont show if rx,tx, scheduler running on master core(as we are browsing through RTE_LCORE_FOREACH_SLAVE) > + > + if (!quiet) { > + printf("\nPort Workload distribution:\n"); > + uint32_t i; > + uint64_t tot_pkts = 0; > + uint64_t pkts_per_wkr[RTE_MAX_LCORE] = {0}; > + for (i = 0; i < num_workers; i++) { > + char statname[64]; > + snprintf(statname, sizeof(statname), "port_%u_rx", > + worker_data[i].port_id); Please check "port_%u_rx" xstat availability with PMD first. > + pkts_per_wkr[i] = rte_event_dev_xstats_by_name_get( > + dev_id, statname, NULL); > + tot_pkts += pkts_per_wkr[i]; > + } > + for (i = 0; i < num_workers; i++) { > + float pc = pkts_per_wkr[i] * 100 / > + ((float)tot_pkts); > + printf("worker %i :\t%.1f %% (%"PRIu64" pkts)\n", > + i, pc, pkts_per_wkr[i]); > + } > + > + } > + > + return 0; > +} As final note, considering the different options in fastpath, I was thinking like introducing app/test-eventdev like app/testpmd and have set of function pointers# for different modes like "macswap", "txonly" in testpmd to exercise different options and framework for adding new use cases.I will work on that to check the feasibility. ## struct fwd_engine { const char *fwd_mode_name; /**< Forwarding mode name. */ port_fwd_begin_t port_fwd_begin; /**< NULL if nothing special to do. */ port_fwd_end_t port_fwd_end; /**< NULL if nothing special to do. */ packet_fwd_t packet_fwd; /**< Mandatory. */ }; > -- > 2.7.4 >