From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 732E9A04B1; Wed, 23 Sep 2020 13:14:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B8CB11D8FD; Wed, 23 Sep 2020 13:14:02 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id CFE851D603 for ; Wed, 23 Sep 2020 13:14:01 +0200 (CEST) IronPort-SDR: mjw3vYd5Ah6GJmDYXZfL+D+XnXf4qtCDjR7dbqIRHfm67zIcntBV8qeMai0O14y1vAe37APElB ffbzrMhODkHg== X-IronPort-AV: E=McAfee;i="6000,8403,9752"; a="148599100" X-IronPort-AV: E=Sophos;i="5.77,293,1596524400"; d="scan'208";a="148599100" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Sep 2020 04:14:00 -0700 IronPort-SDR: AscEmOwzH4mcM9wJ16fscqqEDYToO8N48r32HiolyCBZ1Fpzxi1AZszKBLN5ocvtAOLMRRRq25 jot6V9HVGBdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,293,1596524400"; d="scan'208";a="322566331" Received: from fmsmsx604.amr.corp.intel.com ([10.18.126.84]) by orsmga002.jf.intel.com with ESMTP; 23 Sep 2020 04:14:00 -0700 Received: from fmsmsx603.amr.corp.intel.com (10.18.126.83) by fmsmsx604.amr.corp.intel.com (10.18.126.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 23 Sep 2020 04:13:59 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5 via Frontend Transport; Wed, 23 Sep 2020 04:13:59 -0700 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (104.47.74.45) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.1713.5; Wed, 23 Sep 2020 04:13:57 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cq+vaYWpX6TxSuddSXfhw6xhBD5cukZZN6Jm0HAqjrz7SZqQiOial3KCXTco/DPOIZdnbdk/kBwIssqikVTzRCYSV16C/Yobpxa121bIMeSusJQXWmuYSzlgGqoWvuwjPq5w+CGg6JYcEZVM0Zy+A8XZWbrF+gsxA5q2NtAKSy08eYoJXkqOI2auz/hG2D0kenz+yjBu3VTneYbfZTCpA064qYI5nDLzmpt+oYWhQSBbmNOwaMNx1Kbhw5X5OajPlfXkeM7os6x+z/xXxX4oluHRuocSqrr+j5evTuNLV9WSfNZnExll0Bfo/5nzitpSErMJbkdaSGs3euKgywYCEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KyFru1knCrBOIl36WTZPQSMPrqE1WcR4YAQrveYjc+Q=; b=KZ95sdNyjpEKcjzwdpHBpHOkTeINwEvtg0VQCQM5RxvX4PlYGon/Kn5nW6XcaLhUyich7jZeDJNjZ+ImRyGOKKui56H7dS5T7RyWNrpMQblP5msKDcdgqRtM/A2yAnPRnSbeJmK4whSkNazOhC6ZMDJIHgu+g1rOd8gOlO/oBG/hzAEjVvo51dekvX9KmCHGlkDBc1BwQp6RdLDosHuNyreW9SrEkIZfL51iyFexnMEqz7gDme60hlSpOdU/6hJt+/ibHaYyBgDEVoOxi1ByGG07baHKGq+pyFGAOfzLSlc7q+/NsDbt16xX/+U7ut06NR5OpaUM3zbLQWW4+4TsUQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KyFru1knCrBOIl36WTZPQSMPrqE1WcR4YAQrveYjc+Q=; b=nG5YOqtqPZVl+RvWjZOnXJMyHmGhYBWiFg2c0y8vUOX3/sW9+S5FiaA+X1i/SyoZtxQ5JG33fi7xvZ8K2Wqqvu5tycolGcZ0FvmT0yuGiCBPM/aGo0KKtQZNLLGm5te7rnMKtZ7xJIOdqQVeiCdNAwfcO9wwgyeNmJrIYTAz0NM= Received: from BYAPR11MB3143.namprd11.prod.outlook.com (2603:10b6:a03:92::32) by BY5PR11MB4417.namprd11.prod.outlook.com (2603:10b6:a03:1c0::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.19; Wed, 23 Sep 2020 11:13:54 +0000 Received: from BYAPR11MB3143.namprd11.prod.outlook.com ([fe80::25c7:1828:b871:bb8]) by BYAPR11MB3143.namprd11.prod.outlook.com ([fe80::25c7:1828:b871:bb8%6]) with mapi id 15.20.3391.019; Wed, 23 Sep 2020 11:13:54 +0000 From: "Van Haaren, Harry" To: "Nicolau, Radu" , "dev@dpdk.org" CC: "jerinj@marvell.com" Thread-Topic: [PATCH v1] event/sw: performance improvements Thread-Index: AQHWhc4yw0MoIn/BrE2rarxyqK1mbql2JUUw Date: Wed, 23 Sep 2020 11:13:54 +0000 Message-ID: References: <20200908105211.10066-1-radu.nicolau@intel.com> In-Reply-To: <20200908105211.10066-1-radu.nicolau@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.5.1.3 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [212.129.84.182] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 7a86190d-e8aa-405d-7af8-08d85fb1c22e x-ms-traffictypediagnostic: BY5PR11MB4417: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3383; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: HDJudhXL7ZiAHLbUCoPILaCsGF7S2AWWYoHhOYH9NBn1CQ5LLkVNqJ67XG2xGYSu+e3PDCP4+pJ7icXPynmsSBc/PyzUxNllqyw1zYJVLvJaOLwCXgNiJq+H12vsGk5oikdCnqP7fxN+bDPAdzTrby5SGdV2E72e3BIEmXaB8QDe1hW7MQvDdGZP+DORu7pAsarvSmJaSWZYvaAmefQtM53oNAzFY5i2rTY4FtE5nC6if+Z8QmswtQk52rmkh2RvYToxPhywS5i2mZVzNDevsY41dDmn4BZO0O29FZTRAQXi+w4Xda3X7RYnUoCmTSDNrvzllloARRM2W/nPMT4KRM2mwMsyPpR5LwL6xBDxnll/Sp/lGZ/1qN6yhlMYlz4E x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3143.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(39860400002)(346002)(136003)(396003)(366004)(5660300002)(33656002)(64756008)(83380400001)(76116006)(7696005)(66476007)(316002)(55016002)(86362001)(110136005)(6506007)(26005)(66946007)(66446008)(66556008)(186003)(8676002)(53546011)(71200400001)(52536014)(2906002)(8936002)(9686003)(4326008)(478600001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: ihGsIK1Xde6UdB1WTYu3zqFKML0AbKmM4mmVuImefGQhMP9890bANvHowXdWmvP5HMwIKyZt+prgfdtB3OkiZ81um3B8uCInddSHUG844R220ZAMx7w55NlQKI/ox8h/BUIAX/PY3nYkNpicCDjZ2JqVXNEtaHJ0a4s88T0vgc+Qa9sgFSGq2ZqHbm7AV2pDby/7k8VwxyWB681FPZzXovibXGIBLhLZd9+zqpqIhsudNF8sGbH6mS131f/C5NV6pXLKx+j/rr9+c8s7uCUie1nxy6Yx+YJK1bQ08u6xTMAoJ/UoEjJmgYkocMVoBGBOGdyN6I7Ok65tpl07oECQvXTjrYt9XyOgPHNW4Qov+Jts3Z1BA72F+zshTc5bdVFrALOAsGm9SOxUm/GU+i2Ytm+Xsh5P5UkpIk+Rbjj9cFAjF6XIzjYYEikp0jaCjJrI3ZBy/9zOCLklW1Bt2XfglYgFyrzNZcRIxWCTnMoC+l+Gg/144Tays4MlwEKrJDav9/cQAlt+2QUFu2e+k6GhF8i6YDaVvT2uXBJsvNpmhTHgI75yw1zyZcP7P8wgrOIdeWkQkRAdRnKHmG3A00lJXioUj8UqE7Nrue817Lt8abVr6cGOzD4LkLPz+5lMDPkBlxPGbPWW1wSLGAF59Euk5Q== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3143.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7a86190d-e8aa-405d-7af8-08d85fb1c22e X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Sep 2020 11:13:54.3152 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 6a30sNCSuv/tBy83K0yshc6akYR4EgSGEGOhLIQDySVdK7HMSE94rM/4KEh77BkECcfIfmVvjzjAswkMAbm+OE2Fc0l28PfsernNKclO80k= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR11MB4417 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Nicolau, Radu > Sent: Tuesday, September 8, 2020 11:52 AM > To: dev@dpdk.org > Cc: jerinj@marvell.com; Van Haaren, Harry ; > Nicolau, Radu > Subject: [PATCH v1] event/sw: performance improvements >=20 > Add minimum burst throughout the scheduler pipeline and a flush counter. > Replace ring API calls with local single threaded implementation > where possible. >=20 > Signed-off-by: Radu Nicolau Thanks for the patch, a few comments inline. > --- > drivers/event/sw/sw_evdev.h | 11 +++- > drivers/event/sw/sw_evdev_scheduler.c | 83 +++++++++++++++++++++++---- > 2 files changed, 81 insertions(+), 13 deletions(-) >=20 > diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h > index 7c77b2495..95e51065f 100644 > --- a/drivers/event/sw/sw_evdev.h > +++ b/drivers/event/sw/sw_evdev.h > @@ -29,7 +29,13 @@ > /* report dequeue burst sizes in buckets */ > #define SW_DEQ_STAT_BUCKET_SHIFT 2 > /* how many packets pulled from port by sched */ > -#define SCHED_DEQUEUE_BURST_SIZE 32 > +#define SCHED_DEQUEUE_BURST_SIZE 64 > + > +#define SCHED_MIN_BURST_SIZE 8 > +#define SCHED_NO_ENQ_CYCLE_FLUSH 256 > +/* set SCHED_DEQUEUE_BURST_SIZE to 64 or 128 when setting this to 1*/ > +#define SCHED_REFILL_ONCE_PER_CALL 1 Is it possible to make the above #define a runtime option? Eg, --vdev event_sw,refill_iter=3D1 That would allow packaged versions of DPDK to be usable in both modes. > + >=20 > #define SW_PORT_HIST_LIST (MAX_SW_PROD_Q_DEPTH) /* size of our history l= ist > */ > #define NUM_SAMPLES 64 /* how many data points use for average stats */ > @@ -214,6 +220,9 @@ struct sw_evdev { > uint32_t xstats_count_mode_port; > uint32_t xstats_count_mode_queue; >=20 > + uint16_t sched_flush_count; > + uint16_t sched_min_burst; > + > /* Contains all ports - load balanced and directed */ > struct sw_port ports[SW_PORTS_MAX] __rte_cache_aligned; >=20 > diff --git a/drivers/event/sw/sw_evdev_scheduler.c > b/drivers/event/sw/sw_evdev_scheduler.c > index cff747da8..ca6d1caff 100644 > --- a/drivers/event/sw/sw_evdev_scheduler.c > +++ b/drivers/event/sw/sw_evdev_scheduler.c > @@ -26,6 +26,29 @@ > /* use cheap bit mixing, we only need to lose a few bits */ > #define SW_HASH_FLOWID(f) (((f) ^ (f >> 10)) & FLOWID_MASK) >=20 > + > +/* single object enq and deq for non MT ring */ > +static __rte_always_inline void > +sw_nonmt_ring_dequeue(struct rte_ring *r, void **obj) > +{ > + if ((r->prod.tail - r->cons.tail) < 1) > + return; > + void **ring =3D (void **)&r[1]; > + *obj =3D ring[r->cons.tail & r->mask]; > + r->cons.tail++; > +} > +static __rte_always_inline int > +sw_nonmt_ring_enqueue(struct rte_ring *r, void *obj) > +{ > + if ((r->capacity + r->cons.tail - r->prod.tail) < 1) > + return 0; > + void **ring =3D (void **)&r[1]; > + ring[r->prod.tail & r->mask] =3D obj; > + r->prod.tail++; > + return 1; > +} > + > + > static inline uint32_t > sw_schedule_atomic_to_cq(struct sw_evdev *sw, struct sw_qid * const qid, > uint32_t iq_num, unsigned int count) > @@ -146,9 +169,9 @@ sw_schedule_parallel_to_cq(struct sw_evdev *sw, struc= t > sw_qid * const qid, > cq_idx =3D 0; > cq =3D qid->cq_map[cq_idx++]; >=20 > - } while (rte_event_ring_free_count( > - sw->ports[cq].cq_worker_ring) =3D=3D 0 || > - sw->ports[cq].inflights =3D=3D SW_PORT_HIST_LIST); > + } while (sw->ports[cq].inflights =3D=3D SW_PORT_HIST_LIST || > + rte_event_ring_free_count( > + sw->ports[cq].cq_worker_ring) =3D=3D 0); >=20 > struct sw_port *p =3D &sw->ports[cq]; > if (sw->cq_ring_space[cq] =3D=3D 0 || > @@ -164,7 +187,7 @@ sw_schedule_parallel_to_cq(struct sw_evdev *sw, struc= t > sw_qid * const qid, > p->hist_list[head].qid =3D qid_id; >=20 > if (keep_order) > - rte_ring_sc_dequeue(qid->reorder_buffer_freelist, > + sw_nonmt_ring_dequeue(qid->reorder_buffer_freelist, > (void *)&p->hist_list[head].rob_entry); >=20 > sw->ports[cq].cq_buf[sw->ports[cq].cq_buf_count++] =3D *qe; > @@ -229,7 +252,7 @@ sw_schedule_qid_to_cq(struct sw_evdev *sw) > uint32_t pkts_done =3D 0; > uint32_t count =3D iq_count(&qid->iq[iq_num]); >=20 > - if (count > 0) { > + if (count >=3D sw->sched_min_burst) { > if (type =3D=3D SW_SCHED_TYPE_DIRECT) > pkts_done +=3D sw_schedule_dir_to_cq(sw, qid, > iq_num, count); > @@ -267,7 +290,7 @@ sw_schedule_reorder(struct sw_evdev *sw, int qid_star= t, int > qid_end) >=20 > for (; qid_start < qid_end; qid_start++) { > struct sw_qid *qid =3D &sw->qids[qid_start]; > - int i, num_entries_in_use; > + unsigned int i, num_entries_in_use; >=20 > if (qid->type !=3D RTE_SCHED_TYPE_ORDERED) > continue; > @@ -275,6 +298,9 @@ sw_schedule_reorder(struct sw_evdev *sw, int qid_star= t, int > qid_end) > num_entries_in_use =3D rte_ring_free_count( > qid->reorder_buffer_freelist); >=20 > + if (num_entries_in_use < sw->sched_min_burst) > + num_entries_in_use =3D 0; > + > for (i =3D 0; i < num_entries_in_use; i++) { > struct reorder_buffer_entry *entry; > int j; > @@ -320,7 +346,7 @@ sw_schedule_reorder(struct sw_evdev *sw, int qid_star= t, int > qid_end) > if (!entry->ready) { > entry->fragment_index =3D 0; >=20 > - rte_ring_sp_enqueue( > + sw_nonmt_ring_enqueue( > qid->reorder_buffer_freelist, > entry); >=20 > @@ -349,9 +375,11 @@ __pull_port_lb(struct sw_evdev *sw, uint32_t port_id= , int > allow_reorder) > uint32_t pkts_iter =3D 0; > struct sw_port *port =3D &sw->ports[port_id]; >=20 > +#if !SCHED_REFILL_ONCE_PER_CALL > /* If shadow ring has 0 pkts, pull from worker ring */ > if (port->pp_buf_count =3D=3D 0) > sw_refill_pp_buf(sw, port); > +#endif As per above comment, this #if would become a runtime check. Similar for the below #if comments. > while (port->pp_buf_count) { > const struct rte_event *qe =3D &port->pp_buf[port->pp_buf_start]; > @@ -467,9 +495,11 @@ sw_schedule_pull_port_dir(struct sw_evdev *sw, uint3= 2_t > port_id) > uint32_t pkts_iter =3D 0; > struct sw_port *port =3D &sw->ports[port_id]; >=20 > +#if !SCHED_REFILL_ONCE_PER_CALL > /* If shadow ring has 0 pkts, pull from worker ring */ > if (port->pp_buf_count =3D=3D 0) > sw_refill_pp_buf(sw, port); > +#endif >=20 > while (port->pp_buf_count) { > const struct rte_event *qe =3D &port->pp_buf[port->pp_buf_start]; > @@ -557,12 +587,41 @@ sw_event_schedule(struct rte_eventdev *dev) > /* push all the internal buffered QEs in port->cq_ring to the > * worker cores: aka, do the ring transfers batched. > */ > + int no_enq =3D 1; > for (i =3D 0; i < sw->port_count; i++) { > - struct rte_event_ring *worker =3D sw->ports[i].cq_worker_ring; > - rte_event_ring_enqueue_burst(worker, sw->ports[i].cq_buf, > - sw->ports[i].cq_buf_count, > - &sw->cq_ring_space[i]); > - sw->ports[i].cq_buf_count =3D 0; > + struct sw_port *port =3D &sw->ports[i]; > + struct rte_event_ring *worker =3D port->cq_worker_ring; > + > +#if SCHED_REFILL_ONCE_PER_CALL > + /* If shadow ring has 0 pkts, pull from worker ring */ > + if (port->pp_buf_count =3D=3D 0) > + sw_refill_pp_buf(sw, port); > +#endif > + > + if (port->cq_buf_count >=3D sw->sched_min_burst) { > + rte_event_ring_enqueue_burst(worker, > + port->cq_buf, > + port->cq_buf_count, > + &sw->cq_ring_space[i]); > + port->cq_buf_count =3D 0; > + no_enq =3D 0; > + } else { > + sw->cq_ring_space[i] =3D > + rte_event_ring_free_count(worker) - > + port->cq_buf_count; > + } > + } > + > + if (no_enq) { > + if (unlikely(sw->sched_flush_count > > SCHED_NO_ENQ_CYCLE_FLUSH)) > + sw->sched_min_burst =3D 1; > + else > + sw->sched_flush_count++; > + } else { > + if (sw->sched_flush_count) > + sw->sched_flush_count--; > + else > + sw->sched_min_burst =3D SCHED_MIN_BURST_SIZE; > } >=20 > } > -- > 2.17.1