From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C5EEE431FC; Wed, 25 Oct 2023 14:29:50 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B27B5402DC; Wed, 25 Oct 2023 14:29:50 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id C154F402D4 for ; Wed, 25 Oct 2023 14:29:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698236989; x=1729772989; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=JIOJL/xhFYK/E0KPH0vcCsm91voDhkRicDZQSgXB0SE=; b=h4xInGWYOO+zo18SQmYqpqiV/C4D9h4XRkc8kTp939Ll+TOmzPRsA6Jx 7NiZHHqON7XMeOkkQJWLZqM9thM8TGrLJ6O9UHAqxCE8Hnc0neot1PzDo 9QsLTkfArNTs3F01e7NOWWEwK/geVSdcSvieGkl9/fXfRyugHVn0Zw7c3 DV3uc7fHkyWLYCx6s6EQsYlfnkUDhsWNRYEjkkcxfH6mJtGYZ2FwRM+dB nDyB/ltN2R6+hOJiUsKQ0RJG5WS6Dx0gecLH8Q4Il/u2SMTKVvRZyBk8/ ub86VDKBR861rzaJryVDpSoUrmYMzMkEpCUSOKCKDawsTEjfKn+xRjY+S Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10873"; a="473531554" X-IronPort-AV: E=Sophos;i="6.03,250,1694761200"; d="scan'208";a="473531554" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2023 05:29:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10873"; a="1005995209" X-IronPort-AV: E=Sophos;i="6.03,250,1694761200"; d="scan'208";a="1005995209" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by fmsmga006.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 25 Oct 2023 05:29:47 -0700 Received: from orsmsx611.amr.corp.intel.com (10.22.229.24) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Wed, 25 Oct 2023 05:29:46 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX611.amr.corp.intel.com (10.22.229.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Wed, 25 Oct 2023 05:29:46 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34 via Frontend Transport; Wed, 25 Oct 2023 05:29:46 -0700 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.101) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.32; Wed, 25 Oct 2023 05:29:46 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Zpfd+d7lzItQD6k/gLre47xJoXic7erCrY0hBScQ/E9e+uoA3V6euG64DT58uX+b34RMp4SnEUNJsZbrkS8pu8GQlbmuZQOOy3ITMNxBBgOE1hOhUUWQ8Fm8coCpNu95/wqCGN9y8bA1ZCmRqKjUjZ4nu6KfgtQ2w57bNP4RUTzYKRLFZxmMipG8PLXpJiI8QKF79/RcqejgXT61y4/GESUA03niusvjFTtOQXQ/GlU0uhZisCFjfHYkNpXy3u/mG1E+E7VxPsLnJQa2PDWaoleqrUTv4YeymLKIhAKfNC+JgEb+Cyv85Qj09v6/fguAJzkVVp26ATfceyhhvt7mCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JoNZpJmSRXtiAeWi5r9K9ytVMOeWoI2DmUgVXutmjOQ=; b=H9qX8Sa5VL11ZfMaVMp7ksqmF/CuLIfIVHLcs8TABnBc+IxGVjCObJAxvzfNF/BFd8kzC6jr07xEb5EkneJ5v6fuhr14QpGe/UY+PLLolpq61kDCDoUlf/Iw2DSL5QuUo5yIxVd6dtdjr0R+kykDD3CgJ/R1pxUwTMrucPb9KhuzvIg0SifSarIsB9Kb4QPZBetS0zTSdV1x7QNB8TKkdJ+YKC7CI4KiVxVxg2c5nf8MFbvsuU+ljqegGbb3QSb9oRsDp3G90hi6KZUMGriDU9adP/05ct3IOWFYTPabjuGRDNCk08ujT/XMnzfu0L0lwtEIgnvZf7XevfUrgr01Sw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by CY5PR11MB6366.namprd11.prod.outlook.com (2603:10b6:930:3a::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.35; Wed, 25 Oct 2023 12:29:39 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::d70b:11a0:d28f:ec44]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::d70b:11a0:d28f:ec44%6]) with mapi id 15.20.6907.032; Wed, 25 Oct 2023 12:29:39 +0000 Date: Wed, 25 Oct 2023 13:29:32 +0100 From: Bruce Richardson To: Mattias =?iso-8859-1?Q?R=F6nnblom?= CC: "dev@dpdk.org" , Jerin Jacob , "Peter Nilsson" , , Harry van Haaren , Abdullah Sevincer , Mattias =?iso-8859-1?Q?R=F6nnblom?= Subject: Re: Eventdev dequeue-enqueue event correlation Message-ID: References: <1b36dd59-d9d0-434c-922f-6e0df8cafe7d@lysator.liu.se> Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1b36dd59-d9d0-434c-922f-6e0df8cafe7d@lysator.liu.se> X-ClientProxiedBy: DU2PR04CA0245.eurprd04.prod.outlook.com (2603:10a6:10:28e::10) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|CY5PR11MB6366:EE_ X-MS-Office365-Filtering-Correlation-Id: 04bfae9b-88d6-4148-bfc9-08dbd5560e5d X-LD-Processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 7DCCBdUxvYYKY3964JDrOqf9TKIrXqh+HB+nPT/XBhANOG1SehPTpSahhXPUvthTydSL/860HvXM6UsCe8W1BDocNjyzpNHunXYa1vyyEAeMlXbQ19cq6uT2TWcGhrH4FX2pc6af8szYcEEn8dMe7ZkLaVMPhl4M275+cmTV+WhzHm+Sg2HSITqVCEME7+NdQ/4+eNT5crJdDY26JvjNfXWvde/8jXcKqYg0OT6o4oUycFFz+9zTY+EojxVe02eHxtVUSOoo0Tf64ciox65KRaBRoutendw4QIIwmOEGjQ503nH3C2Q3Dd9TCe4wYGhJm+1rMNjJ1cVkBi4E6Pm74+fZhXTCg70iomqoo+goV3ru2HcuQFEVuhlED7ST/lXC7qWJ6YOji6afZ18mjdxSo8nzAJTliBcs/TZFg008CilIi5stA26syTJ+lBgOdkdzw30ALiXvnFSNfYwh+PPW1KOO0cmNFe9WUepCcRdq4004DBCPELZe4p/iKLsUAB1u0dz4M54TKGcJ0uPNkvbUeY5RwU8gPKlzbR6uxAwVNzOGa2/SuHNUNZ7SkZkVghIC X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(346002)(366004)(376002)(396003)(136003)(230922051799003)(64100799003)(451199024)(1800799009)(186009)(66476007)(6916009)(44832011)(66556008)(6666004)(54906003)(66946007)(6486002)(5660300002)(2906002)(8936002)(4326008)(66574015)(41300700001)(8676002)(4001150100001)(316002)(83380400001)(82960400001)(296002)(6512007)(26005)(3480700007)(86362001)(6506007)(38100700002)(53546011)(478600001); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?iso-8859-1?Q?S8EqvB03t+rsON4ihDS26+JNQgebiA4H1pCy6TMIiKQSvB+bSIe2Vv618B?= =?iso-8859-1?Q?MxVFZQ9QkxtNLoE1XnGyP9zEcAAy+Gm7xvW4bDkmL2s5xPgSrPOG7fraz3?= =?iso-8859-1?Q?8TUoAzMeT5iArp+7IMdq4tiJYKxsuhIdM/2L1XT1A8borQKMf/zoUEhnve?= =?iso-8859-1?Q?4jTt2HN9IHPyW6XtXYcsMMIp+EU08qdYGvPSaB8XuGUwci4vBnV6Ne3DRa?= =?iso-8859-1?Q?cWjEWg/2i1wClFeyK/IvuPJjdAZRmvMSnqHeUEoTpa7Yt2TqizdLZOG10v?= =?iso-8859-1?Q?7d7724Am30lu25vTbCD7aPlBtbhqktDFLfPGBoUhSBIJlxoqbLopNHjtfs?= =?iso-8859-1?Q?p/h+XZddPS7TjKYLUofwXAxXhX02uHvZI/LvpEjokbvt64QPswrnn8xL9X?= =?iso-8859-1?Q?hpMFtoijRTVcYeXBVnmAxWDd14d8XpbxKgNDXa5lWES61wLKm3lA/VXoXa?= =?iso-8859-1?Q?e8GP9hz7tr5idcdbsrn1KjZwgaRUhsg/VyAWWoHw4XsFW7v3wQHl4MACTt?= =?iso-8859-1?Q?wosEH3TOZStGHL8RFw8tfS7HDJKogvEfbpIVECr86OZo3CfSKIZ8BeDRl+?= =?iso-8859-1?Q?9xHDKhNgWyVCc/PftwGKb6e9EIfw5A7HffVrDxXKo9svfJaahzJDtu9KPO?= =?iso-8859-1?Q?y86lyHwK4bXbhMEFWVcwKck1iLwVYLT7di2OLoULllenlxLsK5BTM0RwJs?= =?iso-8859-1?Q?FmC1kuqIOCIdADlxqpsdJsuWQIJ2vM+8VXQhtH7Ph+AZdFqU0ZvU3z2evL?= =?iso-8859-1?Q?OYlZQlEZIXjRNuDtJkHJTW7HSk4c8lZwB5YNb7cBn2t5jGkbF1BqhlfCEj?= =?iso-8859-1?Q?DXWcFuT6VBTbOSoIJrzArgbFfR5Q4jnoj8CkgaCYG/SlNe+A+oMFAVK7M9?= =?iso-8859-1?Q?W5lEfG/6j8zOqr1zKUGYO/edavjtOnaTjTweZKng2q7JRSJ2gmRwNW10oO?= =?iso-8859-1?Q?YVG7a9lbMnmdF41zt8y9t7wORfGsPRpOWYc0SWd5uYMXamD63ir07PpIV+?= =?iso-8859-1?Q?qz6RjD92vPaOdEDWs3v8uHhD7hKatGREG5XrsmH59zm4dCm/wqKRAkFvFJ?= =?iso-8859-1?Q?zu2VHOvWSMJykeMxtgMzOFerdBfMrxoZydz7Be2fReo+yacA4SS++mYtiZ?= =?iso-8859-1?Q?gNWZiYccYf/Hru9E+j68XKY/qLtVk4bhsLAzqb9k/fdghaoIFMaWUZKp22?= =?iso-8859-1?Q?ipjDCZEd3/pTT0BX3BBhIeoYB83Fcya14Wv/iLAyDTh2UGQe/MPELC6jBO?= =?iso-8859-1?Q?Vt6BRk5X4+DZNmrjTbOORXI2Zdn/c2g45m1v3wpSQRc32ybjnJsSzHLm8y?= =?iso-8859-1?Q?X5qNxtTXGMFQhOlVs9bVzqgWhobsEl0rrHn7SNtxQs9yCbRqRlt5ZnIZJ0?= =?iso-8859-1?Q?hC2S7oVFuMBIjeTTr8IcLyydvgISAoQ4qRzq7jG6g5yAbSJG1yjd5nWn8a?= =?iso-8859-1?Q?VNsOI50q2IczBxlW+XdMjd9fnoogt0Ac84DsM9YG1gsGW/Ii6ghkGKyNpT?= =?iso-8859-1?Q?wKm9s3iEAm2J0OOdFiXmgfaJu8XEpZJO1DRloOmNhncqe3tHlPzRoZwNsc?= =?iso-8859-1?Q?Np/Zc3orc4DYEJLLGeW5AGzfB0vCwhMwSvTCM8qKIIVC4YwDNJV/owui5l?= =?iso-8859-1?Q?uhQS6ZvamajDWNTjotLjpGqDOAucTtIpgNY+c2y920cjJOTcgUyw3cVw?= =?iso-8859-1?Q?=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 04bfae9b-88d6-4148-bfc9-08dbd5560e5d X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Oct 2023 12:29:39.0312 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ucWkrbAnvZbZBk9zfMBCUNn6eZJK0L+dLc1WzO395svycc3z+kL0reZpFjpK3h5PHGUXFcCgthsuFMgKmYQXoFWyCGmRe6nrl82VbrjMXA0= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR11MB6366 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, Oct 25, 2023 at 09:40:54AM +0200, Mattias Rönnblom wrote: > On 2023-10-24 11:10, Bruce Richardson wrote: > > On Tue, Oct 24, 2023 at 09:10:30AM +0100, Bruce Richardson wrote: > > > On Mon, Oct 23, 2023 at 06:10:54PM +0200, Mattias Rönnblom wrote: > > > > Hi. > > > > > > > > Consider an Eventdev app using atomic-type scheduling doing something like: > > > > > > > > struct rte_event events[3]; > > > > > > > > rte_event_dequeue_burst(dev_id, port_id, events, 3, 0); > > > > > > > > /* Assume three events were dequeued, and the application decides > > > > * it's best off to processing event 0 and 2 consecutively */ > > > > > > > > process(&events[0]); > > > > process(&events[2]); > > > > > > > > events[0].queue_id++; > > > > events[0].op = RTE_EVENT_OP_FORWARD; > > > > events[2].queue_id++; > > > > events[2].op = RTE_EVENT_OP_FORWARD; > > > > > > > > rte_event_enqueue_burst(dev_id, port_id, &events[0], 1); > > > > rte_event_enqueue_burst(dev_id, port_id, &events[2], 1); > > > > > > > > process(&events[1]); > > > > events[1].queue_id++; > > > > events[1].op = RTE_EVENT_OP_FORWARD; > > > > > > > > rte_event_enqueue_burst(dev_id, port_id, &events[1], 1); > > > > > > > > If one would just read the Eventdev API spec, they might expect this to work > > > > (especially since impl_opaque hints as potentially be useful for the purpose > > > > of identifying events). > > > > > > > > However, on certain event devices, it doesn't (and maybe rightly so). If > > > > event 0 and 2 belongs to the same flow (queue id + flow id pair), and event > > > > 1 belongs to some other, then this other flow would be "unlocked" at the > > > > point of the second enqueue operation (and thus be processed on some other > > > > core, in parallel). The first flow would still be needlessly "locked". > > > > > > > > Such event devices require the order of the enqueued events to be the same > > > > as the dequeued events, using RTE_EVENT_OP_RELEASE type events as "fillers" > > > > for dropped events. > > > > > > > > Am I missing something in the Eventdev API documentation? > > > > > > > > > > Much more likely is that the documentation is missing something. We should > > > explicitly clarify this behaviour, as it's required by a number of drivers. > > > > > > > Could an event device use the impl_opaque field to track the identity of an > > > > event (and thus relax ordering requirements) and still be complaint toward > > > > the API? > > > > > > > > > > Possibly, but the documentation also doesn't report that the impl_opaque > > > field must be preserved between dequeue and enqueue. When forwarding a > > > packet it's well possible for an app to extract an mbuf from a dequeued > > > event and create a new event for sending it back in to the eventdev. For > > Such a behavior would be in violation of a part of the Eventdev API contract > actually specified. The rte_event struct documentation says about > impl_opaque that "An implementation may use this field to hold > implementation specific value to share between dequeue and enqueue > operation. The application should not modify this field. " > > I see no other way to read this than that "an implementation" here is > referring to an event device PMD. The requirement that the application can't > modify this field only make sense in the context of "from dequeue to > enqueue". > Yep, you are completely correct. For some reason, I had this in my head the other way round, that it was for internal use between the enqueue and dequeue. My mistake! :-( > > > example, if the first stage post-RX is doing classify, it's entirely > > > possible for every single field in the event header to be different for the > > > event returned compared to dequeue (flow_id recomputed, event type/source > > > adjusted, target queue_id and priority updated, op type changed to forward > > > from new, etc. etc.). > > > > > > > What happens if a RTE_EVENT_OP_NEW event is inserted into the mix of > > > > OP_FORWARD and OP_RELEASE type events being enqueued? Again I'm not clear on > > > > what the API says, if anything. > > > > > > > OP_NEW should have no effect on the "history-list" of events previousl > > > dequeued. Again, our docs should clarify that explicitly. Thanks for > > > calling all this out. > > > > > Looking at the docs we have, I would propose adding a new subsection "Event > > Operations", as section 49.1.6 to [1]. There we could explain "New", > > "Forward" and "Release" events - what they mean for the different queue > > types and how to use them. That section could also cover the enqueue > > ordering rules, as the use of event "history" is necessary to explain > > releases and forwards. > > > > This seem reasonable? If nobody else has already started on updating docs > > for this, I'm happy enough to give it a stab. > > > > Batch dequeues not only provides an opportunity to amortize per-interaction > overhead with the event device, it also allows the application to reshuffle > the order in which it decides to process the events. > > Such reshuffling may have a very significant impact on performance. At a > minimum, cache locality improves, and in case the app is able to "vector > processing" (e.g., something akin to what fd.io VPP does), the gains may be > further increased. > > One may argue the app/core should just "do what it's told" by the event > device. After all, an event device is a work scheduler, and reshuffling > items of work certainly counts as (micro-)scheduling work. > > However it's much to hope for to expect a fairly generic function, > especially if it comes in the form of hardware, with a design frozen years > ago, to be able to arrange the work in whatever is currently optimal order > for one particular application. > > What such an app can do (or must do, if it has efficiency constraints) is to > buffer the events on the output side, rearranging them in accordance to the > yet-seemingly-undocumented Eventdev API contract. That's certainly possible, > and not very difficult, but it seems to me that this really is the job > something in the platform (e.g., in Eventdev or the event device PMD). > > One way out of this could be to add an "implicit release-*only*" mode of > operation for eventdev. > > In such a mode, the RTE_SCHED_TYPE_ATOMIC per-flow "lock" (and its ORDERED > equivalent, if there is one) would be held until the next dequeue. In such a > mode, the difference between OP_FORWARD and OP_NEW events would just be the > back-pressure watermark (new_event_threshold). > > That pre-rte_event_enqueue_burst() buffering would prevent the event device > from releasing "locks" that could otherwise be released, but the typical > cost of event device interaction is so high so I have my doubts about how > useful that feature is. If you are worried about "locks" held for a long > time, one may need to use short bursts anyway (since worst-case critical > section length is not reduced by such RELEASEs). > > Another option would be to have the current RTE_EVENT_DEV_CAP_BURST_MODE > capable PMDs start using the "impl_opaque" field for the purpose of matching > in and out events. It would require applications to actually start adhering > to the "don't touch impl_opaque" requirement of the Eventdev API. > > Those "fixes" are not mutually exclusive. > > A side note: it's unfortunate there are no bits in the rte_event struct that > can be used for "event id"/"event SN"/"event dequeue idx" type information, > if an app would like to work around this issue with current PMDs. > Lots of good points here. We'll take a look and see what we can do in our drivers and any other ideas or suggestions. /Bruce