From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 853D7A0597; Thu, 9 Apr 2020 18:47:13 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 366A81D15C; Thu, 9 Apr 2020 18:47:12 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id BFB681D159; Thu, 9 Apr 2020 18:47:07 +0200 (CEST) IronPort-SDR: xkX1/G9tHs1pNEShlFm2i7wnh+vgMdpneP8EtQrb+MD0CTvk9EEQn0VePSokrEknRBsawkXSDO 3Nv+O+Fgm+DA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2020 09:47:06 -0700 IronPort-SDR: 9CiQuT8Nu6zk4Vdg7Ma07duY0kqm7oQdNcl3e+n7gkWXVVxFMvt5Q/E5FgubOrE88pUWzb4BLI AhHbaBmRA7IA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,363,1580803200"; d="scan'208";a="398633593" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga004.jf.intel.com with ESMTP; 09 Apr 2020 09:47:00 -0700 Received: from fmsmsx607.amr.corp.intel.com (10.18.126.87) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 9 Apr 2020 09:46:16 -0700 Received: from fmsmsx607.amr.corp.intel.com (10.18.126.87) by fmsmsx607.amr.corp.intel.com (10.18.126.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 9 Apr 2020 09:46:16 -0700 Received: from FMSEDG001.ED.cps.intel.com (10.1.192.133) by fmsmsx607.amr.corp.intel.com (10.18.126.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Thu, 9 Apr 2020 09:46:16 -0700 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (104.47.56.170) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 9 Apr 2020 09:46:16 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PdYxmVoUJ3UmM+tBTqL52TFqYpCR/mgnOC8FLFvDLdmUXZjCExVCc9w+s8XKXv+6/nbSab4vEuooGOZyP5/wsqSPyOpqvsuRiqsxOxuCB/L8zmCjbBt+ZA7Pz57Wa77RkEnZhpzzZlQulOLIKR2r9d72Du5GLpb3etum3F9W5O+88uw4CM6Nytkc4qRRTVX7AQRFSAztFhg2YKj5wNWBSZZbCvtu590jaYgQL95bF8o4tZscIcjy5kN9rekHgoOvuvfJDFNe6C8TNlmqwgTkL1/5qp9JCLg2BF3rWRQGASO5dqJ1Bpvxc7nyLQJhaNcENeu/u8Ou/7d3FUpJerhyRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VqkXuzF9PxqtoidKw1Y0LZO7aAfir2lF6Kwl29ryV30=; b=JFkgI8onullRWCFoJssRud7ogFyVjexoGtF8y3S4J5bUzpyDV5Esgbh0hJvf2EMvHnAKiTFSpHiaziiFldNrprw004XSYj+nxFIxNsg9j8bdzi6720xIVm9i/DcI+6wdE7XLTYDTRfi+4z314KFLsi79+cvYa7pgi8iaZc9dQt2UdZb0DV+7GxFGoBNwEm0lkDjX+5SNzXhDMyytCRLYLRjSYaKrarRfsUg1NY4jFB1v48LECMb3Ddt4crzLfTbFedbl0NrDYtwdDPSy3QXAbS9B+ogWQgSZfSX71bpNT+cAuLMAUZaTe6i2C1EpnQ6mcRUxs4Y620vuY7/uWlaVCg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VqkXuzF9PxqtoidKw1Y0LZO7aAfir2lF6Kwl29ryV30=; b=PZi+Z3VORS5nZG+5cVKaLVLMN+EZSI+61Yk5tDwlJ4o0cVsqx676AHKYqTJT6x8aX9X60iarWnHNIGn5r9AoUjIHLMFCLRLRjMoUpiDpver5Dd+0S1/5sa0jmkIlTmjbJXvixi4DX4XHvmZwTanV3ridTev+fRot+f/az9ZnYUk= Received: from BYAPR11MB3143.namprd11.prod.outlook.com (2603:10b6:a03:92::32) by BYAPR11MB3430.namprd11.prod.outlook.com (2603:10b6:a03:89::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2878.17; Thu, 9 Apr 2020 16:46:14 +0000 Received: from BYAPR11MB3143.namprd11.prod.outlook.com ([fe80::5917:9757:49b0:3c49]) by BYAPR11MB3143.namprd11.prod.outlook.com ([fe80::5917:9757:49b0:3c49%5]) with mapi id 15.20.2878.021; Thu, 9 Apr 2020 16:46:14 +0000 From: "Van Haaren, Harry" To: Honnappa Nagarahalli , Phil Yang , "thomas@monjalon.net" , "Ananyev, Konstantin" , "stephen@networkplumber.org" , "maxime.coquelin@redhat.com" , "dev@dpdk.org" CC: "david.marchand@redhat.com" , "jerinj@marvell.com" , "hemant.agrawal@nxp.com" , Gavin Hu , Ruifeng Wang , Joyce Kong , nd , "stable@dpdk.org" , nd Thread-Topic: [PATCH v3 09/12] service: avoid race condition for MT unsafe service Thread-Index: AQHV+/one+9ki+cHakWt/XJq7A5CNahnSwOQgAITmoCABkCLIIAAhjMAgAD74QA= Date: Thu, 9 Apr 2020 16:46:14 +0000 Message-ID: References: <1583999071-22872-1-git-send-email-phil.yang@arm.com> <1584407863-774-1-git-send-email-phil.yang@arm.com> <1584407863-774-10-git-send-email-phil.yang@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 authentication-results: spf=none (sender IP is ) smtp.mailfrom=harry.van.haaren@intel.com; x-originating-ip: [192.198.151.188] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 2d5dd383-37f0-4650-f08f-08d7dca58484 x-ms-traffictypediagnostic: BYAPR11MB3430: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4714; x-forefront-prvs: 0368E78B5B x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3143.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(10019020)(346002)(376002)(396003)(366004)(39860400002)(136003)(8936002)(64756008)(33656002)(81156014)(66476007)(26005)(52536014)(54906003)(86362001)(5660300002)(8676002)(110136005)(4326008)(7416002)(316002)(186003)(66556008)(6506007)(66446008)(9686003)(66946007)(30864003)(81166007)(76116006)(53546011)(478600001)(7696005)(2906002)(71200400001)(55016002); DIR:OUT; SFP:1102; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: zbOiiHiEyIQuS/3v47f4Hdj4O9sRtqu+eFqvXOnEsAZ9qs9O6LQLrcEOYfY1GyzYZRoCb3S2Qw6dvz78rgTVrcmkQ7k1/JRJwWvc+evOB3EJjZueVvN5ds5bbgUOR/9OjYxTkcybHyur19iK1UZbR90bzH7Q2xTLR6FoWVuhVpsHW133ZEEdii3u7KbcXyLxfNXZWAPjWTwVuu657Z7+Uu83uUZDmDj4TpjiJKOdrWDoWnEn5KDvZj4u+8heAAELML1b0SRYFXMjIGoSkYQDAWQMdRbA7vFSF35Xh/LYrm1FD9jxQTS7NY2yAilyzpXDoa16/8OLIZogik7nMhxL7a3A4iN/CpEGtuod4fLPiE0J/R/Y7KPA/Fvt9Oc7aLEZ07KYG9JMAMOio4ndzI/Xy/Ks24Wj4prK2EpgTrQ06ubbzC+0BT7oR0cMUPLQymu8 x-ms-exchange-antispam-messagedata: uGp92cGG1wmM0Ux5FYzzSVPBGkdRlS0nzRqUDQRb0TbXGlzG/wmobT3FbOxXi3MM3L1ytR2u1dXW+DpKpgzifNKtsMzpmI7pbdeLIYopr4WMCFzvaKpBs/uXNXnORuXziniPGAaqUr65AHSTKw3RqA== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 2d5dd383-37f0-4650-f08f-08d7dca58484 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Apr 2020 16:46:14.2386 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: sXP2F/6eRven86SZskOtOJeGkvoDMh5UxNKxqLgxKF+nRS+ziupWfwr3KYgeApDNsG3uzlcWTTslyBPKFZslAVSUPz6voDuYw7Di8KWnm5s= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3430 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v3 09/12] service: avoid race condition for MT unsafe service X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Honnappa Nagarahalli > Sent: Thursday, April 9, 2020 2:32 AM > To: Van Haaren, Harry ; Phil Yang > ; thomas@monjalon.net; Ananyev, Konstantin > ; stephen@networkplumber.org; > maxime.coquelin@redhat.com; dev@dpdk.org > Cc: david.marchand@redhat.com; jerinj@marvell.com; hemant.agrawal@nxp.com= ; > Gavin Hu ; Ruifeng Wang ; Joyce K= ong > ; nd ; stable@dpdk.org; Honnappa Nagaraha= lli > ; nd > Subject: RE: [PATCH v3 09/12] service: avoid race condition for MT unsafe > service >=20 > >=20 > > > > > > > > Subject: [PATCH v3 09/12] service: avoid race condition for MT > > > > > unsafe service > > > > > > > > > > From: Honnappa Nagarahalli > > > > > > > > > > There has possible that a MT unsafe service might get configured > > > > > to run on another core while the service is running currently. > > > > > This might result in the MT unsafe service running on multiple > > > > > cores simultaneously. Use 'execute_lock' always when the service > > > > > is MT unsafe. > > > > > > > > > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > > > > Cc: stable@dpdk.org > > > > > > > > > > Signed-off-by: Honnappa Nagarahalli > > > > > Reviewed-by: Phil Yang > > > > > Reviewed-by: Gavin Hu > > > > > > > > We should put "fix" in the title, once converged on an implementati= on. > > > Ok, will replace 'avoid' with 'fix' (once we agree on the solution) > > > > > > > > > > > Regarding Fixes and stable backport, we should consider if fixing > > > > this in > > > stable > > > > with a performance degradation, fixing with more complex solution, > > > > or documenting a known issue a better solution. > > > > > > > > > > > > This fix (always taking the atomic lock) will have a negative > > > > performance impact on existing code using services. We should > > > > investigate a way to fix > > > it > > > > without causing datapath performance degradation. > > > Trying to gauge the impact on the existing applications... > > > The documentation does not explicitly disallow run time mapping of > > > cores to service. > > > 1) If the applications are mapping the cores to services at run time, > > > they are running with a bug. IMO, bug fix resulting in a performance > > > drop should be acceptable. > > > 2) If the service is configured to run on single core > > > (num_mapped_cores =3D=3D 1), but service is set to MT unsafe - this w= ill > > > have a (possible) performance impact. > > > a) This can be solved by setting the service to MT safe and can be > > > documented. This might be a reasonable solution for applications whic= h > > > are compiling with > > > future DPDK releases. > > > b) We can also solve this using symbol versioning - the old version > > > of this function will use the old code, the new version of this > > > function will use the code in > > > this patch. So, if the application is run with > > > future DPDK releases without recompiling, it will continue to use the > > > old version. If the application is compiled > > > with future releases, they can use solution in 2a. > > > We also should think if this is an appropriate solution as this would > > > force 1) to recompile to get the fix. > > > 3) If the service is configured to run on multiple cores > > > (num_mapped_cores > 1), then for those applications, the lock is bein= g > > > taken already. These applications might see some improvements as this > > > patch removes few instructions. > > > > > > > > > > > I think there is a way to achieve this by moving more checks/time t= o > > > > the control path (lcore updating the map), and not forcing the > > > > datapath lcore to always take an atomic. > > > I think 2a above is the solution. > > > > 2a above is e.g. the Eventdev SW routines like Rx/Tx scheduler services= . > I scanned through the code briefly > I see that Eth RX/TX, Crypto adapters are setting the MT_SAFE capabilitie= s, > can be ignored. > Timer adaptor and some others do not set MT_SAFE. Seems like the cores to= run > on are mapped during run time. But it is not clear to me if it can get ma= pped > to run on multiple cores. If they are, they are running with the bug. EAL will map each service to a single lcore. It will "round-robin" if there are more services than service-lcores to run them on. So agree that DPDK's default mappings will not suffer this issue. > But, these are all internal to DPDK and can be fixed. > Are there no performance tests in these components that we can run? > > > We should strive to not reduce datapath performance at all here. > > > > > > > > In this particular case, we have a counter for number of iterations > > > > that a > > > Which counter are you thinking about? > > > All the counters I checked are not atomic operations currently. If we > > > are going to use counters they have to be atomic, which means > > > additional cycles in the data path. > > > > I'll try to explain the concept better, take this example: > > - One service core is mapped to a MT_UNSAFE service, like event/sw pmd > > - Application wants to map a 2nd lcore to the same service > > - You point out that today there is a race over the lock > > -- the lock is not taken if (num_mapped_lcores =3D=3D 1) > > -- this avoids an atomic lock/unlock on the datapath > > > > To achieve our desired goal; > > - control thread doing mapping performs the following operations > > -- write service->num_mapped_lcores++ (atomic not required, only si= ngle- > > writer allowed by APIs) > This has to be atomic because of rte_service_run_iter_on_app_lcore API. > Performance should be fine as this API is not called frequently. But need= to > consider the implications of more than one thread updating num_mapped_cor= es. >=20 > > -- MFENCE (full st-ld barrier) to flush write, and force later load= s to > issue > > after > I am not exactly sure what MFENCE on x86 does. On Arm platforms, the full > barrier (DMB ISH) just makes sure that memory operations are not re-order= ed > around it. It does not say anything about when that store is visible to o= ther > cores. It will be visible at some point in time to cores. > But, I do not think we need to be worried about flushing to memory. >=20 > > -- read the "calls_per_service" counter for each lcores, add them u= p. > This can be trimmed down to the single core the service is mapped to > currently, no need to add all the counters. Correct - however that requires figuring out which lcore is running the service. Anyway, agree - it's an implementation detail as to exactly how we detect it. >=20 > > ---- Wait :) > > -- re-read the "calls_per_service", and ensure the count has change= d. > Basically, polling. This causes more traffic on the interconnect between = the > cores. But might be ok since this API might not be called frequently. Agree this will not be called frequently, and that some polling here will not be a problem. > > ---- The fact that the calls_per_service has changed ensures the > service- > > lcore > > has seen the new "num_mapped_cores" value, and has now taken t= he > > lock! > > -- *now* it is safe to map the 2nd lcore to the service > > > > There is a caveat here that the increments to the "calls_per_service" > variable > > must become globally-observable. To force this immediately would requir= e a > > write memory barrier, which could impact datapath performance. Given th= e > > service is now taking a lock, the unlock() thereof would ensure the > > "calls_per_service" > > is flushed to memory. > If we increment this variable only when the lock is held, we should be fi= ne. > We could have a separate variable. Sure, if a separate variable is preferred that's fine with me. > > Note: we could use calls_per_service, or add a new variable to the serv= ice > > struct. > > Writes to this do not need to be atomic, as it is either mapped to a si= ngle > core, > > or else there's a lock around it. > I think it is better to have a separate variable that is updated only whe= n the > lock is held. > I do not see any change in API sequence. We do this hand-shake only if th= e > service is running (which is all controlled in the writer thread), correc= t? Yes this increment can be localized to just the branch when the unlock() oc= curs, as that is the only time it could make a difference. > This does not solve the problem with rte_service_run_iter_on_app_lcore ge= tting > called on multiple cores concurrently for the same service. Agreed. This "on_app_lcore" API was an addition required to enable unit-tes= ting in a sane way, to run iterations of eg Eventdev PMD. I am in favor of documenting that the application is responsible to ensure the service being run on a specific application lcore is not concurrently running on another application lcore. > > > > service has done. If this increments we know that the lcore running > > > > the service has re-entered the critical section, so would see an > > > > updated "needs atomic" flag. > > > > > > > > This approach may introduce a predictable branch on the datapath, > > > > however the cost of a predictable branch vs always taking an atomic > > > > is order(s?) of magnitude, so a branch is much preferred. > > > > > > > > It must be possible to avoid the datapath overhead using a scheme l= ike > > this. > > > It > > > > will likely be more complex than your proposed change below, howeve= r > > > > if it avoids datapath performance drops I feel that a more complex > > > > solution is worth investigating at least. > > > > > I do not completely understand the approach you are proposing, may be > > > you can elaborate more. > > > > Expanded above, showing a possible solution that does not require addit= ional > > atomics on the datapath. > > > > > > > But, it seems to be based on a counter approach. Following is my > > > assessment on what happens if we use a counter. Let us say we kept > > > track of how many cores are running the service currently. We need an > > > atomic counter other than 'num_mapped_cores'. Let us call that counte= r > > 'num_current_cores'. > > > The code to call the service would look like below. > > > > > > 1) rte_atomic32_inc(&num_current_cores); /* this results in a full > > > memory barrier */ > > > 2) if (__atomic_load_n(&num_current_cores, __ATOMIC_ACQUIRE) =3D=3D 1= ) { > > > /* rte_atomic_read is not enough here as it does not provide the > > > required memory barrier for any architecture */ > > > 3) run_service(); /* Call the service */ > > > 4) } > > > 5) rte_atomic32_sub(&num_current_cores); /* Calling rte_atomic32_clea= r > > > is not enough as it is not an atomic operation and does not provide > > > the required memory barrier */ > > > > > > But the above code has race conditions in lines 1 and 2. It is > > > possible that none of the cores will ever get to run the service as > > > they all could simultaneously increment the counter. Hence lines 1 an= d > > > 2 together need to be atomic, which is nothing but 'compare-exchange' > > operation. > > > > > > BTW, the current code has a bug where it calls 'rte_atomic_clear(&s- > > > >execute_lock)', it is missing memory barriers which results in > > > >clearing the > > > execute_lock before the service has completed running. I suggest > > > changing the 'execute_lock' to rte_spinlock_t and using > > > rte_spinlock_try_lock and rte_spinlock_unlock APIs. > > > > I don't think a spinlock is what we want here: > > > > The idea is that a service-lcore can be mapped to multiple services. > > If one service is already being run (by another thread), we do not want= to > spin > > here waiting for it to become "free" to run by this thread, it should > continue > > to the next service that it is mapped to. > Agree. I am suggesting to use 'rte_spinlock_try_lock' (does not spin) whi= ch is > nothing but 'compare-exchange'. Since the API is available, we should mak= e use > of it instead of repeating the code. Ah apologies, I misread the spinlock usage. Sure if the spinlock_t code is preferred I'm ok with a change. It would be clean to have a separate patch in the patchset to make this change, and have it later in the set than the changes for backporting to ease integration with stable branch. > > > > A unit test is required to validate a fix like this - although > > > > perhaps found > > > by > > > > inspection/review, a real-world test to validate would give confide= nce. > > > Agree, need to have a test case. > > > > > > > > > > > > > > > Thoughts on such an approach? > > > > > >