From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 08413A0350; Tue, 30 Jun 2020 20:57:24 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 936AF1BEA7; Tue, 30 Jun 2020 20:57:23 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 39B341BEA6 for ; Tue, 30 Jun 2020 20:57:22 +0200 (CEST) IronPort-SDR: oejYEVFLAVY2DPSZgrf9FA04YmZO6Uv8XWy6WAcAvN+Xmf2gmPT1Lgn6pmz8wBnSKievU7TPx3 JROsQB+pmKgA== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="147908515" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="147908515" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 11:57:20 -0700 IronPort-SDR: fv+088gPJ3FOgq6Hs0kGXCazQFxfbvq53hnK9kvS/KxZ2FJRKEIE34Zbpu8eiQ3BnKtGtaIhBw Ap1VOeDUVl3Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="313498605" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by fmsmga002.fm.intel.com with ESMTP; 30 Jun 2020 11:57:20 -0700 Received: from FMSEDG001.ED.cps.intel.com (10.1.192.133) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.439.0; Tue, 30 Jun 2020 11:57:19 -0700 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.176) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (TLS) id 14.3.439.0; Tue, 30 Jun 2020 11:57:20 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=I58HfGJZdzP6ywbesFi/xwukW4n0y6htRnwYppmHlRCD0RpTu3kLKazcQdT9eV00F8MpF8Y0saJbD35UBftrDwUl/nZeEtWIrNbQsFNJN7NrRDTtWT3TJdzbATqNyRA5atPRLUZrTQvc5yuNzPMwXcT9PkkXHybPIgSdC+NJGZREfb7zCL+PLwXHeYStlhfxNd124nJeqWyl0KaF0zMU8LYpKKKmQqRsXRqLZ6CkYVLpb3B1x8x7cs34N52OXp9hkZdhN11QBTIFdBVCp+O5GyRMs6BavltIYEKyzkxXj3BIqOAtg7anlGxnjmZUOoQdc777E+bAIktma35nHvcc0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=D+XRiK6q3CgJ85+FuDJqhEeSSzDoR5YFpS88KQIIFIo=; b=XstBU4eev/6SGUCAQ7VXDsVQ8WzVw00l+lrNDB7En4U7Ei5HdN8VOs1myrIlAi3W+CC15nQNp87pn1f7fRqJRX71jdtWmrS9IRxHlijftQTPPCyED20my/Tdq1a7pfuh1JaTd3+2mBWNlfbbbpni3GhhVZ0ZG5CNaUfddqP1xCHMLAsSjPiu+1EtXcF6Bb+uE/CCnPzIsg6Hsj7cO86qkPj+iV0kemtUBCTuGT5ZOL+scm96JboQe51vrOkGlGO/ZTxp0PO9eB7GE4NqEWGdm5onz++DEZzuWRLf0JBa3/8D4EVk+sGUXb2jQPFNSbb6G2gb9JK+iPtIYPP0SS2Raw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=D+XRiK6q3CgJ85+FuDJqhEeSSzDoR5YFpS88KQIIFIo=; b=uuJ027fzMAfBHSRBCtuKMbSr2b3j14sFszxQM//z5lWaZ+683AjdCMs8MXHQMqvts8k+lrOvAuVWQLgDX3jPZnhmyLSOKCrVyGmNNXSYzwkp5qALZX7DXHftiRyX0a5uKAD4UtnSB0SWtNuzVVZASeD1ho3d74WyBjXofWK0rqU= Received: from BYAPR11MB3301.namprd11.prod.outlook.com (2603:10b6:a03:7f::26) by BYAPR11MB3558.namprd11.prod.outlook.com (2603:10b6:a03:b3::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3153.20; Tue, 30 Jun 2020 18:57:15 +0000 Received: from BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f160:29ab:b8f9:4189]) by BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f160:29ab:b8f9:4189%6]) with mapi id 15.20.3131.027; Tue, 30 Jun 2020 18:57:15 +0000 From: "Ananyev, Konstantin" To: Thomas Monjalon , David Marchand CC: "dev@dpdk.org" , "jerinjacobk@gmail.com" , "Richardson, Bruce" , "mdr@ashroe.eu" , "ktraynor@redhat.com" , "Stokes, Ian" , "i.maximets@ovn.org" , "Mcnamara, John" , "Kovacevic, Marko" , "Burakov, Anatoly" , Olivier Matz , "Andrew Rybchenko" , Neil Horman Thread-Topic: [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores Thread-Index: AQHWSJjc02miHDhvs0aJZlTx+B7uGKjkxSPggAEPRACAAFFvIIABW1WAgAAPV/CAAAhKgIAAEmgQgANT3wCABgPxAIAADX0QgAA1sACAADXVwA== Date: Tue, 30 Jun 2020 18:57:15 +0000 Message-ID: References: <20200610144506.30505-1-david.marchand@redhat.com> <2939263.AvGHZF5Fiy@thomas> <2881429.9YLJUJncU7@thomas> In-Reply-To: <2881429.9YLJUJncU7@thomas> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 authentication-results: monjalon.net; dkim=none (message not signed) header.d=none;monjalon.net; dmarc=none action=none header.from=intel.com; x-originating-ip: [192.198.151.182] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 0bd36023-5546-4560-f934-08d81d2767c4 x-ms-traffictypediagnostic: BYAPR11MB3558: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0450A714CB x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: u2nQ+66Bx+qz4XARWJWDaIuaSrX4yU4VCDpmdbhjQExvNq42pVbC8B3QPSU8z5GjCHQCwtqqBXLygzVrskIUB0PiUD7s1YXzPE9eK/btzS7Ym2H9b1kwQyHQxTjhD9zuyGzYNCGIoY5igc2tqPnU4CNjHwVcvdwGuH/nk/DpF3vAK6pt9uT8RmrDPnbr8ApF14BF2T+vNQHSgHNrAXp9PeNwWSnfm/fQ2Z4qcsSF8GNO8FKI/dC8AcvLOVtpFMkGr7GK3ij5dtaXjW28svegNgQChCnxId1ekpOwWyv9/EonrFVIlLO++JBHhuJFiWZNAYcU0vFf0cRMP8tIhCk66w0yF2RaSWAVHDOrtVytr4naWASEW06tbtRds31NSQ5LS6pgSqUxoZOnJxSzjbqrmA== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3301.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(396003)(346002)(366004)(136003)(39860400002)(376002)(4326008)(26005)(2906002)(8936002)(66946007)(5660300002)(6506007)(53546011)(71200400001)(66556008)(66446008)(8676002)(83380400001)(55016002)(186003)(86362001)(7416002)(9686003)(66476007)(64756008)(76116006)(110136005)(7696005)(966005)(54906003)(478600001)(52536014)(316002)(33656002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: j8kZqlOLr7jzWk/p1JueS/LACEoiBuEGwkivnn0fO8ZciWW4He6Xxi4MAz1rov7k+WJ4YnPsaUZUWkUImP2Uu7qrgq8LaKXJy1XLeKCp5Xg81zFgH3T/x0Qla9juMCJtDmHNvxo+tT/mrVIwW7DSZInYLZEGyfNoiCnG3C68dpndk5YQBDW9fiPyt2mZVs7zaCzryiBA4vys60L88UYki7aqx39Lu3lTojBJB0GJZo4Rhm80cL41M3tWn+X3B2cz7XU9SU3vcxDuBA5BxvQkdqhUAwAVSOlRnIjiKNPtMNtto3GQ4AYIwgHVF7JBYknUXPfQE1abXORHdTxzVBO7M0Jgq8aq/W2lTI9kuVexN3Dk+HVEsTgOYuWtymZMLAXzoT/Qy3YsliTSUzfeqw/Z7VhbSETCboJB9yiUlUC82ZSOEL4OOixpPi5TR2GZU6J7QYPgcQXfJ0qQqAb0mPtFF6YykthYS4nlCyDFewlPm+DdHlODIuTnpU+rt2iPcVcU Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3301.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0bd36023-5546-4560-f934-08d81d2767c4 X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Jun 2020 18:57:15.1479 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: u3KhnDPPO+zzvys9mrskg2nveakYZet9WflgCiAm9LN4ZSbJ1G6o3UzLJSGfSHGZF5HOMPr0gZAqeVopq/yqyFSlgZ/T8fDWbAmW1/pHxcI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3558 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v3 6/9] eal: register non-EAL threads as lcores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" =20 > 30/06/2020 14:07, Ananyev, Konstantin: > > > 26/06/2020 16:43, David Marchand: > > > > On Wed, Jun 24, 2020 at 1:59 PM Ananyev, Konstantin > > > > wrote: > > > > > > > Do you mean - make this new dynamic-lcore API return an error= if callied > > > > > > > from secondary process? > > > > > > > > > > > > Yes, and prohibiting from attaching a secondary process if dyna= mic > > > > > > lcore API has been used in primary. > > > > > > I intend to squash in patch 6: > > > > > > https://github.com/david-marchand/dpdk/commit/e5861ee734bfe2e4d= c23d9b919b0db2a32a58aee > > > > > > > > > > But secondary process can attach before lcore_register, so we'll = have some sort of inconsistency in behaviour. > > > > > > > > If the developer tries to use both features, he gets an ERROR log i= n > > > > the two init path. > > > > So whatever the order at runtime, we inform the developer (who did = not > > > > read/understand the rte_thread_register() documentation) that what = he > > > > is doing is unsupported. > > > > > > I agree. > > > Before this patch, pinning a thread on a random core can > > > trigger some issues. > > > After this patch, register an external thread will > > > take care of logging errors in case of inconsistencies. > > > So the user will know he is doing something not supported > > > by the app. > > > > I understand that, and return a meaningful error is definitely > > better the silent crash or memory corruption. > > The problem with that approach, as I said before, MP group > > behaviour becomes non-deterministic. >=20 > It was already non-deterministic before these patches. > > > It is an nice improvement. > > > > > > > > If we really want to go ahead with such workaround - > > > > > > It is not a workaround. > > > It is fixing some old issues and making clear what is really impossib= le. > > > > The root cause of the problem is in our MP model design decisions: > > from one side we treat lcore_id as process local data, from other side > > in some shared data-structures we use lcore_id as an index. > > I think to fix it properly we need either: > > make lcore_id data shared or stop using lcore_id as an index for shared= data. > > So from my perspective this approach is just one of possible workaround= s. > > BTW, there is nothing wrong to have a workaround for the problem > > we are not ready to fix right now. >=20 > I think you are trying to fix multi-process handling. > This patch is not about multi-process, it only highlight incompatibilitie= s. Yes, the problem has been there for a while. David's patch just made it more visible. We discussing different workarounds for the problem. > > > > > probably better to introduce explicit EAL flag ( --single-process= or so). > > > > > As Thomas and Bruce suggested, if I understood them properly. > > > > > > No I was thinking to maintain the tri-state information: > > > - secondary is possible > > > - secondary is attached > > > - secondary is forbidden > > > > Ok, then I misunderstood you. > > > > > Asking the user to use an option to forbid attaching a secondary proc= ess > > > is the same as telling him it is forbidden. > > > > I don't think it is the same. > > On a live and complex system user can't always predict will the primary= proc > > use dynamic lcore and if it will at what particular moment. > > Same for secondary process launching - user might never start it, > > might start it straight after the primary one, > > or might be after several hours. >=20 > I don't see the difference. > An app which register external threads is not compatible > with multi-process. It needs to be clear. > If the user tries to do it anyway, there can be some error, OK. Copying from other mail thread: Imagine the situation - there is a primary proc (supposed to run forever) that does rte_thread_register/rte_thread_unregister during its lifetime. Plus from time to time user runs some secondary process to collect stats/de= bug the primary one (proc-info or so). Now behaviour of such system will be non-deterministic: In some runs primary proc will do rte_thread_register() first, and then secondary proc will be never able to attach. In other cases - secondary will win the race, and then for primary=20 eal_lcore_non_eal_allocate() will always fail. Which means different behaviour between runs, varying performance, etc. > > > The error log is enough in my opinion. > > > > I think it is better than nothing, but probably not the best one. > > Apart from possible non-consistent behaviour, it is quite restrictive: > > dynamic lcore_id wouldn't be available on any DPDK MP deployment. > > Which is a pity - I think it is a cool and useful feature. >=20 > So you are asking to extend the feature. I am asking for solution that would guarantee deterministic behaviour to th= e user. If dynamic lcores and MP support need to be mutually exclusive, then there should be a clean way for the user to *always* enable one and disable the other. "--proc-type=3Dstandalone" will at least guarantee such consistent behaviou= r between runs: secondary proc will always fail to attach and eal_lcore_non_eal_allocate()= will always succeed (as long as there are free lcore_ids off-course). Though I think even better would be not to make them mutually exclusive, but instead let user to split lcore_id space accordingly. Let me list the options currently under discussion: a) New EAL parameter '--lcore-allow=3D...' Explicit EAL parameter to enable dyn-lcore=3DY Consistent behaviour between runs=3DY DYN-lcores/MP-support are mutually exclusive=3DN=20 b) Extend '--proc-type' EAL parameter with new 'standalone' type Explicit EAL parameter to enable dyn-lcore =3DY Consistent behaviour between runs=3DY Dyn lcores/MP-support are mutually exclusive=3DY c) dynamic allow/forbid dynamic-lcore/MP support Explicit EAL parameter=3DN Consistent behaviour between runs=3DN Dyn lcores/MP-support are mutually exclusive=3DY My preference list (from top to bottom): a, b, c. > Honestly, I'm not a fan of multi-process, > so I would not push any feature for it. Me too, but as we can't drop it, we probably have no choice but to live with it.=20 >=20 > If we don't add any new option now, and restrict MP handling > to error messages, it would not prevent from extending > in future, right? It shouldn't I think. Though what is the urgency to push this feature without having an agreement first? =20 >=20 > > What do you guys think about different approach: > > introduce new optional EAL parameter to restrict lcore_id > > values available for the process. > > > > #let say to start primary proc that can use lcore_id=3D[0-99] only: > > dpdk_primary --lcore-allow=3D0-99 ... --file-prefix=3Dxz1 > > > > #to start secondary one for it with allowed lcore_id=3D[100-109]: > > dpdk_secondary --lcore-allow=3D100-109 ... --file-prefix=3Dxz1 --proc-t= ype=3Dsecondary > > > > It is still a workaround, but that way we don't need to > > add any new limitations for dynamic lcores and secondary process usage. > > Now it is up to user to decide would multiple-process use the same shar= ed data > > and if so - split lcore_id space properly among them > > (same as he has to do now with static lcores). >=20 > Isn't it pushing too much to the user? User has to do the similar thing with static lcores right now. =20 >=20 > > > > A EAL flag is a stable API from the start, as there is nothing > > > > describing how we can remove one. > > > > So a new EAL flag for an experimental API/feature seems contradicto= ry. > > > > > > > > Going with a new features status API... I think it is beyond this s= eries. > > > > > > > > Thomas seems to suggest an automatic resolution when features confl= ict > > > > happens.. ? > > > > > > I suggest allowing the maximum and raise an error when usage conflict= s. > > > It seems this is what you did in v4. > > > > > > > I'll send the v4, let's discuss it there if you want. >=20 >=20