From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D9D0CA00C2; Thu, 7 Apr 2022 16:04:59 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7AC794068B; Thu, 7 Apr 2022 16:04:59 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 7257D40689 for ; Thu, 7 Apr 2022 16:04:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649340297; x=1680876297; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=G2wXxoBpa1+orv6HXhFr6QaDXxpMdPsJfFE31QIi2iI=; b=iTT2w5UFsq7MbulvckI2bGzkmoUs6PaG8p4tS5CEprFzXQacdsIqaV6X oANCgo95GZriu9849vQZDqHtq6VQiTx8g/PoQ9AxRuyvqMI/mmKmLRrRd fKWiyHn2AjiqrapLOOi+FPF+TrT5xo0olv1S7DghANfsX3Ms1U3JnlLO7 poFx1GihyOSU9rvX719gEXMy0BDjOm4t67DXVC9hJJ5hnudutFzMIvE8C Xxarrl11iacm6JHXdLrPC+hoPp62JXQ/ozvIGRDhtkdWjEq8plHD3IKzj l9UeSbzMFdTMZOZVh2mH3DFSzFMd9PzMEmwNJTUJTADGDryzaIHesG+ZI w==; X-IronPort-AV: E=McAfee;i="6400,9594,10309"; a="261022667" X-IronPort-AV: E=Sophos;i="5.90,242,1643702400"; d="scan'208";a="261022667" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Apr 2022 07:04:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,242,1643702400"; d="scan'208";a="588819616" Received: from orsmsx604.amr.corp.intel.com ([10.22.229.17]) by orsmga001.jf.intel.com with ESMTP; 07 Apr 2022 07:04:30 -0700 Received: from orsmsx601.amr.corp.intel.com (10.22.229.14) by ORSMSX604.amr.corp.intel.com (10.22.229.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Thu, 7 Apr 2022 07:04:30 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27 via Frontend Transport; Thu, 7 Apr 2022 07:04:30 -0700 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (104.47.57.173) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.27; Thu, 7 Apr 2022 07:04:30 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ly81vkBwKU5KXBO3P3i6Q++Ye0JdDbLNhA36WyKAy7ehZapMO5VWKUlBh4UoieNxBVVyovp7ix0ywU7wDi/rL87zM+H9CRf2o4fvj20I+KSkQWNxAYqdPgQKKq9Oe+CAYZtxbaCvMheVi9PcxdXVSqBaLUYdIXjz55ALxtM+G8+1TppvNVq3AhVZuVamRLDYS2NfYEUiJZkorT/H3+YvxiD4CCkhlkEYEQ9jQ5Q8PARqr2OsLq3r6E4OQUTZnEKrh/q8hXcBURGtue2I4xBLQKBYVLoHASAHxDqComFSG+uos+qBovVEaveY2HMA70d4P+ErPZ0Af0S5tFIwwjCRRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+pz7Cgw5dj/6HvJFSPKTc936OcUf7wT9ChRgpEAM3dk=; b=STPFqAeV7aLCUtOVrngJiayw1+6oAWqvEVGrLkB+jIkU6q5Zyl7VL1gCVnGOH1dFV3rPw/bT4UIbDxrIivTMiuKBfWZ3TiHXCkjep2+2dzQ1kgyuUkYrjSgGh4eA3ycCXrJv3CMKNO4CD9xQrd87TP80Q95IoLdItOEjcfv+72i4qSsfEs8FnQKPzPJqn+26GmRhOF86t7NkWvP6swCQW4hbOdogW8PB4zGt1mBAFPCjIF42ulOUK8tvnEyjONAF5bwDwjwb0AH2qhj4KqPyr12CbGhtc6SibZLDc+mXaJuwHzv9M6Puh73fWDnENQshGUt0XlbRiVm3h9Ol2bw8AQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN0PR11MB5712.namprd11.prod.outlook.com (2603:10b6:408:160::17) by DM6PR11MB3433.namprd11.prod.outlook.com (2603:10b6:5:63::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5144.22; Thu, 7 Apr 2022 14:04:28 +0000 Received: from BN0PR11MB5712.namprd11.prod.outlook.com ([fe80::28cf:55af:8c4b:d4d9]) by BN0PR11MB5712.namprd11.prod.outlook.com ([fe80::28cf:55af:8c4b:d4d9%3]) with mapi id 15.20.5144.022; Thu, 7 Apr 2022 14:04:27 +0000 From: "Van Haaren, Harry" To: =?iso-8859-1?Q?Morten_Br=F8rup?= , "Richardson, Bruce" CC: Maxime Coquelin , "Pai G, Sunil" , "Stokes, Ian" , "Hu, Jiayu" , "Ferriter, Cian" , "Ilya Maximets" , "ovs-dev@openvswitch.org" , "dev@dpdk.org" , "Mcnamara, John" , "O'Driscoll, Tim" , "Finn, Emma" Subject: RE: OVS DPDK DMA-Dev library/Design Discussion Thread-Topic: OVS DPDK DMA-Dev library/Design Discussion Thread-Index: Adg/jDNGcC8G4wWtSxeVfUOuAS3Y6wACLuFQAM6falAAJoZBIAAA0srQAAK2F/AABHBTAAAAvWSAAACgo4AAAF2UgAAAVPwgAAOflKAAGxPQIAGemZWw Date: Thu, 7 Apr 2022 14:04:27 +0000 Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D86F7C@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86F7D@smartserver.smartshare.dk> <7968dd0b-8647-8d7b-786f-dc876bcbf3f0@redhat.com> <98CBD80474FA8B44BF855DF32C47DC35D86F7E@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86F80@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86F82@smartserver.smartshare.dk> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.401.20 authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e2569304-3f17-49c9-87bb-08da189f8790 x-ms-traffictypediagnostic: DM6PR11MB3433:EE_ x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: VHyM16iibX+GhBBUePlK43NVIIxGfKJoqKnqw/WuF/qyXd4ztdxPyuGwmY1wann4ClfjGE4GuuQpj92+rK4wzJgmAWWyfGtIUg9hCSn/5HaHv1OoktY26SWYcWOEJvYSb0Oc7jnwbgv7gLMTeP/WvZYw9ZWYBHlmPHcD6j6HgwyYF95XcZYXbWQzKzYzIjy6j6Q95GdKELjWbGVg53S0dym/QW6eYj9pG54zHbLL3DRJ4HHfenbjVrGbpEf3ItnsPcND3MzNVR/nMfoIcClv8+F9wD2qOGCkyowcW5n8RwHQ1C/5+RZZBFJ+x15zXRxhnD8sWCzqEjVUR71HyXTaoixUvix1VfIe8JwuzVAECpBmo6ceHHX2R/3mrA/+yzlB5LxF3nG5Qu7IqNyQmtWboq6wnIRsfShOVsWh/kKmkmL/jUm/Hr27AHGiqC6LNWoYU7lkJyTzIMMFny/ybhS87+FpKQ+Mh+Sjwo2bIuQ8leRyVaJdtIDgKL5RID3nxMOlBeL1EamlFTsrLMy1O4bGxpU/DlCuacStMQvtOm7bTxTtAV/UC0AOWJd+Aue6gPHQB/gKGkBg2Tn4kYhCVz3hw5rWbJb0T7UTboHeAmDVEwqEhZEQ/E6416eo7N4zGktQOWO/wpRfG5CKnjKyCnPwxa1uofl5c/Le+fwuf5wvYkHVsaZIcwxriRrnhWjOptDjEHX29ZKIKY9SYfbsTD0smv6Gyv5LJXYgRPu0TK0KVqI8AbGkuPDLuXraA0KuYxgJYXPiHmq5FLassv/DzF+WVvfi4Hhrp67bHMdxJXSwHrM= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN0PR11MB5712.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(38070700005)(5660300002)(66574015)(86362001)(26005)(186003)(107886003)(54906003)(6636002)(316002)(30864003)(52536014)(8936002)(82960400001)(38100700002)(508600001)(2906002)(66556008)(66476007)(66446008)(64756008)(66946007)(122000001)(4326008)(76116006)(8676002)(7696005)(9686003)(53546011)(966005)(6506007)(55016003)(33656002)(83380400001)(110136005)(71200400001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?wCGg7KuLLjmqk2XsJRbnc/5nGuon8GodNzo1neubrGLU9Xt8eGzLqgDzgn?= =?iso-8859-1?Q?OjqAbutY1mp05ZJzI9SCMhQ/Oa1AtYmfcDCMaq4Gz5KtCy8ndS0uHocB4h?= =?iso-8859-1?Q?XEcmHupT/TM8pHxUN5G/grc/eFecnM0dySAyt0ceuDAqWiRjcdTdOoujlf?= =?iso-8859-1?Q?v06/vdov3griJTgIL+8cWmQjSMAMPMaOZgHFIfoS+uHTt0ukLMi0hR1HcV?= =?iso-8859-1?Q?PtB1PS9SatdQzJ/J5VEGoC8IgY4ujgYap9E6sGX5GKX35ev9Ze+9QEO7JZ?= =?iso-8859-1?Q?CiwV4JrTs/bkTbqIycDnZRj6B/A/wqhk7MQVImQ4ywVtVC09fCmJAcE0dj?= =?iso-8859-1?Q?/B3ilUWSQxeum6eJyDz1b4HHgZNXT2cikK6BkT4ZdpU9Qdabj+89xQsXTs?= =?iso-8859-1?Q?wdqt86l8gs/xP2Iw3wW1GVje/PROIYG91yP26Ana5PL8kUfjc14V+pRh5M?= =?iso-8859-1?Q?C7Qu3mjMAEFtqvcm3dXcP0wdqn/uql+a4zFk8ohQcivNiMUFR9P+kbqt4r?= =?iso-8859-1?Q?gAS3jlTuQflChh44EkoftFwXGJu7owRWzsPcD+/iuDxzQHsTc/QCnIBn2w?= =?iso-8859-1?Q?/Nf3V/sKdOZ+M8cOqE4zxfKLpEPdPWu2xOylRY7lmSXuZt5PGtjszbGdMT?= =?iso-8859-1?Q?jd3ByVWIg4pwVGXyWbXuiJwTNo6r1KbqXLRX2db1BDmTvw7nmb+gqpaqTH?= =?iso-8859-1?Q?034WCtXqMG++eCM10DOc9Q1NWkWN42ltuecpqcUtQ5xPY+MBgBS8Kee+UD?= =?iso-8859-1?Q?bGp9QTpKjFflXJPlGZKZqUBEXb9KmGr80C2rqOQxDfnTYGKwUnpP7tu/lx?= =?iso-8859-1?Q?03zJD8FHLuw+J4gHRY8/VOCCD6UgRhqW0AOPuBHD2E7WXhUSNKVKwjredW?= =?iso-8859-1?Q?eZqMBZxdRIGT3/ZHQ8OQI1M808g7z49mueCmeIsLojVmGjleUY5YxVjbpB?= =?iso-8859-1?Q?keDe0WKUmDFQXelZtjJio/GHPeJ4NTcP5GdCtd8PQGoLqICLuPCNUECzVU?= =?iso-8859-1?Q?/DKaFR7SCJOeL7fhcomHqJtOhafbp6DbZ+ui+7vUHT0uxDVl+wuCy66iXl?= =?iso-8859-1?Q?G/uhASngn6UG4jRRTkxvjmGBFJlSyxLwUHCykCGUd6K4JcLd37UmlA/E12?= =?iso-8859-1?Q?Qu6/1tJCxPxhxTgtfBGWslHrX3ly/he/OhCL+QXiTdZV+uqtGvA7AL5qPN?= =?iso-8859-1?Q?wrK0+nVfmF3/MuZQiH4tgaZEgX0cHiVPjpWvJjpnLnGZDPrbberjrpOw1f?= =?iso-8859-1?Q?LT0X950ehZKg6Hzy4m3t1KD4lTxgZC3GeLihQ/uJ7dvn7GX8FhjC9nEPwP?= =?iso-8859-1?Q?n0YTOo6gUMtbFFUl8JgyCEpflcPIlOivZ3/cvLxpn4vrX5aUXCSDZQpXa4?= =?iso-8859-1?Q?gxJFMPrSDtab4Uqt75o1U13iI/kL7MwG61A/Vxit9geq8ZNO+2mwrzd4xq?= =?iso-8859-1?Q?/iLlXSrLqIuQyrQbo1Ual/VjbF3rj+Zq1OzqN93i6DJqrC8DQyBNYe38G/?= =?iso-8859-1?Q?VPzOCudCLjmy2S9E22MDvcKjPBNgg8QkKEu/kkXs4VL9b9iDvUbqC9dU6t?= =?iso-8859-1?Q?0Fbh5XKlpfsF1/X2sQUDsyMlDI655TrR8F19W8lbflIgfMM/855/uFUTPd?= =?iso-8859-1?Q?nH7ENPNYhl1RaNYmP1oo8Li6lg3EtHGTVqmLH7rnjgDHyiS2f7IQNRmIOQ?= =?iso-8859-1?Q?f0+t6rB4oQkq9IylADt+pAbYWr1Ow3Dh6Ux0vORL4rNULW34zboQ2TftF7?= =?iso-8859-1?Q?UoT6l8Dr+3ctjbz7lEFw9l3vRaDjbL2bvbfRuXDAJsirrbcueREKwF79hv?= =?iso-8859-1?Q?YyKScGh48Lzqh0SQQmd7GPWJD09PhHQ=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN0PR11MB5712.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: e2569304-3f17-49c9-87bb-08da189f8790 X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Apr 2022 14:04:27.8047 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Q7ejCWvMeY6dDLwPm75CdojFITVlMeiojmzYwdY7fe2AxEWfIlcCPSVL/KFaydLlZwTTjeDPIiNgXECmMvTMjGiOxweWkvQvkBhz6fEHxQs= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR11MB3433 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi OVS & DPDK, Maintainers & Community, Top posting overview of discussion as replies to thread become slower: perhaps it is a good time to review and plan for next steps? >From my perspective, it those most vocal in the thread seem to be in favour= of the clean rx/tx split ("defer work"), with the tradeoff that the application must be = aware of handling the async DMA completions. If there are any concerns opposing upstreaming o= f this method, please indicate this promptly, and we can continue technical discussions he= re now. In absence of continued technical discussion here, I suggest Sunil and Ian = collaborate on getting the OVS Defer-work approach, and DPDK VHost Async patchsets available on Gi= tHub for easier consumption and future development (as suggested in slides presented on las= t call). Regards, -Harry No inline-replies below; message just for context. > -----Original Message----- > From: Van Haaren, Harry > Sent: Wednesday, March 30, 2022 10:02 AM > To: Morten Br=F8rup ; Richardson, Bruce > > Cc: Maxime Coquelin ; Pai G, Sunil > ; Stokes, Ian ; Hu, Jiayu > ; Ferriter, Cian ; Ilya Maxi= mets > ; ovs-dev@openvswitch.org; dev@dpdk.org; Mcnamara, > John ; O'Driscoll, Tim = ; > Finn, Emma > Subject: RE: OVS DPDK DMA-Dev library/Design Discussion >=20 > > -----Original Message----- > > From: Morten Br=F8rup > > Sent: Tuesday, March 29, 2022 8:59 PM > > To: Van Haaren, Harry ; Richardson, Bruce > > > > Cc: Maxime Coquelin ; Pai G, Sunil > > ; Stokes, Ian ; Hu, Jiayu > > ; Ferriter, Cian ; Ilya Ma= ximets > > ; ovs-dev@openvswitch.org; dev@dpdk.org; Mcnamara, > John > > ; O'Driscoll, Tim ; F= inn, > > Emma > > Subject: RE: OVS DPDK DMA-Dev library/Design Discussion > > > > > From: Van Haaren, Harry [mailto:harry.van.haaren@intel.com] > > > Sent: Tuesday, 29 March 2022 19.46 > > > > > > > From: Morten Br=F8rup > > > > Sent: Tuesday, March 29, 2022 6:14 PM > > > > > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > > Sent: Tuesday, 29 March 2022 19.03 > > > > > > > > > > On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Br=F8rup wrote: > > > > > > > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > > > > > > > Sent: Tuesday, 29 March 2022 18.24 > > > > > > > > > > > > > > Hi Morten, > > > > > > > > > > > > > > On 3/29/22 16:44, Morten Br=F8rup wrote: > > > > > > > >> From: Van Haaren, Harry [mailto:harry.van.haaren@intel.com= ] > > > > > > > >> Sent: Tuesday, 29 March 2022 15.02 > > > > > > > >> > > > > > > > >>> From: Morten Br=F8rup > > > > > > > >>> Sent: Tuesday, March 29, 2022 1:51 PM > > > > > > > >>> > > > > > > > >>> Having thought more about it, I think that a completely > > > > > different > > > > > > > architectural approach is required: > > > > > > > >>> > > > > > > > >>> Many of the DPDK Ethernet PMDs implement a variety of RX > > > and TX > > > > > > > packet burst functions, each optimized for different CPU vect= or > > > > > > > instruction sets. The availability of a DMA engine should be > > > > > treated > > > > > > > the same way. So I suggest that PMDs copying packet contents, > > > e.g. > > > > > > > memif, pcap, vmxnet3, should implement DMA optimized RX and T= X > > > > > packet > > > > > > > burst functions. > > > > > > > >>> > > > > > > > >>> Similarly for the DPDK vhost library. > > > > > > > >>> > > > > > > > >>> In such an architecture, it would be the application's jo= b > > > to > > > > > > > allocate DMA channels and assign them to the specific PMDs th= at > > > > > should > > > > > > > use them. But the actual use of the DMA channels would move > > > down > > > > > below > > > > > > > the application and into the DPDK PMDs and libraries. > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> Med venlig hilsen / Kind regards, > > > > > > > >>> -Morten Br=F8rup > > > > > > > >> > > > > > > > >> Hi Morten, > > > > > > > >> > > > > > > > >> That's *exactly* how this architecture is designed & > > > > > implemented. > > > > > > > >> 1. The DMA configuration and initialization is up to the > > > > > application > > > > > > > (OVS). > > > > > > > >> 2. The VHost library is passed the DMA-dev ID, and its > > > new > > > > > async > > > > > > > rx/tx APIs, and uses the DMA device to accelerate the copy. > > > > > > > >> > > > > > > > >> Looking forward to talking on the call that just started. > > > > > Regards, - > > > > > > > Harry > > > > > > > >> > > > > > > > > > > > > > > > > OK, thanks - as I said on the call, I haven't looked at the > > > > > patches. > > > > > > > > > > > > > > > > Then, I suppose that the TX completions can be handled in t= he > > > TX > > > > > > > function, and the RX completions can be handled in the RX > > > function, > > > > > > > just like the Ethdev PMDs handle packet descriptors: > > > > > > > > > > > > > > > > TX_Burst(tx_packet_array): > > > > > > > > 1. Clean up descriptors processed by the NIC chip. --> > > > Process > > > > > TX > > > > > > > DMA channel completions. (Effectively, the 2nd pipeline stage= .) > > > > > > > > 2. Pass on the tx_packet_array to the NIC chip > > > descriptors. -- > > > > > > Pass > > > > > > > on the tx_packet_array to the TX DMA channel. (Effectively, t= he > > > 1st > > > > > > > pipeline stage.) > > > > > > > > > > > > > > The problem is Tx function might not be called again, so > > > enqueued > > > > > > > packets in 2. may never be completed from a Virtio point of > > > view. > > > > > IOW, > > > > > > > the packets will be copied to the Virtio descriptors buffers, > > > but > > > > > the > > > > > > > descriptors will not be made available to the Virtio driver. > > > > > > > > > > > > In that case, the application needs to call TX_Burst() > > > periodically > > > > > with an empty array, for completion purposes. > > > > > > This is what the "defer work" does at the OVS thread-level, but inste= ad > > > of > > > "brute-forcing" and *always* making the call, the defer work concept > > > tracks > > > *when* there is outstanding work (DMA copies) to be completed > > > ("deferred work") > > > and calls the generic completion function at that point. > > > > > > So "defer work" is generic infrastructure at the OVS thread level to > > > handle > > > work that needs to be done "later", e.g. DMA completion handling. > > > > > > > > > > > > Or some sort of TX_Keepalive() function can be added to the DPD= K > > > > > library, to handle DMA completion. It might even handle multiple > > > DMA > > > > > channels, if convenient - and if possible without locking or othe= r > > > > > weird complexity. > > > > > > That's exactly how it is done, the VHost library has a new API added, > > > which allows > > > for handling completions. And in the "Netdev layer" (~OVS ethdev > > > abstraction) > > > we add a function to allow the OVS thread to do those completions in = a > > > new > > > Netdev-abstraction API called "async_process" where the completions c= an > > > be checked. > > > > > > The only method to abstract them is to "hide" them somewhere that wil= l > > > always be > > > polled, e.g. an ethdev port's RX function. Both V3 and V4 approaches > > > use this method. > > > This allows "completions" to be transparent to the app, at the tradeo= ff > > > to having bad > > > separation of concerns as Rx and Tx are now tied-together. > > > > > > The point is, the Application layer must *somehow * handle of > > > completions. > > > So fundamentally there are 2 options for the Application level: > > > > > > A) Make the application periodically call a "handle completions" > > > function > > > A1) Defer work, call when needed, and track "needed" at app > > > layer, and calling into vhost txq complete as required. > > > Elegant in that "no work" means "no cycles spent" on > > > checking DMA completions. > > > A2) Brute-force-always-call, and pay some overhead when not > > > required. > > > Cycle-cost in "no work" scenarios. Depending on # of > > > vhost queues, this adds up as polling required *per vhost txq*. > > > Also note that "checking DMA completions" means taking a > > > virtq-lock, so this "brute-force" can needlessly increase x-thread > > > contention! > > > > A side note: I don't see why locking is required to test for DMA comple= tions. > > rte_dma_vchan_status() is lockless, e.g.: > > > https://elixir.bootlin.com/dpdk/latest/source/drivers/dma/ioat/ioat_dmade= v.c#L > 56 > > 0 >=20 > Correct, DMA-dev is "ethdev like"; each DMA-id can be used in a lockfree = manner > from a single thread. >=20 > The locks I refer to are at the OVS-netdev level, as virtq's are shared a= cross OVS's > dataplane threads. > So the "M to N" comes from M dataplane threads to N virtqs, hence requiri= ng > some locking. >=20 >=20 > > > B) Hide completions and live with the complexity/architectural > > > sacrifice of mixed-RxTx. > > > Various downsides here in my opinion, see the slide deck > > > presented earlier today for a summary. > > > > > > In my opinion, A1 is the most elegant solution, as it has a clean > > > separation of concerns, does not cause > > > avoidable contention on virtq locks, and spends no cycles when there = is > > > no completion work to do. > > > > > > > Thank you for elaborating, Harry. >=20 > Thanks for part-taking in the discussion & providing your insight! >=20 > > I strongly oppose against hiding any part of TX processing in an RX fun= ction. It > is just > > wrong in so many ways! > > > > I agree that A1 is the most elegant solution. And being the most elegan= t > solution, it > > is probably also the most future proof solution. :-) >=20 > I think so too, yes. >=20 > > I would also like to stress that DMA completion handling belongs in the= DPDK > > library, not in the application. And yes, the application will be requi= red to call > some > > "handle DMA completions" function in the DPDK library. But since the > application > > already knows that it uses DMA, the application should also know that i= t needs > to > > call this extra function - so I consider this requirement perfectly acc= eptable. >=20 > Agree here. >=20 > > I prefer if the DPDK vhost library can hide its inner workings from the > application, > > and just expose the additional "handle completions" function. This also= means > that > > the inner workings can be implemented as "defer work", or by some other > > algorithm. And it can be tweaked and optimized later. >=20 > Yes, the choice in how to call the handle_completions function is Applica= tion > layer. > For OVS we designed Defer Work, V3 and V4. But it is an App level choice,= and > every > application is free to choose its own method. >=20 > > Thinking about the long term perspective, this design pattern is common= for > both > > the vhost library and other DPDK libraries that could benefit from DMA = (e.g. > > vmxnet3 and pcap PMDs), so it could be abstracted into the DMA library = or a > > separate library. But for now, we should focus on the vhost use case, a= nd just > keep > > the long term roadmap for using DMA in mind. >=20 > Totally agree to keep long term roadmap in mind; but I'm not sure we can > refactor > logic out of vhost. When DMA-completions arrive, the virtQ needs to be > updated; > this causes a tight coupling between the DMA completion count, and the vh= ost > library. >=20 > As Ilya raised on the call yesterday, there is an "in_order" requirement = in the > vhost > library, that per virtq the packets are presented to the guest "in order"= of > enqueue. > (To be clear, *not* order of DMA-completion! As Jiayu mentioned, the Vhos= t > library > handles this today by re-ordering the DMA completions.) >=20 >=20 > > Rephrasing what I said on the conference call: This vhost design will b= ecome > the > > common design pattern for using DMA in DPDK libraries. If we get it wro= ng, we > are > > stuck with it. >=20 > Agree, and if we get it right, then we're stuck with it too! :) >=20 >=20 > > > > > > Here is another idea, inspired by a presentation at one of the > > > DPDK > > > > > Userspace conferences. It may be wishful thinking, though: > > > > > > > > > > > > Add an additional transaction to each DMA burst; a special > > > > > transaction containing the memory write operation that makes the > > > > > descriptors available to the Virtio driver. > > > > > > > > > > > > > > > > That is something that can work, so long as the receiver is > > > operating > > > > > in > > > > > polling mode. For cases where virtio interrupts are enabled, you > > > still > > > > > need > > > > > to do a write to the eventfd in the kernel in vhost to signal the > > > > > virtio > > > > > side. That's not something that can be offloaded to a DMA engine, > > > > > sadly, so > > > > > we still need some form of completion call. > > > > > > > > I guess that virtio interrupts is the most widely deployed scenario= , > > > so let's ignore > > > > the DMA TX completion transaction for now - and call it a possible > > > future > > > > optimization for specific use cases. So it seems that some form of > > > completion call > > > > is unavoidable. > > > > > > Agree to leave this aside, there is in theory a potential optimizatio= n, > > > but > > > unlikely to be of large value. > > > > > > > One more thing: When using DMA to pass on packets into a guest, there c= ould > be a > > delay from the DMA completes until the guest is signaled. Is there any = CPU > cache > > hotness regarding the guest's access to the packet data to consider her= e? I.e. if > we > > wait signaling the guest, the packet data may get cold. >=20 > Interesting question; we can likely spawn a new thread around this topic! > In short, it depends on how/where the DMA hardware writes the copy. >=20 > With technologies like DDIO, the "dest" part of the copy will be in LLC. = The core > reading the > dest data will benefit from the LLC locality (instead of snooping it from= a remote > core's L1/L2). >=20 > Delays in notifying the guest could result in LLC capacity eviction, yes. > The application layer decides how often/promptly to check for completions= , > and notify the guest of them. Calling the function more often will result= in less > delay in that portion of the pipeline. >=20 > Overall, there are caching benefits with DMA acceleration, and the applic= ation > can control > the latency introduced between dma-completion done in HW, and Guest vring > update.