From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B3819A0509; Wed, 30 Mar 2022 11:01:38 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A5CC340685; Wed, 30 Mar 2022 11:01:38 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by mails.dpdk.org (Postfix) with ESMTP id D35E94013F for ; Wed, 30 Mar 2022 11:01:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1648630897; x=1680166897; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=PCcScJDP6eysO0oyrfk1WkxTNLbR7w3c6yvoJDqEKpw=; b=iWW+kGAUEdxIT5S8h3Df3tPd6iJ76679t3ZEC7IwHigtRKE6kX0DTac0 z9A99bffgXQ6+U7CEdosl7Z+EJqkH4Koh0qC9g2CgtrOrePeRpjrAnpAU GiJ93koftyGJN4qbjDahvPvumm3fj3xB8Eq8MMMM0GosKib7BzZkqg95/ t/8LZgsEgHgGVMk65L25luRMEPzqlusHPBYA0p9nd5PEn9SiG1XpmEgaM H2mRsPfa681vht8sm7pxh70PqT/KFJTamVmRfcXqgh+F4S2Xuj1ht8Mr+ NqJgx1OOpwE/QgQH9XZXpEqifT7zGs/tv+MlvjQnrnrObJKdDLSMA8NsA w==; X-IronPort-AV: E=McAfee;i="6200,9189,10301"; a="284396543" X-IronPort-AV: E=Sophos;i="5.90,222,1643702400"; d="scan'208";a="284396543" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Mar 2022 02:01:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,222,1643702400"; d="scan'208";a="605347801" Received: from orsmsx605.amr.corp.intel.com ([10.22.229.18]) by fmsmga008.fm.intel.com with ESMTP; 30 Mar 2022 02:01:35 -0700 Received: from orsmsx611.amr.corp.intel.com (10.22.229.24) by ORSMSX605.amr.corp.intel.com (10.22.229.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Wed, 30 Mar 2022 02:01:34 -0700 Received: from orsmsx604.amr.corp.intel.com (10.22.229.17) by ORSMSX611.amr.corp.intel.com (10.22.229.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Wed, 30 Mar 2022 02:01:34 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx604.amr.corp.intel.com (10.22.229.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27 via Frontend Transport; Wed, 30 Mar 2022 02:01:34 -0700 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.109) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.21; Wed, 30 Mar 2022 02:01:34 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZLjkfsnV3BHhZA8QtvjJ2Skbuc3qLx3D3P+jd5LgwPJFUzCer62No7bJG/QpzBnyrWTngKDC3byhBjxi1VF3VgY9CeSBcLwigFPOquFUVaRMdsoEf6rCHpx5y4NhbF4vwRL6BGWvu82w7yiQxERSGlmj3eX2KI4BWRNCVbxK5Mvv9CyB7EwFn//2EUwY7FSFkjbl3NHPkrgi593Orz1l335aaesV2rPR7GK3jiyhySd6xtH366NZQXXfj6hREFnFIguRwwrw4TNGI7EFRxpnrVOhy6bRC3mS6i+xUgj638dSCzko37aVpNcC4kxPAxfpQImpMGT4mJnUG6tQYS5JvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Sx5CMnxcsfHeUr/6e6PYO27Da9ftk38CleaTMceDmCU=; b=TH4jle5ZGUc6nVpEBzb7uUV5yveWDYy2ABnn95n4HG62ddtkmKa2aiBkJ8bRJTLNopWOiB+X6fryNmh5j5Ci0nqkZ5Ex6xZhPAcMseGmm5OagcPCpmxLUfkx8xzNpRF9MbKeb8EzpzlznwzNt7WH325Ic6xlW89RToFErAt6l0il/H/s4m8OUK5yTCSAnyZ29xOWPYZhYBL3qCq/S1fx61ZGqtp9rBE/nhjkgp9ECxyHFdvgfMr2dtWt9/+/iK4HTxB+pY+kL+GpHxOPj9C24DOmYkL4G7qTJ8LltzXRYMJVK7Cjb1t1iLQoMRmlJdcyjXJZGAM5Ro9jGfTOnq+yDg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN0PR11MB5712.namprd11.prod.outlook.com (2603:10b6:408:160::17) by DM8PR11MB5607.namprd11.prod.outlook.com (2603:10b6:8:28::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.19; Wed, 30 Mar 2022 09:01:32 +0000 Received: from BN0PR11MB5712.namprd11.prod.outlook.com ([fe80::28cf:55af:8c4b:d4d9]) by BN0PR11MB5712.namprd11.prod.outlook.com ([fe80::28cf:55af:8c4b:d4d9%3]) with mapi id 15.20.5123.020; Wed, 30 Mar 2022 09:01:32 +0000 From: "Van Haaren, Harry" To: =?iso-8859-1?Q?Morten_Br=F8rup?= , "Richardson, Bruce" CC: Maxime Coquelin , "Pai G, Sunil" , "Stokes, Ian" , "Hu, Jiayu" , "Ferriter, Cian" , "Ilya Maximets" , "ovs-dev@openvswitch.org" , "dev@dpdk.org" , "Mcnamara, John" , "O'Driscoll, Tim" , "Finn, Emma" Subject: RE: OVS DPDK DMA-Dev library/Design Discussion Thread-Topic: OVS DPDK DMA-Dev library/Design Discussion Thread-Index: Adg/jDNGcC8G4wWtSxeVfUOuAS3Y6wACLuFQAM6falAAJoZBIAAA0srQAAK2F/AABHBTAAAAvWSAAACgo4AAAF2UgAAAVPwgAAOflKAAGxPQIA== Date: Wed, 30 Mar 2022 09:01:32 +0000 Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D86F7C@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86F7D@smartserver.smartshare.dk> <7968dd0b-8647-8d7b-786f-dc876bcbf3f0@redhat.com> <98CBD80474FA8B44BF855DF32C47DC35D86F7E@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86F80@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86F82@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86F82@smartserver.smartshare.dk> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.401.20 authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9aace109-5c56-4bc1-c98f-08da122be2bd x-ms-traffictypediagnostic: DM8PR11MB5607:EE_ x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: d0Vi+WO5xS9Byhczr4zTXvZvBiC5l6p54UjFWaeGrV8DXmP09xcnZKJtzA3N9XPlPsbqMOb8bUwZXDQD8EsrV8rnzdGn8baZGumnuD6IbNLqcyR/yU1YebK0r9RHFfEWpex7+qwFCRUZAGu3GQ2nK24jDE6wl+EjCiqInUZxyhZbFrWJb+MPBcbPchzD5viiED4hR7SZmewFc3In7bMtqhqYxtKOKFUSeq+XWlaP/ZfSW+RJkwoygT09U7i76KE+TVkP2fD8OsfMRqGg3bwwO9sIApNPbQWUk1h6wlOjEJIbc2TBT9UjZMwobfxtWoelmZvHUgUYYf/TpRqbZIGwEJZJn273xMl24FsAQ6DARbxLkEnxZ4vq/p4RzcPP9TWeDc3c+RrS8wqtZkz5kDkRUxv+aIbCZNczmTKfJUEK9enwjKO57CogK5U3hAyXms0fzxi9DVO0htBHm5BiuDxLCjNz4cIyFM4oXUvPtiyguFbbAtsPuugazGTxOUw9zIKM+3kp5KKRBhq0O5fCOXN8jNwtNXi/ZMBhJha1VazL9UO0Xp1s/p2p21qA6EQLgwhklVE+LC0tpwoXippWWnP4GMvfAujo0poubJB6PHJW18f5/dSAAop12AyptyMmZC07ISdaibEw2cHW74GBtoivK479qV+A0xds3vM2rYHcKZte6yJI2Uiho6kvYG+5AOjnNUQ9WsCLJNo/42sfIRyzy6ASmyXZKkiCC3apVeiEV+IkTq3AUrvMk78q9/e0OAdhcct/BFPGGkCNw/DcWnqUfHiLX1PE0+5XxxCcrT2HcDw= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN0PR11MB5712.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(33656002)(2906002)(4326008)(55016003)(5660300002)(66946007)(76116006)(83380400001)(66556008)(66476007)(66574015)(186003)(64756008)(66446008)(52536014)(8676002)(38100700002)(107886003)(26005)(966005)(508600001)(38070700005)(9686003)(7696005)(86362001)(71200400001)(316002)(6506007)(53546011)(122000001)(54906003)(82960400001)(6636002)(110136005)(30864003)(8936002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?LAISGwbq7T0ENL+20NPWcG5A1Gc46A1mZlEZvuDy8z7yOGn36BiXFiHmDB?= =?iso-8859-1?Q?TMHI2bVGREA/kP68ZsFdMBS4/aTJu5ygHqPg+6WluywCgqP2bkgWzLUv29?= =?iso-8859-1?Q?jtgh4jZz7tUTkbITKY2OVRK4IWT6SGMTM/RnE/UvEx/c5WsVEqjimmv7Y4?= =?iso-8859-1?Q?idRIvELAh3qIELWEtMRuLTzjH0XHNvetnePKf4t3oFvEcIKeWwbDl5qVDV?= =?iso-8859-1?Q?kCWAC0a2cwaFx0B0jbYokcGS3wRYE/AO0bV63b8R0FsPeiulHlv70Q/t6+?= =?iso-8859-1?Q?W46a6M6PjnZ2cZ+DzWFp3m4UGMkGmxfcc4x1M6e6tHG5KyhGda0jEe030A?= =?iso-8859-1?Q?K2r08XVz89xTOlDb18b9iROPivyNAM1w1R9x44GCl82eCfH2Tz7gE7w2yk?= =?iso-8859-1?Q?Rm6boO1jMBat5UKMuZPDYgKBpR8H5Fk3O/g5+sQ1An5wwNKtDMYUtKlBAQ?= =?iso-8859-1?Q?suIdLP/nrD+URqEyjyAiVuI9WRRbQXnvcPG9UpfF2VF0Pby+lAymwNPVch?= =?iso-8859-1?Q?L68FLi3YFsdwHXqF3WEo0K1Np8wQuwdeYc8Imr4y5YURxVF58GqyaKGMPp?= =?iso-8859-1?Q?izmZnOZUx/Smb0cdTKuzCbG0G4nfdG7htphcDdvM4xedeiuIYhuNszJWdN?= =?iso-8859-1?Q?BA5KEO1Sjlxxhk51iDt3yE7wHyYW8L5Wj4IHaYYm1EDbYf6Vj5BINpS2Z/?= =?iso-8859-1?Q?GSxiyNEG3clG0kcOaAsvbAfw1kVI4Rlyzz055MN6UfHU574SAMeIRs0AhM?= =?iso-8859-1?Q?74oh+A6JRcHQ5xvsmKF+qel5M7gwlQ2eIyW3MOMB5SRpyspxAWtu1TpIk5?= =?iso-8859-1?Q?pc5X546KATJUXNwEX3f+1FXPHXwtfAnVjVyrj3yBdlJdy2de+/fFhekvTK?= =?iso-8859-1?Q?mV9mmYOiNSTGUaIx0nrXktIzDwi6QYdvllodfQFEKwcvNYui2Z+fCwHFUo?= =?iso-8859-1?Q?CiuLA/sG2STy6Q86wP6MmWAW9VpX6PXwq/yK2SRgd6oo5cntJjC1z7G2Lr?= =?iso-8859-1?Q?ZSfxbcec/BAwe/wJJU6KlX137WAsH4uOun4FVCC9tfouGLoWhWJRbgW4Ls?= =?iso-8859-1?Q?UvsCz3j9RJtFdANwQ9VTUWeokNM9mVDOVNZHIMxjXQHwljzUwSiW7jrxtP?= =?iso-8859-1?Q?FISPQja54Krg0yG8Tg6/mJEfR0NIjt3Lux9nXktmol2JAq5FANQY4lN2/1?= =?iso-8859-1?Q?4PTmWDxxtL0RkaBSLenvaWOrXLrubCUstuBIZ6vdffNWkFrXhP/sLj9Wja?= =?iso-8859-1?Q?i1lU4x/Fci4Ie6Zn1gBpOKxSw48nqRLXiHj4BdsYAMuDU1n+r3/UnEX35j?= =?iso-8859-1?Q?Gp6nAE/QT0GWd7GrlDXErQ8IVo7IqnuRpPawczl5Lv1PwVGGVbaR/QZkUG?= =?iso-8859-1?Q?NblL9zH9TPobjQzoeOcHtdpMdbWhpDjUSAHyv+X5it7dJQgBSvNTnslZ9l?= =?iso-8859-1?Q?IHKaM9otsz+TzdY6tjWmBkjgnon4Z9dGpqv1n0jQe+9b0yey8rrBGRiSIU?= =?iso-8859-1?Q?wTPdpjkwWWTlbczQDVTvWvaz9ny+wCAjMCNWe+f7fel/tdYUfl/GszGovQ?= =?iso-8859-1?Q?FdCFDLoQD8d0RzCWqOF7hMvSEsOpyDVmEubJkYP/6ZssjRHj5QtK1ytsyN?= =?iso-8859-1?Q?o5qsMNLx9Fly8Np0kK8NEsmLfoui6FrU3O6XsUzjLo7BoK4YiKfwaUP6bg?= =?iso-8859-1?Q?gXzFERoXlAVxxJozdd5tVI86i0PywBB8H4zF4AfrvJ2dpE5zTdZYV8A06q?= =?iso-8859-1?Q?9+rcQzCExdJPOSm5XSiy2sNH6XNGfwkqdFGiYoeyv00BEa4MefrzQfCxv8?= =?iso-8859-1?Q?Z39R+lUSiCQdD5snf3PuIi2eRhciuds=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN0PR11MB5712.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9aace109-5c56-4bc1-c98f-08da122be2bd X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Mar 2022 09:01:32.1922 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: RcEH6La/qj180LZv3WMBa6pGNCrLj6AZMXIPCi4NfbnTlemAHPBezq1d+zTnaqACvS0/UzoJoSoI1aNfXhhEhlWKEhHU7OLEQ8fCv5rN4e8= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8PR11MB5607 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Morten Br=F8rup > Sent: Tuesday, March 29, 2022 8:59 PM > To: Van Haaren, Harry ; Richardson, Bruce > > Cc: Maxime Coquelin ; Pai G, Sunil > ; Stokes, Ian ; Hu, Jiayu > ; Ferriter, Cian ; Ilya Maxi= mets > ; ovs-dev@openvswitch.org; dev@dpdk.org; Mcnamara, Jo= hn > ; O'Driscoll, Tim ; Fin= n, > Emma > Subject: RE: OVS DPDK DMA-Dev library/Design Discussion >=20 > > From: Van Haaren, Harry [mailto:harry.van.haaren@intel.com] > > Sent: Tuesday, 29 March 2022 19.46 > > > > > From: Morten Br=F8rup > > > Sent: Tuesday, March 29, 2022 6:14 PM > > > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > Sent: Tuesday, 29 March 2022 19.03 > > > > > > > > On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Br=F8rup wrote: > > > > > > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > > > > > > Sent: Tuesday, 29 March 2022 18.24 > > > > > > > > > > > > Hi Morten, > > > > > > > > > > > > On 3/29/22 16:44, Morten Br=F8rup wrote: > > > > > > >> From: Van Haaren, Harry [mailto:harry.van.haaren@intel.com] > > > > > > >> Sent: Tuesday, 29 March 2022 15.02 > > > > > > >> > > > > > > >>> From: Morten Br=F8rup > > > > > > >>> Sent: Tuesday, March 29, 2022 1:51 PM > > > > > > >>> > > > > > > >>> Having thought more about it, I think that a completely > > > > different > > > > > > architectural approach is required: > > > > > > >>> > > > > > > >>> Many of the DPDK Ethernet PMDs implement a variety of RX > > and TX > > > > > > packet burst functions, each optimized for different CPU vector > > > > > > instruction sets. The availability of a DMA engine should be > > > > treated > > > > > > the same way. So I suggest that PMDs copying packet contents, > > e.g. > > > > > > memif, pcap, vmxnet3, should implement DMA optimized RX and TX > > > > packet > > > > > > burst functions. > > > > > > >>> > > > > > > >>> Similarly for the DPDK vhost library. > > > > > > >>> > > > > > > >>> In such an architecture, it would be the application's job > > to > > > > > > allocate DMA channels and assign them to the specific PMDs that > > > > should > > > > > > use them. But the actual use of the DMA channels would move > > down > > > > below > > > > > > the application and into the DPDK PMDs and libraries. > > > > > > >>> > > > > > > >>> > > > > > > >>> Med venlig hilsen / Kind regards, > > > > > > >>> -Morten Br=F8rup > > > > > > >> > > > > > > >> Hi Morten, > > > > > > >> > > > > > > >> That's *exactly* how this architecture is designed & > > > > implemented. > > > > > > >> 1. The DMA configuration and initialization is up to the > > > > application > > > > > > (OVS). > > > > > > >> 2. The VHost library is passed the DMA-dev ID, and its > > new > > > > async > > > > > > rx/tx APIs, and uses the DMA device to accelerate the copy. > > > > > > >> > > > > > > >> Looking forward to talking on the call that just started. > > > > Regards, - > > > > > > Harry > > > > > > >> > > > > > > > > > > > > > > OK, thanks - as I said on the call, I haven't looked at the > > > > patches. > > > > > > > > > > > > > > Then, I suppose that the TX completions can be handled in the > > TX > > > > > > function, and the RX completions can be handled in the RX > > function, > > > > > > just like the Ethdev PMDs handle packet descriptors: > > > > > > > > > > > > > > TX_Burst(tx_packet_array): > > > > > > > 1. Clean up descriptors processed by the NIC chip. --> > > Process > > > > TX > > > > > > DMA channel completions. (Effectively, the 2nd pipeline stage.) > > > > > > > 2. Pass on the tx_packet_array to the NIC chip > > descriptors. -- > > > > > Pass > > > > > > on the tx_packet_array to the TX DMA channel. (Effectively, the > > 1st > > > > > > pipeline stage.) > > > > > > > > > > > > The problem is Tx function might not be called again, so > > enqueued > > > > > > packets in 2. may never be completed from a Virtio point of > > view. > > > > IOW, > > > > > > the packets will be copied to the Virtio descriptors buffers, > > but > > > > the > > > > > > descriptors will not be made available to the Virtio driver. > > > > > > > > > > In that case, the application needs to call TX_Burst() > > periodically > > > > with an empty array, for completion purposes. > > > > This is what the "defer work" does at the OVS thread-level, but instead > > of > > "brute-forcing" and *always* making the call, the defer work concept > > tracks > > *when* there is outstanding work (DMA copies) to be completed > > ("deferred work") > > and calls the generic completion function at that point. > > > > So "defer work" is generic infrastructure at the OVS thread level to > > handle > > work that needs to be done "later", e.g. DMA completion handling. > > > > > > > > > Or some sort of TX_Keepalive() function can be added to the DPDK > > > > library, to handle DMA completion. It might even handle multiple > > DMA > > > > channels, if convenient - and if possible without locking or other > > > > weird complexity. > > > > That's exactly how it is done, the VHost library has a new API added, > > which allows > > for handling completions. And in the "Netdev layer" (~OVS ethdev > > abstraction) > > we add a function to allow the OVS thread to do those completions in a > > new > > Netdev-abstraction API called "async_process" where the completions can > > be checked. > > > > The only method to abstract them is to "hide" them somewhere that will > > always be > > polled, e.g. an ethdev port's RX function. Both V3 and V4 approaches > > use this method. > > This allows "completions" to be transparent to the app, at the tradeoff > > to having bad > > separation of concerns as Rx and Tx are now tied-together. > > > > The point is, the Application layer must *somehow * handle of > > completions. > > So fundamentally there are 2 options for the Application level: > > > > A) Make the application periodically call a "handle completions" > > function > > A1) Defer work, call when needed, and track "needed" at app > > layer, and calling into vhost txq complete as required. > > Elegant in that "no work" means "no cycles spent" on > > checking DMA completions. > > A2) Brute-force-always-call, and pay some overhead when not > > required. > > Cycle-cost in "no work" scenarios. Depending on # of > > vhost queues, this adds up as polling required *per vhost txq*. > > Also note that "checking DMA completions" means taking a > > virtq-lock, so this "brute-force" can needlessly increase x-thread > > contention! >=20 > A side note: I don't see why locking is required to test for DMA completi= ons. > rte_dma_vchan_status() is lockless, e.g.: > https://elixir.bootlin.com/dpdk/latest/source/drivers/dma/ioat/ioat_dmade= v.c#L56 > 0 Correct, DMA-dev is "ethdev like"; each DMA-id can be used in a lockfree ma= nner from a single thread. The locks I refer to are at the OVS-netdev level, as virtq's are shared acr= oss OVS's dataplane threads. So the "M to N" comes from M dataplane threads to N virtqs, hence requiring= some locking. > > B) Hide completions and live with the complexity/architectural > > sacrifice of mixed-RxTx. > > Various downsides here in my opinion, see the slide deck > > presented earlier today for a summary. > > > > In my opinion, A1 is the most elegant solution, as it has a clean > > separation of concerns, does not cause > > avoidable contention on virtq locks, and spends no cycles when there is > > no completion work to do. > > >=20 > Thank you for elaborating, Harry. Thanks for part-taking in the discussion & providing your insight! > I strongly oppose against hiding any part of TX processing in an RX funct= ion. It is just > wrong in so many ways! >=20 > I agree that A1 is the most elegant solution. And being the most elegant = solution, it > is probably also the most future proof solution. :-) I think so too, yes. > I would also like to stress that DMA completion handling belongs in the D= PDK > library, not in the application. And yes, the application will be require= d to call some > "handle DMA completions" function in the DPDK library. But since the appl= ication > already knows that it uses DMA, the application should also know that it = needs to > call this extra function - so I consider this requirement perfectly accep= table. Agree here. > I prefer if the DPDK vhost library can hide its inner workings from the a= pplication, > and just expose the additional "handle completions" function. This also m= eans that > the inner workings can be implemented as "defer work", or by some other > algorithm. And it can be tweaked and optimized later. Yes, the choice in how to call the handle_completions function is Applicati= on layer. For OVS we designed Defer Work, V3 and V4. But it is an App level choice, a= nd every application is free to choose its own method.=20 > Thinking about the long term perspective, this design pattern is common f= or both > the vhost library and other DPDK libraries that could benefit from DMA (e= .g. > vmxnet3 and pcap PMDs), so it could be abstracted into the DMA library or= a > separate library. But for now, we should focus on the vhost use case, and= just keep > the long term roadmap for using DMA in mind. Totally agree to keep long term roadmap in mind; but I'm not sure we can re= factor logic out of vhost. When DMA-completions arrive, the virtQ needs to be upda= ted; this causes a tight coupling between the DMA completion count, and the vhos= t library. As Ilya raised on the call yesterday, there is an "in_order" requirement in= the vhost library, that per virtq the packets are presented to the guest "in order" o= f enqueue. (To be clear, *not* order of DMA-completion! As Jiayu mentioned, the Vhost = library handles this today by re-ordering the DMA completions.) > Rephrasing what I said on the conference call: This vhost design will bec= ome the > common design pattern for using DMA in DPDK libraries. If we get it wrong= , we are > stuck with it. Agree, and if we get it right, then we're stuck with it too! :) > > > > > Here is another idea, inspired by a presentation at one of the > > DPDK > > > > Userspace conferences. It may be wishful thinking, though: > > > > > > > > > > Add an additional transaction to each DMA burst; a special > > > > transaction containing the memory write operation that makes the > > > > descriptors available to the Virtio driver. > > > > > > > > > > > > > That is something that can work, so long as the receiver is > > operating > > > > in > > > > polling mode. For cases where virtio interrupts are enabled, you > > still > > > > need > > > > to do a write to the eventfd in the kernel in vhost to signal the > > > > virtio > > > > side. That's not something that can be offloaded to a DMA engine, > > > > sadly, so > > > > we still need some form of completion call. > > > > > > I guess that virtio interrupts is the most widely deployed scenario, > > so let's ignore > > > the DMA TX completion transaction for now - and call it a possible > > future > > > optimization for specific use cases. So it seems that some form of > > completion call > > > is unavoidable. > > > > Agree to leave this aside, there is in theory a potential optimization, > > but > > unlikely to be of large value. > > >=20 > One more thing: When using DMA to pass on packets into a guest, there cou= ld be a > delay from the DMA completes until the guest is signaled. Is there any CP= U cache > hotness regarding the guest's access to the packet data to consider here?= I.e. if we > wait signaling the guest, the packet data may get cold. Interesting question; we can likely spawn a new thread around this topic! In short, it depends on how/where the DMA hardware writes the copy. With technologies like DDIO, the "dest" part of the copy will be in LLC. Th= e core reading the dest data will benefit from the LLC locality (instead of snooping it from a= remote core's L1/L2). Delays in notifying the guest could result in LLC capacity eviction, yes. The application layer decides how often/promptly to check for completions, and notify the guest of them. Calling the function more often will result i= n less delay in that portion of the pipeline. Overall, there are caching benefits with DMA acceleration, and the applicat= ion can control the latency introduced between dma-completion done in HW, and Guest vring u= pdate.