From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 41608A32A8 for ; Sat, 26 Oct 2019 16:10:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4C4141BF67; Sat, 26 Oct 2019 16:10:02 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id 2EBAC1BF60 for ; Sat, 26 Oct 2019 16:10:01 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9QE1MKT005570; Sat, 26 Oct 2019 07:09:58 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=suo8JImCLeBNZnko7tPPlyLXWlOVnICOuTdTtHT8Lvc=; b=rkeDibgpzMoFd1b22iToPQgbjUrWbCEJq5z4jKbckmYx27vPDo1IKBy4d64YBUeITKh9 Mr5eOPdDt/QPrcxKqc8ktg5cubq0h9Vg+N24MilrlNMuvzKJ29OBjLDqv45DcEJ2RQZA JDqV1lGj7UzasMvnpE31WCvC9FDXh483qLO64wVGd+UM2xnIjB7G5YrodxuPk3PWoTRo XV7krlJ8V9+yG17ckGKBK5rOI3VqrfbIdcqOuRNWLhO1bSgQqrwkN0nTB9VLLySln2lX SoeOw4BXy9QMuR3HCzSjdNLTv6g1Fc5ErxXmh2RoFXTix6zUcjZ/PswLyNhdoPOoZ6rC KQ== Received: from sc-exch03.marvell.com ([199.233.58.183]) by mx0b-0016f401.pphosted.com with ESMTP id 2vvnnnr9f3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Sat, 26 Oct 2019 07:09:57 -0700 Received: from SC-EXCH01.marvell.com (10.93.176.81) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Sat, 26 Oct 2019 07:09:55 -0700 Received: from NAM04-SN1-obe.outbound.protection.outlook.com (104.47.44.50) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Sat, 26 Oct 2019 07:09:55 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XIGjDvyEbmGWJ2MNyZOWCAiQpbz7gVCvrMgH2MH25vDAZfUEgzB7qpSsNydEDZrpOYVaflZs/KERyvirOPpS5RV6DrOKZAQeTU2SUscTvWvNz72o2cTYhGlA3w1UgqtfAGZf1kFpmZV8VnLKRgTQ3Tgie2+MY+s3t3jGMi8EdLoXdSt9P7najSYAIO2Cqnc+Q4bKmQTRhpG/lNDRWS/rC5nf5JqvSVVe7wi6ot2XQB8mHTrJ0PTmjZBCUnhvh2OmnxUPIp+VtS7fatI2grRaYzx+j2l+wxtJ8ToMtYHP4P23i9MRXiFtILStVHhD+kb3OxBAarlLZfF6VvBBoIEA+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=suo8JImCLeBNZnko7tPPlyLXWlOVnICOuTdTtHT8Lvc=; b=BGy5vuc5IV3SnNyCFVFlbzC/ho1uM2vDJ+F+U1nQQq2uFyk1ywHpeS12q+bDdtE9Qcoo886nbslcmcpE+m3xpjrlFLyMEuLvsbSvhtrlQI6iAGAcYpMhopFPqQ+Vj07++T6Ft0/4Bxvba6y0DtKGJiP8xH/Q+9TC1iPscJOb/IVi8AFvUTZRsKKedqfSGsgZgQ26gXLNASjixWz9679rPH2P5MNtg8itVf0yINrqavFr4xhBaeodX0Yqir6hqPxX9lKuFAlaLLjbRwWRppoeq8sgrk+gZaR5zICCACDyPkRf4SoIhisO7U4i9kEXFJ3abSlwXJpKCZoPTTXCHLSuBg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com; dkim=pass header.d=marvell.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=suo8JImCLeBNZnko7tPPlyLXWlOVnICOuTdTtHT8Lvc=; b=rtgLMBhlhWrtuxm9yNQkelKKnEFB1eCq2eCPC7FYAPyURWE8uoJHzPKFfuHWeUDCli0EPPAD0N3Q8zE+IZftXlqtT6ENGCKqdYyXkFm11B6eo9fhrFxDZqp2Ccabvz+H/VqSi7TRhPuL+uKFZsowi7D1bsswCHsd55zKj5OS5OI= Received: from MWHPR18MB1645.namprd18.prod.outlook.com (10.173.241.137) by MWHPR18MB1360.namprd18.prod.outlook.com (10.173.242.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2387.23; Sat, 26 Oct 2019 14:09:52 +0000 Received: from MWHPR18MB1645.namprd18.prod.outlook.com ([fe80::b4fd:71ce:2bc4:7afb]) by MWHPR18MB1645.namprd18.prod.outlook.com ([fe80::b4fd:71ce:2bc4:7afb%3]) with mapi id 15.20.2387.025; Sat, 26 Oct 2019 14:09:52 +0000 From: Vamsi Krishna Attunuru To: Olivier Matz CC: Jerin Jacob , Andrew Rybchenko , Ferruh Yigit , "thomas@monjalon.net" , Jerin Jacob Kollanukkaran , Kiran Kumar Kokkilagadda , "anatoly.burakov@intel.com" , "stephen@networkplumber.org" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option Thread-Index: AQHViYp0eXahEdAVlE6WOaNh/KWwTKdoTnsAgAAEHQCAAb0SAIAAIE8AgADZW8CAAdR6gIAAGPzw Date: Sat, 26 Oct 2019 14:09:51 +0000 Message-ID: References: <77f8eaf0-52ca-1295-973d-c8085f7b7736@intel.com> <08c426d1-6fc9-1c3f-02d4-8632a8e3c337@solarflare.com> <20191023144724.GO25286@glumotte.dev.6wind.com> <20191024173506.GU25286@glumotte.dev.6wind.com> <20191026122525.ny6wwtrnfw32367j@platinum> In-Reply-To: <20191026122525.ny6wwtrnfw32367j@platinum> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [103.227.99.38] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 96556753-d4fe-4d4a-1cba-08d75a1e2b8b x-ms-traffictypediagnostic: MWHPR18MB1360: x-ms-exchange-purlcount: 1 x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:6108; x-forefront-prvs: 0202D21D2F x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(376002)(396003)(346002)(136003)(366004)(39850400004)(189003)(199004)(13464003)(45074003)(51914003)(74316002)(9686003)(6306002)(81156014)(4326008)(229853002)(81166006)(6916009)(7696005)(99286004)(30864003)(76176011)(6436002)(2906002)(256004)(71190400001)(14444005)(305945005)(55016002)(19627235002)(8676002)(8936002)(6116002)(3846002)(7736002)(14454004)(52536014)(478600001)(966005)(6246003)(5660300002)(71200400001)(54906003)(316002)(66476007)(66946007)(446003)(76116006)(64756008)(66446008)(26005)(486006)(476003)(66556008)(11346002)(6506007)(86362001)(102836004)(25786009)(186003)(53546011)(66066001)(33656002); DIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR18MB1360; H:MWHPR18MB1645.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:3; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Bf2MUSMof08G4EZxxBneMsdL0TGUUPqnh2FM5QrU9B7kaRmDfFR8Y/d+0QoJO6EWTbFKQK7HkhX84q0DMVl/LpZLaRn6cFe/NBYvu2Ees7RxDNWFTE083JXMlFZJac5UN5MvSXGqeURpTDoCLycKOM3uPxLKLW1Ljnb6vsxJ0Esp0LOUkiFf3Pabz1hprE5vUvDpRuzotmKx7j4BI66CJ+he38sJ3cEp+VV0biOoD2xUlI0rRuxWyRm3Rv4Pv/7L+4UNcNgMdQ1ITUH55OzK35AFm3LjnSTpJTx/baI/kAf8jUyZrwbkw5BhHB2rR6M7MjtY0G/QKXc+WgCgdvf0Wp+1cbf/lDsiQol5MFPQFoLi0xM0dMICB8xDgfvD1v+dQwCotlsLyBDjivSp0X4vJkDU53aqO3spA3qBjXB0tdyT7o1X4X3SdfGerSGCzRjW+P6I57uTKbNg/ncChUjuI+hhl7NgZ9Imr+6i4rZrAzw= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 96556753-d4fe-4d4a-1cba-08d75a1e2b8b X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Oct 2019 14:09:52.0211 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: uKfAG2mOvxZkcxffKBg9Ke7Z+ETCd2H0QROsQnel4Y0KgX2Tbwnn/te4+zgEPan8M6ifwK1jwpWx8bhZrpv5vA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR18MB1360 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-26_04:2019-10-25,2019-10-26 signatures=0 Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Olivier, > -----Original Message----- > From: Olivier Matz > Sent: Saturday, October 26, 2019 5:55 PM > To: Vamsi Krishna Attunuru > Cc: Jerin Jacob ; Andrew Rybchenko > ; Ferruh Yigit ; > thomas@monjalon.net; Jerin Jacob Kollanukkaran ; > Kiran Kumar Kokkilagadda ; > anatoly.burakov@intel.com; stephen@networkplumber.org; dev@dpdk.org > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni opt= ion >=20 > Hi Jerin, Hi Vamsi, >=20 > On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote: > > > > > > > -----Original Message----- > > > From: Jerin Jacob > > > Sent: Friday, October 25, 2019 1:01 AM > > > To: Olivier Matz > > > Cc: Vamsi Krishna Attunuru ; Andrew > Rybchenko > > > ; Ferruh Yigit ; > > > thomas@monjalon.net; Jerin Jacob Kollanukkaran ; > > > Kiran Kumar Kokkilagadda ; > > > anatoly.burakov@intel.com; stephen@networkplumber.org; > dev@dpdk.org > > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy > > > kni option > > > > > > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz > > > > > > wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote: > > > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz > > > > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote: > > > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi Ferruh, > > > > > > > > > > > > > > > > Can you please explain the problems in using kni dedicated > > > > > > > > mbuf alloc > > > routines while enabling kni iova=3Dva mode. Please see the below > > > discussion with Andrew. He wanted to know the problems in having > newer APIs. > > > > > > > > > > > > > > > > > > > > > While waiting for the Ferruh reply, I would like to > > > > > > > summarise the current status > > > > > > > > > > > > > > # In order to make KNI work with IOVA as VA, We need to make > > > > > > > sure mempool pool _object_ should not span across two huge > > > > > > > pages > > > > > > > > > > > > > > # This problem can be fixed by, either of: > > > > > > > > > > > > > > a) Introduce a flag in mempool to define this constraint, so > > > > > > > that, when only needed, this constraint enforced and this is > > > > > > > in line with existing semantics of addressing such problems > > > > > > > in mempool > > > > > > > > > > > > > > b) Instead of creating a flag, Make this behavior by default > > > > > > > in mempool for IOVA as VA case > > > > > > > > > > > > > > Upside: > > > > > > > b1) There is no need for specific mempool_create for KNI. > > > > > > > > > > > > > > Downside: > > > > > > > b2) Not align with existing mempool API semantics > > > > > > > b3) There will be a trivial amount of memory waste as we can > > > > > > > not allocate from the edge. Considering the normal huge page > > > > > > > memory size is 1G or 512MB this not a real issue. > > > > > > > > > > > > > > c) Make IOVA as PA when KNI kernel module is loaded > > > > > > > > > > > > > > Upside: > > > > > > > c1) Doing option (a) would call for new KNI specific mempool > > > > > > > create API i.e existing KNI applications need a one-line > > > > > > > change in application to make it work with release 19.11 or l= ater. > > > > > > > > > > > > > > Downslide: > > > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not > > > > > > > work with KNI > > > > > > > c3) Need root privilege to run KNI as IOVA as PA need root > > > > > > > privilege > > > > > > > > > > > > > > For the next year, we expect applications to work 19.11 > > > > > > > without any code change. My personal opinion to make go with > > > > > > > option (a) and update the release notes to document the > > > > > > > change any it simple one-line change. > > > > > > > > > > > > > > The selection of (a) vs (b) is between KNI and Mempool > maintainers. > > > > > > > Could we please reach a consensus? Or can we discuss this TB > meeting? > > > > > > > > > > > > > > We are going back and forth on this feature on for the last > > > > > > > 3 releases. Now that, we solved all the technical problems, > > > > > > > please help us to decide (a) vs (b) to make forward progress. > > > > > > > > > > > > Thank you for the summary. > > > > > > What is not clear to me is if (a) or (b) may break an existing > > > > > > application, and if yes, in which case. > > > > > > > > > > Thanks for the reply. > > > > > > > > > > To be clear we are talking about out of tree KNI tree application= . > > > > > Which they don't want to > > > > > change rte_pktmbuf_pool_create() to > > > > > rte_kni_pktmbuf_pool_create() and build for v19.11 > > > > > > > > > > So in case (b) there is no issue as It will be using > rte_pktmbuf_pool_create (). > > > > > But in case of (a) it will create an issue if out of tree KNI > > > > > application is using rte_pktmbuf_pool_create() which is not > > > > > using the NEW flag. > > > > > > > > Following yesterday's discussion at techboard, I looked at the > > > > mempool code and at my previous RFC patch. It took some time to > > > > remind me what was my worries. > > > > > > Thanks for the review Olivier. > > > > > > Just to make sure the correct one is reviewed. > > > > > > 1) v7 had similar issue mentioned > > > https://urldefense.proofpoint.com/v2/url?u=3Dhttp- > > > > 3A__patches.dpdk.org_patch_56585_&d=3DDwIBaQ&c=3DnKjWec2b6R0mOyPaz7 > xtf > > > > Q&r=3DWllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=3DMMwAZe76YM > VHe > > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=3DmfN_afnyFm65sQYzaAg_- > > > uM9o22A5j392TdBZY-bKK4&e=3D >=20 > The v7 has the problem I described below: the iova-contiguous allocation > may fail because the calculated size is too small, and remaining objects = will > be added in another chunk. This can happen if a fully iova-contiguous > mempool is requested (it happens for octeontx). >=20 > Also, the way page size is retrieved in > rte_mempool_op_populate_default() assume that memzones are used, > which is not correct. >=20 > > > 2) v11 addressed the review comments and you have given the Acked-by > > > for mempool change https://urldefense.proofpoint.com/v2/url?u=3Dhttp- > > > > 3A__patches.dpdk.org_patch_61559_&d=3DDwIBaQ&c=3DnKjWec2b6R0mOyPaz7 > xtf > > > > Q&r=3DWllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=3DMMwAZe76YM > VHe > > > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=3DfrFvKOHFDRhTam6jDZZc6omK2gb1RU > 62 > > > xzAiiBMnf0I&e=3D >=20 > The v11 looked fine to me, because it does not impact any default behavio= r, > and the plan was to remove this new function when my mempool patchset > was in. >=20 >=20 > > > > > > My thought process in the TB meeting was, since > > > rte_mempool_populate_from_pg_sz_chunks() reviwed replace > > > rte_pktmbuf_pool_create's rte_mempool_populate_default() with > > > rte_mempool_populate_from_pg_sz_chunks() > > > in IOVA =3D=3D VA case to avoid a new KNI mempool_create API. > > > > > > > > > > > Currently, in rte_mempool_populate_default(), when the mempool is > > > > populated, we first try to allocate one iova-contiguous block of > > > > (n * elt_size). On success, we use this memory to fully populate > > > > the mempool without taking care of crossing page boundaries. > > > > > > > > If we change the behavior to prevent objects from crossing pages, > > > > the assumption that allocating (n * elt_size) is always enough > > > > becomes wrong. By luck, there is no real impact, because if the > > > > mempool is not fully populated after this first iteration, it will > > > > allocate a new chunk. > > > > > > > > To be rigorous, we need to better calculate the amount of memory > > > > to allocate, according to page size. > > > > Hi Olivier, > > > > Thanks for the review, I think the below mentioned problems exist with > > current mempool_populate_default() api and will there be high chances > > of hitting those problems when we precalculate the memory size(after > > preventing objs from pg boundary and fit complete mempool memory in > > single mem chunk) and if mempool size goes beyond page size as below > > example. ?, >=20 > Yes, the problem described below (alloc a mempool of 1.1GB resulting in 2= GB > reserved) exists in the current version. It will be fixed in the new vers= ion of > my "mempool: avoid objects allocations across pages" > patchset. >=20 > FYI, a reworked patchset is alsmost ready, I'll send it monday. Thanks a lot Olivier. >=20 > > > > Regards, > > Vamsi > > > > > > > > > > Looking at the code, I found another problem in the same area: > > > > let's say we populate a mempool that requires 1.1GB (and we use 1G > huge pages): > > > > > > > > 1/ mempool code will first tries to allocate an iova-contiguous zon= e > > > > of 1.1G -> fail > > > > 2/ it then tries to allocate a page-aligned non iova-contiguous > > > > zone of 1.1G, which is 2G. On success, a lot of memory is wasted= . > > > > 3/ on error, we try to allocate the biggest zone, it can still retu= rn > > > > a zone between 1.1G and 2G, which can also waste memory. > > > > > > > > I will rework my mempool patchset to properly address these > > > > issues, hopefully tomorrow. > > > > > > OK. > > > > > > > > > > > > > > Also, I thought about another idea to solve your issue, not sure > > > > it is better but it would not imply to change the mempool > > > > behavior. If I understood the problem, when a mbuf is accross 2 > > > > pages, the copy of the data can fail in kni because the mbuf is > > > > not virtually contiguous in the > > > > > > For KNI use case, we would need _physically_ contiguous to make sure > > > that using, get_user_pages_remote() we get physically contiguous > > > memory map, so that both KNI kernel thread and KNI kernel context > > > and DPDK userspace can use the same memory in different contexts. > > > > > > > > > > > > > kernel. So why not in this case splitting the memcpy() into > > > > several, each of them being on a single page (and calling > > > > phys2virt() for each page)? The same would have to be done when > > > > accessing the fields of the mbuf structure if it crosses a page > > > > boundary. Would that work? This > > > > > > If the above is the requirement, Does this logic need to be in slow > > > path or fast path? >=20 > In fast path. But I don't think the performance impact would be significa= tive. >=20 > Vamsi, do you confirm this approach could also solve the issue without > changing the mempool? So far there is no performance impact observed when these feature(kni iova = as va) is enabled with this patch set. Not sure of impact if when memcpy sp= lit and extra address translations are introduced, this approach might solv= e but I feel it's not a clearer way of solving the issue. >=20 > > > > > > > could be a B plan. > > > > > > OK. > > > > > > > > > > > Olivier