From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 97618A00C4; Wed, 27 Jul 2022 19:20:31 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5B1EA4021F; Wed, 27 Jul 2022 19:20:30 +0200 (CEST) Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30078.outbound.protection.outlook.com [40.107.3.78]) by mails.dpdk.org (Postfix) with ESMTP id B071640141 for ; Wed, 27 Jul 2022 19:20:29 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ml0HGA7o/tseTuRG+Qh3XE/MJ3TY4D7PMxm4lboldyBWLStINPRTPkQFCS3F7kSeb1eROUUiwMEvMIdm8mlo/tbEPJWEZDaWOtUemPkRmy+nomLBtbpzupNVyGbA8I4cWJm0gKoNEKP3DwZFa9plG7RCKxXg6DL22mBTKHme27z6Su4XwnSkkW4MUKBe8aFvFnaNwnFDoymfxFvTnUyMDN21lvAk1BlnVhIDUozwbPQDJED3jW4Rkmxo0zP4LofyV3jNKqmdFHBbS+gFqDDUOFSFekHYQZSHgXbq0pPuIWHH8Vo0MDSeHyQJS4QmELhVY0Zf+XugNznGnLKWw107rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x1HE+pAtQ3SpDpVrjb5mfdHdrWym9tB0sE44NWy/+Fk=; b=iPJF1oFB9UPYM2fDLs2EFP+F9LckBTRfVHxGoLlIa5x5Htn06jWyUtluqTr8g6DneWvUFlysJO3VPA7cNdfJ1xf/fSaSoHjRmEOegNweC86ebiRLeSE0+dOUZH4WZZSZPxGM8EP8VJzhXeObas2amco55/TBf5PNYxt5/TLG4GH1yRd/n2XWhDz4vVEp87Z6DzjIo93448yIkQ0rFHFFjsbWQ0ECJQF3sTPEtZLLW/WRvGinXCIELEDgXnjNERilNdFtOcnB/R3wxVaM8mdZ0HiRjYoTxf576fzTJ1zzg/SKaNdjSjSVsH1qgxEv3fPU5zlmd7oFF5+Z0GObLsCZDA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=xsightlabs.com; dmarc=pass action=none header.from=xsightlabs.com; dkim=pass header.d=xsightlabs.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xsightlabs.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=x1HE+pAtQ3SpDpVrjb5mfdHdrWym9tB0sE44NWy/+Fk=; b=n/TPdTLWIs0xQ95bviN77FaprBKpXg2yJ0/R8E/GvVeJBWcxRxXKCtQn1mpOOt39KrWWH1B2q51vgEh679EbZiRJPoeWDzW0ZQQOjAX4mJ3l+RAUZtRSojfQpTCjLyxsCMXgg5MlEkh/ox9dYr22KfdMjzLIYzhIj+iPAONdy9M= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=xsightlabs.com; Received: from DB9P193MB1482.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:2a6::7) by PAXP193MB1693.EURP193.PROD.OUTLOOK.COM (2603:10a6:102:1c1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5458.19; Wed, 27 Jul 2022 17:20:28 +0000 Received: from DB9P193MB1482.EURP193.PROD.OUTLOOK.COM ([fe80::d09a:72e6:a61d:ce90]) by DB9P193MB1482.EURP193.PROD.OUTLOOK.COM ([fe80::d09a:72e6:a61d:ce90%7]) with mapi id 15.20.5458.025; Wed, 27 Jul 2022 17:20:28 +0000 Message-ID: Date: Wed, 27 Jul 2022 13:20:22 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [RFC] EAL: legacy memory fixed address translations References: <256b5409-ddaf-d7cc-00c1-273ca76dbf71@xsightlabs.com> <6aaa04d8-2ac5-ced6-ec25-d42bc52a3e2f@xsightlabs.com> <20220726225910.26159820@sovereign> From: Don Wallwork To: Dmitry Kozlyuk Cc: "dev@dpdk.org" In-Reply-To: <20220726225910.26159820@sovereign> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: ZR0P278CA0059.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:21::10) To DB9P193MB1482.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:2a6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d1c2c25b-40b5-4ae4-76d5-08da6ff44cbf X-MS-TrafficTypeDiagnostic: PAXP193MB1693:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: L/MtFFjpU7B3P5xqcFH341oIlb4xOYuQH5RxmcjGtiC1fAF9FVCUFy0bV9BKM3Gfbdlwqi10kRGlZn36BqVtFvV+nz0ti8wnH01laHwdR5qHdWM4jwHgSsrc2woMUCDKzfiq6A+oPjZeKDyuUtmUfS0fRs0eSH3po5MYrgOj6SU5Wo0L5RSUE0yem9HABroZ0tyM5i1Jf4fCaf8QeymfxgYsFRwGMtLlCUnrDkqpittjpAn++L/+S2WHqk7Ph808m9mCGBV8jofjJ0hhgtjRKsxyT8qaJIk3r5ikRh5MzudhOkvoIlV52JEvYwYWtm6DhZuP7IG4LlmWdoRBW8EzYqezNZfBLLC1wTakK/CyT5UJFcQxYeWtUt19XCYg4V3JXkio7QvvQJR+gn8dInI460UFGbSpKfT1P68qgAPOQq3ryFqPhW6YzFc7pLb2mjlI28FoIqKZQWyG8rXwIN3SpBQEr7NgYZtH94efVSh/kFDRZPLchzAVXtBZJCGvxNhjmFITA9+TZRIiik/KyIy68u9eU0P+RwE157pTa4MHGcU133294Q0emF/M8rNpVm4anJ5NyUaBdqzm4j35dkQeMpYwx+9RSxTI8jgtMl5+7AdCISRxPwn2lFgJ67Rce38tdn1m00i0Rww8sY2cWJBG0Mzq4wxLWPeYeNFwt1ONsnMMAdOwp5Lte6kg+ZJ0nz9CpLhmyZ0c5Zmjzzupz/7agRNIVDROIc0cm/AH5jIezMC3aqnAvS4Y64fZv69U304O3m0P0eXZYaE4l1r21oQvmXK0WekQn21YTsurhXCkOxEOyCZwYVjOC5gVK9HOfTEzt//i5tdoHtA3Ojf/wByDrDo+6oR94vW9pKBvgClYBmQ= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB9P193MB1482.EURP193.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230016)(4636009)(39850400004)(396003)(136003)(376002)(346002)(366004)(6486002)(52116002)(53546011)(83380400001)(36756003)(31696002)(8936002)(4326008)(41300700001)(66476007)(8676002)(6506007)(6666004)(478600001)(316002)(86362001)(66556008)(38350700002)(38100700002)(6916009)(2906002)(31686004)(5660300002)(6512007)(26005)(66946007)(2616005)(186003)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?cVZBNFFvdUJuTnlkOHRFT1FtckJHaTF6TU1VYmJ0bVhjZzkxdXdFVG1WUXYz?= =?utf-8?B?SnFTMUNjVjdxRkZnajJNMzhtTU5LVC9ySTBsdFlQbTgrcjcvNjhEa2loc0Ro?= =?utf-8?B?d3UwZkVJNGN3ODBpam5VNmtwQ2hwZEJCVVN6QzQ5NkV6TmMreHJycUJsdFBM?= =?utf-8?B?RGRkMXZETjNTZFhXN2JYSS9LOTRvNklPODJqcnA1aEN5Z1I5eUdxRGFsTGdr?= =?utf-8?B?UEZlek45amM4M3FkOERxeFoxZlJpdEtSZkQya2cwUlM3V21HaGh6MldacS9h?= =?utf-8?B?RmlBay9YNHNrU2l4ZE9KWTE2dU5JeDMzVVA0dFkxOGFDMmkwMGFBMm14eXk4?= =?utf-8?B?VDAxWXZYV3FYcWR5dmVJOXZNRVVOekxZUnhTWC9Md3pRUE0wbTdsckhnMnFs?= =?utf-8?B?bE5KVFU2RHNMbE5PR0o4UHN6T01iV2VWUVR0VjRkVDJGVndpUXFELzcwZ0ZD?= =?utf-8?B?YUxGN2ZtdExwa3dzMUxQUjltbDcrWFFUalFKaDRJSURaZ3Z0ZjhldzU5VVdU?= =?utf-8?B?MjZBTzBpMTdpVnp3ZTB3QlpaTjEySmxNc1hQWU5yQjU5cUEwSkZVcFZiVUpV?= =?utf-8?B?VHplL1RqbzJrM1ozVlByREdWRGtZTjk4MVdvOU91SmNIbTVOcnZCRTlod1Bw?= =?utf-8?B?Q29IZjJ2ZU82UFpCZHRyTUZpNTcwdjE5TlExc2hKc01acnI5M2dDY29QaGNH?= =?utf-8?B?cFdNWGxNcnN2M2pRY0MvSTduaVJiYVBFbHFmN0laS3RTOXM2THQrZElRQks4?= =?utf-8?B?d2s2TkF0MHVZbTlxbGZhcnNJWkZuYnpVR2xRS0hRR0YrTWo0dTRsVlRDUWxo?= =?utf-8?B?MVRWMGttNWE0azI4N1ZlRmF2NDdQam9wT1lXWWFuS3BObWxzMjFrRmY3Y2pz?= =?utf-8?B?Mlk5Sk54TWcyWFA5UnZLaVVLYkxBSzh2UmVmRlBWRzFXRUQ2Si92TkNuTE4r?= =?utf-8?B?YmRNdzNoWjEyQzlUU2xZMk5lUW0wMGJvRlF6VWsyOVNBSC9tMk1lUFRWRlRN?= =?utf-8?B?dE52RUZIc2hWbklsU1RFdFJaRlBJbHNENTJWempnZ3ppSS9IN2dYb2YzNVg3?= =?utf-8?B?VWRIZGVnT1hxZVdWS1ROaGFkRy9UK2JYaEl1ekZlcXBESkVaM25XRUtrU2Q1?= =?utf-8?B?aDVQblZHMmlaUVUzWksvUGtVeUxxRmNBeTRSWTlYcHdXdDQ5OXhYNFAvTS8y?= =?utf-8?B?S21TRVh4NzdCUE01RWVYdktKY0RETFU3TVdzZ0R0SkliY2dRUGx3aGlSdXda?= =?utf-8?B?SEZyVG55NHZGYkJjTnpxMWNCcE1EZEF3bjRNS0hzTkJOQ0ZaT0NiTXhXVEw2?= =?utf-8?B?MlJIaGpNK3hiOERlU2pZYjYzWnFNL3RVTExGUXM2cVhQWmtsTWpvVkJCekh4?= =?utf-8?B?aG92WmNQVEJhVk90R24rUzRmNWhvTlBOSm9FaE5MT3p2UVVKeFNSN2trQXdQ?= =?utf-8?B?RFNpZ2QvdklwYnVBZzUvUHBMSURZbzVoQUsrVFRRNFNLMkVwV3VxQUhVOXJK?= =?utf-8?B?K2JiWW4vUHA3RGgvN0x4YjVIN2l3ZGFuRlBrTDVvcERWMFVBdG95bFQrMkc5?= =?utf-8?B?elV4dStvYitqb3E4clQrOEluaUZ2NVNGY29HcHFZdTd2RWpVRU9RSGoxaVcx?= =?utf-8?B?dk9EcTIyamtSTFg3dmpnOWZRdWQyRUVLUEtab2IxRCtIOGNtSXFaeG5ORE4y?= =?utf-8?B?YlBJZlpKa1E0dUhuM0k0dWNrd3RnRzlJVlB0K1ZwYkpUT0NMUXFhQXNid0Jq?= =?utf-8?B?eHV1ZHprbGFET1BVTVd3bGVnOEpQMnZDRXVvZzR3WFl5RWxCUWJ5RW15elBF?= =?utf-8?B?VW5ROHNKYTl3RXZrR1owVHM3eWVySnlBYThhTUs4b04yY1k2U2JUbEt6djlD?= =?utf-8?B?N0hFbER0OWdBQmIxdXJxQzhKekZ0WXpzTG15VHVVUnNib0tnOUNLMCtLYlE2?= =?utf-8?B?eUpOakRUVXVBbFplUWVialRDZk1kZGk0bGdBZkxvTUN1QjdYeVRzV1Myc24w?= =?utf-8?B?TU1ackoxZU90YXJrbVh4NEVpOXNoUStpc08zaXVZM3d5R0gyRnlXaUNqTnRh?= =?utf-8?B?cUt3czEwYjVEVFFMQkplT0lvdmNQdCtjbS83WDUva1B0Y3RNbWhiS2xxbUFG?= =?utf-8?Q?mpdcCaTzT9UDmvDcymdGGTEkZ?= X-OriginatorOrg: xsightlabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: d1c2c25b-40b5-4ae4-76d5-08da6ff44cbf X-MS-Exchange-CrossTenant-AuthSource: DB9P193MB1482.EURP193.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Jul 2022 17:20:27.8072 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 646a3e34-83ea-4273-9177-ab01923abaa9 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 75bbczp9BVQFsO0f8tV/kF5RtvfYQeIVdxMs1lWHIiE4Kwx2ebjmPdrNfRxTN+sWmVIJ0TlCIlj7fZZmR5LcCw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXP193MB1693 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 7/26/2022 3:59 PM, Dmitry Kozlyuk wrote: > Hi Don, > > 2022-07-26 14:33 (UTC-0400), Don Wallwork: >> This proposal describes a method for translating any huge page >> address from virtual to physical or vice versa using simple >> addition or subtraction of a single fixed value. This allows >> devices to efficiently access arbitrary huge page memory, even >> stack data when worker stacks are in huge pages. > What is the use case and how much is the benefit? Several examples where this could help include: 1. A device could return flow lookup results containing the physical address of a matching entry that needs to be translated to a virtual address. 2. Hardware can perform offloads on dynamically allocted heap memory objects and would need PA to avoid requiring IOMMU. 3. It may be useful to prepare data such as descriptors in stack variables, then pass the PA to hardware which can DMA directly from stack memory. 4. The CPU instruction set provides memory operations such as prefetch, atomics, ALU and so on which operate on virtual addresses with no software requirement to provide physical addresses. A device may be able to provide a more optimized implementation of such features that could avoid performance degradation associated with using a hardware IOMMU if provided virtual addresses. Having the ability to offload such operations without requiring data structure modifications to store an IOVA for every virtual address is desirable. All of these cases can run at packet rate and are not operating on mbuf data. These would all benefit from efficient address translation in the same way that mbufs already do. Unlike mbuf translation that only covers VA to PA, this translation can perform both VA to PA and PA to VA with equal efficiency. > > When drivers need to process a large number of memory blocks, > these are typically packets in the form of mbufs, > which already have IOVA attached, so there is no translation. > Does translation of mbuf VA to PA with the proposed method > show significant improvement over reading mbuf->iova? This proposal does not relate to mbufs.  As you say, there is already an efficient VA to PA mechanism in place for those. > > When drivers need to process a few IOVA-contiguous memory blocks, > they can calculate VA-to-PA offsets in advance, > amortizing translation cost. > Hugepage stack falls within this category. As the cases listed above hopefully show, there are cases where it is not practical or desirable to precalculate the offsets. > >> When legacy memory mode is used, it is possible to map a single >> virtual memory region large enough to cover all huge pages. During >> legacy hugepage init, each hugepage is mapped into that region. > Legacy mode is called "legacy" with an intent to be deprecated :) Understood.  For our initial implementation, we were okay with that limitation given that supporting in legacy mode was simpler. > There is initial allocation (-m) and --socket-limit in dynamic mode. > When initial allocation is equal to the socket limit, > it should be the same behavior as in legacy mode: > the number of hugepages mapped is constant and cannot grow, > so the feature seems applicable as well. It seems feasible to implement this feature in non-legacy mode as well. The approach would be similar; reserve a region of virtual address space large enough to cover all huge pages before they are allocated.  As huge pages are allocated, they are mapped into the appropriate location within that virtual address space. > >> Once all pages have been mapped, any unused holes in that memory >> region are unmapped. > Who tracks these holes and prevents translation from their VA? Since the holes are unmapped, references to locations in unused regions will result in seg faults. > Why the holes appear? Memory layout for different NUMA nodes may cause holes.  Also, there is no guarantee that all huge pages are physically contiguous. > >> This feature is applicable when rte_eal_iova_mode() == RTE_IOVA_PA > One can say it always works for RTE_IOVA_VA with VA-to-PA offset of 0. This is true, but requires the use of a hardware IOMMU which degrades performance. > >> and could be enabled either by default when the legacy memory EAL >> option is given, or a new EAL option could be added to specifically >> enable this feature. >> >> It may be desirable to set a capability bit when this feature is >> enabled to allow drivers to behave differently depending on the >> state of that flag. > The feature requires, in IOVA-as-PA mode: > 1) that hugepage mapping is static (legacy mode or "-m" == "--socket-limit"); > 2) that EAL has succeeded to map all hugepages in one PA-continuous block. It does not require huge pages to be physically contiguous. Theoretically the mapping a giant VA region could fail, but we have not seen this in practice even when running on x86_64 servers with multiple NUMA nodes, many cores and huge pages that span TBs of physical address space. > As userspace code, DPDK cannot guarantee 2). > Because this mode breaks nothing and just makes translation more efficient, > DPDK can always try to implement it and then report whether it has succeeded. > Applications and drivers can decide what to do by querying this API. Yes, providing an API to check this capability would definitely work. Thanks for all the good feedback. -Don