From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4E5114548D; Tue, 18 Jun 2024 14:33:22 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0DB2040DF6; Tue, 18 Jun 2024 14:33:22 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 6A2C9402B4 for ; Tue, 18 Jun 2024 14:33:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718714000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WH7TOem4qtKS0gidZmC5q3qky9HjQjQQjDyLXZSxJkM=; b=NcutuwGjzliHyiHAD/2e62OfXOz4ZskoZ/HdeLZsJM7ZtZTS/qNGF4ZPtWm9KteUXAufCB tb7lHnsrl7QSJGBxKsK83K3FtFj3335O2MSPZTqQoi+EIt78iCjHoELlSRj4e8rxmzQpA3 WCptCXWj2d3jn1/zsF/ujEiFi6WEkWE= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-361-56LlvHvSOfuVGHHyq-7wcQ-1; Tue, 18 Jun 2024 08:33:18 -0400 X-MC-Unique: 56LlvHvSOfuVGHHyq-7wcQ-1 Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-52c97141df8so1649541e87.3 for ; Tue, 18 Jun 2024 05:33:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718713997; x=1719318797; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WH7TOem4qtKS0gidZmC5q3qky9HjQjQQjDyLXZSxJkM=; b=REUj7ghh/COVMTJQ7keLBG3GwwxXJJFT1xjtOTTma+bjCBTOWsy6Yo5FD2SV8aTxUO 8gF4dCDW0muDOquN7MzK3bh3dDCHWAvDUzRHiE3hi1n8KahEST3fDXVVnBgry7FRee9q FFGV+Z0cwtM0Lp7Qdzy0PCvsvTKwUTGhX5Fk2KkLhnWhCrNKeeUrhMhIWlMQVZLuLRWN /omtU9qmszJgAYrjNraeNKyEGnYocCXHLjaX3DI3zWOMq341otCUeJZOECXEukMgQDvi u5v3b3SyRRVZPdbDGUUTZyeEjLFd8uevdgK1t9i4B2Rx4/XDUd52Rlyyj7Vh+qRkODqs Jh2Q== X-Gm-Message-State: AOJu0YxCBSHUOnq0aY3P0pZi/EBSBsR2OjBxp1kwAX4dYhgvX65fyLdT akxRhcfoBo/8/TO7C0EHKH/suHBj/GBKjhD0t+3d3K/QQvVN/n8nGoJIyLr08GfjYrtWBD5VjSM 4vNZwLUNduKbNrKKak+RBoUy+NEwOZFcPuV03VWoPrxn9A8x7rs5SE0CqwWHLUTC3U5xqlONyJC N1UYaZk77QKfd/i8c= X-Received: by 2002:a19:e047:0:b0:52c:884c:3d82 with SMTP id 2adb3069b0e04-52ca6e9048fmr7567520e87.46.1718713997114; Tue, 18 Jun 2024 05:33:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG0DVrOstLkwSP8jSRk6zBglo25D+pzRy195CSv/FCLluCEscY0Mm/P+m85GE/Mvyd7hQQdc6KpnwOuw1TZaDs= X-Received: by 2002:a19:e047:0:b0:52c:884c:3d82 with SMTP id 2adb3069b0e04-52ca6e9048fmr7567496e87.46.1718713996708; Tue, 18 Jun 2024 05:33:16 -0700 (PDT) MIME-Version: 1.0 References: <20240325100500.694748-2-rjarry@redhat.com> <20240327091440.1166119-2-rjarry@redhat.com> In-Reply-To: <20240327091440.1166119-2-rjarry@redhat.com> From: David Marchand Date: Tue, 18 Jun 2024 14:33:05 +0200 Message-ID: Subject: Re: [PATCH v5] graph: expose node context as pointers To: Robin Jarry Cc: dev@dpdk.org, Jerin Jacob , Kiran Kumar K , Nithin Dabilpuram , Zhirun Yan , Tyler Retzlaff X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Re Robin, On Wed, Mar 27, 2024 at 10:17=E2=80=AFAM Robin Jarry wr= ote: > > In some cases, the node context data is used to store two pointers > because the data is larger than the reserved 16 bytes. Having to define > intermediate structures just to be able to cast is tedious. And without > intermediate structures, casting to opaque pointers is hard without > violating strict aliasing rules. > > Add an unnamed union to allow storing opaque pointers in the node > context. Unfortunately, aligning an unnamed union that contains an array > produces inconsistent results between C and C++. To preserve ABI/API > compatibility in both C and C++, move all fast-path area fields into an > unnamed struct which is cache aligned. Use __rte_cache_min_aligned to > preserve existing alignment on architectures where cache lines are 128 > bytes. > > Add a static assert to ensure that the unnamed union is not larger than > the context array (RTE_NODE_CTX_SZ). > > Signed-off-by: Robin Jarry > --- > > Notes: > v5: > > * Helper functions to hide casting proved to be harder than expected. > Naive casting may even be impossible without breaking strict aliasi= ng > rules. The only other option would be to use explicit memcpy calls. > * Unnamed union tentative again. As suggested by Tyler (thank you!), > using an intermediate unnamed struct to carry the alignment produce= s > consistent ABI in C and C++. > * Also, Tyler (thank you!) suggested that the fast path area alignmen= t > size may be incorrect for architectures where the cache line is not= 64 > bytes. There will be a 64 bytes hole in the structure at the end of > the unnamed struct before the zero length next nodes array. Use > __rte_cache_min_aligned to preserve existing alignment. - There is still an issue with that approach on 128 bytes cache line arches, like ARM. This results in a ABI breakage: Functions changes summary: 0 Removed, 1 Changed (9 filtered out), 0 Added functions Variables changes summary: 0 Removed, 0 Changed, 0 Added variable 1 function with some indirect sub-type change: [C] 'function bool __rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*, rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has some indirect sub-type changes: parameter 1 of type 'rte_node*' has sub-type changes: in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:= 1: type size changed from 3072 to 2048 (in bits) 7 data member deletions: 'uint8_t ctx[16]', at offset 2048 (in bits) at rte_graph_worker_common.h:115:1 'uint16_t size', at offset 2176 (in bits) at rte_graph_worker_common.h:116:1 'uint16_t idx', at offset 2192 (in bits) at rte_graph_worker_common.h:117:1 'rte_graph_off_t off', at offset 2208 (in bits) at rte_graph_worker_common.h:118:1 'uint64_t total_cycles', at offset 2240 (in bits) at rte_graph_worker_common.h:119:1 'uint64_t total_calls', at offset 2304 (in bits) at rte_graph_worker_common.h:120:1 'uint64_t total_objs', at offset 2368 (in bits) at rte_graph_worker_common.h:121:1 1 data member insertion: 'struct {union {uint8_t ctx[16]; struct {void* ctx_ptr; void* ctx_ptr2;};}; uint16_t size; uint16_t idx; rte_graph_off_t off; uint64_t total_cycles; uint64_t total_calls; uint64_t total_objs; union {void** objs; uint64_t objs_u64;}; union {rte_node_process_t process; uint64_t process_u64;};}', at offset 1536 (in bits) 1 data member changes (1 filtered): 'rte_node* nodes[]' offset changed from 2560 to 2048 (in bits) (by -512 bits) Before the patch, the rte_node object layout was: struct rte_node { ... /* XXX 64 bytes hole, try to pack */ /* --- cacheline 4 boundary (256 bytes) --- */ uint8_t ctx[16] __attribute__((__aligned__(128))); /* 256 16 */ uint16_t size; /* 272 2 */ uint16_t idx; /* 274 2 */ rte_graph_off_t off; /* 276 4 */ uint64_t total_cycles; /* 280 8 */ uint64_t total_calls; /* 288 8 */ uint64_t total_objs; /* 296 8 */ union { void * * objs; /* 304 8 */ uint64_t objs_u64; /* 304 8 */ }; /* 304 8 */ union { rte_node_process_t process; /* 312 8 */ uint64_t process_u64; /* 312 8 */ }; /* 312 8 */ /* --- cacheline 5 boundary (320 bytes) --- */ struct rte_node * nodes[] __attribute__((__aligned__(64))); /* 320 0 */ /* size: 384, cachelines: 6, members: 20 */ /* sum members: 250, holes: 3, sum holes: 70 */ /* padding: 64 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 64 */ } __attribute__((__aligned__(128))); After this patch: struct rte_node { ... /* --- cacheline 3 boundary (192 bytes) --- */ struct { union { uint8_t ctx[16]; /* 192 16 */ struct { void * ctx_ptr; /* 192 8 */ void * ctx_ptr2; /* 200 8 */ }; /* 192 16 */ }; /* 192 16 */ uint16_t size; /* 208 2 */ uint16_t idx; /* 210 2 */ rte_graph_off_t off; /* 212 4 */ uint64_t total_cycles; /* 216 8 */ uint64_t total_calls; /* 224 8 */ uint64_t total_objs; /* 232 8 */ union { void * * objs; /* 240 8 */ uint64_t objs_u64; /* 240 8 */ }; /* 240 8 */ union { rte_node_process_t process; /* 248 8 */ uint64_t process_u64; /* 248 8 */ }; /* 248 8 */ } __attribute__((__aligned__(64))) __attribute__((__aligned__(64))); /* 192 64 */ /* --- cacheline 4 boundary (256 bytes) --- */ struct rte_node * nodes[] __attribute__((__aligned__(64))); /* 256 0 */ /* size: 256, cachelines: 4, members: 12 */ /* sum members: 250, holes: 2, sum holes: 6 */ /* forced alignments: 2 */ } __attribute__((__aligned__(128))); The introduced anonymous structure gets aligned on the minimum cache line size (64 bytes): with this change, ctx[] move from offset 256, to offset 192. Similarly, nodes[] moves from offset 320 to offset 256. As we discussed offlist, there are a few options to workaround this issue (like moving nodes[] inside the anonymous struct though it still results in an increased rte_node struct, or like adding an explicit padding field right before the newly introduced anonymous struct, ...). - Additionally, anonymous structures are not correctly handled with libabigail 2.4 which is the version used in the CI. At the moment, the ABI check in GHA and UNH will fail on x86 with: 1 function with some indirect sub-type change: [C] 'function bool __rte_graph_mcore_dispatch_sched_node_enqueue(rte_node*, rte_graph_rq_head*)' at rte_graph_model_mcore_dispatch.c:117:1 has some indirect sub-type changes: parameter 1 of type 'rte_node*' has sub-type changes: in pointed to type 'struct rte_node' at rte_graph_worker_common.h:92:= 1: type size hasn't changed 2 data member deletions: 'union {void** objs; uint64_t objs_u64;}', at offset 1920 (in bit= s) 'union {rte_node_process_t process; uint64_t process_u64;}', at offset 1984 (in bits) no data member changes (2 filtered); On this topic, we have to either put a suppression rule on the rte_node structure, or bump the libabigail version in UNH, GHA, and the maintainers build env (though the latter won't happen overnight, and we are really close to rc1). For those two reasons, it is better to revisit this patch and have it ready for the next release. While at it, it may be worth cleaning up the rte_node structure in v24.11, if so, please announce in a deprecation notice for this planned ABI breakage. --=20 David Marchand