From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0054.outbound.protection.outlook.com [104.47.1.54]) by dpdk.org (Postfix) with ESMTP id 7DF831BB70 for ; Wed, 11 Apr 2018 19:08:29 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=xDjRlmXFtBJXOoA1qVp0cxZhpAbB+uhCaKPhp9qo7Lc=; b=srTUJfBPNAOA6Cs/ek0aBmlVz+HHnvmQObslcFe45X9o9HXBoBAm61mYL0zt3HMKn+NBhg4qOxqn/w05Uw8rJAM457mZMMWhZHQvtkNumnvnvxxh0nQ2f2yf+jbL1jfgRpU16Kg8xmqilZPDxegOli/k4AX+jBzVlP2ZwUHhA6g= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=yskoh@mellanox.com; Received: from yongseok-MBP.local (209.116.155.178) by HE1PR0501MB2042.eurprd05.prod.outlook.com (2603:10a6:3:35::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.653.12; Wed, 11 Apr 2018 17:08:24 +0000 Date: Wed, 11 Apr 2018 10:08:11 -0700 From: Yongseok Koh To: "Ananyev, Konstantin" Cc: Olivier Matz , "Lu, Wenzhuo" , "Wu, Jingjing" , Adrien Mazarguil , =?iso-8859-1?Q?N=E9lio?= Laranjeiro , "dev@dpdk.org" Message-ID: <20180411170810.GA27791@yongseok-MBP.local> References: <20180310012532.15809-1-yskoh@mellanox.com> <20180402185008.13073-1-yskoh@mellanox.com> <20180402185008.13073-2-yskoh@mellanox.com> <20180403082615.etnr33cuyey7i3u3@platinum> <20180404001205.GB1867@yongseok-MBP.local> <20180409160434.kmw4iyztemrkzmtc@platinum> <20180410015902.GA20627@yongseok-MBP.local> <2601191342CEEE43887BDE71AB977258AE91344A@IRSMSX102.ger.corp.intel.com> <20180411053302.GA26252@yongseok-MBP.local> <2601191342CEEE43887BDE71AB977258AE913944@IRSMSX102.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB977258AE913944@IRSMSX102.ger.corp.intel.com> User-Agent: Mutt/1.9.3 (2018-01-21) X-Originating-IP: [209.116.155.178] X-ClientProxiedBy: CY4PR04CA0080.namprd04.prod.outlook.com (2603:10b6:910:4f::45) To HE1PR0501MB2042.eurprd05.prod.outlook.com (2603:10a6:3:35::20) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:HE1PR0501MB2042; X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2042; 3:sd/hLtJDSOueTQAuLDYohZvOGSoUfn/XOPF5jTyMp/MUIPgfFprPMasaDNA8cg51mmKrx5YS+uJHnH2RwyDPNTzOs56Uq0nW8NJDdEMfWhtSuyNlIT3/Q7LkAbJM6yAcpGa7Eid2QbvN0D7sec3r0e2xHcGzRgZZQnhLukGszpmSqgC/oUM2CuuESHXnJuekhEwrdFkW7Pl3wNYKd4vWtPVmq64aLEbPwbTucJ0VSFlwYwWgkA3HvapX5P5InfRD; 25:A548kn+G3fepcd8Uvp4otVyRvWuQuWmdMS7SPswtxSkA2a7Bhlhn0tN+XIQ9MUAvUkKB0XiAMSSTV+jZG+SBiWK6q+Daj2/sNngiGp13MPXBLxEJf5KH6DU3H0bL+3KwrF7CqZgXpaIwMkAjsoMWc7DxmMXDW294uVEf1wZgnQzzkUUe4f1ec3/fqKvBnsd+ukKBNwUAM9e2UE+XAJTOYGJSW3oQhfgeseaNb00c5bswHoHGhh/dtdqehuMpFUg0zsnf6l2+0S5UwfpX4zkaW+YmNxxJnTMyDHbonSzy/OibVsOIECS5XttoimAvmXIfm2/XXkBdwcxB9Mv1rukz4g==; 31:nzozYXScxHXyWSGnE4uCRNIzd+4o4qcCFu7Pq++hWVjvt3rQH0UeTMqiNrcUkjl4tldW+sD4xAHx8jCOCkpQUgdDfnBJIbDzBgvn9ald340TbKSVs7yDRjJARrEO8ZaIRbboEuwS21lwZQWAD5o4inbzZawNxG+kU2JeyUrB3MClWL6G0PB7yk61nLQ5v3SA4PWxpVGLFgZXMrxvQEU7JdLg2pOQ4OtwiDsr5ytExCk= X-MS-TrafficTypeDiagnostic: HE1PR0501MB2042: X-LD-Processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2042; 20:1R1mup6NR/UE9wMQ1rzEjVSL7nwraAosTu1eIwO+sABdpXDISFKEWYsnFqitF1aYqVQvAiSvtmkBsJcjiMEEXS+qPSkbcGUv6QNlwvbG7HYYspXtHw6d+lM9eIXQDTkPJgvR7r+GNcX31LpSWJVuVLOSl4ev1It2HX/cLFKXFETgKHkIa9Awq+H+pl/IEXTpZEgGq+scPJ9buQ0GQBcAvRNVxvHWdlyMFmCsAy39g3weQoCDsjGYAU1r0m3ALg75FIwsseGqjPOQDVsKi5NVfCRhRr4U+XmIq0FczQ6F670X3Qlusl1OQNkcz8QZzGXmg6rJIZH8Dm1//MWneGN/H57jiCwtszrays5cd8fYHMmMfBuCRwvghZpJ6reyJUCC9vPCHPzVEXUfNSk0071wDLzGwjdSeoP/OX3UiT3YF+pRUUO/ovyM3QW5CmlkQdmzw+rVfUYIfiHt7UjD1AUkyxIl9v7oUm7VtlWe20C4CbZnoyCvS2/oLLmS5z2GpK/i; 4:jybsSAo2L3O0SQjDqvno9qx0fhuSNT7elEtARBtB/Q72PUR5i+hXsDD9B7r0YdKL1UFaXRR8NBH+A7CBENxdQ9u03emft1VZXBzl+jyIunS3IGnjkfZE+eEC63VjI28T7VvwZtDMS3ZjUKJbn9gAEdkEmnUGo6G7z0kizqZG8MVQ8pbRjv/w/oZhDTu4IXoMRPfP1tFX68yoakycS6Vb164Cnf+2coZgwvJSWdIXqDWjMLXsqu4KSx6jiu6kNNtB8/AKxNEldafvYfyxnw/xuoHdkLrrSTIqEL72oEdNQxmYjtmJ+2Ti8mA4bjxqvLZ1DLlXu3lwZo0tBL1YUV6POW5TjdvetW08W0mMcEU0TMmbz8EMlL5gsAM8gfzlBPIas170a9xue8qoknVh256HrNHbsfNZVV0G7hZxBa0E+tU= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(189930954265078)(15185016700835)(45079756050767)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(3231221)(944501327)(52105095)(3002001)(10201501046)(6055026)(6041310)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:HE1PR0501MB2042; BCL:0; PCL:0; RULEID:; SRVR:HE1PR0501MB2042; X-Forefront-PRVS: 0639027A9E X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(346002)(376002)(396003)(366004)(39860400002)(39380400002)(199004)(189003)(186003)(105586002)(8936002)(16526019)(54906003)(50466002)(478600001)(2906002)(6506007)(316002)(33656002)(16586007)(966005)(386003)(6916009)(106356001)(26005)(305945005)(98436002)(45080400002)(1076002)(68736007)(93886005)(446003)(76176011)(7696005)(8676002)(476003)(6666003)(956004)(66066001)(11346002)(58126008)(5890100001)(6116002)(3846002)(47776003)(6246003)(53936002)(4326008)(59450400001)(25786009)(5660300001)(97736004)(55016002)(6306002)(9686003)(7736002)(486006)(81156014)(81166006)(33896004)(23726003)(52116002)(86362001)(229853002)(18370500001)(19627235001); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0501MB2042; H:yongseok-MBP.local; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; Received-SPF: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; HE1PR0501MB2042; 23:bQaEcVcUZBsFaTZcePClTGkdsx1SQ25zPUyaNv6?= =?us-ascii?Q?0AHyXvq+OKY6lFXEaa+Y11CeZy29CkLW8d97GkGXJ4wjgF4bkZuCBW2kXjrP?= =?us-ascii?Q?13VyXQN5VQeHqrQccYf5hXPiMG9XNjTo+2sghRR4elJui47MoodMEah/YNvo?= =?us-ascii?Q?UcXL69MYk3JT8q4JPxDUOjrJ+iF5NAy90C2SnLGVqFkhKQ3eQaDAgfx6ZjSG?= =?us-ascii?Q?mbDWOH1YZN70qeDYibXpiVRC1LZvtS4Sv/D/l+qsIWmkRozYFUS6Ve7WabQz?= =?us-ascii?Q?F7cMUDxqfs+J4DI6oo43JIbrz8895vgOSsstjymbFv3qJAHZCiF7ceSu5wv0?= =?us-ascii?Q?b6eKo6LqehEMULD6FdqMFv3YEwYasq77YA2Es2oyEb++o7yxNTIb5LDLZ+Xd?= =?us-ascii?Q?IecwuN7N0gZWlIossEnEPMs05slH55tgqJvZvuZX7Rw8sFYRs+IpUJ8h7LyJ?= =?us-ascii?Q?7Su+1S6ZQaQ2WZoxWIRRMYTCw10YLh/WPwJ3PLrtipexAbELj5n+9K6z0256?= =?us-ascii?Q?iwHtiXEAtjPWe0o1j717nrzlWzEfMccvzK4NiBm9jgDg0K7c/kM51ugaB+0n?= =?us-ascii?Q?B8fozoJfGJPfk1Rz2/+chF78Gu6pbS+9APdH4/VJ6N9kwT58IxEUfBV7yO0l?= =?us-ascii?Q?vP56MfmhFJDC43WymG7DUG885o39OgDhdkQtYRiFYw1v819zSr/7d7cwk732?= =?us-ascii?Q?OHqy+BCFJ0rO7OXjc6+3vbSkFHw2Fmx8hchZ+pAs3NP6Qyq4vG+EvSTMgXWz?= =?us-ascii?Q?t5wm4HNyNmJ3M5GZsifAEZAADZ6LB9f3kB32MYAuU53s94x5qnch+jjUiieO?= =?us-ascii?Q?IxgWWK1mB3UGROIdbVm/v1rehaLRPEMhCaTyi6s2QR9cD76tKxvPhd7KToux?= =?us-ascii?Q?J85ehGUAszc0k0xNCu61NM26Lg72evsGJZeyuGze/1A0pf3o+d5in4ZAWyOY?= =?us-ascii?Q?K5ovzGVUirS2gU5jjxe355sWu/5tWZIwBPmN+oyxV8jHRk31B+p+PNVI5ppQ?= =?us-ascii?Q?Y67l7XPf1/4oHWs1GtSZqp+jpy5dOVMQ8ib2XWk20lNXmPr5fYNS76BjNHhP?= =?us-ascii?Q?5RxIHMkc7vbhhwlFHRUTOkXw2VdUFWTV+v5rAOmK2q8QgvjPEBX4+zACYxT5?= =?us-ascii?Q?ilFjtX4d8jCMBH5lxjXqMbo87XheDBjNJeUJtm2GG++pqO3ec2Pt+xqJb1xg?= =?us-ascii?Q?AD48dfaIC3+99nNEuuj68kLtUuywRETlFN9svrtsOGK0vwNporDc+Lox5Ay8?= =?us-ascii?Q?Q8pPqPjr5Tn430PqQK7Ke/MwWWFYnDIfJKEe+VuaqqvNCB46UXUSsgkJdlj5?= =?us-ascii?Q?ug0Itr05khuw1lBjLmt1cj9JFes1dwp8UK49tRrRYjnP0f7MGLXj4hmqU3Oo?= =?us-ascii?Q?lXq4WaubesBko8czHjr/t/7sl/vyhP3Ciog4m89MhbOSSA4omVyRPCX/CvUy?= =?us-ascii?Q?DsaJq8FFhctKbhLbYJiHQ7OzYNBfGe3PZwGeiFCvau9fzD55gsdsj?= X-Microsoft-Antispam-Message-Info: dwgzPZHOfXGhxKlhG63POtleaw0F50/loQssiwn/TfaWC4637jL7QWSdWNuPNkYc+7E3rD8cznEJkwvnGjH7rEYkuj+PK6w38KAAYiBRRpsJ+UiFhuIKgJfviYEwGJXiEr5xdw5KZJzn28Ugk9fw7U68BNps8Y5BqVBLjPOpoTUO/H2+0tOwkUf4fT/8gXcV X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2042; 6:owvHHfqQVCitJQym+22+WjX5P2enazv5yR8FwhQjbo1xv3Fo2m7a0GVjrLMcq1ln/MGSMRmo/4RW4/CsOhd9liZXvkll+E1fLKW/H//sc4HzF8MNV3y1XMbp2NzfE3KfHNqLJOgepqZEkrNv9GRqlZz0m15XYl2JAGWLyc/myJUFntKIj7GpzZ6NTlAdv6sTDy6kTNZCQuNlaWcviOsP35sFO6fNoXDKBNmFAuyoRBD745eWzvFgLu1gxf8w5KeNYMXF5q63rwW4J0tA/KCl2qnlfJCgJthDQ6+ouxiHVBOknrWAG8I39pW8Z2tvKUC0jGljLA8xkgEze3HnB4HbtW+kCD4W3qwh1SoUdZ5FcmPw8poQhlJ5tEaUbTsDK4ksCqbXTuWaO3kFmhCpcrwzkOiTRQtjePwpFFYL3EqciMBZCdOUr2bc/oqiCSuq4R2EeqhSHlVexzZXkheJwBsTFg==; 5:dtmQ/M0iFTq6T8Dt8pbIH4+0ea4DYsjViV/o9PHS7sk77oY35rL1nGteVajl4e+GWedwKi6TWql2nX/VVwqjJ69xTD+L/f9InCXDkrsMs5ZmpCRNEONqeNJyCmGY4fMmncLmt6N0I+9maKIPOcukFL0TeoDoXwf7owp7OSmwLh8=; 24:MtuJwbAGF7Ug3vNYdpF7u3ONm/a+ABmn2WuG5iAR8jFuH+HNKw7DB3ekWZy0qSBsMj5LJwv0VhO/LObdk0YTowIgR5MwyW13x1PDvMofb80= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2042; 7:ZsOO1iLj5HYPuq3dr1BnFfn8WJD+xWZtJdxt2xJsMwA1m6ZHQLgWfWaYk9aUpEOJbFIaX1f/U4FLC7K5TvWSLzMFp2BxpYa9cYhX+zSb9HA10cGRoVOGI2NykztljEgJJ3NxZa9NvE1JkhqwtDaJf7/M20jjvcoWRVHSdJkCo5NSOI7aU05nkp6bbo6081ymGUPuAThYGIS5HoZJNJEwVm1WddRx8Jyos2lDQ45Jw45EWRkfxhX1TmqpVfYSrrGS X-MS-Office365-Filtering-Correlation-Id: efba0e9d-7b06-4dbf-b036-08d59fced733 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Apr 2018 17:08:24.8641 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: efba0e9d-7b06-4dbf-b036-08d59fced733 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0501MB2042 Subject: Re: [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for flexible indirection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2018 17:08:29 -0000 On Wed, Apr 11, 2018 at 11:39:47AM +0000, Ananyev, Konstantin wrote: > > Hi Yongseok, > > > > > > > > > On Mon, Apr 09, 2018 at 06:04:34PM +0200, Olivier Matz wrote: > > > > > Hi Yongseok, > > > > > > > > > > On Tue, Apr 03, 2018 at 05:12:06PM -0700, Yongseok Koh wrote: > > > > > > On Tue, Apr 03, 2018 at 10:26:15AM +0200, Olivier Matz wrote: > > > > > > > Hi, > > > > > > > > > > > > > > On Mon, Apr 02, 2018 at 11:50:03AM -0700, Yongseok Koh wrote: > > > > > > > > When attaching a mbuf, indirect mbuf has to point to start of buffer of > > > > > > > > direct mbuf. By adding buf_off field to rte_mbuf, this becomes more > > > > > > > > flexible. Indirect mbuf can point to any part of direct mbuf by calling > > > > > > > > rte_pktmbuf_attach_at(). > > > > > > > > > > > > > > > > Possible use-cases could be: > > > > > > > > - If a packet has multiple layers of encapsulation, multiple indirect > > > > > > > > buffers can reference different layers of the encapsulated packet. > > > > > > > > - A large direct mbuf can even contain multiple packets in series and > > > > > > > > each packet can be referenced by multiple mbuf indirections. > > > > > > > > > > > > > > > > Signed-off-by: Yongseok Koh > > > > > > > > > > > > > > I think the current API is already able to do what you want. > > > > > > > > > > > > > > 1/ Here is a mbuf m with its data > > > > > > > > > > > > > > off > > > > > > > <--> > > > > > > > len > > > > > > > +----+ <----------> > > > > > > > | | > > > > > > > +-|----v----------------------+ > > > > > > > | | -----------------------| > > > > > > > m | buf | XXXXXXXXXXX || > > > > > > > | -----------------------| > > > > > > > +-----------------------------+ > > > > > > > > > > > > > > > > > > > > > 2/ clone m: > > > > > > > > > > > > > > c = rte_pktmbuf_alloc(pool); > > > > > > > rte_pktmbuf_attach(c, m); > > > > > > > > > > > > > > Note that c has its own offset and length fields. > > > > > > > > > > > > > > > > > > > > > off > > > > > > > <--> > > > > > > > len > > > > > > > +----+ <----------> > > > > > > > | | > > > > > > > +-|----v----------------------+ > > > > > > > | | -----------------------| > > > > > > > m | buf | XXXXXXXXXXX || > > > > > > > | -----------------------| > > > > > > > +------^----------------------+ > > > > > > > | > > > > > > > +----+ > > > > > > > indirect | > > > > > > > +-|---------------------------+ > > > > > > > | | -----------------------| > > > > > > > c | buf | || > > > > > > > | -----------------------| > > > > > > > +-----------------------------+ > > > > > > > > > > > > > > off len > > > > > > > <--><----------> > > > > > > > > > > > > > > > > > > > > > 3/ remove some data from c without changing m > > > > > > > > > > > > > > rte_pktmbuf_adj(c, 10) // at head > > > > > > > rte_pktmbuf_trim(c, 10) // at tail > > > > > > > > > > > > > > > > > > > > > Please let me know if it fits your needs. > > > > > > > > > > > > No, it doesn't. > > > > > > > > > > > > Trimming head and tail with the current APIs removes data and make the space > > > > > > available. Adjusting packet head means giving more headroom, not shifting the > > > > > > buffer itself. If m has two indirect mbufs (c1 and c2) and those are pointing to > > > > > > difference offsets in m, > > > > > > > > > > > > rte_pktmbuf_adj(c1, 10); > > > > > > rte_pktmbuf_adj(c2, 20); > > > > > > > > > > > > then the owner of c2 regard the first (off+20)B as available headroom. If it > > > > > > wants to attach outer header, it will overwrite the headroom even though the > > > > > > owner of c1 is still accessing it. Instead, another mbuf (h1) for the outer > > > > > > header should be linked by h1->next = c2. > > > > > > > > > > Yes, after these operations c1, c2 and m should become read-only. So, to > > > > > prepend headers, another mbuf has to be inserted before as you suggest. It > > > > > is possible to wrap this in a function rte_pktmbuf_clone_area(m, offset, > > > > > length) that will: > > > > > - alloc and attach indirect mbuf for each segment of m that is > > > > > in the range [offset : length+offset]. > > > > > - prepend an empty and writable mbuf for the headers > > > > > > > > > > > If c1 and c2 are attached with shifting buffer address by adjusting buf_off, > > > > > > which actually shrink the headroom, this case can be properly handled. > > > > > > > > > > What do you mean by properly handled? > > > > > > > > > > Yes, prepending data or adding data in the indirect mbuf won't override > > > > > the direct mbuf. But prepending data or adding data in the direct mbuf m > > > > > won't be protected. > > > > > > > > > > From an application point of view, indirect mbufs, or direct mbufs that > > > > > have refcnt != 1, should be both considered as read-only because they > > > > > may share their data. How an application can know if the data is shared > > > > > or not? > > > > > > > > > > Maybe we need a flag to differentiate mbufs that are read-only > > > > > (something like SHARED_DATA, or simply READONLY). In your case, if my > > > > > understanding is correct, you want to have indirect mbufs with RW data. > > > > > > > > Agree that indirect mbuf must be treated as read-only, Then the current code is > > > > enough to handle that use-case. > > > > > > > > > > And another use-case (this is my actual use-case) is to make a large mbuf have > > > > > > multiple packets in series. AFAIK, this will also be helpful for some FPGA NICs > > > > > > because it transfers multiple packets to a single large buffer to reduce PCIe > > > > > > overhead for small packet traffic like the Multi-Packet Rx of mlx5 does. > > > > > > Otherwise, packets should be memcpy'd to regular mbufs one by one instead of > > > > > > indirect referencing. > > > > > > But just to make HW to RX multiple packets into one mbuf, > > > data_off inside indirect mbuf should be enough, correct? > > Right. Current max buffer len of mbuf is 64kB (16bits) but it is enough for mlx5 > > to reach to 100Gbps with 64B traffic (149Mpps). I made mlx5 HW put 16 packets in > > a buffer. So, it needs ~32kB buffer. Having more bits in length fields would be > > better but 16-bit is good enough to overcome the PCIe Gen3 bottleneck in order > > to saturate the network link. > > There were few complains that 64KB max is a limitation for some use-cases. > I am not against increasing it, but I don't think we have free space on first cache-line for that > without another big rework of mbuf layout. > Considering that we need to increase size for buf_len, data_off, data_len, and probably priv_size too. > > > > > > As I understand, what you'd like to achieve with this new field - > > > ability to manipulate packet boundaries after RX, probably at upper layer. > > > As Olivier pointed above, that doesn't sound as safe approach - as you have multiple > > > indirect mbufs trying to modify same direct buffer. > > > > I agree that there's an implication that indirect mbuf or mbuf having refcnt > 1 > > is read-only. What that means, all the entities which own such mbufs have to be > > aware of that and keep the principle as DPDK can't enforce the rule and there > > can't be such sanity check. In this sense, HW doesn't violate it because the > > direct mbuf is injected to HW before indirection. When packets are written by > > HW, PMD attaches indirect mbufs to the direct mbuf and deliver those to > > application layer with freeing the original direct mbuf (decrement refcnt by 1). > > So, HW doesn't touch the direct buffer once it reaches to upper layer. > > Yes, I understand that. But as I can see you introduced functions to adjust head and tail, > which implies that it should be possible by some entity (upper layer?) to manipulate these > indirect mbufs. > And we don't know how exactly it will be done. That's a valid concern. I can make it private by merging into the _attach_to() func, or I just can add a comment in the API doc. However, if users are aware that a mbuf is read-only and we expect them to keep it intact by their own judgement, they would/should not use those APIs. We can't stop them modifying content or the buffer itself anyway. Will add more comments of this discussion regarding read-only mode. > > The direct buffer will be freed and get available for reuse when all the attached > > indirect mbufs are freed. > > > > > Though if you really need to do that, why it can be achieved by updating buf_len and priv_size > > > Fields for indirect mbufs, straight after attach()? > > > > Good point. > > Actually that was my draft (Mellanox internal) version of this patch :-) But I > > had to consider a case where priv_size is really given by user. Even though it > > is less likely, but if original priv_size is quite big, it can't cover entire > > buf_len. For this, I had to increase priv_size to 32-bit but adding another > > 16bit field (buf_off) looked more plausible. > > As I remember, we can't have mbufs bigger then 64K, > so priv_size + buf_len should be always less than 64K, correct? Can you let me know where I can find the constraint? I checked rte_pktmbuf_pool_create() and rte_pktmbuf_init() again to not make any mistake but there's no such limitation. elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + (unsigned)data_room_size; The max of data_room_size is 64kB, so is priv_size. m->buf_addr starts from 'm + sizeof(*m) + priv_size' and m->buf_len can't be larger than UINT16_MAX. So, priv_size couldn't be used for this purpose. Yongseok > > > > > > > > > > > > Does this make sense? > > > > > > > > > > I understand the need. > > > > > > > > > > Another option would be to make the mbuf->buffer point to an external > > > > > buffer (not inside the direct mbuf). This would require to add a > > > > > mbuf->free_cb. See "Mbuf with external data buffer" (page 19) in [1] for > > > > > a quick overview. > > > > > > > > > > [1] > > > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdksummit.com%2FArchive%2Fpdf%2F2016Userspace%2FDay01 > > > > -Session05-OlivierMatz- > > > > > > Userspace2016.pdf&data=02%7C01%7Cyskoh%40mellanox.com%7Ca5405edb36e445e6540808d59e339a38%7Ca652971c7d2e4d9ba6a4d > > > > 149256f461b%7C0%7C0%7C636588866861082855&sdata=llw%2BwiY5cC56naOUhBbIg8TKtfFN6VZcIRY5PV7VqZs%3D&reserved=0 > > > > > > > > > > The advantage is that it does not require the large data to be inside a > > > > > mbuf (requiring a mbuf structure before the buffer, and requiring to be > > > > > allocated from a mempool). On the other hand, it is maybe more complex > > > > > to implement compared to your solution. > > > > > > > > I knew that you presented the slides and frankly, I had considered that option > > > > at first. But even with that option, metadata to store refcnt should also be > > > > allocated and managed anyway. Kernel also maintains the skb_shared_info at the > > > > end of the data segment. Even though it could have smaller metadata structure, > > > > I just wanted to make full use of the existing framework because it is less > > > > complex as you mentioned. Given that you presented the idea of external data > > > > buffer in 2016 and there hasn't been many follow-up discussions/activities so > > > > far, I thought the demand isn't so big yet thus I wanted to make this patch > > > > simpler. I personally think that we can take the idea of external data seg when > > > > more demands come from users in the future as it would be a huge change and may > > > > break current ABI/API. When the day comes, I'll gladly participate in the > > > > discussions and write codes for it if I can be helpful. > > > > > > > > Do you think this patch is okay for now? > > > > > > > > > > > > Thanks for your comments, > > > > Yongseok