From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50052.outbound.protection.outlook.com [40.107.5.52]) by dpdk.org (Postfix) with ESMTP id 4E5708BA7 for ; Wed, 25 Apr 2018 20:30:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=LKH1imz20HPWkdt4IL/8rxxcXobFOn05l6ZffK+Oo6s=; b=qnuXL8N67c/4aqEBswrKsjpW0zK+awawxC8SX6OaiXrod1irajFYCyL+ozKUJ40524wl7GzOWPK8YDL0Hf21HGYoVUMnZ9iJFfuzmjGCaU4p/+u6EixlNmxfFzkuKMwDM4iu1T4ugvYhEwQlQA7ZC83/hf2xxJc0rmq/48VlBKk= Received: from VI1PR0501MB2045.eurprd05.prod.outlook.com (10.167.195.147) by VI1PR0501MB2173.eurprd05.prod.outlook.com (10.169.134.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.696.13; Wed, 25 Apr 2018 18:30:11 +0000 Received: from VI1PR0501MB2045.eurprd05.prod.outlook.com ([fe80::5453:e317:6563:11b3]) by VI1PR0501MB2045.eurprd05.prod.outlook.com ([fe80::5453:e317:6563:11b3%13]) with mapi id 15.20.0696.019; Wed, 25 Apr 2018 18:30:11 +0000 From: Yongseok Koh To: "Ananyev, Konstantin" CC: "Lu, Wenzhuo" , "Wu, Jingjing" , "olivier.matz@6wind.com" , "dev@dpdk.org" , "arybchenko@solarflare.com" , "stephen@networkplumber.org" , Thomas Monjalon , "Adrien Mazarguil" , =?iso-8859-1?Q?N=E9lio_Laranjeiro?= Thread-Topic: [dpdk-dev] [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf Thread-Index: AQHT3ECoB6uQuxGqukihULWWfew6/6QRev0A///GtACAAHoDAP//lZ+AgAAFhoCAAHeGgA== Date: Wed, 25 Apr 2018 18:30:11 +0000 Message-ID: <096E1626-EC4D-44A2-AA67-779499FC335B@mellanox.com> References: <20180310012532.15809-1-yskoh@mellanox.com> <20180425025341.10590-1-yskoh@mellanox.com> <2601191342CEEE43887BDE71AB977258AEBCF98C@IRSMSX102.ger.corp.intel.com> <20180425170638.GB3268@yongseok-MBP.local> <2601191342CEEE43887BDE71AB977258AEBCFD59@IRSMSX102.ger.corp.intel.com> <20180425180235.GD3268@yongseok-MBP.local> <20180425182221.GE3268@yongseok-MBP.local> In-Reply-To: <20180425182221.GE3268@yongseok-MBP.local> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [209.116.155.178] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR0501MB2173; 7:5s4JdH7S+OCQJldhDOmihLLlLKGTkdbOLusKz6UYlJ4UnjZCOhuEAt5sDv/Ez3u3PhDD7iWLzLyRzdJ8gI24VDlpwwlELV5I+A51bRxtyoJKvp3YnsrxVvBUQZPfsPP7m0vR695cnDhGZ3F+2i0GQId0QpIezgHuN7rh1AyhLmL3ieXqqpDiKnK4efLzguz6J7BjWt7+XHUsZWMGuHQreX/TqPV1mMINDXNYKlU+KA4izxxz2uP0cLAhZ6S5/2jJ x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:VI1PR0501MB2173; x-ms-traffictypediagnostic: VI1PR0501MB2173: authentication-results: spf=none (sender IP is ) smtp.mailfrom=yskoh@mellanox.com; x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(228905959029699)(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(3231232)(944501410)(52105095)(6055026)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(20161123564045)(20161123558120)(6072148)(201708071742011); SRVR:VI1PR0501MB2173; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0501MB2173; x-forefront-prvs: 06530126A4 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(39380400002)(366004)(396003)(39860400002)(199004)(189003)(13464003)(81166006)(2616005)(105586002)(68736007)(106356001)(81156014)(186003)(8936002)(36756003)(102836004)(8676002)(5660300001)(7736002)(6436002)(6486002)(93886005)(59450400001)(26005)(66066001)(316002)(33656002)(76176011)(2906002)(229853002)(99286004)(478600001)(54906003)(3660700001)(82746002)(53546011)(3280700002)(7416002)(6506007)(6916009)(5250100002)(97736004)(86362001)(14454004)(486006)(25786009)(83716003)(2900100001)(5890100001)(53936002)(305945005)(4326008)(11346002)(3846002)(476003)(446003)(6116002)(6512007)(6246003); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0501MB2173; H:VI1PR0501MB2045.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: mLdLGS9DrnvzXRW48YLQ+pPLDGXbqeVxSA18x9NpnYkxMxgtH2uxd2JMylpPEZg9iad1m6+6xa5mrF/TCraOi7SzMd4Tugg4DD8DQO3wOKSwxj2xglzVOBtVjQzsTut7bLfA9hGHqlA4iVwnisQIca3W1U4FLubA5tuKCmpNvZnEp75eiHebH/LqfL8N5y8J spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-ID: <08DD10AC1A0D1545B21D5F306F395167@eurprd05.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 61b8c69e-54e5-4853-4997-08d5aada9473 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 61b8c69e-54e5-4853-4997-08d5aada9473 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Apr 2018 18:30:11.1562 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0501MB2173 Subject: Re: [dpdk-dev] [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Apr 2018 18:30:14 -0000 > On Apr 25, 2018, at 11:22 AM, Yongseok Koh wrote: >=20 > On Wed, Apr 25, 2018 at 11:02:36AM -0700, Yongseok Koh wrote: >> On Wed, Apr 25, 2018 at 05:23:20PM +0000, Ananyev, Konstantin wrote: >>>=20 >>>=20 >>>> -----Original Message----- >>>> From: Yongseok Koh [mailto:yskoh@mellanox.com] >>>> Sent: Wednesday, April 25, 2018 6:07 PM >>>> To: Ananyev, Konstantin >>>> Cc: Lu, Wenzhuo ; Wu, Jingjing ; olivier.matz@6wind.com; dev@dpdk.org; >>>> arybchenko@solarflare.com; stephen@networkplumber.org; thomas@monjalon= .net; adrien.mazarguil@6wind.com; >>>> nelio.laranjeiro@6wind.com >>>> Subject: Re: [PATCH v5 1/2] mbuf: support attaching external buffer to= mbuf >>>>=20 >>>> On Wed, Apr 25, 2018 at 01:31:42PM +0000, Ananyev, Konstantin wrote: >>>> [...] >>>>>> /** Mbuf prefetch */ >>>>>> #define RTE_MBUF_PREFETCH_TO_FREE(m) do { \ >>>>>> if ((m) !=3D NULL) \ >>>>>> @@ -1213,11 +1306,127 @@ static inline int rte_pktmbuf_alloc_bulk(st= ruct rte_mempool *pool, >>>>>> } >>>>>>=20 >>>>>> /** >>>>>> + * Attach an external buffer to a mbuf. >>>>>> + * >>>>>> + * User-managed anonymous buffer can be attached to an mbuf. When a= ttaching >>>>>> + * it, corresponding free callback function and its argument should= be >>>>>> + * provided. This callback function will be called once all the mbu= fs are >>>>>> + * detached from the buffer. >>>>>> + * >>>>>> + * The headroom for the attaching mbuf will be set to zero and this= can be >>>>>> + * properly adjusted after attachment. For example, ``rte_pktmbuf_a= dj()`` >>>>>> + * or ``rte_pktmbuf_reset_headroom()`` can be used. >>>>>> + * >>>>>> + * More mbufs can be attached to the same external buffer by >>>>>> + * ``rte_pktmbuf_attach()`` once the external buffer has been attac= hed by >>>>>> + * this API. >>>>>> + * >>>>>> + * Detachment can be done by either ``rte_pktmbuf_detach_extbuf()``= or >>>>>> + * ``rte_pktmbuf_detach()``. >>>>>> + * >>>>>> + * Attaching an external buffer is quite similar to mbuf indirectio= n in >>>>>> + * replacing buffer addresses and length of a mbuf, but a few diffe= rences: >>>>>> + * - When an indirect mbuf is attached, refcnt of the direct mbuf w= ould be >>>>>> + * 2 as long as the direct mbuf itself isn't freed after the atta= chment. >>>>>> + * In such cases, the buffer area of a direct mbuf must be read-o= nly. But >>>>>> + * external buffer has its own refcnt and it starts from 1. Unles= s >>>>>> + * multiple mbufs are attached to a mbuf having an external buffe= r, the >>>>>> + * external buffer is writable. >>>>>> + * - There's no need to allocate buffer from a mempool. Any buffer = can be >>>>>> + * attached with appropriate free callback and its IO address. >>>>>> + * - Smaller metadata is required to maintain shared data such as r= efcnt. >>>>>> + * >>>>>> + * @warning >>>>>> + * @b EXPERIMENTAL: This API may change without prior notice. >>>>>> + * Once external buffer is enabled by allowing experimental API, >>>>>> + * ``RTE_MBUF_DIRECT()`` and ``RTE_MBUF_INDIRECT()`` are no longer >>>>>> + * exclusive. A mbuf can be considered direct if it is neither indi= rect nor >>>>>> + * having external buffer. >>>>>> + * >>>>>> + * @param m >>>>>> + * The pointer to the mbuf. >>>>>> + * @param buf_addr >>>>>> + * The pointer to the external buffer we're attaching to. >>>>>> + * @param buf_iova >>>>>> + * IO address of the external buffer we're attaching to. >>>>>> + * @param buf_len >>>>>> + * The size of the external buffer we're attaching to. If memory = for >>>>>> + * shared data is not provided, buf_len must be larger than the s= ize of >>>>>> + * ``struct rte_mbuf_ext_shared_info`` and padding for alignment.= If not >>>>>> + * enough, this function will return NULL. >>>>>> + * @param shinfo >>>>>> + * User-provided memory for shared data. If NULL, a few bytes in = the >>>>>> + * trailer of the provided buffer will be dedicated for shared da= ta and >>>>>> + * the shared data will be properly initialized. Otherwise, user = must >>>>>> + * initialize the content except for free callback and its argume= nt. The >>>>>> + * pointer of shared data will be stored in m->shinfo. >>>>>> + * @param free_cb >>>>>> + * Free callback function to call when the external buffer needs = to be >>>>>> + * freed. >>>>>> + * @param fcb_opaque >>>>>> + * Argument for the free callback function. >>>>>> + * >>>>>> + * @return >>>>>> + * A pointer to the new start of the data on success, return NULL >>>>>> + * otherwise. >>>>>> + */ >>>>>> +static inline char * __rte_experimental >>>>>> +rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, >>>>>> + rte_iova_t buf_iova, uint16_t buf_len, >>>>>> + struct rte_mbuf_ext_shared_info *shinfo, >>>>>> + rte_mbuf_extbuf_free_callback_t free_cb, void *fcb_opaque) >>>>>> +{ >>>>>> + /* Additional attachment should be done by rte_pktmbuf_attach() */ >>>>>> + RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m)); >>>>>=20 >>>>> Shouldn't we have here something like: >>>>> RTE_ASSERT(RTE_MBUF_DIRECT(m) && rte_mbuf_refcnt_read(m) =3D=3D 1); >>>>> ? >>>>=20 >>>> Right. That's better. Attaching mbuf should be direct and writable. >>>>=20 >>>>>> + >>>>>> + m->buf_addr =3D buf_addr; >>>>>> + m->buf_iova =3D buf_iova; >>>>>> + >>>>>> + if (shinfo =3D=3D NULL) { >>>>>=20 >>>>> Instead of allocating shinfo ourselves - wound's it be better to rely >>>>> on caller always allocating afeeling it for us (he can do that at the= end/start of buffer, >>>>> or whenever he likes to. >>>>=20 >>>> It is just for convenience. For some users, external attachment could = be >>>> occasional and casual, e.g. punt control traffic from kernel/hv. For s= uch >>>> non-serious cases, it is good to provide this small utility. >>>=20 >>> For such users that small utility could be a separate function then: >>> shinfo_inside_buf() or so. >>=20 >> I like this idea! As this is an inline function and can be called in a d= atapath, >> shorter code is better if it isn't expected to be used frequently. >>=20 >> Will take this idea for the new version. Thanks. >=20 > However, if this API is called with shinfo=3DNULL (builtin constant), thi= s code > block won't get included in compile time because it is an inline function= . Sorry, it was wrong. I said the exact opposite. Not enough sleep theses day= s. :-( If shinfo is passed, the code block will be included anyway. Please disregard the email. Yongseok >=20 > What is disadvantage to keep this block here? More intuitive? >=20 > Advantage of keeping it here could be simplicity. No need to call the uti= lity in > advance. >=20 > Or separating this code to another inline function could make the API pro= totype > simpler because free_cb and its arg should be passed via shinfo. >=20 > static inline char * __rte_experimental > rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, > rte_iova_t buf_iova, uint16_t buf_len, > struct rte_mbuf_ext_shared_info *shinfo) >=20 > I'm still inclined to write the utility function like you suggested. > Thoughts? >=20 > Thanks, > Yongseok >=20 >>>>> Again in that case - caller can provide one shinfo to several mbufs (= with different buf_addrs) >>>>> and would know for sure that free_cb wouldn't be overwritten by mista= ke. >>>>> I.E. mbuf code will only update refcnt inside shinfo. >>>>=20 >>>> I think you missed the discussion with other people yesterday. This ch= ange is >>>> exactly for that purpose. Like I documented above, if this API is call= ed with >>>> shinfo being provided, it will use the user-provided shinfo instead of= sparing a >>>> few byte in the trailer and won't touch the shinfo. >>>=20 >>> As I can see your current code always update free_cb and fcb_opaque. >>> Which is kind of strange these fields shold be the same for all instanc= es of the shinfo.