From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A02BEA04B7; Wed, 14 Oct 2020 15:55:30 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6C3F01DE38; Wed, 14 Oct 2020 15:55:29 +0200 (CEST) Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by dpdk.org (Postfix) with ESMTP id 1294B1DE35 for ; Wed, 14 Oct 2020 15:55:28 +0200 (CEST) Received: by mail-wr1-f66.google.com with SMTP id g12so3944775wrp.10 for ; Wed, 14 Oct 2020 06:55:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=23QFrrCWn5qfUoqaYKeYrsCM6oe297v9Yl61WVN5qg0=; b=Z6gHIhSScorYqheCW08pJXg8wvX0BT8nVTTeFkx4PKLudDTh8BwNxwT708YEEQ9KPB iPr0HPWdApwNIcG3xUvNZNMw3JK7abU8ehfpsPN1sue1EGz03N9xzUDqZyh5q644Eyco DmKT30A98BRa30vbiUGyCLVAzVfsSFfPAwYwKxFcHXRTk8e1j5m+M6kNQdvtfVa5xfQg v9gMLbXeEDSkRttuNBLpiA7H82l/+QyodX8FRbRPeutvSOtHfukYfRtmfXb0zi+wsuoL +0tBxeEJ6uCUICQzBOMA37yMw119V4mNAkfIjcoxHNv83dBXVE0oDK9WSMgeoiF1O/Tp 81mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=23QFrrCWn5qfUoqaYKeYrsCM6oe297v9Yl61WVN5qg0=; b=eVsenpsOvAGPa68gkjehk2h7GBPYVpK9Hj0VEsemEsTSmWG7sCvOIhBNpY5tVNglU+ 6pBZ3OoJLR0XOg+oCqZrHQxm7u6cLpnu4bdz7CGhWD2uqyZl/6Lx6fsN/WdVVMoQxvCl 0l7z0WOTVrln96MlPmgZbnA/uII+aJlkGn7l1KEIoDlIJeNJdxD6/+ilTWFvxCO/LKjD kPT0pYfyIZ4SkAL+nrqBhvUfwQWZgEmgyaUTp1Y5rLc4H/Jta93oiv2XO5W4XE+Ne8O3 cJ6rL59q4jxsIHPjbSyGbEo7hveWtheyJK39lqnwqk1iKy2IOsjrn2it9Pky4b6ybdbN z/Wg== X-Gm-Message-State: AOAM530cKR6tNvDIvWdUVRf09f0JlY1ljJnD8ssVNDah/BDBHlk4GNTq PyPGZUSszbjYB2xtisIDDUtsZg== X-Google-Smtp-Source: ABdhPJzEgvUHhfH8u97+3bFBepzYd7lWF29QqNWv0u6bq0nH5uR7A91+qEMQTbTZ1cMYJ3GWSQBY+g== X-Received: by 2002:adf:bc0f:: with SMTP id s15mr5754701wrg.83.1602683726580; Wed, 14 Oct 2020 06:55:26 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id c132sm3866993wmf.25.2020.10.14.06.55.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Oct 2020 06:55:25 -0700 (PDT) Date: Wed, 14 Oct 2020 15:55:25 +0200 From: Olivier Matz To: yang_y_yi Cc: dev@dpdk.org, jiayu.hu@intel.com, thomas@monjalon.net, yangyi01@inspur.com Message-ID: <20201014135525.GS21395@platinum> References: <3cf82e61.145c.173b1ed87ce.Coremail.yang_y_yi@163.com> <20200803081139.GK5869@platinum> <7584d005.5305.173b3b3389f.Coremail.yang_y_yi@163.com> <20200803123425.GM5869@platinum> <7e80eaff.148a.173b71824b0.Coremail.yang_y_yi@163.com> <76308d9c.2b0b.174ce214814.Coremail.yang_y_yi@163.com> <20201007094821.GP21395@platinum> <95e7dd5.68f2.1750cc5aee0.Coremail.yang_y_yi@163.com> <20201009115525.GC21395@platinum> <6d9c1522.127d.1751032f37a.Coremail.yang_y_yi@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <6d9c1522.127d.1751032f37a.Coremail.yang_y_yi@163.com> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH V1 2/3] mbuf: change free_cb interface to adapt to GSO case X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, On Sat, Oct 10, 2020 at 09:49:35AM +0800, yang_y_yi wrote: > Olivier, thank you so much for helping figure out this, it does work > as the code you changed, so we can fix the issue without this patch > series. My question is can it also work for normal mbuf or indirect > mbuf? (I mean pkt is direct mbuf or indirect mbuf) Yes, it works for any mbuf type for pkt (direct, indirect, ext). Happy to see it solves your issue! Regards, Olivier > At 2020-10-09 19:55:25, "Olivier Matz" wrote: > >Hi, > > > >On Fri, Oct 09, 2020 at 05:51:23PM +0800, yang_y_yi wrote: > >> Olivier, thank you so much for your reply, your patch post for vhost h= elp me understand your concern better, I totally agree. For GSO case, let m= e show you a simple code to explain my issue. > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >> struct rte_mbuf *pkt =3D rte_pktmbuf_alloc(mp); > >> virtio_dev_extbuf_alloc(pkt, data_len) > >> struct rte_mbuf * pkt_seg1 =3D rte_pktmbuf_alloc(indirect_pool); > >>=20 > >>=20 > >> rte_pktmbuf_attach(pkt_seg1, pkt); > >> rte_mbuf_ext_refcnt_update(pkt, -1); > >>=20 > >> struct rte_mbuf * pkt_seg2 =3D rte_pktmbuf_alloc(indirect_pool); > >> rte_pktmbuf_attach(pkt_seg2, pkt); > >> rte_mbuf_ext_refcnt_update(pkt, -1); > >> struct rte_mbuf *pkt_segs[2] =3D {pkt_seg1, pkt_seg2}; > >>=20 > >> rte_eth_tx_burst(dev->port_id, qid, pkt_segs, 2); > >>=20 > >>=20 > >> Is it a simple test you expect? The issue here is nobody explicitly ca= lls > >> rte_pktmbuf_free(pkt), rte_pktmbuf_free(pkt_segX) in PMD driver won't = free > >> "pkt", Is it clear to you now? > > > > > >Thank you for the small code. Yes, this is what I expected. > > > >The proper way to do this is something like this: > > > > /* create a mbuf, and attach it to an external buffer */ > > struct rte_mbuf *pkt =3D rte_pktmbuf_alloc(mp); > > virtio_dev_extbuf_alloc(pkt, data_len) > > > > /* create a new mbuf, attach it to the previous one: the resulting > > * mbuf is also an "external mbuf" (is has the EXT_ATTACHED_MBUF > > * flag, and its data is stored in the ext buffer. > > * See an example here: https://www.droids-corp.org/~zer0/ext-mbuf.svg > > */ > > struct rte_mbuf *pkt_seg1 =3D rte_pktmbuf_alloc(indirect_pool); > > rte_pktmbuf_attach(pkt_seg1, pkt); > > > > /* do the same another time */ > > struct rte_mbuf *pkt_seg2 =3D rte_pktmbuf_alloc(indirect_pool); > > rte_pktmbuf_attach(pkt_seg2, pkt); > > > > /* release the original pkt, we don't need it anymore */ > > rte_pktmbuf_free(pkt); > > > > /* send the new segments, they will be freed by the driver once > > * Tx is done. When the last packet referencing the external buffer > > * is freed, the free callback of the buffer will be invoked. */ > > struct rte_mbuf *pkt_segs[2] =3D {pkt_seg1, pkt_seg2}; > > rte_eth_tx_burst(dev->port_id, qid, pkt_segs, 2); > > > >Hope this is clearer now. > > > >Regards, > >Olivier > > > > > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >>=20 > >> At 2020-10-07 17:48:21, "Olivier Matz" wrote: > >> >Hi, > >> > > >> >On Sun, Sep 27, 2020 at 01:55:21PM +0800, yang_y_yi wrote: > >> >> Per GSO requirement, this is a must-have change, Jiayu, can you hel= p review > >> >> this series? > >> > > >> >I can't ack this patch until I have a simple and clear test case (only > >> >with mbuf functions, without GSO or vhost) showing the issue we have > >> >today with current. > >> > > >> >> Olivier, if you used the old interface, maybe you need to change yo= ur code to > >> >> adapt this, I don't think we can have a better way to handle GSO ca= se. > >> > > >> >Sorry, I don't get your point. What do I need to change in which code? > >> > > >> >(some more comments below) > >> > > >> >> At 2020-08-04 09:31:19, "yang_y_yi" wrote: > >> >>=20 > >> >> At 2020-08-03 20:34:25, "Olivier Matz" wro= te: > >> >> >On Mon, Aug 03, 2020 at 05:42:13PM +0800, yang_y_yi wrote: > >> >> >> At 2020-08-03 16:11:39, "Olivier Matz" = wrote: > >> >> >> >On Mon, Aug 03, 2020 at 09:26:40AM +0800, yang_y_yi wrote: > >> >> >> >> At 2020-08-03 04:29:07, "Olivier Matz" wrote: > >> >> >> >> >Hi, > >> >> >> >> > > >> >> >> >> >On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote: > >> >> >> >> >>=20 > >> >> >> >> >>=20 > >> >> >> >> >> At 2020-07-31 23:15:43, "Olivier Matz" wrote: > >> >> >> >> >> >Hi, > >> >> >> >> >> > > >> >> >> >> >> >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y_yi@163.c= om wrote: > >> >> >> >> >> >> From: Yi Yang > >> >> >> >> >> >>=20 > >> >> >> >> >> >> In GSO case, segmented mbufs are attached to original > >> >> >> >> >> >> mbuf which can't be freed when it is external. The issue > >> >> >> >> >> >> is free_cb doesn't know original mbuf and doesn't free > >> >> >> >> >> >> it when refcnt of shinfo is 0. > >> >> >> >> >> >>=20 > >> >> >> >> >> >> Original mbuf can be freed by rte_pktmbuf_free segmented > >> >> >> >> >> >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of > >> >> >> >> >> >> cases should have different behaviors. free_cb won't > >> >> >> >> >> >> explicitly call rte_pktmbuf_free to free original mbuf > >> >> >> >> >> >> if it is freed by rte_pktmbuf_free original mbuf, but it > >> >> >> >> >> >> has to call rte_pktmbuf_free to free original mbuf if it > >> >> >> >> >> >> is freed by rte_pktmbuf_free segmented mbufs. > >> >> >> >> >> >>=20 > >> >> >> >> >> >> In order to fix this issue, free_cb interface has to be= en > >> >> >> >> >> >> changed, __rte_pktmbuf_free_extbuf must deliver called > >> >> >> >> >> >> mbuf pointer to free_cb, argument opaque can be defined > >> >> >> >> >> >> as a custom struct by user, it can includes original mb= uf > >> >> >> >> >> >> pointer, user-defined free_cb can compare caller mbuf w= ith > >> >> >> >> >> >> mbuf in opaque struct, free_cb should free original mbuf > >> >> >> >> >> >> if they are not same, this corresponds to rte_pktmbuf_f= ree > >> >> >> >> >> >> segmented mbufs case, otherwise, free_cb won't free ori= ginal > >> >> >> >> >> >> mbuf because the caller explicitly called rte_pktmbuf_f= ree > >> >> >> >> >> >> to free it. > >> >> >> >> >> >>=20 > >> >> >> >> >> >> Here is pseduo code to show two kind of cases. > >> >> >> >> >> >>=20 > >> >> >> >> >> >> case 1. rte_pktmbuf_free segmented mbufs > >> >> >> >> >> >>=20 > >> >> >> >> >> >> nb_tx =3D rte_gso_segment(original_mbuf, /* original mb= uf */ > >> >> >> >> >> >> &gso_ctx, > >> >> >> >> >> >> /* segmented mbuf */ > >> >> >> >> >> >> (struct rte_mbuf **)&gso_mbufs, > >> >> >> >> >> >> MAX_GSO_MBUFS); > >> >> >> >> >> > > >> >> >> >> >> >I'm sorry but it is not very clear to me what operations = are done by > >> >> >> >> >> >rte_gso_segment(). > >> >> >> >> >> > > >> >> >> >> >> >In the current code, I only see calls to rte_pktmbuf_atta= ch(), > >> >> >> >> >> >which do not deal with external buffers. Am I missing som= ething? > >> >> >> >> >> > > >> >> >> >> >> >Are you able to show the issue only with mbuf functions? = It would > >> >> >> >> >> >be helpful to understand what does not work. > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> >Thanks, > >> >> >> >> >> >Olivier > >> >> >> >> >> > > >> >> >> >> >> Oliver, thank you for comment, let me show you why it does= n't work for my use case. In OVS DPDK, VM uses vhostuserclient to send lar= ge packets whose size is about 64K because we enabled TSO & UFO, these larg= e packets use rte_mbufs allocated by DPDK virtio_net function=20 > >> >> >> >> >> virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virti= o_net.c. Please refer to [PATCH V1 3/3], I changed free_cb as below, these = packets use the same allocate function and the same free_cb no matter they = are TCP packet or UDP packets, in case of VXLAN TSO, most NICs can't suppor= t inner UDP fragment offload, so OVS DPDK has to do it by software, for UDP= case, the original rte_mbuf only can be freed by segmented rte_mbufs which= are output packets of rte_gso_segment, i.e. the original rte_mbuf only can= freed by free_cb, you can see, it explicitly called rte_pktmbuf_free(arg->= mbuf), the condition statement "if (caller_m !=3D arg->mbuf)" is true for t= his case, this has no problem, but for TCP case, the original mbuf is deliv= ered to rte_eth_tx_burst() but not segmented rte_mbufs output by rte_gso_se= gment, PMD driver will call rte_pktmbuf_free(original_rte_mbuf) but not rte= _pktmbuf_free(segmented_rte_mbufs), the same free_cb will be called, that m= eans original_rte_mbuf will be freed twice, you know what will happen, this= is just the issue I'm fixing. I bring in caller_m argument, it can help wo= rk around this because caller_m is arg->mbuf and the condition statement "i= f (caller_m !=3D arg->mbuf)" is false, you can't fix it without the change = this patch series did. > >> >> >> >> > > >> >> >> >> >I'm sill not sure to get your issue. Please, if you have a s= imple test > >> >> >> >> >case using only mbufs functions (without virtio, gso, ...), = it would be > >> >> >> >> >very helpful because we will be sure that we are talking abo= ut the same > >> >> >> >> >thing. In case there is an issue, it can easily become a uni= t test. > >> >> >> >>=20 > >> >> >> >> Oliver, I think you don't get the point, free operation can't= be controlled by the application itself,=20 > >> >> >> >> it is done by PMD driver and triggered by rte_eth_tx_burst, I= have shown pseudo code, > >> >> >> >> rte_gso_segment just segments a large mbuf to multiple mbufs,= it won't send them, the application > >> >> >> >> will call rte_eth_tx_burst to send them finally. > >> >> >> >> > >> >> >> >> > > >> >> >> >> >That said, I looked at vhost mbuf allocation and gso segment= ation, and > >> >> >> >> >I found some strange things: > >> >> >> >> > > >> >> >> >> >1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to = create the > >> >> >> >> > ext mbuf. > >> >> >> >> > > >> >> >> >> > a/ The first one stores the shinfo struct in the mbuf, ba= sically > >> >> >> >> > like this: > >> >> >> >> > > >> >> >> >> > pkt =3D rte_pktmbuf_alloc(mp); > >> >> >> >> > shinfo =3D rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared= _info *); > >> >> >> >> > buf =3D rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > >> >> >> >> > shinfo->free_cb =3D virtio_dev_extbuf_free; > >> >> >> >> > shinfo->fcb_opaque =3D buf; > >> >> >> >> > rte_mbuf_ext_refcnt_set(shinfo, 1); > >> >> >> >> > > >> >> >> >> > I don't think it is a good idea, because there is no g= uarantee that > >> >> >> >> > the mbuf won't be freed before the buffer. For instanc= e, doing > >> >> >> >> > this will probably fail: > >> >> >> >> > > >> >> >> >> > pkt2 =3D rte_pktmbuf_alloc(mp); > >> >> >> >> > rte_pktmbuf_attach(pkt2, pkt); > >> >> >> >> > rte_pktmbuf_free(pkt); /* pkt is freed, but it contains sh= info ! */ > >> >> >> >>=20 > >> >> >> >> pkt is created by the application I can control, so I can con= trol it where it will be freed, right? > >> >> >> > > >> >> >> >This example shows that mbufs allocated like this by the vhost > >> >> >> >driver are not constructed correctly. If an application attach = a new > >> >> >> >packet (pkt2) to it and frees the original one (pkt), it may re= sult in a > >> >> >> >memory corruption. > >> >> >> > > >> >> >> >Of course, to be tested and confirmed. > >> >> >>=20 > >> >> >> No, attach will increase refcnt of shinfo, free_cb only is calle= d when refcnt of shinfo is decreased to > >> >> >> 0, isn't it? > >> >> > > >> >> >When pkt will be freed, it will decrement the shinfo refcnt, and > >> >> >after it will be 1. So the buffer won't be freed. After that, the > >> >> >mbuf pkt will be detached, and will return to the mbuf pool. It me= ans > >> >> >it can be reallocated, and the next user can overwrite shinfo which > >> >> >is still stored in the mbuf data. > >> >>=20 > >> >> I think this is an issue of DPDK itself, if external buffer in shin= fo is freed, shinfo should be set to NULL, if user will > >> >> overwrite it, he/she should use the same way as a new external buff= er is attached.=20 > >> > > >> >No, there is no issue in DPDK. The lifetime of shinfo should be at le= ast > >> >the same the lifetime of the buffer. If shinfo is stored in initial m= buf > >> >(called "pkt" in the example above), the mbuf and shinfo can be freed= while > >> >the buffer is still in use by another packet. > >> > > >> >> >I did a test to show it, see: > >> >> >http://git.droids-corp.org/?p=3Ddpdk.git;a=3Dcommitdiff;h=3Da61749= 4eeb01ff > >> >> > > >> >> >If you run the mbuf autotest, it segfaults. > >> >>=20 > >> >> I think your test is wrong, you're changing shinfo (which is being = used) in wrong way, if free_bc is NULL, it will be invalid. > >> > > >> >I'm changing the data of a newly allocated mbuf, it is perfectly lega= l. > >> >I happens that it points the the shinfo that is still in use. > >> > > >> > > >> >>=20 > >> >> static inline void > >> >> rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, > >> >> rte_iova_t buf_iova, uint16_t buf_len, > >> >> struct rte_mbuf_ext_shared_info *shinfo) > >> >> { > >> >> /* mbuf should not be read-only */ > >> >> RTE_ASSERT(RTE_MBUF_DIRECT(m) && rte_mbuf_refcnt_read(m) = =3D=3D 1); > >> >> RTE_ASSERT(shinfo->free_cb !=3D NULL); > >> >>=20 > >> >> For any shinfo operation, you should do it by rte_pktmbuf_attach_ex= tbuf, you can't change it at will after that. > >> >>=20 > >> >> > > >> >> > > >> >> >>=20 > >> >> >> > > >> >> >> >>=20 > >> >> >> >> > > >> >> >> >> > To do this properly, the mbuf refcnt should be increas= ed, and > >> >> >> >> > the mbuf should be freed in the callback. But I don't = think it's > >> >> >> >> > worth doing it, given the second path (b/) looks good = to me. > >> >> >> >> > > >> >> >> >> > b/ The second path stores the shinfo struct at the end of= the > >> >> >> >> > allocated buffer, like this: > >> >> >> >> > > >> >> >> >> > pkt =3D rte_pktmbuf_alloc(mp); > >> >> >> >> > buf_len +=3D sizeof(*shinfo) + sizeof(uintptr_t); > >> >> >> >> > buf_len =3D RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); > >> >> >> >> > buf =3D rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > >> >> >> >> > shinfo =3D rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, > >> >> >> >> > virtio_dev_extbuf_free, buf); > >> >> >> >> > > >> >> >> >> > I think this is correct, because we have the guarantee= that shinfo > >> >> >> >> > exists as long as the buffer exists. > >> >> >> >>=20 > >> >> >> >> What buffer does the allocated buffer you're saying here? The= issue we're discussing how we can > >> >> >> >> free original mbuf which owns shinfo buffer. > >> >> >> > > >> >> >> >I don't get your question. > >> >> >> > > >> >> >> >I'm just saying that this code path looks correct, compared to > >> >> >> >the previous one. > >> >> >>=20 > >> >> >> I think you're challenging principle of external mbuf, that isn'= t the thing I address. > >> >> > > >> >> >I'm not challenging anything, I'm saying there is a bug in this co= de, > >> >> >and the unit test above tends to confirm it. > >> >>=20 > >> >> If it is bug, you can post a patch to fix it, it isn't related wit= h my patches. But in my opinion, you don't > >> >> use it in correct way, I don't think it is a bug. > >> > > >> >I'll submit a patch for this. > >> > > >> >The point is you are testing GSO on top of wrongly-constructed mbufs,= so > >> >it would be safer for you to fix this before doing more tests. > >> > > >> > > >> >> >> >> >2/ in rte_gso_segment(), there is a loop like this: > >> >> >> >> > > >> >> >> >> > while (pkt_seg) { > >> >> >> >> > rte_mbuf_refcnt_update(pkt_seg, -1); > >> >> >> >> > pkt_seg =3D pkt_seg->next; > >> >> >> >> > } > >> >> >> >> > > >> >> >> >> > You change it to take in account the refcnt for ext mbufs. > >> >> >> >> > > >> >> >> >> > I may have missed something but I wonder why it is not si= mply: > >> >> >> >> > > >> >> >> >> > rte_pktmbuf_free(pkt_seg); > >> >> >> >> > > >> >> >> >> > It will decrease the proper refcnt, and free the mbufs if= they > >> >> >> >> > are not used. > >> >> >> >>=20 > >> >> >> >> Again, rte_gso_segment just decreases refcnt by one, this wil= l ensure the last segmented=20 > >> >> >> >> mbuf free will trigger freeing original mbuf (only free_cb ca= n do this). > >> >> >> > > >> >> >> >rte_pktmbuf_free() will also decerase the refcnt, and free the = resources > >> >> >> >when the refcnt reaches 0. > >> >> >> > > >> >> >> >It has some advantages compared to decrease the reference count= er of all > >> >> >> >segments: > >> >> >> > > >> >> >> >- no need to iterate the segments, there is only one function c= all > >> >> >> >- no need to have a special case for ext mbufs like you did in = your patch > >> >> >> >- it may be safer, in case some segments have a refcnt =3D=3D 1= , because > >> >> >> > resources will be freed. > >> >> >>=20 > >> >> >> For external mbuf, attach only increases refcnt of shinfo, refcn= t of mbuf won't be touched. For normal > >> >> >> mbuf, attach only increase refcnt of mbuf, no shinfo there, no r= efcnt of shinfo increased. > >> >> > > >> >> >I suppose rte_gso_segment() can take any mbuf type as input: stand= ard > >> >> >mbuf, indirect mbuf, ext mbuf, or even a mbuf chaing containing se= gments of > >> >> >different types. > >> >> > > >> >> >For instance, if you pass a chain of 2 mbufs: > >> >> >- the first one is a direct mbuf containing the IP/TCP headers (or= ig_hdr) > >> >> >- the second on is a mbuf pointing to an ext buffer (orig_payload) > >> >> > > >> >> >I expect that the resulting mbuf after calling gso contains a list= of mbufs > >> >> >like this: > >> >> >- a first segment containing the IP/TCP headers (new_hdr) > >> >> >- a payload segment pointing on the same ext buffer > >> >> > > >> >> >In theory, there is no reason that orig_hdr should be referenced by > >> >> >another new mbuf, because it only contains headers (no data). If t= hat's > >> >> >the case, its refcnt is 1, and decreasing it to 0 without freeing = it > >> >> >is a bug. > >> >>=20 > >> >> For this user scenario, orig_m is owner of external buffer, small s= egmented mbufs reference > >> >> it, you shouldn't free orig_m before all attached segmented mbufs a= re freed, isn't it? > >> > > >> >In this case, orig_hdr has to be freed because it is a direct mbuf (n= ot > >> >shared). > >> >The buffer pointed by orig_payload will be freed when all newly creat= ed > >> >segments are freed. > >> > > >> > > >> >> > > >> >> >Anyway, there is maybe no issue in that case, but I was just sugge= sting > >> >> >that using rte_pktmbuf_free() is easier to read, and safer than ma= nually > >> >> >decreasing the refcnt of each segment. > >> >> > > >> >> > > >> >> >> >> >Again, sorry if this is not the issue your are referring to,= but > >> >> >> >> >in this case I really think that having a simple example cod= e that > >> >> >> >> >shows the issue would help. > >> >> >> >>=20 > >> >> >> >> Oliver, my statement in the patch I sent out has pseudo code = to show this. I don't think a simple > >> >> >> >> unit test can show it. > >> >> >> > > >> >> >> >I don't see why. The PMDs and the libraries use the mbuf functi= ons, why > >> >> >> >a unit test couldn't call the same functions? > >> >> >> > > >> >> >> >> Let me summarize it here again. For original mbuf, there are = two cases freeing > >> >> >> >> it, case one is PMD driver calls free against segmented mbufs= , last segmented mbuf free will trigger > >> >> >> >> free_cb call which will free original large & extended mbuf. > >> >> >> > > >> >> >> >OK > >> >> >> > > >> >> >> >> Case two is PMD driver will call free against > >> >> >> >> original mbuf, that also will call free_cb to free attached e= xtended buffer. > >> >> >> > > >> >> >> >OK > >> >> >> > > >> >> >> >And what makes that case 1 or case 2 is executed? > >> >> >> > > >> >> >> >> In case one free_cb must call > >> >> >> >> rte_pktmbuf_free otherwise nobody will free original large & = extended mbuf, in case two free_cb can't=20 > >> >> >> >> call rte_pktmbuf_free because the caller calling it is just r= te_pktmbuf_free we need. That is to say, you > >> >> >> >> must use the same free_cb to handle these two cases, this is = my issue and the point you don't get. > >> >> >> > > >> >> >> >I think there is no need to change the free_cb API. It should w= ork like > >> >> >> >this: > >> >> >> > > >> >> >> >- virtio creates the original external mbuf (orig_m) > >> >> >> >- gso will create a new mbuf referencing the external buffer (n= ew_m) > >> >> >> > > >> >> >> >At this point, the shinfo has a refcnt of 2. The large buffer w= ill be > >> >> >> >freed as soon as rte_pktmbuf_free() is called on orig_m and new= _m, > >> >> >> >whatever the order. > >> >> >> > > >> >> >> >Regards, > >> >> >> >Olivier > >> >> >>=20 > >> >> >> Oliver, the reason it works is I changed free_cb API, case 1 doe= sn't know orig_m, how you make it free orig_m in free_cb. > >> >> >> The intention I change free_cb is to let it know orig_m, I saw O= VS DPDK ran out out buffers and orig_m isn't freed, that is why > >> >> >> I want to bring in this to fix the issue. Again, in case 1, nobo= dy explicitly calls ret_pktmbuf_free(orig_m) except free_cb I changed. > >> >> > > >> >> >If nobody calls ret_pktmbuf_free(orig_m), it is a problem. > >> >> >The free_cb is to free the buffer, not the mbuf. > >> >> > > >> >> >To me, it should work like this: > >> >> > > >> >> >1- virtio creates a mbuf attached to the ext buffer (the shinfo pl= acement > >> >> > bug should be fixed) > >> >> >2- gso create mbufs that reference the the same ext buf (by attach= ing the > >> >> > new mbuf) > >> >> >3- gso must free the original mbuf > >> >>=20 > >> >> This is impossible, segmented mbufs are referencing external buffer= in original mbuf, > >> >> how do you free it? As I said rte_gso_segment has no way to free it= , please tell me a way if > >> >> you know how to do this. > >> > > >> >As I said above, calling rte_mbuf_free(orig_m) will decrement the ref= erence > >> >counters on all segments. The segments will be returned to the pool i= f the > >> >refcnt reaches 0. > >> > > >> >>=20 > >> >> >4- the PMD transmits the new mbufs, and frees them > >> >> > > >> >> >Whatever 3- or 4- is executed first, at the end we are sure that: > >> >> >- all mbufs will be returned to the pool > >> >> >- the linear buffer will be freed when the refcnt reaches 0. > >> >> > > >> >> >If this is still unclear, please, write a unit test like I did > >> >> >above to show your issue. > >> >> > > >> >> >Regards, > >> >> >Olivier > >> >> > > >> >>=20 > >> >> The issue is in "3- gso must free the original mbuf", rte_pktmbuf_f= ree(segmented_mbus) can't do it, > >> >> rte_gso_segment is impossible to do it, only feasible point is free= _cb, please let me know if you have > >> >> a better way to free original mbuf and don't impact on segmented mb= ufs in PMD. My point is you must > >> >> have a place to call rte_pktmbuf_free(rogin_m) explicitly, otherwis= e it is impossible to return it to memory > >> >> pool, please point out where it can be called in my user scenario.= I don't care how it is done, I just care it can > >> >> fix my issue, please don't hesitate and send me a patch if you can,= thanks a lot. > >> > > >> >Sorry, but I don't see how I can be clearer to what I explained > >> >in my previous answer. > >> > > >> >Please, *provide a simple test example* without gso/vhost, and I can = help > >> >to make it work. > >> > > >> > > >> >Regards, > >> >Olivier > >> > > >> > > >> >> > > >> >> > > >> >> >> free_cb must handle case 1 and case 2 in the same code, for case= 1, caller_m is segmented new_m, for case 2, caller_m is orig_m. > >> >> >>=20 > >> >> >> loop in rte_gso_segement is handling original mbuf (this mbuf is= multi-mbuf and includes multiple mbufs which are linked by next > >> >> >> pointer), it isn't a problem at all. > >> >> >>=20 > >> >> >> Please show me code how you can fix my issue if you don't change= free_cb, thank you. > >> >> >>=20 > >> >> >> struct shinfo_arg { > >> >> >> void *buf; > >> >> >> struct rte_mbuf *mbuf; > >> >> >> }; > >> >> >>=20 > >> >> >> virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque) > >> >> >> { > >> >> >> struct shinfo_arg *arg =3D (struct shinfo_arg *)opaque; > >> >> >>=20 > >> >> >> rte_free(arg->buf); > >> >> >> if (caller_m !=3D arg->mbuf) > >> >> >> rte_pktmbuf_free(arg->mbuf); > >> >> >> rte_free(arg); > >> >> >> }