From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id AE8F3A0518; Sun, 2 Aug 2020 22:29:11 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F023D1C020; Sun, 2 Aug 2020 22:29:10 +0200 (CEST) Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by dpdk.org (Postfix) with ESMTP id 0383D34F3 for ; Sun, 2 Aug 2020 22:29:09 +0200 (CEST) Received: by mail-wm1-f66.google.com with SMTP id d190so12593940wmd.4 for ; Sun, 02 Aug 2020 13:29:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=DTGc4k/Ef9nKwuG8KyNGri09ZuaYV6xdQGuiAs3cd8o=; b=Kcxpr6UsJvPz2KXYdqbxf3mONvkwF/yxKUAeVvWgejH1IGWKzGaDvD6XehPjandc8O P3+CWitufz6xmGnLf6ZO13wM+5TQ7dAN+OvfbhS32UIN8LObgGRMkP3+8IvT+isNBQTo hvnh1aTuSci1MaOIARuYrJwTtggGJbE3561J0SyYxk4couTxce1BhziKAm1QCMoXwBRG 15rCaeinKAlON1YQSuWsGeD0OGyZ0gbVN4KD05cVI1IUl/ywIT+JyFL/VKfTG6K35jGh aaea78saarptSTdo5VseaarwguzjOM5vNYvPGh/KzwUUuIASe4ArP3lWGLNVrD6ofTjK snWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=DTGc4k/Ef9nKwuG8KyNGri09ZuaYV6xdQGuiAs3cd8o=; b=LTKRZnVrWwzIt3aP0uWTPlGByaW48pAea6AZwnxyihweUps3bXqNncWpk3/IfHnHI1 hDJYxmrSYoyXf0KK/nrv3BXLWUaW1IEGaY6jZkmJzfF67dFlbLANB/Gx3nZEH1gGFMKB pob5cMrmMB6t/OeC8kS+t/Vh90HJ2I6wRoezGFbG6/KFREzpcexQaMo1ToqFHJvGiruZ K2C8muitApO3o1xv3g37pQ/ETltMI4koA4Qf1Pt0gvFaub+ZoUOzQLTUTqnl5ulb6Bvr 8mFAFRiuZmMdd6/joD2ie5UO8ngC1ektxrPN4ehHpnD7iloZtjUSyzSRkMwa5naAROm8 jxgg== X-Gm-Message-State: AOAM530VKslH+JjF6ajASX/OXcFM29iYcRaUvAH2LdLp/csvepxjm1br GImJBrpycXBuNgCLifWyRaxxUg== X-Google-Smtp-Source: ABdhPJwxGJdgV5rJBImhFyzWEuyAxY1Bvn+4rskzzruPZipX2/EcK093+aje9Ya7VUmg4lqeJ8pxhQ== X-Received: by 2002:a1c:14e:: with SMTP id 75mr13627628wmb.151.1596400149427; Sun, 02 Aug 2020 13:29:09 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id r22sm13502428wmh.45.2020.08.02.13.29.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Aug 2020 13:29:08 -0700 (PDT) Date: Sun, 2 Aug 2020 22:29:07 +0200 From: Olivier Matz To: yang_y_yi Cc: dev@dpdk.org, jiayu.hu@intel.com, thomas@monjalon.net, yangyi01@inspur.com Message-ID: <20200802202907.GJ5869@platinum> References: <20200730120900.108232-1-yang_y_yi@163.com> <20200730120900.108232-3-yang_y_yi@163.com> <20200731151543.GH5869@platinum> <2a50e80c.44f.173ac4c6d20.Coremail.yang_y_yi@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <2a50e80c.44f.173ac4c6d20.Coremail.yang_y_yi@163.com> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH V1 2/3] mbuf: change free_cb interface to adapt to GSO case X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote: >=20 >=20 > At 2020-07-31 23:15:43, "Olivier Matz" wrote: > >Hi, > > > >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y_yi@163.com wrote: > >> From: Yi Yang > >>=20 > >> In GSO case, segmented mbufs are attached to original > >> mbuf which can't be freed when it is external. The issue > >> is free_cb doesn't know original mbuf and doesn't free > >> it when refcnt of shinfo is 0. > >>=20 > >> Original mbuf can be freed by rte_pktmbuf_free segmented > >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of > >> cases should have different behaviors. free_cb won't > >> explicitly call rte_pktmbuf_free to free original mbuf > >> if it is freed by rte_pktmbuf_free original mbuf, but it > >> has to call rte_pktmbuf_free to free original mbuf if it > >> is freed by rte_pktmbuf_free segmented mbufs. > >>=20 > >> In order to fix this issue, free_cb interface has to been > >> changed, __rte_pktmbuf_free_extbuf must deliver called > >> mbuf pointer to free_cb, argument opaque can be defined > >> as a custom struct by user, it can includes original mbuf > >> pointer, user-defined free_cb can compare caller mbuf with > >> mbuf in opaque struct, free_cb should free original mbuf > >> if they are not same, this corresponds to rte_pktmbuf_free > >> segmented mbufs case, otherwise, free_cb won't free original > >> mbuf because the caller explicitly called rte_pktmbuf_free > >> to free it. > >>=20 > >> Here is pseduo code to show two kind of cases. > >>=20 > >> case 1. rte_pktmbuf_free segmented mbufs > >>=20 > >> nb_tx =3D rte_gso_segment(original_mbuf, /* original mbuf */ > >> &gso_ctx, > >> /* segmented mbuf */ > >> (struct rte_mbuf **)&gso_mbufs, > >> MAX_GSO_MBUFS); > > > >I'm sorry but it is not very clear to me what operations are done by > >rte_gso_segment(). > > > >In the current code, I only see calls to rte_pktmbuf_attach(), > >which do not deal with external buffers. Am I missing something? > > > >Are you able to show the issue only with mbuf functions? It would > >be helpful to understand what does not work. > > > > > >Thanks, > >Olivier > > > Oliver, thank you for comment, let me show you why it doesn't work for my= use case. In OVS DPDK, VM uses vhostuserclient to send large packets whos= e size is about 64K because we enabled TSO & UFO, these large packets use r= te_mbufs allocated by DPDK virtio_net function=20 > virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Please= refer to [PATCH V1 3/3], I changed free_cb as below, these packets use the= same allocate function and the same free_cb no matter they are TCP packet = or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP fra= gment offload, so OVS DPDK has to do it by software, for UDP case, the orig= inal rte_mbuf only can be freed by segmented rte_mbufs which are output pac= kets of rte_gso_segment, i.e. the original rte_mbuf only can freed by free_= cb, you can see, it explicitly called rte_pktmbuf_free(arg->mbuf), the cond= ition statement "if (caller_m !=3D arg->mbuf)" is true for this case, this = has no problem, but for TCP case, the original mbuf is delivered to rte_eth= _tx_burst() but not segmented rte_mbufs output by rte_gso_segment, PMD driv= er will call rte_pktmbuf_free(original_rte_mbuf) but not rte_pktmbuf_free(s= egmented_rte_mbufs), the same free_cb will be called, that means original_r= te_mbuf will be freed twice, you know what will happen, this is just the is= sue I'm fixing. I bring in caller_m argument, it can help work around this = because caller_m is arg->mbuf and the condition statement "if (caller_m != =3D arg->mbuf)" is false, you can't fix it without the change this patch se= ries did. I'm sill not sure to get your issue. Please, if you have a simple test case using only mbufs functions (without virtio, gso, ...), it would be very helpful because we will be sure that we are talking about the same thing. In case there is an issue, it can easily become a unit test. That said, I looked at vhost mbuf allocation and gso segmentation, and I found some strange things: 1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the ext mbuf. a/ The first one stores the shinfo struct in the mbuf, basically like this: pkt =3D rte_pktmbuf_alloc(mp); shinfo =3D rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *); buf =3D rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); shinfo->free_cb =3D virtio_dev_extbuf_free; shinfo->fcb_opaque =3D buf; rte_mbuf_ext_refcnt_set(shinfo, 1); I don't think it is a good idea, because there is no guarantee that the mbuf won't be freed before the buffer. For instance, doing this will probably fail: pkt2 =3D rte_pktmbuf_alloc(mp); rte_pktmbuf_attach(pkt2, pkt); rte_pktmbuf_free(pkt); /* pkt is freed, but it contains shinfo ! */ To do this properly, the mbuf refcnt should be increased, and the mbuf should be freed in the callback. But I don't think it's worth doing it, given the second path (b/) looks good to me. b/ The second path stores the shinfo struct at the end of the allocated buffer, like this: pkt =3D rte_pktmbuf_alloc(mp); buf_len +=3D sizeof(*shinfo) + sizeof(uintptr_t); buf_len =3D RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); buf =3D rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); shinfo =3D rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, virtio_dev_extbuf_free, buf); I think this is correct, because we have the guarantee that shinfo exists as long as the buffer exists. 2/ in rte_gso_segment(), there is a loop like this: while (pkt_seg) { rte_mbuf_refcnt_update(pkt_seg, -1); pkt_seg =3D pkt_seg->next; } You change it to take in account the refcnt for ext mbufs. I may have missed something but I wonder why it is not simply: rte_pktmbuf_free(pkt_seg); It will decrease the proper refcnt, and free the mbufs if they are not used. Again, sorry if this is not the issue your are referring to, but in this case I really think that having a simple example code that shows the issue would help. Regards, Olivier >=20 > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.= c index e663fd4..3b69cbb 100644 >=20 > --- a/lib/librte_vhost/virtio_net.c >=20 > +++ b/lib/librte_vhost/virtio_net.c >=20 > @@ -2136,10 +2136,20 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid= , uint16_t queue_id, >=20 > return NULL; >=20 > } >=20 > =20 >=20 > +struct shinfo_arg { >=20 > + void *buf; >=20 > + struct rte_mbuf *mbuf; >=20 > +}; >=20 > + >=20 > static void >=20 > -virtio_dev_extbuf_free(struct rte_mbuf * caller_m __rte_unused, void *op= aque) >=20 > +virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque) >=20 > { >=20 > - rte_free(opaque); >=20 > + struct shinfo_arg *arg =3D (struct shinfo_arg *)opaque; >=20 > + >=20 > + rte_free(arg->buf); >=20 > + if (caller_m !=3D arg->mbuf) >=20 > + rte_pktmbuf_free(arg->mbuf); >=20 > + rte_free(arg); >=20 > } >=20 > =20 >=20 > static int >=20 > @@ -2172,8 +2182,14 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid,= uint16_t queue_id, >=20 > =20 >=20 > /* Initialize shinfo */ >=20 > if (shinfo) { >=20 > + struct shinfo_arg *arg =3D (struct shinfo_a= rg *) >=20 > + rte_malloc(NULL, sizeof(str= uct shinfo_arg), >=20 > + RTE_CACH= E_LINE_SIZE); >=20 > + >=20 > + arg->buf =3D buf; >=20 > + arg->mbuf =3D pkt; >=20 > shinfo->free_cb =3D virtio_dev_extbuf_fre= e; >=20 > - shinfo->fcb_opaque =3D buf; >=20 > + shinfo->fcb_opaque =3D arg; >=20 > rte_mbuf_ext_refcnt_set(shinfo, 1); >=20 > } else { >=20 > shinfo =3D rte_pktmbuf_ext_shinfo_init_he= lper(buf, &buf_len, >=20 > -- >=20 > 1.8.3.1 >=20 >=20 >=20 > /* > * Allocate a host supported pktmbuf. > */ > static __rte_always_inline struct rte_mbuf * > virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, > uint32_t data_len) > { > struct rte_mbuf *pkt =3D rte_pktmbuf_alloc(mp); >=20 > if (unlikely(pkt =3D=3D NULL)) { > VHOST_LOG_DATA(ERR, > "Failed to allocate memory for mbuf.\n"); > return NULL; > } >=20 > if (rte_pktmbuf_tailroom(pkt) >=3D data_len) > return pkt; >=20 > /* attach an external buffer if supported */ > if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) > return pkt; >=20 > /* check if chained buffers are allowed */ > if (!dev->linearbuf) > return pkt; >=20 > /* Data doesn't fit into the buffer and the host supports > * only linear buffers > */ > rte_pktmbuf_free(pkt); >=20 > return NULL; > } > =20