From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A27C6A0518; Sun, 2 Aug 2020 22:45:19 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ECDB01C022; Sun, 2 Aug 2020 22:45:18 +0200 (CEST) Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by dpdk.org (Postfix) with ESMTP id 3D5681C021 for ; Sun, 2 Aug 2020 22:45:17 +0200 (CEST) Received: by mail-wr1-f67.google.com with SMTP id f1so31815535wro.2 for ; Sun, 02 Aug 2020 13:45:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=TaL16UUwml+qhkbPBCqDUMs/53x3Mdz9ZwpMSpbfPU8=; b=Ng11jbjEWJR8CN9q/FgeMbEH/wXEyOUlHauAOJa20NsCn5dgZXtHdfkOshSRx3yId2 Sr9zpknlwbwZOQ69saf+AqRlIzlw9Djl/OCuKL/MU+R00ub0M6krww3RGWT3DWZiXYUj ZW58AAiJ3LQDFLLW2/RYmeTgja2WYLNivhQgGDXnPuFWjl4o4XHkVmFxlYoOyN5ewS+P F5RkhFOzNLdToNfEWVPmREwFHj6cBghyGhoePeS2L+AavERZAwlrSOSr+OHoUrLL7MuI WvG5n4FM29qw6b92FaP75lMuWxDZSDOGhGWK3zjH6poGZhWq2QYabFzmsn66OqMSRgho 4/EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=TaL16UUwml+qhkbPBCqDUMs/53x3Mdz9ZwpMSpbfPU8=; b=n5WnY08o8ygZbKejgd2YMveKnUKaDB1mUDk8wct6q4Nw6+s5DQnXy/I6O36HSXGPqV bZ9yS3UoK7qYMTZ6rdVcbDeMpU3uIrXvsOpYWSoX0LK3vORQgfBEebLfn/sLe7B3oaqF PGfBWKjUt/1rN98jNtWmQ4Xx10NCTrFpLV0a/C+Guc7ucMQGh6Q8nx8ovEm2AabBe+vA 3OXYyvSrq7zNXVbd6+Z7SfpP1BOFiIU6srOIyg/J8RrfuVzBD6R8h6dKGERIusDMhQeI lLHtDtjOHm/OsudxthMwCi0CTjMIGUIjIEPdAD0FREJL2tugQn4wdlM+QDvP4L9sIYg6 ikwA== X-Gm-Message-State: AOAM532XzR/dYC63QNr3Xen8O5+hymiBthNgU0PNZka5dF8qK/RRcnG8 652EzVkSKRhB6Mg3pv563j2DIA== X-Google-Smtp-Source: ABdhPJwL+h9xfw9A4tu4jTe8+xOxlIN0WCq5WUdx98RDI1FkLuwGQBPVmZOIiXqtTZgXTBxkt47b+A== X-Received: by 2002:adf:dcc9:: with SMTP id x9mr13451414wrm.153.1596401116809; Sun, 02 Aug 2020 13:45:16 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a60013e303434ff938a5.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:13e3:343:4ff9:38a5]) by smtp.gmail.com with ESMTPSA id f16sm21260067wro.34.2020.08.02.13.45.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Aug 2020 13:45:16 -0700 (PDT) Date: Sun, 2 Aug 2020 22:45:12 +0200 From: Olivier Matz To: yang_y_yi Cc: dev@dpdk.org, jiayu.hu@intel.com, thomas@monjalon.net, yangyi01@inspur.com Message-ID: <20200802204512.GB23824@arsenic.home> References: <20200730120900.108232-1-yang_y_yi@163.com> <20200730120900.108232-3-yang_y_yi@163.com> <20200731151543.GH5869@platinum> <2a50e80c.44f.173ac4c6d20.Coremail.yang_y_yi@163.com> <20200802202907.GJ5869@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20200802202907.GJ5869@platinum> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH V1 2/3] mbuf: change free_cb interface to adapt to GSO case X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Sun, Aug 02, 2020 at 10:29:07PM +0200, Olivier Matz wrote: > Hi, >=20 > On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote: > >=20 > >=20 > > At 2020-07-31 23:15:43, "Olivier Matz" wrote: > > >Hi, > > > > > >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y_yi@163.com wrote: > > >> From: Yi Yang > > >>=20 > > >> In GSO case, segmented mbufs are attached to original > > >> mbuf which can't be freed when it is external. The issue > > >> is free_cb doesn't know original mbuf and doesn't free > > >> it when refcnt of shinfo is 0. > > >>=20 > > >> Original mbuf can be freed by rte_pktmbuf_free segmented > > >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of > > >> cases should have different behaviors. free_cb won't > > >> explicitly call rte_pktmbuf_free to free original mbuf > > >> if it is freed by rte_pktmbuf_free original mbuf, but it > > >> has to call rte_pktmbuf_free to free original mbuf if it > > >> is freed by rte_pktmbuf_free segmented mbufs. > > >>=20 > > >> In order to fix this issue, free_cb interface has to been > > >> changed, __rte_pktmbuf_free_extbuf must deliver called > > >> mbuf pointer to free_cb, argument opaque can be defined > > >> as a custom struct by user, it can includes original mbuf > > >> pointer, user-defined free_cb can compare caller mbuf with > > >> mbuf in opaque struct, free_cb should free original mbuf > > >> if they are not same, this corresponds to rte_pktmbuf_free > > >> segmented mbufs case, otherwise, free_cb won't free original > > >> mbuf because the caller explicitly called rte_pktmbuf_free > > >> to free it. > > >>=20 > > >> Here is pseduo code to show two kind of cases. > > >>=20 > > >> case 1. rte_pktmbuf_free segmented mbufs > > >>=20 > > >> nb_tx =3D rte_gso_segment(original_mbuf, /* original mbuf */ > > >> &gso_ctx, > > >> /* segmented mbuf */ > > >> (struct rte_mbuf **)&gso_mbufs, > > >> MAX_GSO_MBUFS); > > > > > >I'm sorry but it is not very clear to me what operations are done by > > >rte_gso_segment(). > > > > > >In the current code, I only see calls to rte_pktmbuf_attach(), > > >which do not deal with external buffers. Am I missing something? > > > > > >Are you able to show the issue only with mbuf functions? It would > > >be helpful to understand what does not work. > > > > > > > > >Thanks, > > >Olivier > > > > > Oliver, thank you for comment, let me show you why it doesn't work for = my use case. In OVS DPDK, VM uses vhostuserclient to send large packets wh= ose size is about 64K because we enabled TSO & UFO, these large packets use= rte_mbufs allocated by DPDK virtio_net function=20 > > virtio_dev_pktmbuf_alloc() (in file lib/librte_vhost/virtio_net.c. Plea= se refer to [PATCH V1 3/3], I changed free_cb as below, these packets use t= he same allocate function and the same free_cb no matter they are TCP packe= t or UDP packets, in case of VXLAN TSO, most NICs can't support inner UDP f= ragment offload, so OVS DPDK has to do it by software, for UDP case, the or= iginal rte_mbuf only can be freed by segmented rte_mbufs which are output p= ackets of rte_gso_segment, i.e. the original rte_mbuf only can freed by fre= e_cb, you can see, it explicitly called rte_pktmbuf_free(arg->mbuf), the co= ndition statement "if (caller_m !=3D arg->mbuf)" is true for this case, thi= s has no problem, but for TCP case, the original mbuf is delivered to rte_e= th_tx_burst() but not segmented rte_mbufs output by rte_gso_segment, PMD dr= iver will call rte_pktmbuf_free(original_rte_mbuf) but not rte_pktmbuf_free= (segmented_rte_mbufs), the same free_cb will be called, that means original= _rte_mbuf will be freed twice, you know what will happen, this is just the = issue I'm fixing. I bring in caller_m argument, it can help work around thi= s because caller_m is arg->mbuf and the condition statement "if (caller_m != =3D arg->mbuf)" is false, you can't fix it without the change this patch se= ries did. >=20 > I'm sill not sure to get your issue. Please, if you have a simple test > case using only mbufs functions (without virtio, gso, ...), it would be > very helpful because we will be sure that we are talking about the same > thing. In case there is an issue, it can easily become a unit test. >=20 > That said, I looked at vhost mbuf allocation and gso segmentation, and > I found some strange things: >=20 > 1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to create the > ext mbuf. >=20 > a/ The first one stores the shinfo struct in the mbuf, basically > like this: >=20 > pkt =3D rte_pktmbuf_alloc(mp); > shinfo =3D rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *); > buf =3D rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > shinfo->free_cb =3D virtio_dev_extbuf_free; > shinfo->fcb_opaque =3D buf; > rte_mbuf_ext_refcnt_set(shinfo, 1); >=20 > I don't think it is a good idea, because there is no guarantee that > the mbuf won't be freed before the buffer. For instance, doing > this will probably fail: >=20 > pkt2 =3D rte_pktmbuf_alloc(mp); > rte_pktmbuf_attach(pkt2, pkt); > rte_pktmbuf_free(pkt); /* pkt is freed, but it contains shinfo ! */ >=20 > To do this properly, the mbuf refcnt should be increased, and > the mbuf should be freed in the callback. But I don't think it's > worth doing it, given the second path (b/) looks good to me. >=20 > b/ The second path stores the shinfo struct at the end of the > allocated buffer, like this: >=20 > pkt =3D rte_pktmbuf_alloc(mp); > buf_len +=3D sizeof(*shinfo) + sizeof(uintptr_t); > buf_len =3D RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); > buf =3D rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > shinfo =3D rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, > virtio_dev_extbuf_free, buf); >=20 > I think this is correct, because we have the guarantee that shinfo > exists as long as the buffer exists. >=20 > 2/ in rte_gso_segment(), there is a loop like this: >=20 > while (pkt_seg) { > rte_mbuf_refcnt_update(pkt_seg, -1); > pkt_seg =3D pkt_seg->next; > } >=20 > You change it to take in account the refcnt for ext mbufs. >=20 > I may have missed something but I wonder why it is not simply: >=20 > rte_pktmbuf_free(pkt_seg); Here, I mean rte_pktmbuf_free(pkt) (one free() call for the mbuf chain, not one per segment) >=20 > It will decrease the proper refcnt, and free the mbufs if they > are not used. >=20 > Again, sorry if this is not the issue your are referring to, but > in this case I really think that having a simple example code that > shows the issue would help. >=20 > Regards, > Olivier >=20 >=20 >=20 > >=20 > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_ne= t.c index e663fd4..3b69cbb 100644 > >=20 > > --- a/lib/librte_vhost/virtio_net.c > >=20 > > +++ b/lib/librte_vhost/virtio_net.c > >=20 > > @@ -2136,10 +2136,20 @@ uint16_t rte_vhost_poll_enqueue_completed(int v= id, uint16_t queue_id, > >=20 > > return NULL; > >=20 > > } > >=20 > > =20 > >=20 > > +struct shinfo_arg { > >=20 > > + void *buf; > >=20 > > + struct rte_mbuf *mbuf; > >=20 > > +}; > >=20 > > + > >=20 > > static void > >=20 > > -virtio_dev_extbuf_free(struct rte_mbuf * caller_m __rte_unused, void *= opaque) > >=20 > > +virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque) > >=20 > > { > >=20 > > - rte_free(opaque); > >=20 > > + struct shinfo_arg *arg =3D (struct shinfo_arg *)opaque; > >=20 > > + > >=20 > > + rte_free(arg->buf); > >=20 > > + if (caller_m !=3D arg->mbuf) > >=20 > > + rte_pktmbuf_free(arg->mbuf); > >=20 > > + rte_free(arg); > >=20 > > } > >=20 > > =20 > >=20 > > static int > >=20 > > @@ -2172,8 +2182,14 @@ uint16_t rte_vhost_poll_enqueue_completed(int vi= d, uint16_t queue_id, > >=20 > > =20 > >=20 > > /* Initialize shinfo */ > >=20 > > if (shinfo) { > >=20 > > + struct shinfo_arg *arg =3D (struct shinfo= _arg *) > >=20 > > + rte_malloc(NULL, sizeof(s= truct shinfo_arg), > >=20 > > + RTE_CA= CHE_LINE_SIZE); > >=20 > > + > >=20 > > + arg->buf =3D buf; > >=20 > > + arg->mbuf =3D pkt; > >=20 > > shinfo->free_cb =3D virtio_dev_extbuf_f= ree; > >=20 > > - shinfo->fcb_opaque =3D buf; > >=20 > > + shinfo->fcb_opaque =3D arg; > >=20 > > rte_mbuf_ext_refcnt_set(shinfo, 1); > >=20 > > } else { > >=20 > > shinfo =3D rte_pktmbuf_ext_shinfo_init_= helper(buf, &buf_len, > >=20 > > -- > >=20 > > 1.8.3.1 > >=20 > >=20 > >=20 > > /* > > * Allocate a host supported pktmbuf. > > */ > > static __rte_always_inline struct rte_mbuf * > > virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, > > uint32_t data_len) > > { > > struct rte_mbuf *pkt =3D rte_pktmbuf_alloc(mp); > >=20 > > if (unlikely(pkt =3D=3D NULL)) { > > VHOST_LOG_DATA(ERR, > > "Failed to allocate memory for mbuf.\n"); > > return NULL; > > } > >=20 > > if (rte_pktmbuf_tailroom(pkt) >=3D data_len) > > return pkt; > >=20 > > /* attach an external buffer if supported */ > > if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len)) > > return pkt; > >=20 > > /* check if chained buffers are allowed */ > > if (!dev->linearbuf) > > return pkt; > >=20 > > /* Data doesn't fit into the buffer and the host supports > > * only linear buffers > > */ > > rte_pktmbuf_free(pkt); > >=20 > > return NULL; > > } > > =20