From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 432BD430EE;
	Thu, 24 Aug 2023 10:52:57 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 3E81D41148;
	Thu, 24 Aug 2023 10:52:57 +0200 (CEST)
Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com
 [209.85.222.178])
 by mails.dpdk.org (Postfix) with ESMTP id 1087D40EE1
 for <dev@dpdk.org>; Thu, 24 Aug 2023 08:51:19 +0200 (CEST)
Received: by mail-qk1-f178.google.com with SMTP id
 af79cd13be357-76d931f48acso353654985a.3
 for <dev@dpdk.org>; Wed, 23 Aug 2023 23:51:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20221208; t=1692859878; x=1693464678;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=qm39hm1reEKDEUE2V3zpyQ7lxEp/5+kP1V23OS+If/s=;
 b=VSRo5ndzN7Kp/Tppytuo2yVJRdRO/DoTfouK59bLQd2GDBl0lIdBDNRo9PFWBhAJc2
 SSGR1ZmBw+TnGatmwzWahT84xErhER3mnrA04/852U8+VXmwEfD/prspwG+qnC1y/5xn
 4T+/ZyYWw8INbySKUzVEXTndzdKu16V4CCdueDirlsuD+3/QgZwNoYoewuGLE+ID6eU7
 Gt1BaW4gS5FTEJN2XP0TtvGsgRIQvjPjIEsVUovzfK4bFMyXRoExN4kejNmeZ4PCeWp/
 Psp3Xeys6F9WKt89iVz0CMJzakMZicY0xXH3HDzpbWH/acty9pp6IRmkryVkU3bAFGqg
 PwZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1692859878; x=1693464678;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=qm39hm1reEKDEUE2V3zpyQ7lxEp/5+kP1V23OS+If/s=;
 b=CN3vI9uMrSe5HwTMhRc5OOY1W6f1PESOCWa8eFzVVVEBGgvvwlZGCEzuVqQ5dfZlE7
 maaM29gZZLdIk54+lU3yhTUwnix8+tXb2KE4W99rGid6YeiGL7BN/7i8v99D64EuTy5q
 s6WyIJEaVsWzodKbV5RtxUAMMuBLrdjphY2CwJgYUby/sYhl8TjGvTpzXUNa1vCQoB+c
 1CfAIO9RdHMvPtS2Q0Kh+nRe319++qHne+STlr9gg7sUwJ4bxrRb9kqIz6gGUOrdM/5E
 /x1ONK2/HmbHX5YoHEBHUAsJFOySvxtqKxoccRE8deHBow9DwyG82Ccs0MPmKwbjQmTh
 +JAw==
X-Gm-Message-State: AOJu0Yz21GWFFP/RvdxVR2yLoKMHsRm3P4i9TDUgcFbU4enq78WGy+R8
 aNHRkhTL0hSiEBn0fvPnBe+tKjE81LygsA0KMJI=
X-Google-Smtp-Source: AGHT+IF0Tb11aWQVpE3xEL0VwBaOiF5YKlsmHdQxB3HZe7AAsVditG93F7EMP1IkNHE5ytLaUhQjLicwVrhXrTiJ148=
X-Received: by 2002:a05:620a:1a1c:b0:76d:921d:52d2 with SMTP id
 bk28-20020a05620a1a1c00b0076d921d52d2mr16866673qkb.10.1692859878237; Wed, 23
 Aug 2023 23:51:18 -0700 (PDT)
MIME-Version: 1.0
References: <20230810160941.3895855-1-jhascoet@kalray.eu>
 <20230822063453.97904-1-jhascoet@kalray.eu>
 <ZORzEDFxEgYsF3R5@platinum>
In-Reply-To: <ZORzEDFxEgYsF3R5@platinum>
From: Hascoet Julien <ju.hascoet@gmail.com>
Date: Thu, 24 Aug 2023 08:51:07 +0200
Message-ID: <CAOkfUEhNnXfS77R_E-AVAHe+vDPYvG5OMfhWbHgCzDank+udTQ@mail.gmail.com>
Subject: Re: [PATCH] app: fix silent enqueue fail in test_mbuf test_refcnt_iter
To: Olivier Matz <olivier.matz@6wind.com>
Cc: dev@dpdk.org, David Marchand <david.marchand@redhat.com>
Content-Type: multipart/alternative; boundary="000000000000b269e50603a5a787"
X-Mailman-Approved-At: Thu, 24 Aug 2023 10:52:56 +0200
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

--000000000000b269e50603a5a787
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hello Oliver,

thanks, your response helped a lot, I managed to find the root cause of the
instability which is on our side.
It was due to other internal developments.
I'll still add an error check on the enqueue ops to catch eventual problems
earlier, if that suits you.

Best regards,

Julien

On Tue, Aug 22, 2023 at 10:34=E2=80=AFAM Olivier Matz <olivier.matz@6wind.c=
om>
wrote:

> Hello Julien,
>
> On Tue, Aug 22, 2023 at 08:34:53AM +0200, jhascoet wrote:
> > From: Julien Hascoet <ju.hascoet@gmail.com>
> >
> > In case of ring full state, we retry the enqueue
> > operation in order to avoid mbuf loss.
> >
> > Fixes: af75078fece ("first public release")
> >
> > Signed-off-by: Julien Hascoet <ju.hascoet@gmail.com>
> > ---
> >  app/test/test_mbuf.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > index efac01806b..ad18bf6378 100644
> > --- a/app/test/test_mbuf.c
> > +++ b/app/test/test_mbuf.c
> > @@ -1033,12 +1033,21 @@ test_refcnt_iter(unsigned int lcore, unsigned
> int iter,
> >               tref +=3D ref;
> >               if ((ref & 1) !=3D 0) {
> >                       rte_pktmbuf_refcnt_update(m, ref);
> > -                     while (ref-- !=3D 0)
> > -                             rte_ring_enqueue(refcnt_mbuf_ring, m);
> > +                     while (ref-- !=3D 0) {
> > +                             /* retry in case of failure */
> > +                             while (rte_ring_enqueue(refcnt_mbuf_ring,
> m) !=3D 0) {
> > +                                     /* let others consume */
> > +                                     rte_pause();
> > +                             }
> > +                     }
> >               } else {
> >                       while (ref-- !=3D 0) {
> >                               rte_pktmbuf_refcnt_update(m, 1);
> > -                             rte_ring_enqueue(refcnt_mbuf_ring, m);
> > +                             /* retry in case of failure */
> > +                             while (rte_ring_enqueue(refcnt_mbuf_ring,
> m) !=3D 0) {
> > +                                     /* let others consume */
> > +                                     rte_pause();
> > +                             }
> >                       }
> >               }
> >               rte_pktmbuf_free(m);
> > --
> > 2.34.1
> >
>
> Can you give some more details about how to reproduce the issue?
>
> From what I see, the code does the following:
>
> main core:
>   create a ring with at least (REFCNT_MBUF_NUM * REFCNT_MAX_REF) entries
>   create an mbuf pool with REFCNT_MBUF_NUM entries
>   start worker cores
>   do REFCNT_MAX_ITER times:
>     for each mbuf of the pool (REFCNT_MBUF_NUM entries):
>       let r be a random number between 1 and REFCNT_MAX_REF
>       increase mbuf references by r, and enqueue r times in the ring
>     wait that the ring is empty (since worker cores are dequeuing mbufs)
>   stop worker cores
>
> worker cores:
>   dequeue packets from the ring and free them until asked to stop
>
>
> I may be mistaking but I don't see how the number of mbufs in ring could
> exceed REFCNT_MBUF_NUM * REFCNT_MAX_REF.
>
> Regards,
> Olivier
>
>
> Note: removing CC maintainers@dpdk.org
>

--000000000000b269e50603a5a787
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello Oliver,</div><div><br></div><div>thanks, your r=
esponse helped a lot, I managed to find the root cause of the instability w=
hich is on our side.</div><div>It was due to other  internal developments.<=
/div><div>I&#39;ll still add an error check on the enqueue ops to catch eve=
ntual problems earlier, if that suits you.</div><div><br></div><div>Best re=
gards,</div><div><br></div><div>Julien<br></div></div><br><div class=3D"gma=
il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Aug 22, 2023 at 10:=
34=E2=80=AFAM Olivier Matz &lt;<a href=3D"mailto:olivier.matz@6wind.com">ol=
ivier.matz@6wind.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204=
);padding-left:1ex">Hello Julien,<br>
<br>
On Tue, Aug 22, 2023 at 08:34:53AM +0200, jhascoet wrote:<br>
&gt; From: Julien Hascoet &lt;<a href=3D"mailto:ju.hascoet@gmail.com" targe=
t=3D"_blank">ju.hascoet@gmail.com</a>&gt;<br>
&gt; <br>
&gt; In case of ring full state, we retry the enqueue<br>
&gt; operation in order to avoid mbuf loss.<br>
&gt; <br>
&gt; Fixes: af75078fece (&quot;first public release&quot;)<br>
&gt; <br>
&gt; Signed-off-by: Julien Hascoet &lt;<a href=3D"mailto:ju.hascoet@gmail.c=
om" target=3D"_blank">ju.hascoet@gmail.com</a>&gt;<br>
&gt; ---<br>
&gt;=C2=A0 app/test/test_mbuf.c | 15 ++++++++++++---<br>
&gt;=C2=A0 1 file changed, 12 insertions(+), 3 deletions(-)<br>
&gt; <br>
&gt; diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c<br>
&gt; index efac01806b..ad18bf6378 100644<br>
&gt; --- a/app/test/test_mbuf.c<br>
&gt; +++ b/app/test/test_mbuf.c<br>
&gt; @@ -1033,12 +1033,21 @@ test_refcnt_iter(unsigned int lcore, unsigned =
int iter,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tref +=3D ref;<b=
r>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((ref &amp; 1=
) !=3D 0) {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0rte_pktmbuf_refcnt_update(m, ref);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0while (ref-- !=3D 0)<br>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rte_ring_enqueue(refcnt_mbuf_ring, m);<b=
r>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0while (ref-- !=3D 0) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* retry in case of failure */<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (rte_ring_enqueue(refcnt_mbuf_ring=
, m) !=3D 0) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* let other=
s consume */<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rte_pause();=
<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0}<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0while (ref-- !=3D 0) {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rte_pktmbuf_refcnt_update(m, 1);<b=
r>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rte_ring_enqueue(refcnt_mbuf_ring, m);<b=
r>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* retry in case of failure */<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (rte_ring_enqueue(refcnt_mbuf_ring=
, m) !=3D 0) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* let other=
s consume */<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rte_pause();=
<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0}<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rte_pktmbuf_free=
(m);<br>
&gt; -- <br>
&gt; 2.34.1<br>
&gt; <br>
<br>
Can you give some more details about how to reproduce the issue?<br>
<br>
>From what I see, the code does the following:<br>
<br>
main core:<br>
=C2=A0 create a ring with at least (REFCNT_MBUF_NUM * REFCNT_MAX_REF) entri=
es<br>
=C2=A0 create an mbuf pool with REFCNT_MBUF_NUM entries<br>
=C2=A0 start worker cores<br>
=C2=A0 do REFCNT_MAX_ITER times:<br>
=C2=A0 =C2=A0 for each mbuf of the pool (REFCNT_MBUF_NUM entries):<br>
=C2=A0 =C2=A0 =C2=A0 let r be a random number between 1 and REFCNT_MAX_REF<=
br>
=C2=A0 =C2=A0 =C2=A0 increase mbuf references by r, and enqueue r times in =
the ring<br>
=C2=A0 =C2=A0 wait that the ring is empty (since worker cores are dequeuing=
 mbufs)<br>
=C2=A0 stop worker cores<br>
<br>
worker cores:<br>
=C2=A0 dequeue packets from the ring and free them until asked to stop<br>
<br>
<br>
I may be mistaking but I don&#39;t see how the number of mbufs in ring coul=
d<br>
exceed REFCNT_MBUF_NUM * REFCNT_MAX_REF.<br>
<br>
Regards,<br>
Olivier<br>
<br>
<br>
Note: removing CC <a href=3D"mailto:maintainers@dpdk.org" target=3D"_blank"=
>maintainers@dpdk.org</a><br>
</blockquote></div>

--000000000000b269e50603a5a787--