DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
@ 2020-11-04 17:00 Olivier Matz
  2020-11-05  0:15 ` Ananyev, Konstantin
                   ` (4 more replies)
  0 siblings, 5 replies; 74+ messages in thread
From: Olivier Matz @ 2020-11-04 17:00 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, stable

m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Freeing this mbuf chain will not restore nb_seg=1 in the second
segment. This is expected that mbufs stored in pool have their
nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.c |  6 ++----
 lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 8a456e5e64..e632071c23 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
-		m->next = NULL;
-		m->nb_segs = 1;
-	}
+	m->next = NULL;
+	m->nb_segs = 1;
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a1414ed7cd..ef5800c8ef 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
-			m->next = NULL;
-			m->nb_segs = 1;
-		}
+		m->next = NULL;
+		m->nb_segs = 1;
 
 		return m;
 
@@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
-			m->next = NULL;
-			m->nb_segs = 1;
-		}
+		m->next = NULL;
+		m->nb_segs = 1;
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
@ 2020-11-05  0:15 ` Ananyev, Konstantin
  2020-11-05  7:46   ` Olivier Matz
  2020-11-08  7:25 ` Ali Alnubani
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-05  0:15 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: stable


Hi Olivier,
 
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
> 
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
> 
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
> 
> Freeing this mbuf chain will not restore nb_seg=1 in the second
> segment. 

Hmm, not sure why is that?
You are talking about freeing m1, right?
rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
	...
	if (m->next != NULL) {
                        m->next = NULL;
                        m->nb_segs = 1;
                }

m1->next != NULL, so it will enter the if() block,
and will reset both next and nb_segs.
What I am missing here? 
Thinking in more generic way, that change:
 -		if (m->next != NULL) {
 -			m->next = NULL;
 -			m->nb_segs = 1;
 -		}
 +		m->next = NULL;
 +		m->nb_segs = 1;

Assumes that it is ok to have an mbuf with
nb_seg > 1 and next == NULL.
Which seems wrong to me.
Konstantin


>This is expected that mbufs stored in pool have their
> nb_seg field set to 1.
> 
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
>  2 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 8a456e5e64..e632071c23 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> 
>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>  	m->ol_flags = EXT_ATTACHED_MBUF;
> -	if (m->next != NULL) {
> -		m->next = NULL;
> -		m->nb_segs = 1;
> -	}
> +	m->next = NULL;
> +	m->nb_segs = 1;
>  	rte_mbuf_raw_free(m);
>  }
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index a1414ed7cd..ef5800c8ef 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> -			m->next = NULL;
> -			m->nb_segs = 1;
> -		}
> +		m->next = NULL;
> +		m->nb_segs = 1;
> 
>  		return m;
> 
> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> -			m->next = NULL;
> -			m->nb_segs = 1;
> -		}
> +		m->next = NULL;
> +		m->nb_segs = 1;
>  		rte_mbuf_refcnt_set(m, 1);
> 
>  		return m;
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  0:15 ` Ananyev, Konstantin
@ 2020-11-05  7:46   ` Olivier Matz
  2020-11-05  8:26     ` Andrew Rybchenko
                       ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Olivier Matz @ 2020-11-05  7:46 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, stable

On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> 
> Hi Olivier,
>  
> > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > because it can happen that m->nb_seg is != 1. For instance in this
> > case:
> > 
> >   m1 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m1, 500);
> >   m2 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m2, 500);
> >   rte_pktmbuf_chain(m1, m2);
> >   m0 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m0, 500);
> >   rte_pktmbuf_chain(m0, m1);
> > 
> > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > segment (this is not required), after this code the mbuf chain
> > have 3 segments:
> >   - m0: next=m1, nb_seg=3
> >   - m1: next=m2, nb_seg=2
> >   - m2: next=NULL, nb_seg=1
> > 
> > Freeing this mbuf chain will not restore nb_seg=1 in the second
> > segment. 
> 
> Hmm, not sure why is that?
> You are talking about freeing m1, right?
> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> {
> 	...
> 	if (m->next != NULL) {
>                         m->next = NULL;
>                         m->nb_segs = 1;
>                 }
> 
> m1->next != NULL, so it will enter the if() block,
> and will reset both next and nb_segs.
> What I am missing here? 
> Thinking in more generic way, that change:
>  -		if (m->next != NULL) {
>  -			m->next = NULL;
>  -			m->nb_segs = 1;
>  -		}
>  +		m->next = NULL;
>  +		m->nb_segs = 1;

Ah, sorry. I oversimplified the example and now it does not
show the issue...

The full example also adds a split() to break the mbuf chain
between m1 and m2. The kind of thing that would be done for
software TCP segmentation.

After this operation, we have 2 mbuf chain:
 - m0 with 2 segments, the last one has next=NULL but nb_seg=2
 - new_m with 1 segment

Freeing m0 will not restore nb_seg=1 in the second segment.

> Assumes that it is ok to have an mbuf with
> nb_seg > 1 and next == NULL.
> Which seems wrong to me.

I don't think it is wrong: nb_seg is just ignored when not in the first
segment, and there is nothing saying it should be set to 1. Typically,
rte_pktmbuf_chain() does not change it, and I guess it's the same for
many similar functions in applications.

Olivier

> 
> 
> >This is expected that mbufs stored in pool have their
> > nb_seg field set to 1.
> > 
> > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> >  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> >  2 files changed, 6 insertions(+), 12 deletions(-)
> > 
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 8a456e5e64..e632071c23 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > 
> >  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> >  	m->ol_flags = EXT_ATTACHED_MBUF;
> > -	if (m->next != NULL) {
> > -		m->next = NULL;
> > -		m->nb_segs = 1;
> > -	}
> > +	m->next = NULL;
> > +	m->nb_segs = 1;
> >  	rte_mbuf_raw_free(m);
> >  }
> > 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index a1414ed7cd..ef5800c8ef 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >  				return NULL;
> >  		}
> > 
> > -		if (m->next != NULL) {
> > -			m->next = NULL;
> > -			m->nb_segs = 1;
> > -		}
> > +		m->next = NULL;
> > +		m->nb_segs = 1;
> > 
> >  		return m;
> > 
> > @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >  				return NULL;
> >  		}
> > 
> > -		if (m->next != NULL) {
> > -			m->next = NULL;
> > -			m->nb_segs = 1;
> > -		}
> > +		m->next = NULL;
> > +		m->nb_segs = 1;
> >  		rte_mbuf_refcnt_set(m, 1);
> > 
> >  		return m;
> > --
> > 2.25.1
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  7:46   ` Olivier Matz
@ 2020-11-05  8:26     ` Andrew Rybchenko
  2020-11-05  9:10       ` Olivier Matz
  2020-11-05  8:33     ` Morten Brørup
  2020-11-05  9:09     ` Andrew Rybchenko
  2 siblings, 1 reply; 74+ messages in thread
From: Andrew Rybchenko @ 2020-11-05  8:26 UTC (permalink / raw)
  To: dev

On 11/5/20 10:46 AM, Olivier Matz wrote:
> On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
>>
>> Hi Olivier,
>>  
>>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
>>> because it can happen that m->nb_seg is != 1. For instance in this
>>> case:
>>>
>>>   m1 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m1, 500);
>>>   m2 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m2, 500);
>>>   rte_pktmbuf_chain(m1, m2);
>>>   m0 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m0, 500);
>>>   rte_pktmbuf_chain(m0, m1);
>>>
>>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
>>> segment (this is not required), after this code the mbuf chain
>>> have 3 segments:
>>>   - m0: next=m1, nb_seg=3
>>>   - m1: next=m2, nb_seg=2
>>>   - m2: next=NULL, nb_seg=1
>>>
>>> Freeing this mbuf chain will not restore nb_seg=1 in the second
>>> segment. 
>>
>> Hmm, not sure why is that?
>> You are talking about freeing m1, right?
>> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>> {
>> 	...
>> 	if (m->next != NULL) {
>>                         m->next = NULL;
>>                         m->nb_segs = 1;
>>                 }
>>
>> m1->next != NULL, so it will enter the if() block,
>> and will reset both next and nb_segs.
>> What I am missing here? 
>> Thinking in more generic way, that change:
>>  -		if (m->next != NULL) {
>>  -			m->next = NULL;
>>  -			m->nb_segs = 1;
>>  -		}
>>  +		m->next = NULL;
>>  +		m->nb_segs = 1;
> 
> Ah, sorry. I oversimplified the example and now it does not
> show the issue...
> 
> The full example also adds a split() to break the mbuf chain
> between m1 and m2. The kind of thing that would be done for
> software TCP segmentation.
> 

If so, may be the right solution is to care about nb_segs
when next is set to NULL on split? Any place when next is set
to NULL. Just to keep the optimization in a more generic place.

> After this operation, we have 2 mbuf chain:
>  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
>  - new_m with 1 segment
> 
> Freeing m0 will not restore nb_seg=1 in the second segment.
> 
>> Assumes that it is ok to have an mbuf with
>> nb_seg > 1 and next == NULL.
>> Which seems wrong to me.
> 
> I don't think it is wrong: nb_seg is just ignored when not in the first
> segment, and there is nothing saying it should be set to 1. Typically,
> rte_pktmbuf_chain() does not change it, and I guess it's the same for
> many similar functions in applications.
> 
> Olivier
> 
>>
>>
>>> This is expected that mbufs stored in pool have their
>>> nb_seg field set to 1.
>>>
>>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>> ---
>>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
>>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
>>>  2 files changed, 6 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>> index 8a456e5e64..e632071c23 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
>>>
>>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>>>  	m->ol_flags = EXT_ATTACHED_MBUF;
>>> -	if (m->next != NULL) {
>>> -		m->next = NULL;
>>> -		m->nb_segs = 1;
>>> -	}
>>> +	m->next = NULL;
>>> +	m->nb_segs = 1;
>>>  	rte_mbuf_raw_free(m);
>>>  }
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>> index a1414ed7cd..ef5800c8ef 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>
>>>  		return m;
>>>
>>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>  		rte_mbuf_refcnt_set(m, 1);
>>>
>>>  		return m;
>>> --
>>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  7:46   ` Olivier Matz
  2020-11-05  8:26     ` Andrew Rybchenko
@ 2020-11-05  8:33     ` Morten Brørup
  2020-11-05  9:03       ` Olivier Matz
  2020-11-05  9:09     ` Andrew Rybchenko
  2 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-11-05  8:33 UTC (permalink / raw)
  To: Olivier Matz, Ananyev, Konstantin; +Cc: dev, stable

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Thursday, November 5, 2020 8:46 AM
> 
> On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> >
> > Hi Olivier,
> >
> > > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > because it can happen that m->nb_seg is != 1. For instance in this
> > > case:
> > >
> > >   m1 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m1, 500);
> > >   m2 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m2, 500);
> > >   rte_pktmbuf_chain(m1, m2);
> > >   m0 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m0, 500);
> > >   rte_pktmbuf_chain(m0, m1);
> > >
> > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > segment (this is not required), after this code the mbuf chain
> > > have 3 segments:
> > >   - m0: next=m1, nb_seg=3
> > >   - m1: next=m2, nb_seg=2
> > >   - m2: next=NULL, nb_seg=1
> > >
> > > Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > segment.
> >
> > Hmm, not sure why is that?
> > You are talking about freeing m1, right?
> > rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > {
> > 	...
> > 	if (m->next != NULL) {
> >                         m->next = NULL;
> >                         m->nb_segs = 1;
> >                 }
> >
> > m1->next != NULL, so it will enter the if() block,
> > and will reset both next and nb_segs.
> > What I am missing here?
> > Thinking in more generic way, that change:
> >  -		if (m->next != NULL) {
> >  -			m->next = NULL;
> >  -			m->nb_segs = 1;
> >  -		}
> >  +		m->next = NULL;
> >  +		m->nb_segs = 1;
> 
> Ah, sorry. I oversimplified the example and now it does not
> show the issue...
> 
> The full example also adds a split() to break the mbuf chain
> between m1 and m2. The kind of thing that would be done for
> software TCP segmentation.
> 
> After this operation, we have 2 mbuf chain:
>  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
>  - new_m with 1 segment
> 
> Freeing m0 will not restore nb_seg=1 in the second segment.
> 
> > Assumes that it is ok to have an mbuf with
> > nb_seg > 1 and next == NULL.
> > Which seems wrong to me.
> 
> I don't think it is wrong: nb_seg is just ignored when not in the first
> segment, and there is nothing saying it should be set to 1. Typically,
> rte_pktmbuf_chain() does not change it, and I guess it's the same for
> many similar functions in applications.
> 
> Olivier

Acked-by: Morten Brørup <mb@smartsharesystems.com>

And while you are at it, please consider extending the description of the two mbuf fields with their invariants:
1. nb_segs is only valid for the first segment of a multi-segment packet.
2. next is NULL for non-segmented packets.

> 
> >
> >
> > >This is expected that mbufs stored in pool have their
> > > nb_seg field set to 1.
> > >
> > > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > ---
> > >  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > >  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > >  2 files changed, 6 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.c
> b/lib/librte_mbuf/rte_mbuf.c
> > > index 8a456e5e64..e632071c23 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr,
> void *opaque)
> > >
> > >  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > >  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > -	if (m->next != NULL) {
> > > -		m->next = NULL;
> > > -		m->nb_segs = 1;
> > > -	}
> > > +	m->next = NULL;
> > > +	m->nb_segs = 1;
> > >  	rte_mbuf_raw_free(m);
> > >  }
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> b/lib/librte_mbuf/rte_mbuf.h
> > > index a1414ed7cd..ef5800c8ef 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >  				return NULL;
> > >  		}
> > >
> > > -		if (m->next != NULL) {
> > > -			m->next = NULL;
> > > -			m->nb_segs = 1;
> > > -		}
> > > +		m->next = NULL;
> > > +		m->nb_segs = 1;
> > >
> > >  		return m;
> > >
> > > @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >  				return NULL;
> > >  		}
> > >
> > > -		if (m->next != NULL) {
> > > -			m->next = NULL;
> > > -			m->nb_segs = 1;
> > > -		}
> > > +		m->next = NULL;
> > > +		m->nb_segs = 1;
> > >  		rte_mbuf_refcnt_set(m, 1);
> > >
> > >  		return m;
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  8:33     ` Morten Brørup
@ 2020-11-05  9:03       ` Olivier Matz
  0 siblings, 0 replies; 74+ messages in thread
From: Olivier Matz @ 2020-11-05  9:03 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Ananyev, Konstantin, dev, stable

On Thu, Nov 05, 2020 at 09:33:58AM +0100, Morten Brørup wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > Sent: Thursday, November 5, 2020 8:46 AM
> > 
> > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > >
> > > Hi Olivier,
> > >
> > > > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > because it can happen that m->nb_seg is != 1. For instance in this
> > > > case:
> > > >
> > > >   m1 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m1, 500);
> > > >   m2 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m2, 500);
> > > >   rte_pktmbuf_chain(m1, m2);
> > > >   m0 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m0, 500);
> > > >   rte_pktmbuf_chain(m0, m1);
> > > >
> > > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > segment (this is not required), after this code the mbuf chain
> > > > have 3 segments:
> > > >   - m0: next=m1, nb_seg=3
> > > >   - m1: next=m2, nb_seg=2
> > > >   - m2: next=NULL, nb_seg=1
> > > >
> > > > Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > segment.
> > >
> > > Hmm, not sure why is that?
> > > You are talking about freeing m1, right?
> > > rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > {
> > > 	...
> > > 	if (m->next != NULL) {
> > >                         m->next = NULL;
> > >                         m->nb_segs = 1;
> > >                 }
> > >
> > > m1->next != NULL, so it will enter the if() block,
> > > and will reset both next and nb_segs.
> > > What I am missing here?
> > > Thinking in more generic way, that change:
> > >  -		if (m->next != NULL) {
> > >  -			m->next = NULL;
> > >  -			m->nb_segs = 1;
> > >  -		}
> > >  +		m->next = NULL;
> > >  +		m->nb_segs = 1;
> > 
> > Ah, sorry. I oversimplified the example and now it does not
> > show the issue...
> > 
> > The full example also adds a split() to break the mbuf chain
> > between m1 and m2. The kind of thing that would be done for
> > software TCP segmentation.
> > 
> > After this operation, we have 2 mbuf chain:
> >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> >  - new_m with 1 segment
> > 
> > Freeing m0 will not restore nb_seg=1 in the second segment.
> > 
> > > Assumes that it is ok to have an mbuf with
> > > nb_seg > 1 and next == NULL.
> > > Which seems wrong to me.
> > 
> > I don't think it is wrong: nb_seg is just ignored when not in the first
> > segment, and there is nothing saying it should be set to 1. Typically,
> > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > many similar functions in applications.
> > 
> > Olivier
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> And while you are at it, please consider extending the description of the two mbuf fields with their invariants:
> 1. nb_segs is only valid for the first segment of a multi-segment packet.
> 2. next is NULL for non-segmented packets.

Good point, will add it in v2.

> 
> > 
> > >
> > >
> > > >This is expected that mbufs stored in pool have their
> > > > nb_seg field set to 1.
> > > >
> > > > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > >  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > >  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > >  2 files changed, 6 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.c
> > b/lib/librte_mbuf/rte_mbuf.c
> > > > index 8a456e5e64..e632071c23 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.c
> > > > +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr,
> > void *opaque)
> > > >
> > > >  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > >  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > -	if (m->next != NULL) {
> > > > -		m->next = NULL;
> > > > -		m->nb_segs = 1;
> > > > -	}
> > > > +	m->next = NULL;
> > > > +	m->nb_segs = 1;
> > > >  	rte_mbuf_raw_free(m);
> > > >  }
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > b/lib/librte_mbuf/rte_mbuf.h
> > > > index a1414ed7cd..ef5800c8ef 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >  				return NULL;
> > > >  		}
> > > >
> > > > -		if (m->next != NULL) {
> > > > -			m->next = NULL;
> > > > -			m->nb_segs = 1;
> > > > -		}
> > > > +		m->next = NULL;
> > > > +		m->nb_segs = 1;
> > > >
> > > >  		return m;
> > > >
> > > > @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >  				return NULL;
> > > >  		}
> > > >
> > > > -		if (m->next != NULL) {
> > > > -			m->next = NULL;
> > > > -			m->nb_segs = 1;
> > > > -		}
> > > > +		m->next = NULL;
> > > > +		m->nb_segs = 1;
> > > >  		rte_mbuf_refcnt_set(m, 1);
> > > >
> > > >  		return m;
> > > > --
> > > > 2.25.1
> > >
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  7:46   ` Olivier Matz
  2020-11-05  8:26     ` Andrew Rybchenko
  2020-11-05  8:33     ` Morten Brørup
@ 2020-11-05  9:09     ` Andrew Rybchenko
  2 siblings, 0 replies; 74+ messages in thread
From: Andrew Rybchenko @ 2020-11-05  9:09 UTC (permalink / raw)
  To: dev; +Cc: Morten Brørup, Ananyev, Konstantin, dpdk stable, Olivier Matz

Just resend with lost Cc restored.

On 11/5/20 10:46 AM, Olivier Matz wrote:
> On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
>>
>> Hi Olivier,
>>  
>>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
>>> because it can happen that m->nb_seg is != 1. For instance in this
>>> case:
>>>
>>>   m1 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m1, 500);
>>>   m2 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m2, 500);
>>>   rte_pktmbuf_chain(m1, m2);
>>>   m0 = rte_pktmbuf_alloc(mp);
>>>   rte_pktmbuf_append(m0, 500);
>>>   rte_pktmbuf_chain(m0, m1);
>>>
>>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
>>> segment (this is not required), after this code the mbuf chain
>>> have 3 segments:
>>>   - m0: next=m1, nb_seg=3
>>>   - m1: next=m2, nb_seg=2
>>>   - m2: next=NULL, nb_seg=1
>>>
>>> Freeing this mbuf chain will not restore nb_seg=1 in the second
>>> segment. 
>>
>> Hmm, not sure why is that?
>> You are talking about freeing m1, right?
>> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>> {
>> 	...
>> 	if (m->next != NULL) {
>>                         m->next = NULL;
>>                         m->nb_segs = 1;
>>                 }
>>
>> m1->next != NULL, so it will enter the if() block,
>> and will reset both next and nb_segs.
>> What I am missing here? 
>> Thinking in more generic way, that change:
>>  -		if (m->next != NULL) {
>>  -			m->next = NULL;
>>  -			m->nb_segs = 1;
>>  -		}
>>  +		m->next = NULL;
>>  +		m->nb_segs = 1;
> 
> Ah, sorry. I oversimplified the example and now it does not
> show the issue...
> 
> The full example also adds a split() to break the mbuf chain
> between m1 and m2. The kind of thing that would be done for
> software TCP segmentation.
> 

If so, may be the right solution is to care about nb_segs
when next is set to NULL on split? Any place when next is set
to NULL. Just to keep the optimization in a more generic place.

> After this operation, we have 2 mbuf chain:
>  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
>  - new_m with 1 segment
> 
> Freeing m0 will not restore nb_seg=1 in the second segment.
> 
>> Assumes that it is ok to have an mbuf with
>> nb_seg > 1 and next == NULL.
>> Which seems wrong to me.
> 
> I don't think it is wrong: nb_seg is just ignored when not in the first
> segment, and there is nothing saying it should be set to 1. Typically,
> rte_pktmbuf_chain() does not change it, and I guess it's the same for
> many similar functions in applications.
> 
> Olivier
> 
>>
>>
>>> This is expected that mbufs stored in pool have their
>>> nb_seg field set to 1.
>>>
>>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>> ---
>>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
>>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
>>>  2 files changed, 6 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>> index 8a456e5e64..e632071c23 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
>>>
>>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>>>  	m->ol_flags = EXT_ATTACHED_MBUF;
>>> -	if (m->next != NULL) {
>>> -		m->next = NULL;
>>> -		m->nb_segs = 1;
>>> -	}
>>> +	m->next = NULL;
>>> +	m->nb_segs = 1;
>>>  	rte_mbuf_raw_free(m);
>>>  }
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>> index a1414ed7cd..ef5800c8ef 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>
>>>  		return m;
>>>
>>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>>>  				return NULL;
>>>  		}
>>>
>>> -		if (m->next != NULL) {
>>> -			m->next = NULL;
>>> -			m->nb_segs = 1;
>>> -		}
>>> +		m->next = NULL;
>>> +		m->nb_segs = 1;
>>>  		rte_mbuf_refcnt_set(m, 1);
>>>
>>>  		return m;
>>> --
>>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  8:26     ` Andrew Rybchenko
@ 2020-11-05  9:10       ` Olivier Matz
  2020-11-05 11:34         ` Ananyev, Konstantin
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2020-11-05  9:10 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev

On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> On 11/5/20 10:46 AM, Olivier Matz wrote:
> > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> >>
> >> Hi Olivier,
> >>  
> >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> >>> because it can happen that m->nb_seg is != 1. For instance in this
> >>> case:
> >>>
> >>>   m1 = rte_pktmbuf_alloc(mp);
> >>>   rte_pktmbuf_append(m1, 500);
> >>>   m2 = rte_pktmbuf_alloc(mp);
> >>>   rte_pktmbuf_append(m2, 500);
> >>>   rte_pktmbuf_chain(m1, m2);
> >>>   m0 = rte_pktmbuf_alloc(mp);
> >>>   rte_pktmbuf_append(m0, 500);
> >>>   rte_pktmbuf_chain(m0, m1);
> >>>
> >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> >>> segment (this is not required), after this code the mbuf chain
> >>> have 3 segments:
> >>>   - m0: next=m1, nb_seg=3
> >>>   - m1: next=m2, nb_seg=2
> >>>   - m2: next=NULL, nb_seg=1
> >>>
> >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> >>> segment. 
> >>
> >> Hmm, not sure why is that?
> >> You are talking about freeing m1, right?
> >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >> {
> >> 	...
> >> 	if (m->next != NULL) {
> >>                         m->next = NULL;
> >>                         m->nb_segs = 1;
> >>                 }
> >>
> >> m1->next != NULL, so it will enter the if() block,
> >> and will reset both next and nb_segs.
> >> What I am missing here? 
> >> Thinking in more generic way, that change:
> >>  -		if (m->next != NULL) {
> >>  -			m->next = NULL;
> >>  -			m->nb_segs = 1;
> >>  -		}
> >>  +		m->next = NULL;
> >>  +		m->nb_segs = 1;
> > 
> > Ah, sorry. I oversimplified the example and now it does not
> > show the issue...
> > 
> > The full example also adds a split() to break the mbuf chain
> > between m1 and m2. The kind of thing that would be done for
> > software TCP segmentation.
> > 
> 
> If so, may be the right solution is to care about nb_segs
> when next is set to NULL on split? Any place when next is set
> to NULL. Just to keep the optimization in a more generic place.

The problem with that approach is that there are already several
existing split() or trim() implementations in different DPDK-based
applications. For instance, we have some in 6WINDGate. If we force
applications to set nb_seg to 1 when resetting next, it has to be
documented because it is not straightforward. I think the approach from
this patch is safer.

By the way, for 21.11, if we are able to do some optimizations and have
both pool (index?) and next in the first cache line, we may reconsider
the fact that next and nb_segs are already set for new allocated mbufs,
because it is not straightforward either.


> > After this operation, we have 2 mbuf chain:
> >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> >  - new_m with 1 segment
> > 
> > Freeing m0 will not restore nb_seg=1 in the second segment.
> > 
> >> Assumes that it is ok to have an mbuf with
> >> nb_seg > 1 and next == NULL.
> >> Which seems wrong to me.
> > 
> > I don't think it is wrong: nb_seg is just ignored when not in the first
> > segment, and there is nothing saying it should be set to 1. Typically,
> > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > many similar functions in applications.
> > 
> > Olivier
> > 
> >>
> >>
> >>> This is expected that mbufs stored in pool have their
> >>> nb_seg field set to 1.
> >>>
> >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> >>> ---
> >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> >>> index 8a456e5e64..e632071c23 100644
> >>> --- a/lib/librte_mbuf/rte_mbuf.c
> >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> >>>
> >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> >>> -	if (m->next != NULL) {
> >>> -		m->next = NULL;
> >>> -		m->nb_segs = 1;
> >>> -	}
> >>> +	m->next = NULL;
> >>> +	m->nb_segs = 1;
> >>>  	rte_mbuf_raw_free(m);
> >>>  }
> >>>
> >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >>> index a1414ed7cd..ef5800c8ef 100644
> >>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >>>  				return NULL;
> >>>  		}
> >>>
> >>> -		if (m->next != NULL) {
> >>> -			m->next = NULL;
> >>> -			m->nb_segs = 1;
> >>> -		}
> >>> +		m->next = NULL;
> >>> +		m->nb_segs = 1;
> >>>
> >>>  		return m;
> >>>
> >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >>>  				return NULL;
> >>>  		}
> >>>
> >>> -		if (m->next != NULL) {
> >>> -			m->next = NULL;
> >>> -			m->nb_segs = 1;
> >>> -		}
> >>> +		m->next = NULL;
> >>> +		m->nb_segs = 1;
> >>>  		rte_mbuf_refcnt_set(m, 1);
> >>>
> >>>  		return m;
> >>> --
> >>> 2.25.1
> >>
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05  9:10       ` Olivier Matz
@ 2020-11-05 11:34         ` Ananyev, Konstantin
  2020-11-05 12:31           ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-05 11:34 UTC (permalink / raw)
  To: Olivier Matz, Andrew Rybchenko; +Cc: dev



> On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > >>
> > >> Hi Olivier,
> > >>
> > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > >>> case:
> > >>>
> > >>>   m1 = rte_pktmbuf_alloc(mp);
> > >>>   rte_pktmbuf_append(m1, 500);
> > >>>   m2 = rte_pktmbuf_alloc(mp);
> > >>>   rte_pktmbuf_append(m2, 500);
> > >>>   rte_pktmbuf_chain(m1, m2);
> > >>>   m0 = rte_pktmbuf_alloc(mp);
> > >>>   rte_pktmbuf_append(m0, 500);
> > >>>   rte_pktmbuf_chain(m0, m1);
> > >>>
> > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > >>> segment (this is not required), after this code the mbuf chain
> > >>> have 3 segments:
> > >>>   - m0: next=m1, nb_seg=3
> > >>>   - m1: next=m2, nb_seg=2
> > >>>   - m2: next=NULL, nb_seg=1
> > >>>
> > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > >>> segment.
> > >>
> > >> Hmm, not sure why is that?
> > >> You are talking about freeing m1, right?
> > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >> {
> > >> 	...
> > >> 	if (m->next != NULL) {
> > >>                         m->next = NULL;
> > >>                         m->nb_segs = 1;
> > >>                 }
> > >>
> > >> m1->next != NULL, so it will enter the if() block,
> > >> and will reset both next and nb_segs.
> > >> What I am missing here?
> > >> Thinking in more generic way, that change:
> > >>  -		if (m->next != NULL) {
> > >>  -			m->next = NULL;
> > >>  -			m->nb_segs = 1;
> > >>  -		}
> > >>  +		m->next = NULL;
> > >>  +		m->nb_segs = 1;
> > >
> > > Ah, sorry. I oversimplified the example and now it does not
> > > show the issue...
> > >
> > > The full example also adds a split() to break the mbuf chain
> > > between m1 and m2. The kind of thing that would be done for
> > > software TCP segmentation.
> > >
> >
> > If so, may be the right solution is to care about nb_segs
> > when next is set to NULL on split? Any place when next is set
> > to NULL. Just to keep the optimization in a more generic place.


> The problem with that approach is that there are already several
> existing split() or trim() implementations in different DPDK-based
> applications. For instance, we have some in 6WINDGate. If we force
> applications to set nb_seg to 1 when resetting next, it has to be
> documented because it is not straightforward. 

I think it is better to go that way.
From my perspective it seems natural to reset nb_seg at same time
we reset next, otherwise inconsistency will occur. 

> I think the approach from
> this patch is safer.

It might be easier from perspective that changes in less places are required,
Though I think that this patch will introduce some performance drop.
As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
 
> By the way, for 21.11, if we are able to do some optimizations and have
> both pool (index?) and next in the first cache line, we may reconsider
> the fact that next and nb_segs are already set for new allocated mbufs,
> because it is not straightforward either.

My suggestion - let's put future optimization discussion aside for now,
and concentrate on that particular patch. 

> 
> > > After this operation, we have 2 mbuf chain:
> > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > >  - new_m with 1 segment
> > >
> > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > >
> > >> Assumes that it is ok to have an mbuf with
> > >> nb_seg > 1 and next == NULL.
> > >> Which seems wrong to me.
> > >
> > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > segment, and there is nothing saying it should be set to 1. Typically,
> > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > many similar functions in applications.
> > >
> > > Olivier
> > >
> > >>
> > >>
> > >>> This is expected that mbufs stored in pool have their
> > >>> nb_seg field set to 1.
> > >>>
> > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > >>> Cc: stable@dpdk.org
> > >>>
> > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > >>> ---
> > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > >>>
> > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > >>> index 8a456e5e64..e632071c23 100644
> > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > >>>
> > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > >>> -	if (m->next != NULL) {
> > >>> -		m->next = NULL;
> > >>> -		m->nb_segs = 1;
> > >>> -	}
> > >>> +	m->next = NULL;
> > >>> +	m->nb_segs = 1;
> > >>>  	rte_mbuf_raw_free(m);
> > >>>  }
> > >>>
> > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > >>> index a1414ed7cd..ef5800c8ef 100644
> > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >>>  				return NULL;
> > >>>  		}
> > >>>
> > >>> -		if (m->next != NULL) {
> > >>> -			m->next = NULL;
> > >>> -			m->nb_segs = 1;
> > >>> -		}
> > >>> +		m->next = NULL;
> > >>> +		m->nb_segs = 1;
> > >>>
> > >>>  		return m;
> > >>>
> > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >>>  				return NULL;
> > >>>  		}
> > >>>
> > >>> -		if (m->next != NULL) {
> > >>> -			m->next = NULL;
> > >>> -			m->nb_segs = 1;
> > >>> -		}
> > >>> +		m->next = NULL;
> > >>> +		m->nb_segs = 1;
> > >>>  		rte_mbuf_refcnt_set(m, 1);
> > >>>
> > >>>  		return m;
> > >>> --
> > >>> 2.25.1
> > >>
> >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 11:34         ` Ananyev, Konstantin
@ 2020-11-05 12:31           ` Olivier Matz
  2020-11-05 13:14             ` Ananyev, Konstantin
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2020-11-05 12:31 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Andrew Rybchenko, dev

On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > >>
> > > >> Hi Olivier,
> > > >>
> > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > >>> case:
> > > >>>
> > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > >>>   rte_pktmbuf_append(m1, 500);
> > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > >>>   rte_pktmbuf_append(m2, 500);
> > > >>>   rte_pktmbuf_chain(m1, m2);
> > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > >>>   rte_pktmbuf_append(m0, 500);
> > > >>>   rte_pktmbuf_chain(m0, m1);
> > > >>>
> > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > >>> segment (this is not required), after this code the mbuf chain
> > > >>> have 3 segments:
> > > >>>   - m0: next=m1, nb_seg=3
> > > >>>   - m1: next=m2, nb_seg=2
> > > >>>   - m2: next=NULL, nb_seg=1
> > > >>>
> > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > >>> segment.
> > > >>
> > > >> Hmm, not sure why is that?
> > > >> You are talking about freeing m1, right?
> > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >> {
> > > >> 	...
> > > >> 	if (m->next != NULL) {
> > > >>                         m->next = NULL;
> > > >>                         m->nb_segs = 1;
> > > >>                 }
> > > >>
> > > >> m1->next != NULL, so it will enter the if() block,
> > > >> and will reset both next and nb_segs.
> > > >> What I am missing here?
> > > >> Thinking in more generic way, that change:
> > > >>  -		if (m->next != NULL) {
> > > >>  -			m->next = NULL;
> > > >>  -			m->nb_segs = 1;
> > > >>  -		}
> > > >>  +		m->next = NULL;
> > > >>  +		m->nb_segs = 1;
> > > >
> > > > Ah, sorry. I oversimplified the example and now it does not
> > > > show the issue...
> > > >
> > > > The full example also adds a split() to break the mbuf chain
> > > > between m1 and m2. The kind of thing that would be done for
> > > > software TCP segmentation.
> > > >
> > >
> > > If so, may be the right solution is to care about nb_segs
> > > when next is set to NULL on split? Any place when next is set
> > > to NULL. Just to keep the optimization in a more generic place.
> 
> 
> > The problem with that approach is that there are already several
> > existing split() or trim() implementations in different DPDK-based
> > applications. For instance, we have some in 6WINDGate. If we force
> > applications to set nb_seg to 1 when resetting next, it has to be
> > documented because it is not straightforward. 
> 
> I think it is better to go that way.
> From my perspective it seems natural to reset nb_seg at same time
> we reset next, otherwise inconsistency will occur. 

While it is not explicitly stated for nb_segs, to me it was clear that
nb_segs is only valid in the first segment, like for many fields (port,
ol_flags, vlan, rss, ...).

If we say that nb_segs has to be valid in any segments, it means that
chain() or split() will have to update it in all segments, which is not
efficient.

Saying that nb_segs has to be valid for the first and last segment seems
really odd to me. What is the logic behind that except keeping this test
as is?

In any case, it has to be better documented.


Olivier


> > I think the approach from
> > this patch is safer.
> 
> It might be easier from perspective that changes in less places are required,
> Though I think that this patch will introduce some performance drop.
> As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
>  
> > By the way, for 21.11, if we are able to do some optimizations and have
> > both pool (index?) and next in the first cache line, we may reconsider
> > the fact that next and nb_segs are already set for new allocated mbufs,
> > because it is not straightforward either.
> 
> My suggestion - let's put future optimization discussion aside for now,
> and concentrate on that particular patch. 
> 
> > 
> > > > After this operation, we have 2 mbuf chain:
> > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > >  - new_m with 1 segment
> > > >
> > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > >
> > > >> Assumes that it is ok to have an mbuf with
> > > >> nb_seg > 1 and next == NULL.
> > > >> Which seems wrong to me.
> > > >
> > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > many similar functions in applications.
> > > >
> > > > Olivier
> > > >
> > > >>
> > > >>
> > > >>> This is expected that mbufs stored in pool have their
> > > >>> nb_seg field set to 1.
> > > >>>
> > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > >>> Cc: stable@dpdk.org
> > > >>>
> > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > >>> ---
> > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > >>>
> > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > >>> index 8a456e5e64..e632071c23 100644
> > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > >>>
> > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > >>> -	if (m->next != NULL) {
> > > >>> -		m->next = NULL;
> > > >>> -		m->nb_segs = 1;
> > > >>> -	}
> > > >>> +	m->next = NULL;
> > > >>> +	m->nb_segs = 1;
> > > >>>  	rte_mbuf_raw_free(m);
> > > >>>  }
> > > >>>
> > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >>>  				return NULL;
> > > >>>  		}
> > > >>>
> > > >>> -		if (m->next != NULL) {
> > > >>> -			m->next = NULL;
> > > >>> -			m->nb_segs = 1;
> > > >>> -		}
> > > >>> +		m->next = NULL;
> > > >>> +		m->nb_segs = 1;
> > > >>>
> > > >>>  		return m;
> > > >>>
> > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >>>  				return NULL;
> > > >>>  		}
> > > >>>
> > > >>> -		if (m->next != NULL) {
> > > >>> -			m->next = NULL;
> > > >>> -			m->nb_segs = 1;
> > > >>> -		}
> > > >>> +		m->next = NULL;
> > > >>> +		m->nb_segs = 1;
> > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > >>>
> > > >>>  		return m;
> > > >>> --
> > > >>> 2.25.1
> > > >>
> > >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 12:31           ` Olivier Matz
@ 2020-11-05 13:14             ` Ananyev, Konstantin
  2020-11-05 13:24               ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-05 13:14 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Andrew Rybchenko, dev



> 
> On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > > >>
> > > > >> Hi Olivier,
> > > > >>
> > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > > >>> case:
> > > > >>>
> > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > >>>
> > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > >>> segment (this is not required), after this code the mbuf chain
> > > > >>> have 3 segments:
> > > > >>>   - m0: next=m1, nb_seg=3
> > > > >>>   - m1: next=m2, nb_seg=2
> > > > >>>   - m2: next=NULL, nb_seg=1
> > > > >>>
> > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > >>> segment.
> > > > >>
> > > > >> Hmm, not sure why is that?
> > > > >> You are talking about freeing m1, right?
> > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >> {
> > > > >> 	...
> > > > >> 	if (m->next != NULL) {
> > > > >>                         m->next = NULL;
> > > > >>                         m->nb_segs = 1;
> > > > >>                 }
> > > > >>
> > > > >> m1->next != NULL, so it will enter the if() block,
> > > > >> and will reset both next and nb_segs.
> > > > >> What I am missing here?
> > > > >> Thinking in more generic way, that change:
> > > > >>  -		if (m->next != NULL) {
> > > > >>  -			m->next = NULL;
> > > > >>  -			m->nb_segs = 1;
> > > > >>  -		}
> > > > >>  +		m->next = NULL;
> > > > >>  +		m->nb_segs = 1;
> > > > >
> > > > > Ah, sorry. I oversimplified the example and now it does not
> > > > > show the issue...
> > > > >
> > > > > The full example also adds a split() to break the mbuf chain
> > > > > between m1 and m2. The kind of thing that would be done for
> > > > > software TCP segmentation.
> > > > >
> > > >
> > > > If so, may be the right solution is to care about nb_segs
> > > > when next is set to NULL on split? Any place when next is set
> > > > to NULL. Just to keep the optimization in a more generic place.
> >
> >
> > > The problem with that approach is that there are already several
> > > existing split() or trim() implementations in different DPDK-based
> > > applications. For instance, we have some in 6WINDGate. If we force
> > > applications to set nb_seg to 1 when resetting next, it has to be
> > > documented because it is not straightforward.
> >
> > I think it is better to go that way.
> > From my perspective it seems natural to reset nb_seg at same time
> > we reset next, otherwise inconsistency will occur.
> 
> While it is not explicitly stated for nb_segs, to me it was clear that
> nb_segs is only valid in the first segment, like for many fields (port,
> ol_flags, vlan, rss, ...).
> 
> If we say that nb_segs has to be valid in any segments, it means that
> chain() or split() will have to update it in all segments, which is not
> efficient.

Why in all?
We can state that nb_segs on non-first segment should always equal 1.
As I understand in that case, both split() and chain() have to update nb_segs
only for head mbufs, rest ones will remain untouched.  

> 
> Saying that nb_segs has to be valid for the first and last segment seems
> really odd to me. What is the logic behind that except keeping this test
> as is?
> 
> In any case, it has to be better documented.
> 
> 
> Olivier
> 
> 
> > > I think the approach from
> > > this patch is safer.
> >
> > It might be easier from perspective that changes in less places are required,
> > Though I think that this patch will introduce some performance drop.
> > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
> >
> > > By the way, for 21.11, if we are able to do some optimizations and have
> > > both pool (index?) and next in the first cache line, we may reconsider
> > > the fact that next and nb_segs are already set for new allocated mbufs,
> > > because it is not straightforward either.
> >
> > My suggestion - let's put future optimization discussion aside for now,
> > and concentrate on that particular patch.
> >
> > >
> > > > > After this operation, we have 2 mbuf chain:
> > > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > > >  - new_m with 1 segment
> > > > >
> > > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > > >
> > > > >> Assumes that it is ok to have an mbuf with
> > > > >> nb_seg > 1 and next == NULL.
> > > > >> Which seems wrong to me.
> > > > >
> > > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > > many similar functions in applications.
> > > > >
> > > > > Olivier
> > > > >
> > > > >>
> > > > >>
> > > > >>> This is expected that mbufs stored in pool have their
> > > > >>> nb_seg field set to 1.
> > > > >>>
> > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > >>> Cc: stable@dpdk.org
> > > > >>>
> > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > >>> ---
> > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > >>>
> > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > >>> index 8a456e5e64..e632071c23 100644
> > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > >>>
> > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > >>> -	if (m->next != NULL) {
> > > > >>> -		m->next = NULL;
> > > > >>> -		m->nb_segs = 1;
> > > > >>> -	}
> > > > >>> +	m->next = NULL;
> > > > >>> +	m->nb_segs = 1;
> > > > >>>  	rte_mbuf_raw_free(m);
> > > > >>>  }
> > > > >>>
> > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >>>  				return NULL;
> > > > >>>  		}
> > > > >>>
> > > > >>> -		if (m->next != NULL) {
> > > > >>> -			m->next = NULL;
> > > > >>> -			m->nb_segs = 1;
> > > > >>> -		}
> > > > >>> +		m->next = NULL;
> > > > >>> +		m->nb_segs = 1;
> > > > >>>
> > > > >>>  		return m;
> > > > >>>
> > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >>>  				return NULL;
> > > > >>>  		}
> > > > >>>
> > > > >>> -		if (m->next != NULL) {
> > > > >>> -			m->next = NULL;
> > > > >>> -			m->nb_segs = 1;
> > > > >>> -		}
> > > > >>> +		m->next = NULL;
> > > > >>> +		m->nb_segs = 1;
> > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > >>>
> > > > >>>  		return m;
> > > > >>> --
> > > > >>> 2.25.1
> > > > >>
> > > >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 13:14             ` Ananyev, Konstantin
@ 2020-11-05 13:24               ` Olivier Matz
  2020-11-05 13:55                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2020-11-05 13:24 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Andrew Rybchenko, dev

On Thu, Nov 05, 2020 at 01:14:07PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > 
> > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > > > >>
> > > > > >> Hi Olivier,
> > > > > >>
> > > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > > > >>> case:
> > > > > >>>
> > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > >>>
> > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > > >>> segment (this is not required), after this code the mbuf chain
> > > > > >>> have 3 segments:
> > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > >>>
> > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > > >>> segment.
> > > > > >>
> > > > > >> Hmm, not sure why is that?
> > > > > >> You are talking about freeing m1, right?
> > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >> {
> > > > > >> 	...
> > > > > >> 	if (m->next != NULL) {
> > > > > >>                         m->next = NULL;
> > > > > >>                         m->nb_segs = 1;
> > > > > >>                 }
> > > > > >>
> > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > >> and will reset both next and nb_segs.
> > > > > >> What I am missing here?
> > > > > >> Thinking in more generic way, that change:
> > > > > >>  -		if (m->next != NULL) {
> > > > > >>  -			m->next = NULL;
> > > > > >>  -			m->nb_segs = 1;
> > > > > >>  -		}
> > > > > >>  +		m->next = NULL;
> > > > > >>  +		m->nb_segs = 1;
> > > > > >
> > > > > > Ah, sorry. I oversimplified the example and now it does not
> > > > > > show the issue...
> > > > > >
> > > > > > The full example also adds a split() to break the mbuf chain
> > > > > > between m1 and m2. The kind of thing that would be done for
> > > > > > software TCP segmentation.
> > > > > >
> > > > >
> > > > > If so, may be the right solution is to care about nb_segs
> > > > > when next is set to NULL on split? Any place when next is set
> > > > > to NULL. Just to keep the optimization in a more generic place.
> > >
> > >
> > > > The problem with that approach is that there are already several
> > > > existing split() or trim() implementations in different DPDK-based
> > > > applications. For instance, we have some in 6WINDGate. If we force
> > > > applications to set nb_seg to 1 when resetting next, it has to be
> > > > documented because it is not straightforward.
> > >
> > > I think it is better to go that way.
> > > From my perspective it seems natural to reset nb_seg at same time
> > > we reset next, otherwise inconsistency will occur.
> > 
> > While it is not explicitly stated for nb_segs, to me it was clear that
> > nb_segs is only valid in the first segment, like for many fields (port,
> > ol_flags, vlan, rss, ...).
> > 
> > If we say that nb_segs has to be valid in any segments, it means that
> > chain() or split() will have to update it in all segments, which is not
> > efficient.
> 
> Why in all?
> We can state that nb_segs on non-first segment should always equal 1.
> As I understand in that case, both split() and chain() have to update nb_segs
> only for head mbufs, rest ones will remain untouched.  

Well, anyway, I think it's strange to have a constraint on m->nb_segs for
non-first segment. We don't have that kind of constraints for other fields.


> 
> > 
> > Saying that nb_segs has to be valid for the first and last segment seems
> > really odd to me. What is the logic behind that except keeping this test
> > as is?
> > 
> > In any case, it has to be better documented.
> > 
> > 
> > Olivier
> > 
> > 
> > > > I think the approach from
> > > > this patch is safer.
> > >
> > > It might be easier from perspective that changes in less places are required,
> > > Though I think that this patch will introduce some performance drop.
> > > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
> > >
> > > > By the way, for 21.11, if we are able to do some optimizations and have
> > > > both pool (index?) and next in the first cache line, we may reconsider
> > > > the fact that next and nb_segs are already set for new allocated mbufs,
> > > > because it is not straightforward either.
> > >
> > > My suggestion - let's put future optimization discussion aside for now,
> > > and concentrate on that particular patch.
> > >
> > > >
> > > > > > After this operation, we have 2 mbuf chain:
> > > > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > > > >  - new_m with 1 segment
> > > > > >
> > > > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > > > >
> > > > > >> Assumes that it is ok to have an mbuf with
> > > > > >> nb_seg > 1 and next == NULL.
> > > > > >> Which seems wrong to me.
> > > > > >
> > > > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > > > many similar functions in applications.
> > > > > >
> > > > > > Olivier
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >>> This is expected that mbufs stored in pool have their
> > > > > >>> nb_seg field set to 1.
> > > > > >>>
> > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > > >>> Cc: stable@dpdk.org
> > > > > >>>
> > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > >>> ---
> > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > >>>
> > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > >>>
> > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > >>> -	if (m->next != NULL) {
> > > > > >>> -		m->next = NULL;
> > > > > >>> -		m->nb_segs = 1;
> > > > > >>> -	}
> > > > > >>> +	m->next = NULL;
> > > > > >>> +	m->nb_segs = 1;
> > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > >>>  }
> > > > > >>>
> > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >>>  				return NULL;
> > > > > >>>  		}
> > > > > >>>
> > > > > >>> -		if (m->next != NULL) {
> > > > > >>> -			m->next = NULL;
> > > > > >>> -			m->nb_segs = 1;
> > > > > >>> -		}
> > > > > >>> +		m->next = NULL;
> > > > > >>> +		m->nb_segs = 1;
> > > > > >>>
> > > > > >>>  		return m;
> > > > > >>>
> > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >>>  				return NULL;
> > > > > >>>  		}
> > > > > >>>
> > > > > >>> -		if (m->next != NULL) {
> > > > > >>> -			m->next = NULL;
> > > > > >>> -			m->nb_segs = 1;
> > > > > >>> -		}
> > > > > >>> +		m->next = NULL;
> > > > > >>> +		m->nb_segs = 1;
> > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > >>>
> > > > > >>>  		return m;
> > > > > >>> --
> > > > > >>> 2.25.1
> > > > > >>
> > > > >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 13:24               ` Olivier Matz
@ 2020-11-05 13:55                 ` Ananyev, Konstantin
  2020-11-05 16:30                   ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-05 13:55 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Andrew Rybchenko, dev



> >
> > >
> > > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko wrote:
> > > > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev, Konstantin wrote:
> > > > > > >>
> > > > > > >> Hi Olivier,
> > > > > > >>
> > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > > > > > >>> because it can happen that m->nb_seg is != 1. For instance in this
> > > > > > >>> case:
> > > > > > >>>
> > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > >>>
> > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > > > > >>> segment (this is not required), after this code the mbuf chain
> > > > > > >>> have 3 segments:
> > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > >>>
> > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in the second
> > > > > > >>> segment.
> > > > > > >>
> > > > > > >> Hmm, not sure why is that?
> > > > > > >> You are talking about freeing m1, right?
> > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > >> {
> > > > > > >> 	...
> > > > > > >> 	if (m->next != NULL) {
> > > > > > >>                         m->next = NULL;
> > > > > > >>                         m->nb_segs = 1;
> > > > > > >>                 }
> > > > > > >>
> > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > >> and will reset both next and nb_segs.
> > > > > > >> What I am missing here?
> > > > > > >> Thinking in more generic way, that change:
> > > > > > >>  -		if (m->next != NULL) {
> > > > > > >>  -			m->next = NULL;
> > > > > > >>  -			m->nb_segs = 1;
> > > > > > >>  -		}
> > > > > > >>  +		m->next = NULL;
> > > > > > >>  +		m->nb_segs = 1;
> > > > > > >
> > > > > > > Ah, sorry. I oversimplified the example and now it does not
> > > > > > > show the issue...
> > > > > > >
> > > > > > > The full example also adds a split() to break the mbuf chain
> > > > > > > between m1 and m2. The kind of thing that would be done for
> > > > > > > software TCP segmentation.
> > > > > > >
> > > > > >
> > > > > > If so, may be the right solution is to care about nb_segs
> > > > > > when next is set to NULL on split? Any place when next is set
> > > > > > to NULL. Just to keep the optimization in a more generic place.
> > > >
> > > >
> > > > > The problem with that approach is that there are already several
> > > > > existing split() or trim() implementations in different DPDK-based
> > > > > applications. For instance, we have some in 6WINDGate. If we force
> > > > > applications to set nb_seg to 1 when resetting next, it has to be
> > > > > documented because it is not straightforward.
> > > >
> > > > I think it is better to go that way.
> > > > From my perspective it seems natural to reset nb_seg at same time
> > > > we reset next, otherwise inconsistency will occur.
> > >
> > > While it is not explicitly stated for nb_segs, to me it was clear that
> > > nb_segs is only valid in the first segment, like for many fields (port,
> > > ol_flags, vlan, rss, ...).
> > >
> > > If we say that nb_segs has to be valid in any segments, it means that
> > > chain() or split() will have to update it in all segments, which is not
> > > efficient.
> >
> > Why in all?
> > We can state that nb_segs on non-first segment should always equal 1.
> > As I understand in that case, both split() and chain() have to update nb_segs
> > only for head mbufs, rest ones will remain untouched.
> 
> Well, anyway, I think it's strange to have a constraint on m->nb_segs for
> non-first segment. We don't have that kind of constraints for other fields.

True, we don't. But this is one of the fields we consider critical
for proper work of mbuf alloc/free mechanism. 

> 
> 
> >
> > >
> > > Saying that nb_segs has to be valid for the first and last segment seems
> > > really odd to me. What is the logic behind that except keeping this test
> > > as is?
> > >
> > > In any case, it has to be better documented.
> > >
> > >
> > > Olivier
> > >
> > >
> > > > > I think the approach from
> > > > > this patch is safer.
> > > >
> > > > It might be easier from perspective that changes in less places are required,
> > > > Though I think that this patch will introduce some performance drop.
> > > > As now each mbuf_prefree_seg() will cause update of 2 cache lines unconditionally.
> > > >
> > > > > By the way, for 21.11, if we are able to do some optimizations and have
> > > > > both pool (index?) and next in the first cache line, we may reconsider
> > > > > the fact that next and nb_segs are already set for new allocated mbufs,
> > > > > because it is not straightforward either.
> > > >
> > > > My suggestion - let's put future optimization discussion aside for now,
> > > > and concentrate on that particular patch.
> > > >
> > > > >
> > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > >  - m0 with 2 segments, the last one has next=NULL but nb_seg=2
> > > > > > >  - new_m with 1 segment
> > > > > > >
> > > > > > > Freeing m0 will not restore nb_seg=1 in the second segment.
> > > > > > >
> > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > >> Which seems wrong to me.
> > > > > > >
> > > > > > > I don't think it is wrong: nb_seg is just ignored when not in the first
> > > > > > > segment, and there is nothing saying it should be set to 1. Typically,
> > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's the same for
> > > > > > > many similar functions in applications.
> > > > > > >
> > > > > > > Olivier
> > > > > > >
> > > > > > >>
> > > > > > >>
> > > > > > >>> This is expected that mbufs stored in pool have their
> > > > > > >>> nb_seg field set to 1.
> > > > > > >>>
> > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > > > > > >>> Cc: stable@dpdk.org
> > > > > > >>>
> > > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > >>> ---
> > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > >>>
> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > >>> @@ -129,10 +129,8 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > >>>
> > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > >>> -	if (m->next != NULL) {
> > > > > > >>> -		m->next = NULL;
> > > > > > >>> -		m->nb_segs = 1;
> > > > > > >>> -	}
> > > > > > >>> +	m->next = NULL;
> > > > > > >>> +	m->nb_segs = 1;
> > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > >>>  }
> > > > > > >>>
> > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > >>>  				return NULL;
> > > > > > >>>  		}
> > > > > > >>>
> > > > > > >>> -		if (m->next != NULL) {
> > > > > > >>> -			m->next = NULL;
> > > > > > >>> -			m->nb_segs = 1;
> > > > > > >>> -		}
> > > > > > >>> +		m->next = NULL;
> > > > > > >>> +		m->nb_segs = 1;
> > > > > > >>>
> > > > > > >>>  		return m;
> > > > > > >>>
> > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > >>>  				return NULL;
> > > > > > >>>  		}
> > > > > > >>>
> > > > > > >>> -		if (m->next != NULL) {
> > > > > > >>> -			m->next = NULL;
> > > > > > >>> -			m->nb_segs = 1;
> > > > > > >>> -		}
> > > > > > >>> +		m->next = NULL;
> > > > > > >>> +		m->nb_segs = 1;
> > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > >>>
> > > > > > >>>  		return m;
> > > > > > >>> --
> > > > > > >>> 2.25.1
> > > > > > >>
> > > > > >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 13:55                 ` Ananyev, Konstantin
@ 2020-11-05 16:30                   ` Morten Brørup
  2020-11-05 23:55                     ` Ananyev, Konstantin
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-11-05 16:30 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier Matz; +Cc: Andrew Rybchenko, dev

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Thursday, November 5, 2020 2:56 PM
> 
> > >
> > > >
> > > > On Thu, Nov 05, 2020 at 11:34:18AM +0000, Ananyev, Konstantin
> wrote:
> > > > >
> > > > >
> > > > > > On Thu, Nov 05, 2020 at 11:26:51AM +0300, Andrew Rybchenko
> wrote:
> > > > > > > On 11/5/20 10:46 AM, Olivier Matz wrote:
> > > > > > > > On Thu, Nov 05, 2020 at 12:15:49AM +0000, Ananyev,
> Konstantin wrote:
> > > > > > > >>
> > > > > > > >> Hi Olivier,
> > > > > > > >>
> > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value
> of m->next,
> > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> instance in this
> > > > > > > >>> case:
> > > > > > > >>>
> > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > >>>
> > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> initial m1
> > > > > > > >>> segment (this is not required), after this code the
> mbuf chain
> > > > > > > >>> have 3 segments:
> > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > >>>
> > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in
> the second
> > > > > > > >>> segment.
> > > > > > > >>
> > > > > > > >> Hmm, not sure why is that?
> > > > > > > >> You are talking about freeing m1, right?
> > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > >> {
> > > > > > > >> 	...
> > > > > > > >> 	if (m->next != NULL) {
> > > > > > > >>                         m->next = NULL;
> > > > > > > >>                         m->nb_segs = 1;
> > > > > > > >>                 }
> > > > > > > >>
> > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > >> and will reset both next and nb_segs.
> > > > > > > >> What I am missing here?
> > > > > > > >> Thinking in more generic way, that change:
> > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > >>  -			m->next = NULL;
> > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > >>  -		}
> > > > > > > >>  +		m->next = NULL;
> > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > >
> > > > > > > > Ah, sorry. I oversimplified the example and now it does
> not
> > > > > > > > show the issue...
> > > > > > > >
> > > > > > > > The full example also adds a split() to break the mbuf
> chain
> > > > > > > > between m1 and m2. The kind of thing that would be done
> for
> > > > > > > > software TCP segmentation.
> > > > > > > >
> > > > > > >
> > > > > > > If so, may be the right solution is to care about nb_segs
> > > > > > > when next is set to NULL on split? Any place when next is
> set
> > > > > > > to NULL. Just to keep the optimization in a more generic
> place.
> > > > >
> > > > >
> > > > > > The problem with that approach is that there are already
> several
> > > > > > existing split() or trim() implementations in different DPDK-
> based
> > > > > > applications. For instance, we have some in 6WINDGate. If we
> force
> > > > > > applications to set nb_seg to 1 when resetting next, it has
> to be
> > > > > > documented because it is not straightforward.
> > > > >
> > > > > I think it is better to go that way.
> > > > > From my perspective it seems natural to reset nb_seg at same
> time
> > > > > we reset next, otherwise inconsistency will occur.
> > > >
> > > > While it is not explicitly stated for nb_segs, to me it was clear
> that
> > > > nb_segs is only valid in the first segment, like for many fields
> (port,
> > > > ol_flags, vlan, rss, ...).
> > > >
> > > > If we say that nb_segs has to be valid in any segments, it means
> that
> > > > chain() or split() will have to update it in all segments, which
> is not
> > > > efficient.
> > >
> > > Why in all?
> > > We can state that nb_segs on non-first segment should always equal
> 1.
> > > As I understand in that case, both split() and chain() have to
> update nb_segs
> > > only for head mbufs, rest ones will remain untouched.
> >
> > Well, anyway, I think it's strange to have a constraint on m->nb_segs
> for
> > non-first segment. We don't have that kind of constraints for other
> fields.
> 
> True, we don't. But this is one of the fields we consider critical
> for proper work of mbuf alloc/free mechanism.
> 

I am not sure that requiring m->nb_segs == 1 on non-first segments will provide any benefits.

E.g. the second segment of a three-segment chain will still have m->next != NULL, so it cannot be used as a gate to prevent accessing m->next.

I might have overlooked something, though.

> >
> >
> > >
> > > >
> > > > Saying that nb_segs has to be valid for the first and last
> segment seems
> > > > really odd to me. What is the logic behind that except keeping
> this test
> > > > as is?
> > > >
> > > > In any case, it has to be better documented.
> > > >
> > > >
> > > > Olivier
> > > >
> > > >
> > > > > > I think the approach from
> > > > > > this patch is safer.
> > > > >
> > > > > It might be easier from perspective that changes in less places
> are required,
> > > > > Though I think that this patch will introduce some performance
> drop.
> > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> lines unconditionally.
> > > > >
> > > > > > By the way, for 21.11, if we are able to do some
> optimizations and have
> > > > > > both pool (index?) and next in the first cache line, we may
> reconsider
> > > > > > the fact that next and nb_segs are already set for new
> allocated mbufs,
> > > > > > because it is not straightforward either.
> > > > >
> > > > > My suggestion - let's put future optimization discussion aside
> for now,
> > > > > and concentrate on that particular patch.
> > > > >
> > > > > >
> > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> nb_seg=2
> > > > > > > >  - new_m with 1 segment
> > > > > > > >
> > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> segment.
> > > > > > > >
> > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > >> Which seems wrong to me.
> > > > > > > >
> > > > > > > > I don't think it is wrong: nb_seg is just ignored when
> not in the first
> > > > > > > > segment, and there is nothing saying it should be set to
> 1. Typically,
> > > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's
> the same for
> > > > > > > > many similar functions in applications.
> > > > > > > >
> > > > > > > > Olivier
> > > > > > > >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> This is expected that mbufs stored in pool have their
> > > > > > > >>> nb_seg field set to 1.
> > > > > > > >>>
> > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in
> pool")
> > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > >>> ---
> > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > >>>
> > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > >>> @@ -129,10 +129,8 @@
> rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > >>>
> > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > >>> -		m->next = NULL;
> > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > >>> -	}
> > > > > > > >>> +	m->next = NULL;
> > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > >>>  }
> > > > > > > >>>
> > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct
> rte_mbuf *m)
> > > > > > > >>>  				return NULL;
> > > > > > > >>>  		}
> > > > > > > >>>
> > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > >>> -			m->next = NULL;
> > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > >>> -		}
> > > > > > > >>> +		m->next = NULL;
> > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > >>>
> > > > > > > >>>  		return m;
> > > > > > > >>>
> > > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct
> rte_mbuf *m)
> > > > > > > >>>  				return NULL;
> > > > > > > >>>  		}
> > > > > > > >>>
> > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > >>> -			m->next = NULL;
> > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > >>> -		}
> > > > > > > >>> +		m->next = NULL;
> > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > >>>
> > > > > > > >>>  		return m;
> > > > > > > >>> --
> > > > > > > >>> 2.25.1
> > > > > > > >>
> > > > > > >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 16:30                   ` Morten Brørup
@ 2020-11-05 23:55                     ` Ananyev, Konstantin
  2020-11-06  7:52                       ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-05 23:55 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz; +Cc: Andrew Rybchenko, dev



> > > > > > > > >> Hi Olivier,
> > > > > > > > >>
> > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the value
> > of m->next,
> > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > instance in this
> > > > > > > > >>> case:
> > > > > > > > >>>
> > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > >>>
> > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > initial m1
> > > > > > > > >>> segment (this is not required), after this code the
> > mbuf chain
> > > > > > > > >>> have 3 segments:
> > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > >>>
> > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1 in
> > the second
> > > > > > > > >>> segment.
> > > > > > > > >>
> > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > >> {
> > > > > > > > >> 	...
> > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > >>                         m->next = NULL;
> > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > >>                 }
> > > > > > > > >>
> > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > >> What I am missing here?
> > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > >>  -			m->next = NULL;
> > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > >>  -		}
> > > > > > > > >>  +		m->next = NULL;
> > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > >
> > > > > > > > > Ah, sorry. I oversimplified the example and now it does
> > not
> > > > > > > > > show the issue...
> > > > > > > > >
> > > > > > > > > The full example also adds a split() to break the mbuf
> > chain
> > > > > > > > > between m1 and m2. The kind of thing that would be done
> > for
> > > > > > > > > software TCP segmentation.
> > > > > > > > >
> > > > > > > >
> > > > > > > > If so, may be the right solution is to care about nb_segs
> > > > > > > > when next is set to NULL on split? Any place when next is
> > set
> > > > > > > > to NULL. Just to keep the optimization in a more generic
> > place.
> > > > > >
> > > > > >
> > > > > > > The problem with that approach is that there are already
> > several
> > > > > > > existing split() or trim() implementations in different DPDK-
> > based
> > > > > > > applications. For instance, we have some in 6WINDGate. If we
> > force
> > > > > > > applications to set nb_seg to 1 when resetting next, it has
> > to be
> > > > > > > documented because it is not straightforward.
> > > > > >
> > > > > > I think it is better to go that way.
> > > > > > From my perspective it seems natural to reset nb_seg at same
> > time
> > > > > > we reset next, otherwise inconsistency will occur.
> > > > >
> > > > > While it is not explicitly stated for nb_segs, to me it was clear
> > that
> > > > > nb_segs is only valid in the first segment, like for many fields
> > (port,
> > > > > ol_flags, vlan, rss, ...).
> > > > >
> > > > > If we say that nb_segs has to be valid in any segments, it means
> > that
> > > > > chain() or split() will have to update it in all segments, which
> > is not
> > > > > efficient.
> > > >
> > > > Why in all?
> > > > We can state that nb_segs on non-first segment should always equal
> > 1.
> > > > As I understand in that case, both split() and chain() have to
> > update nb_segs
> > > > only for head mbufs, rest ones will remain untouched.
> > >
> > > Well, anyway, I think it's strange to have a constraint on m->nb_segs
> > for
> > > non-first segment. We don't have that kind of constraints for other
> > fields.
> >
> > True, we don't. But this is one of the fields we consider critical
> > for proper work of mbuf alloc/free mechanism.
> >
> 
> I am not sure that requiring m->nb_segs == 1 on non-first segments will provide any benefits.

It would make this patch unneeded.
So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-free.

> 
> E.g. the second segment of a three-segment chain will still have m->next != NULL, so it cannot be used as a gate to prevent accessing m-
> >next.
> 
> I might have overlooked something, though.
> 
> > >
> > >
> > > >
> > > > >
> > > > > Saying that nb_segs has to be valid for the first and last
> > segment seems
> > > > > really odd to me. What is the logic behind that except keeping
> > this test
> > > > > as is?
> > > > >
> > > > > In any case, it has to be better documented.
> > > > >
> > > > >
> > > > > Olivier
> > > > >
> > > > >
> > > > > > > I think the approach from
> > > > > > > this patch is safer.
> > > > > >
> > > > > > It might be easier from perspective that changes in less places
> > are required,
> > > > > > Though I think that this patch will introduce some performance
> > drop.
> > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > lines unconditionally.
> > > > > >
> > > > > > > By the way, for 21.11, if we are able to do some
> > optimizations and have
> > > > > > > both pool (index?) and next in the first cache line, we may
> > reconsider
> > > > > > > the fact that next and nb_segs are already set for new
> > allocated mbufs,
> > > > > > > because it is not straightforward either.
> > > > > >
> > > > > > My suggestion - let's put future optimization discussion aside
> > for now,
> > > > > > and concentrate on that particular patch.
> > > > > >
> > > > > > >
> > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > nb_seg=2
> > > > > > > > >  - new_m with 1 segment
> > > > > > > > >
> > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > segment.
> > > > > > > > >
> > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > >> Which seems wrong to me.
> > > > > > > > >
> > > > > > > > > I don't think it is wrong: nb_seg is just ignored when
> > not in the first
> > > > > > > > > segment, and there is nothing saying it should be set to
> > 1. Typically,
> > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess it's
> > the same for
> > > > > > > > > many similar functions in applications.
> > > > > > > > >
> > > > > > > > > Olivier
> > > > > > > > >
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> This is expected that mbufs stored in pool have their
> > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > >>>
> > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in
> > pool")
> > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > >>>
> > > > > > > > >>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > > >>> ---
> > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > >>>
> > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > >>> @@ -129,10 +129,8 @@
> > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > >>>
> > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > >>> -		m->next = NULL;
> > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > >>> -	}
> > > > > > > > >>> +	m->next = NULL;
> > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > >>>  }
> > > > > > > > >>>
> > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > >>> @@ -1329,10 +1329,8 @@ rte_pktmbuf_prefree_seg(struct
> > rte_mbuf *m)
> > > > > > > > >>>  				return NULL;
> > > > > > > > >>>  		}
> > > > > > > > >>>
> > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > >>> -			m->next = NULL;
> > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > >>> -		}
> > > > > > > > >>> +		m->next = NULL;
> > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > >>>
> > > > > > > > >>>  		return m;
> > > > > > > > >>>
> > > > > > > > >>> @@ -1346,10 +1344,8 @@ rte_pktmbuf_prefree_seg(struct
> > rte_mbuf *m)
> > > > > > > > >>>  				return NULL;
> > > > > > > > >>>  		}
> > > > > > > > >>>
> > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > >>> -			m->next = NULL;
> > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > >>> -		}
> > > > > > > > >>> +		m->next = NULL;
> > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > >>>
> > > > > > > > >>>  		return m;
> > > > > > > > >>> --
> > > > > > > > >>> 2.25.1
> > > > > > > > >>
> > > > > > > >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-05 23:55                     ` Ananyev, Konstantin
@ 2020-11-06  7:52                       ` Morten Brørup
  2020-11-06  8:20                         ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-11-06  7:52 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier Matz; +Cc: Andrew Rybchenko, dev

> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Friday, November 6, 2020 12:55 AM
> 
> > > > > > > > > >> Hi Olivier,
> > > > > > > > > >>
> > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the
> value
> > > of m->next,
> > > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > > instance in this
> > > > > > > > > >>> case:
> > > > > > > > > >>>
> > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > >>>
> > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > > initial m1
> > > > > > > > > >>> segment (this is not required), after this code the
> > > mbuf chain
> > > > > > > > > >>> have 3 segments:
> > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > >>>
> > > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1
> in
> > > the second
> > > > > > > > > >>> segment.
> > > > > > > > > >>
> > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > >> {
> > > > > > > > > >> 	...
> > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > >>                 }
> > > > > > > > > >>
> > > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > >> What I am missing here?
> > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > >>  -		}
> > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > >
> > > > > > > > > > Ah, sorry. I oversimplified the example and now it
> does
> > > not
> > > > > > > > > > show the issue...
> > > > > > > > > >
> > > > > > > > > > The full example also adds a split() to break the
> mbuf
> > > chain
> > > > > > > > > > between m1 and m2. The kind of thing that would be
> done
> > > for
> > > > > > > > > > software TCP segmentation.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > If so, may be the right solution is to care about
> nb_segs
> > > > > > > > > when next is set to NULL on split? Any place when next
> is
> > > set
> > > > > > > > > to NULL. Just to keep the optimization in a more
> generic
> > > place.
> > > > > > >
> > > > > > >
> > > > > > > > The problem with that approach is that there are already
> > > several
> > > > > > > > existing split() or trim() implementations in different
> DPDK-
> > > based
> > > > > > > > applications. For instance, we have some in 6WINDGate. If
> we
> > > force
> > > > > > > > applications to set nb_seg to 1 when resetting next, it
> has
> > > to be
> > > > > > > > documented because it is not straightforward.
> > > > > > >
> > > > > > > I think it is better to go that way.
> > > > > > > From my perspective it seems natural to reset nb_seg at
> same
> > > time
> > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > >
> > > > > > While it is not explicitly stated for nb_segs, to me it was
> clear
> > > that
> > > > > > nb_segs is only valid in the first segment, like for many
> fields
> > > (port,
> > > > > > ol_flags, vlan, rss, ...).
> > > > > >
> > > > > > If we say that nb_segs has to be valid in any segments, it
> means
> > > that
> > > > > > chain() or split() will have to update it in all segments,
> which
> > > is not
> > > > > > efficient.
> > > > >
> > > > > Why in all?
> > > > > We can state that nb_segs on non-first segment should always
> equal
> > > 1.
> > > > > As I understand in that case, both split() and chain() have to
> > > update nb_segs
> > > > > only for head mbufs, rest ones will remain untouched.
> > > >
> > > > Well, anyway, I think it's strange to have a constraint on m-
> >nb_segs
> > > for
> > > > non-first segment. We don't have that kind of constraints for
> other
> > > fields.
> > >
> > > True, we don't. But this is one of the fields we consider critical
> > > for proper work of mbuf alloc/free mechanism.
> > >
> >
> > I am not sure that requiring m->nb_segs == 1 on non-first segments
> will provide any benefits.
> 
> It would make this patch unneeded.
> So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-
> free.

I see. Then I agree with Konstantin that alternative solutions should be considered.

The benefit regarding free()'ing non-segmented mbufs - which is a very common operation - certainly outweighs the cost of requiring split()/chain() operations to set the new head mbuf's nb_segs = 1.

Nonetheless, the bug needs to be fixed somehow.

If we can't come up with a better solution that doesn't break the ABI, we are forced to accept the patch.

Unless the techboard accepts to break the ABI in order to avoid the performance cost of this patch.

> 
> >
> > E.g. the second segment of a three-segment chain will still have m-
> >next != NULL, so it cannot be used as a gate to prevent accessing m-
> > >next.
> >
> > I might have overlooked something, though.
> >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Saying that nb_segs has to be valid for the first and last
> > > segment seems
> > > > > > really odd to me. What is the logic behind that except
> keeping
> > > this test
> > > > > > as is?
> > > > > >
> > > > > > In any case, it has to be better documented.
> > > > > >
> > > > > >
> > > > > > Olivier
> > > > > >
> > > > > >
> > > > > > > > I think the approach from
> > > > > > > > this patch is safer.
> > > > > > >
> > > > > > > It might be easier from perspective that changes in less
> places
> > > are required,
> > > > > > > Though I think that this patch will introduce some
> performance
> > > drop.
> > > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > > lines unconditionally.
> > > > > > >
> > > > > > > > By the way, for 21.11, if we are able to do some
> > > optimizations and have
> > > > > > > > both pool (index?) and next in the first cache line, we
> may
> > > reconsider
> > > > > > > > the fact that next and nb_segs are already set for new
> > > allocated mbufs,
> > > > > > > > because it is not straightforward either.
> > > > > > >
> > > > > > > My suggestion - let's put future optimization discussion
> aside
> > > for now,
> > > > > > > and concentrate on that particular patch.
> > > > > > >
> > > > > > > >
> > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > > nb_seg=2
> > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > >
> > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > > segment.
> > > > > > > > > >
> > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > >
> > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> when
> > > not in the first
> > > > > > > > > > segment, and there is nothing saying it should be set
> to
> > > 1. Typically,
> > > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess
> it's
> > > the same for
> > > > > > > > > > many similar functions in applications.
> > > > > > > > > >
> > > > > > > > > > Olivier
> > > > > > > > > >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> This is expected that mbufs stored in pool have
> their
> > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > >>>
> > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while
> in
> > > pool")
> > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > >>>
> > > > > > > > > >>> Signed-off-by: Olivier Matz
> <olivier.matz@6wind.com>
> > > > > > > > > >>> ---
> > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > >>>
> > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > >>> -	}
> > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > >>>  }
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> rte_pktmbuf_prefree_seg(struct
> > > rte_mbuf *m)
> > > > > > > > > >>>  				return NULL;
> > > > > > > > > >>>  		}
> > > > > > > > > >>>
> > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > >>> -		}
> > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > >>>
> > > > > > > > > >>>  		return m;
> > > > > > > > > >>>
> > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> rte_pktmbuf_prefree_seg(struct
> > > rte_mbuf *m)
> > > > > > > > > >>>  				return NULL;
> > > > > > > > > >>>  		}
> > > > > > > > > >>>
> > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > >>> -		}
> > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > >>>
> > > > > > > > > >>>  		return m;
> > > > > > > > > >>> --
> > > > > > > > > >>> 2.25.1
> > > > > > > > > >>
> > > > > > > > >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06  7:52                       ` Morten Brørup
@ 2020-11-06  8:20                         ` Olivier Matz
  2020-11-06  8:50                           ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2020-11-06  8:20 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Ananyev, Konstantin, Andrew Rybchenko, dev

On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > Sent: Friday, November 6, 2020 12:55 AM
> > 
> > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > >>
> > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever the
> > value
> > > > of m->next,
> > > > > > > > > > >>> because it can happen that m->nb_seg is != 1. For
> > > > instance in this
> > > > > > > > > > >>> case:
> > > > > > > > > > >>>
> > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > >>>
> > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in the
> > > > initial m1
> > > > > > > > > > >>> segment (this is not required), after this code the
> > > > mbuf chain
> > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > >>>
> > > > > > > > > > >>> Freeing this mbuf chain will not restore nb_seg=1
> > in
> > > > the second
> > > > > > > > > > >>> segment.
> > > > > > > > > > >>
> > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > >> {
> > > > > > > > > > >> 	...
> > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > >>                 }
> > > > > > > > > > >>
> > > > > > > > > > >> m1->next != NULL, so it will enter the if() block,
> > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > >> What I am missing here?
> > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > >>  -		}
> > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > >
> > > > > > > > > > > Ah, sorry. I oversimplified the example and now it
> > does
> > > > not
> > > > > > > > > > > show the issue...
> > > > > > > > > > >
> > > > > > > > > > > The full example also adds a split() to break the
> > mbuf
> > > > chain
> > > > > > > > > > > between m1 and m2. The kind of thing that would be
> > done
> > > > for
> > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If so, may be the right solution is to care about
> > nb_segs
> > > > > > > > > > when next is set to NULL on split? Any place when next
> > is
> > > > set
> > > > > > > > > > to NULL. Just to keep the optimization in a more
> > generic
> > > > place.
> > > > > > > >
> > > > > > > >
> > > > > > > > > The problem with that approach is that there are already
> > > > several
> > > > > > > > > existing split() or trim() implementations in different
> > DPDK-
> > > > based
> > > > > > > > > applications. For instance, we have some in 6WINDGate. If
> > we
> > > > force
> > > > > > > > > applications to set nb_seg to 1 when resetting next, it
> > has
> > > > to be
> > > > > > > > > documented because it is not straightforward.
> > > > > > > >
> > > > > > > > I think it is better to go that way.
> > > > > > > > From my perspective it seems natural to reset nb_seg at
> > same
> > > > time
> > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > >
> > > > > > > While it is not explicitly stated for nb_segs, to me it was
> > clear
> > > > that
> > > > > > > nb_segs is only valid in the first segment, like for many
> > fields
> > > > (port,
> > > > > > > ol_flags, vlan, rss, ...).
> > > > > > >
> > > > > > > If we say that nb_segs has to be valid in any segments, it
> > means
> > > > that
> > > > > > > chain() or split() will have to update it in all segments,
> > which
> > > > is not
> > > > > > > efficient.
> > > > > >
> > > > > > Why in all?
> > > > > > We can state that nb_segs on non-first segment should always
> > equal
> > > > 1.
> > > > > > As I understand in that case, both split() and chain() have to
> > > > update nb_segs
> > > > > > only for head mbufs, rest ones will remain untouched.
> > > > >
> > > > > Well, anyway, I think it's strange to have a constraint on m-
> > >nb_segs
> > > > for
> > > > > non-first segment. We don't have that kind of constraints for
> > other
> > > > fields.
> > > >
> > > > True, we don't. But this is one of the fields we consider critical
> > > > for proper work of mbuf alloc/free mechanism.
> > > >
> > >
> > > I am not sure that requiring m->nb_segs == 1 on non-first segments
> > will provide any benefits.
> > 
> > It would make this patch unneeded.
> > So, for direct, non-segmented mbufs  pktmbuf_free() will remain write-
> > free.
> 
> I see. Then I agree with Konstantin that alternative solutions should be considered.
> 
> The benefit regarding free()'ing non-segmented mbufs - which is a very common operation - certainly outweighs the cost of requiring split()/chain() operations to set the new head mbuf's nb_segs = 1.
> 
> Nonetheless, the bug needs to be fixed somehow.
> 
> If we can't come up with a better solution that doesn't break the ABI, we are forced to accept the patch.
> 
> Unless the techboard accepts to break the ABI in order to avoid the performance cost of this patch.

Did someone notice a performance drop with this patch?
On my side, I don't see any regression on a L3 use case.

Let's sumarize: splitting a mbuf chain and freeing it causes subsequent mbuf
allocation to return a mbuf which is not correctly initialized. There are 2
options to fix it:

1/ change the mbuf free function (this patch)

   - m->nb_segs would behave like many other field: valid in the first
     segment, ignored in other segments
   - may impact performance (suspected)

2/ change all places where a mbuf chain is split, or trimmed

   - m->nb_segs would have a specific behavior: count the number of
     segments in the first mbuf, should be 1 in the last segment,
     ignored in other ones.
   - no code change in mbuf library, so no performance impact
   - need to patch all places where we do a mbuf split or trim. From afar,
     I see at least mbuf_cut_seg_ofs() in DPDK. Some external applications
     may have to be patched (for instance, I already found 3 places in
     6WIND code base without a deep search).

In my opinion, 1/ is better, except we notice a significant performance,
because the (implicit) behavior is unchanged.

Whatever the solution, some documentation has to be added.

Olivier


> 
> > 
> > >
> > > E.g. the second segment of a three-segment chain will still have m-
> > >next != NULL, so it cannot be used as a gate to prevent accessing m-
> > > >next.
> > >
> > > I might have overlooked something, though.
> > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Saying that nb_segs has to be valid for the first and last
> > > > segment seems
> > > > > > > really odd to me. What is the logic behind that except
> > keeping
> > > > this test
> > > > > > > as is?
> > > > > > >
> > > > > > > In any case, it has to be better documented.
> > > > > > >
> > > > > > >
> > > > > > > Olivier
> > > > > > >
> > > > > > >
> > > > > > > > > I think the approach from
> > > > > > > > > this patch is safer.
> > > > > > > >
> > > > > > > > It might be easier from perspective that changes in less
> > places
> > > > are required,
> > > > > > > > Though I think that this patch will introduce some
> > performance
> > > > drop.
> > > > > > > > As now each mbuf_prefree_seg() will cause update of 2 cache
> > > > lines unconditionally.
> > > > > > > >
> > > > > > > > > By the way, for 21.11, if we are able to do some
> > > > optimizations and have
> > > > > > > > > both pool (index?) and next in the first cache line, we
> > may
> > > > reconsider
> > > > > > > > > the fact that next and nb_segs are already set for new
> > > > allocated mbufs,
> > > > > > > > > because it is not straightforward either.
> > > > > > > >
> > > > > > > > My suggestion - let's put future optimization discussion
> > aside
> > > > for now,
> > > > > > > > and concentrate on that particular patch.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL but
> > > > nb_seg=2
> > > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > > >
> > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the second
> > > > segment.
> > > > > > > > > > >
> > > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > > >
> > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> > when
> > > > not in the first
> > > > > > > > > > > segment, and there is nothing saying it should be set
> > to
> > > > 1. Typically,
> > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I guess
> > it's
> > > > the same for
> > > > > > > > > > > many similar functions in applications.
> > > > > > > > > > >
> > > > > > > > > > > Olivier
> > > > > > > > > > >
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>> This is expected that mbufs stored in pool have
> > their
> > > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while
> > in
> > > > pool")
> > > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > > >>>
> > > > > > > > > > >>> Signed-off-by: Olivier Matz
> > <olivier.matz@6wind.com>
> > > > > > > > > > >>> ---
> > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > > >>>  2 files changed, 6 insertions(+), 12 deletions(-)
> > > > > > > > > > >>>
> > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > > >>>
> > > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > > >>> -	}
> > > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > > >>>  }
> > > > > > > > > > >>>
> > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> > rte_pktmbuf_prefree_seg(struct
> > > > rte_mbuf *m)
> > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > >>>  		}
> > > > > > > > > > >>>
> > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > >>> -		}
> > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > >>>
> > > > > > > > > > >>>  		return m;
> > > > > > > > > > >>>
> > > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> > rte_pktmbuf_prefree_seg(struct
> > > > rte_mbuf *m)
> > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > >>>  		}
> > > > > > > > > > >>>
> > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > >>> -		}
> > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > > >>>
> > > > > > > > > > >>>  		return m;
> > > > > > > > > > >>> --
> > > > > > > > > > >>> 2.25.1
> > > > > > > > > > >>
> > > > > > > > > >
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06  8:20                         ` Olivier Matz
@ 2020-11-06  8:50                           ` Morten Brørup
  2020-11-06 10:04                             ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-11-06  8:50 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Ananyev, Konstantin, Andrew Rybchenko, dev

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 6, 2020 9:21 AM
> 
> On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > Sent: Friday, November 6, 2020 12:55 AM
> > >
> > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > >>
> > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever
> the
> > > value
> > > > > of m->next,
> > > > > > > > > > > >>> because it can happen that m->nb_seg is != 1.
> For
> > > > > instance in this
> > > > > > > > > > > >>> case:
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in
> the
> > > > > initial m1
> > > > > > > > > > > >>> segment (this is not required), after this code
> the
> > > > > mbuf chain
> > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> nb_seg=1
> > > in
> > > > > the second
> > > > > > > > > > > >>> segment.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > >> {
> > > > > > > > > > > >> 	...
> > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > >>                 }
> > > > > > > > > > > >>
> > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> block,
> > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > >>  -		}
> > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > >
> > > > > > > > > > > > Ah, sorry. I oversimplified the example and now
> it
> > > does
> > > > > not
> > > > > > > > > > > > show the issue...
> > > > > > > > > > > >
> > > > > > > > > > > > The full example also adds a split() to break the
> > > mbuf
> > > > > chain
> > > > > > > > > > > > between m1 and m2. The kind of thing that would
> be
> > > done
> > > > > for
> > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > If so, may be the right solution is to care about
> > > nb_segs
> > > > > > > > > > > when next is set to NULL on split? Any place when
> next
> > > is
> > > > > set
> > > > > > > > > > > to NULL. Just to keep the optimization in a more
> > > generic
> > > > > place.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > The problem with that approach is that there are
> already
> > > > > several
> > > > > > > > > > existing split() or trim() implementations in
> different
> > > DPDK-
> > > > > based
> > > > > > > > > > applications. For instance, we have some in
> 6WINDGate. If
> > > we
> > > > > force
> > > > > > > > > > applications to set nb_seg to 1 when resetting next,
> it
> > > has
> > > > > to be
> > > > > > > > > > documented because it is not straightforward.
> > > > > > > > >
> > > > > > > > > I think it is better to go that way.
> > > > > > > > > From my perspective it seems natural to reset nb_seg at
> > > same
> > > > > time
> > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > >
> > > > > > > > While it is not explicitly stated for nb_segs, to me it
> was
> > > clear
> > > > > that
> > > > > > > > nb_segs is only valid in the first segment, like for many
> > > fields
> > > > > (port,
> > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > >
> > > > > > > > If we say that nb_segs has to be valid in any segments,
> it
> > > means
> > > > > that
> > > > > > > > chain() or split() will have to update it in all
> segments,
> > > which
> > > > > is not
> > > > > > > > efficient.
> > > > > > >
> > > > > > > Why in all?
> > > > > > > We can state that nb_segs on non-first segment should
> always
> > > equal
> > > > > 1.
> > > > > > > As I understand in that case, both split() and chain() have
> to
> > > > > update nb_segs
> > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > >
> > > > > > Well, anyway, I think it's strange to have a constraint on m-
> > > >nb_segs
> > > > > for
> > > > > > non-first segment. We don't have that kind of constraints for
> > > other
> > > > > fields.
> > > > >
> > > > > True, we don't. But this is one of the fields we consider
> critical
> > > > > for proper work of mbuf alloc/free mechanism.
> > > > >
> > > >
> > > > I am not sure that requiring m->nb_segs == 1 on non-first
> segments
> > > will provide any benefits.
> > >
> > > It would make this patch unneeded.
> > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> write-
> > > free.
> >
> > I see. Then I agree with Konstantin that alternative solutions should
> be considered.
> >
> > The benefit regarding free()'ing non-segmented mbufs - which is a
> very common operation - certainly outweighs the cost of requiring
> split()/chain() operations to set the new head mbuf's nb_segs = 1.
> >
> > Nonetheless, the bug needs to be fixed somehow.
> >
> > If we can't come up with a better solution that doesn't break the
> ABI, we are forced to accept the patch.
> >
> > Unless the techboard accepts to break the ABI in order to avoid the
> performance cost of this patch.
> 
> Did someone notice a performance drop with this patch?
> On my side, I don't see any regression on a L3 use case.

I am afraid that the DPDK performance regression tests are based on TX immediately following RX, so cache misses in TX may go by unnoticed because RX warmed up the cache for TX already. And similarly for RX reusing mbufs that have been warmed up by the preceding free() at TX.

Please consider testing the performance difference with the mbuf being completely cold at TX, and going completely cold again before being reused for RX.

> 
> Let's sumarize: splitting a mbuf chain and freeing it causes subsequent
> mbuf
> allocation to return a mbuf which is not correctly initialized. There
> are 2
> options to fix it:
> 
> 1/ change the mbuf free function (this patch)
> 
>    - m->nb_segs would behave like many other field: valid in the first
>      segment, ignored in other segments
>    - may impact performance (suspected)
> 
> 2/ change all places where a mbuf chain is split, or trimmed
> 
>    - m->nb_segs would have a specific behavior: count the number of
>      segments in the first mbuf, should be 1 in the last segment,
>      ignored in other ones.
>    - no code change in mbuf library, so no performance impact
>    - need to patch all places where we do a mbuf split or trim. From
> afar,
>      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> applications
>      may have to be patched (for instance, I already found 3 places in
>      6WIND code base without a deep search).
> 
> In my opinion, 1/ is better, except we notice a significant
> performance,
> because the (implicit) behavior is unchanged.
> 
> Whatever the solution, some documentation has to be added.
> 
> Olivier
> 

Unfortunately, I don't think that anything but the first option will go into 20.11 and stable releases of older versions, so I stand by my acknowledgment of the patch.

Which reminds me: Consider reporting the bug in Bugzilla.

> 
> >
> > >
> > > >
> > > > E.g. the second segment of a three-segment chain will still have
> m-
> > > >next != NULL, so it cannot be used as a gate to prevent accessing
> m-
> > > > >next.
> > > >
> > > > I might have overlooked something, though.
> > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Saying that nb_segs has to be valid for the first and
> last
> > > > > segment seems
> > > > > > > > really odd to me. What is the logic behind that except
> > > keeping
> > > > > this test
> > > > > > > > as is?
> > > > > > > >
> > > > > > > > In any case, it has to be better documented.
> > > > > > > >
> > > > > > > >
> > > > > > > > Olivier
> > > > > > > >
> > > > > > > >
> > > > > > > > > > I think the approach from
> > > > > > > > > > this patch is safer.
> > > > > > > > >
> > > > > > > > > It might be easier from perspective that changes in
> less
> > > places
> > > > > are required,
> > > > > > > > > Though I think that this patch will introduce some
> > > performance
> > > > > drop.
> > > > > > > > > As now each mbuf_prefree_seg() will cause update of 2
> cache
> > > > > lines unconditionally.
> > > > > > > > >
> > > > > > > > > > By the way, for 21.11, if we are able to do some
> > > > > optimizations and have
> > > > > > > > > > both pool (index?) and next in the first cache line,
> we
> > > may
> > > > > reconsider
> > > > > > > > > > the fact that next and nb_segs are already set for
> new
> > > > > allocated mbufs,
> > > > > > > > > > because it is not straightforward either.
> > > > > > > > >
> > > > > > > > > My suggestion - let's put future optimization
> discussion
> > > aside
> > > > > for now,
> > > > > > > > > and concentrate on that particular patch.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL
> but
> > > > > nb_seg=2
> > > > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > > > >
> > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the
> second
> > > > > segment.
> > > > > > > > > > > >
> > > > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> > > when
> > > > > not in the first
> > > > > > > > > > > > segment, and there is nothing saying it should be
> set
> > > to
> > > > > 1. Typically,
> > > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I
> guess
> > > it's
> > > > > the same for
> > > > > > > > > > > > many similar functions in applications.
> > > > > > > > > > > >
> > > > > > > > > > > > Olivier
> > > > > > > > > > > >
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>> This is expected that mbufs stored in pool have
> > > their
> > > > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields
> while
> > > in
> > > > > pool")
> > > > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> Signed-off-by: Olivier Matz
> > > <olivier.matz@6wind.com>
> > > > > > > > > > > >>> ---
> > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > > > >>>  2 files changed, 6 insertions(+), 12
> deletions(-)
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > > > >>> -	}
> > > > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > > > >>>  }
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> > > rte_pktmbuf_prefree_seg(struct
> > > > > rte_mbuf *m)
> > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > >>>  		}
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > >>> -		}
> > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> > > rte_pktmbuf_prefree_seg(struct
> > > > > rte_mbuf *m)
> > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > >>>  		}
> > > > > > > > > > > >>>
> > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > >>> -		}
> > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > > > >>>
> > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > >>> --
> > > > > > > > > > > >>> 2.25.1
> > > > > > > > > > > >>
> > > > > > > > > > >
> >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06  8:50                           ` Morten Brørup
@ 2020-11-06 10:04                             ` Olivier Matz
  2020-11-06 10:07                               ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2020-11-06 10:04 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Ananyev, Konstantin, Andrew Rybchenko, dev

On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote:
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Friday, November 6, 2020 9:21 AM
> > 
> > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > > Sent: Friday, November 6, 2020 12:55 AM
> > > >
> > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free whatever
> > the
> > > > value
> > > > > > of m->next,
> > > > > > > > > > > > >>> because it can happen that m->nb_seg is != 1.
> > For
> > > > > > instance in this
> > > > > > > > > > > > >>> case:
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset nb_seg in
> > the
> > > > > > initial m1
> > > > > > > > > > > > >>> segment (this is not required), after this code
> > the
> > > > > > mbuf chain
> > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > nb_seg=1
> > > > in
> > > > > > the second
> > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > >> {
> > > > > > > > > > > > >> 	...
> > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > >>                 }
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > block,
> > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ah, sorry. I oversimplified the example and now
> > it
> > > > does
> > > > > > not
> > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > >
> > > > > > > > > > > > > The full example also adds a split() to break the
> > > > mbuf
> > > > > > chain
> > > > > > > > > > > > > between m1 and m2. The kind of thing that would
> > be
> > > > done
> > > > > > for
> > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If so, may be the right solution is to care about
> > > > nb_segs
> > > > > > > > > > > > when next is set to NULL on split? Any place when
> > next
> > > > is
> > > > > > set
> > > > > > > > > > > > to NULL. Just to keep the optimization in a more
> > > > generic
> > > > > > place.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > The problem with that approach is that there are
> > already
> > > > > > several
> > > > > > > > > > > existing split() or trim() implementations in
> > different
> > > > DPDK-
> > > > > > based
> > > > > > > > > > > applications. For instance, we have some in
> > 6WINDGate. If
> > > > we
> > > > > > force
> > > > > > > > > > > applications to set nb_seg to 1 when resetting next,
> > it
> > > > has
> > > > > > to be
> > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > >
> > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > From my perspective it seems natural to reset nb_seg at
> > > > same
> > > > > > time
> > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > >
> > > > > > > > > While it is not explicitly stated for nb_segs, to me it
> > was
> > > > clear
> > > > > > that
> > > > > > > > > nb_segs is only valid in the first segment, like for many
> > > > fields
> > > > > > (port,
> > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > >
> > > > > > > > > If we say that nb_segs has to be valid in any segments,
> > it
> > > > means
> > > > > > that
> > > > > > > > > chain() or split() will have to update it in all
> > segments,
> > > > which
> > > > > > is not
> > > > > > > > > efficient.
> > > > > > > >
> > > > > > > > Why in all?
> > > > > > > > We can state that nb_segs on non-first segment should
> > always
> > > > equal
> > > > > > 1.
> > > > > > > > As I understand in that case, both split() and chain() have
> > to
> > > > > > update nb_segs
> > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > >
> > > > > > > Well, anyway, I think it's strange to have a constraint on m-
> > > > >nb_segs
> > > > > > for
> > > > > > > non-first segment. We don't have that kind of constraints for
> > > > other
> > > > > > fields.
> > > > > >
> > > > > > True, we don't. But this is one of the fields we consider
> > critical
> > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > >
> > > > >
> > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > segments
> > > > will provide any benefits.
> > > >
> > > > It would make this patch unneeded.
> > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > write-
> > > > free.
> > >
> > > I see. Then I agree with Konstantin that alternative solutions should
> > be considered.
> > >
> > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > very common operation - certainly outweighs the cost of requiring
> > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > >
> > > Nonetheless, the bug needs to be fixed somehow.
> > >
> > > If we can't come up with a better solution that doesn't break the
> > ABI, we are forced to accept the patch.
> > >
> > > Unless the techboard accepts to break the ABI in order to avoid the
> > performance cost of this patch.
> > 
> > Did someone notice a performance drop with this patch?
> > On my side, I don't see any regression on a L3 use case.
> 
> I am afraid that the DPDK performance regression tests are based on TX immediately following RX, so cache misses in TX may go by unnoticed because RX warmed up the cache for TX already. And similarly for RX reusing mbufs that have been warmed up by the preceding free() at TX.
> 
> Please consider testing the performance difference with the mbuf being completely cold at TX, and going completely cold again before being reused for RX.
> 
> > 
> > Let's sumarize: splitting a mbuf chain and freeing it causes subsequent
> > mbuf
> > allocation to return a mbuf which is not correctly initialized. There
> > are 2
> > options to fix it:
> > 
> > 1/ change the mbuf free function (this patch)
> > 
> >    - m->nb_segs would behave like many other field: valid in the first
> >      segment, ignored in other segments
> >    - may impact performance (suspected)
> > 
> > 2/ change all places where a mbuf chain is split, or trimmed
> > 
> >    - m->nb_segs would have a specific behavior: count the number of
> >      segments in the first mbuf, should be 1 in the last segment,
> >      ignored in other ones.
> >    - no code change in mbuf library, so no performance impact
> >    - need to patch all places where we do a mbuf split or trim. From
> > afar,
> >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > applications
> >      may have to be patched (for instance, I already found 3 places in
> >      6WIND code base without a deep search).
> > 
> > In my opinion, 1/ is better, except we notice a significant
> > performance,
> > because the (implicit) behavior is unchanged.
> > 
> > Whatever the solution, some documentation has to be added.
> > 
> > Olivier
> > 
> 
> Unfortunately, I don't think that anything but the first option will go into 20.11 and stable releases of older versions, so I stand by my acknowledgment of the patch.

If we are affraid about 20.11 performance (it is legitimate, few days
before the release), we can target 21.02. After all, everybody lives
with this bug since 2017, so there is no urgency. If accepted and well
tested, it can be backported in stable branches.


> Which reminds me: Consider reporting the bug in Bugzilla.
> 
> > 
> > >
> > > >
> > > > >
> > > > > E.g. the second segment of a three-segment chain will still have
> > m-
> > > > >next != NULL, so it cannot be used as a gate to prevent accessing
> > m-
> > > > > >next.
> > > > >
> > > > > I might have overlooked something, though.
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Saying that nb_segs has to be valid for the first and
> > last
> > > > > > segment seems
> > > > > > > > > really odd to me. What is the logic behind that except
> > > > keeping
> > > > > > this test
> > > > > > > > > as is?
> > > > > > > > >
> > > > > > > > > In any case, it has to be better documented.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Olivier
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > I think the approach from
> > > > > > > > > > > this patch is safer.
> > > > > > > > > >
> > > > > > > > > > It might be easier from perspective that changes in
> > less
> > > > places
> > > > > > are required,
> > > > > > > > > > Though I think that this patch will introduce some
> > > > performance
> > > > > > drop.
> > > > > > > > > > As now each mbuf_prefree_seg() will cause update of 2
> > cache
> > > > > > lines unconditionally.
> > > > > > > > > >
> > > > > > > > > > > By the way, for 21.11, if we are able to do some
> > > > > > optimizations and have
> > > > > > > > > > > both pool (index?) and next in the first cache line,
> > we
> > > > may
> > > > > > reconsider
> > > > > > > > > > > the fact that next and nb_segs are already set for
> > new
> > > > > > allocated mbufs,
> > > > > > > > > > > because it is not straightforward either.
> > > > > > > > > >
> > > > > > > > > > My suggestion - let's put future optimization
> > discussion
> > > > aside
> > > > > > for now,
> > > > > > > > > > and concentrate on that particular patch.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > After this operation, we have 2 mbuf chain:
> > > > > > > > > > > > >  - m0 with 2 segments, the last one has next=NULL
> > but
> > > > > > nb_seg=2
> > > > > > > > > > > > >  - new_m with 1 segment
> > > > > > > > > > > > >
> > > > > > > > > > > > > Freeing m0 will not restore nb_seg=1 in the
> > second
> > > > > > segment.
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Assumes that it is ok to have an mbuf with
> > > > > > > > > > > > >> nb_seg > 1 and next == NULL.
> > > > > > > > > > > > >> Which seems wrong to me.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think it is wrong: nb_seg is just ignored
> > > > when
> > > > > > not in the first
> > > > > > > > > > > > > segment, and there is nothing saying it should be
> > set
> > > > to
> > > > > > 1. Typically,
> > > > > > > > > > > > > rte_pktmbuf_chain() does not change it, and I
> > guess
> > > > it's
> > > > > > the same for
> > > > > > > > > > > > > many similar functions in applications.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Olivier
> > > > > > > > > > > > >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>> This is expected that mbufs stored in pool have
> > > > their
> > > > > > > > > > > > >>> nb_seg field set to 1.
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields
> > while
> > > > in
> > > > > > pool")
> > > > > > > > > > > > >>> Cc: stable@dpdk.org
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> Signed-off-by: Olivier Matz
> > > > <olivier.matz@6wind.com>
> > > > > > > > > > > > >>> ---
> > > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.c |  6 ++----
> > > > > > > > > > > > >>>  lib/librte_mbuf/rte_mbuf.h | 12 ++++--------
> > > > > > > > > > > > >>>  2 files changed, 6 insertions(+), 12
> > deletions(-)
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> > > > > > b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > > >>> index 8a456e5e64..e632071c23 100644
> > > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.c
> > > > > > > > > > > > >>> @@ -129,10 +129,8 @@
> > > > > > rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
> > > > > > > > > > > > >>>  	m->ol_flags = EXT_ATTACHED_MBUF;
> > > > > > > > > > > > >>> -	if (m->next != NULL) {
> > > > > > > > > > > > >>> -		m->next = NULL;
> > > > > > > > > > > > >>> -		m->nb_segs = 1;
> > > > > > > > > > > > >>> -	}
> > > > > > > > > > > > >>> +	m->next = NULL;
> > > > > > > > > > > > >>> +	m->nb_segs = 1;
> > > > > > > > > > > > >>>  	rte_mbuf_raw_free(m);
> > > > > > > > > > > > >>>  }
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > > >>> index a1414ed7cd..ef5800c8ef 100644
> > > > > > > > > > > > >>> --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > > >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > > > > > > >>> @@ -1329,10 +1329,8 @@
> > > > rte_pktmbuf_prefree_seg(struct
> > > > > > rte_mbuf *m)
> > > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > > >>>  		}
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > > >>> -		}
> > > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> @@ -1346,10 +1344,8 @@
> > > > rte_pktmbuf_prefree_seg(struct
> > > > > > rte_mbuf *m)
> > > > > > > > > > > > >>>  				return NULL;
> > > > > > > > > > > > >>>  		}
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>> -		if (m->next != NULL) {
> > > > > > > > > > > > >>> -			m->next = NULL;
> > > > > > > > > > > > >>> -			m->nb_segs = 1;
> > > > > > > > > > > > >>> -		}
> > > > > > > > > > > > >>> +		m->next = NULL;
> > > > > > > > > > > > >>> +		m->nb_segs = 1;
> > > > > > > > > > > > >>>  		rte_mbuf_refcnt_set(m, 1);
> > > > > > > > > > > > >>>
> > > > > > > > > > > > >>>  		return m;
> > > > > > > > > > > > >>> --
> > > > > > > > > > > > >>> 2.25.1
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > >
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06 10:04                             ` Olivier Matz
@ 2020-11-06 10:07                               ` Morten Brørup
  2020-11-06 11:53                                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-11-06 10:07 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Ananyev, Konstantin, Andrew Rybchenko, dev

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 6, 2020 11:05 AM
> 
> On Fri, Nov 06, 2020 at 09:50:45AM +0100, Morten Brørup wrote:
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Friday, November 6, 2020 9:21 AM
> > >
> > > On Fri, Nov 06, 2020 at 08:52:58AM +0100, Morten Brørup wrote:
> > > > > From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> > > > > Sent: Friday, November 6, 2020 12:55 AM
> > > > >
> > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> whatever
> > > the
> > > > > value
> > > > > > > of m->next,
> > > > > > > > > > > > > >>> because it can happen that m->nb_seg is !=
> 1.
> > > For
> > > > > > > instance in this
> > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> nb_seg in
> > > the
> > > > > > > initial m1
> > > > > > > > > > > > > >>> segment (this is not required), after this
> code
> > > the
> > > > > > > mbuf chain
> > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > > nb_seg=1
> > > > > in
> > > > > > > the second
> > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > > >> {
> > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > > block,
> > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and
> now
> > > it
> > > > > does
> > > > > > > not
> > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The full example also adds a split() to break
> the
> > > > > mbuf
> > > > > > > chain
> > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> would
> > > be
> > > > > done
> > > > > > > for
> > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If so, may be the right solution is to care
> about
> > > > > nb_segs
> > > > > > > > > > > > > when next is set to NULL on split? Any place
> when
> > > next
> > > > > is
> > > > > > > set
> > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> more
> > > > > generic
> > > > > > > place.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > The problem with that approach is that there are
> > > already
> > > > > > > several
> > > > > > > > > > > > existing split() or trim() implementations in
> > > different
> > > > > DPDK-
> > > > > > > based
> > > > > > > > > > > > applications. For instance, we have some in
> > > 6WINDGate. If
> > > > > we
> > > > > > > force
> > > > > > > > > > > > applications to set nb_seg to 1 when resetting
> next,
> > > it
> > > > > has
> > > > > > > to be
> > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > >
> > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > From my perspective it seems natural to reset
> nb_seg at
> > > > > same
> > > > > > > time
> > > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > > >
> > > > > > > > > > While it is not explicitly stated for nb_segs, to me
> it
> > > was
> > > > > clear
> > > > > > > that
> > > > > > > > > > nb_segs is only valid in the first segment, like for
> many
> > > > > fields
> > > > > > > (port,
> > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > >
> > > > > > > > > > If we say that nb_segs has to be valid in any
> segments,
> > > it
> > > > > means
> > > > > > > that
> > > > > > > > > > chain() or split() will have to update it in all
> > > segments,
> > > > > which
> > > > > > > is not
> > > > > > > > > > efficient.
> > > > > > > > >
> > > > > > > > > Why in all?
> > > > > > > > > We can state that nb_segs on non-first segment should
> > > always
> > > > > equal
> > > > > > > 1.
> > > > > > > > > As I understand in that case, both split() and chain()
> have
> > > to
> > > > > > > update nb_segs
> > > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > > >
> > > > > > > > Well, anyway, I think it's strange to have a constraint
> on m-
> > > > > >nb_segs
> > > > > > > for
> > > > > > > > non-first segment. We don't have that kind of constraints
> for
> > > > > other
> > > > > > > fields.
> > > > > > >
> > > > > > > True, we don't. But this is one of the fields we consider
> > > critical
> > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > >
> > > > > >
> > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > segments
> > > > > will provide any benefits.
> > > > >
> > > > > It would make this patch unneeded.
> > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > > write-
> > > > > free.
> > > >
> > > > I see. Then I agree with Konstantin that alternative solutions
> should
> > > be considered.
> > > >
> > > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > > very common operation - certainly outweighs the cost of requiring
> > > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > > >
> > > > Nonetheless, the bug needs to be fixed somehow.
> > > >
> > > > If we can't come up with a better solution that doesn't break the
> > > ABI, we are forced to accept the patch.
> > > >
> > > > Unless the techboard accepts to break the ABI in order to avoid
> the
> > > performance cost of this patch.
> > >
> > > Did someone notice a performance drop with this patch?
> > > On my side, I don't see any regression on a L3 use case.
> >
> > I am afraid that the DPDK performance regression tests are based on
> TX immediately following RX, so cache misses in TX may go by unnoticed
> because RX warmed up the cache for TX already. And similarly for RX
> reusing mbufs that have been warmed up by the preceding free() at TX.
> >
> > Please consider testing the performance difference with the mbuf
> being completely cold at TX, and going completely cold again before
> being reused for RX.
> >
> > >
> > > Let's sumarize: splitting a mbuf chain and freeing it causes
> subsequent
> > > mbuf
> > > allocation to return a mbuf which is not correctly initialized.
> There
> > > are 2
> > > options to fix it:
> > >
> > > 1/ change the mbuf free function (this patch)
> > >
> > >    - m->nb_segs would behave like many other field: valid in the
> first
> > >      segment, ignored in other segments
> > >    - may impact performance (suspected)
> > >
> > > 2/ change all places where a mbuf chain is split, or trimmed
> > >
> > >    - m->nb_segs would have a specific behavior: count the number of
> > >      segments in the first mbuf, should be 1 in the last segment,
> > >      ignored in other ones.
> > >    - no code change in mbuf library, so no performance impact
> > >    - need to patch all places where we do a mbuf split or trim.
> From
> > > afar,
> > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > applications
> > >      may have to be patched (for instance, I already found 3 places
> in
> > >      6WIND code base without a deep search).
> > >
> > > In my opinion, 1/ is better, except we notice a significant
> > > performance,
> > > because the (implicit) behavior is unchanged.
> > >
> > > Whatever the solution, some documentation has to be added.
> > >
> > > Olivier
> > >
> >
> > Unfortunately, I don't think that anything but the first option will
> go into 20.11 and stable releases of older versions, so I stand by my
> acknowledgment of the patch.
> 
> If we are affraid about 20.11 performance (it is legitimate, few days
> before the release), we can target 21.02. After all, everybody lives
> with this bug since 2017, so there is no urgency. If accepted and well
> tested, it can be backported in stable branches.

+1

Good thinking, Olivier!


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06 10:07                               ` Morten Brørup
@ 2020-11-06 11:53                                 ` Ananyev, Konstantin
  2020-11-06 12:23                                   ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-06 11:53 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz; +Cc: Andrew Rybchenko, dev


> > > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> > whatever
> > > > the
> > > > > > value
> > > > > > > > of m->next,
> > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is !=
> > 1.
> > > > For
> > > > > > > > instance in this
> > > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> > nb_seg in
> > > > the
> > > > > > > > initial m1
> > > > > > > > > > > > > > >>> segment (this is not required), after this
> > code
> > > > the
> > > > > > > > mbuf chain
> > > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > >>> Freeing this mbuf chain will not restore
> > > > nb_seg=1
> > > > > > in
> > > > > > > > the second
> > > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > > > > > > > > > > >> {
> > > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the if()
> > > > block,
> > > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > > >> Thinking in more generic way, that change:
> > > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example and
> > now
> > > > it
> > > > > > does
> > > > > > > > not
> > > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The full example also adds a split() to break
> > the
> > > > > > mbuf
> > > > > > > > chain
> > > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> > would
> > > > be
> > > > > > done
> > > > > > > > for
> > > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If so, may be the right solution is to care
> > about
> > > > > > nb_segs
> > > > > > > > > > > > > > when next is set to NULL on split? Any place
> > when
> > > > next
> > > > > > is
> > > > > > > > set
> > > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> > more
> > > > > > generic
> > > > > > > > place.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > The problem with that approach is that there are
> > > > already
> > > > > > > > several
> > > > > > > > > > > > > existing split() or trim() implementations in
> > > > different
> > > > > > DPDK-
> > > > > > > > based
> > > > > > > > > > > > > applications. For instance, we have some in
> > > > 6WINDGate. If
> > > > > > we
> > > > > > > > force
> > > > > > > > > > > > > applications to set nb_seg to 1 when resetting
> > next,
> > > > it
> > > > > > has
> > > > > > > > to be
> > > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > > >
> > > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > > From my perspective it seems natural to reset
> > nb_seg at
> > > > > > same
> > > > > > > > time
> > > > > > > > > > > > we reset next, otherwise inconsistency will occur.
> > > > > > > > > > >
> > > > > > > > > > > While it is not explicitly stated for nb_segs, to me
> > it
> > > > was
> > > > > > clear
> > > > > > > > that
> > > > > > > > > > > nb_segs is only valid in the first segment, like for
> > many
> > > > > > fields
> > > > > > > > (port,
> > > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > > >
> > > > > > > > > > > If we say that nb_segs has to be valid in any
> > segments,
> > > > it
> > > > > > means
> > > > > > > > that
> > > > > > > > > > > chain() or split() will have to update it in all
> > > > segments,
> > > > > > which
> > > > > > > > is not
> > > > > > > > > > > efficient.
> > > > > > > > > >
> > > > > > > > > > Why in all?
> > > > > > > > > > We can state that nb_segs on non-first segment should
> > > > always
> > > > > > equal
> > > > > > > > 1.
> > > > > > > > > > As I understand in that case, both split() and chain()
> > have
> > > > to
> > > > > > > > update nb_segs
> > > > > > > > > > only for head mbufs, rest ones will remain untouched.
> > > > > > > > >
> > > > > > > > > Well, anyway, I think it's strange to have a constraint
> > on m-
> > > > > > >nb_segs
> > > > > > > > for
> > > > > > > > > non-first segment. We don't have that kind of constraints
> > for
> > > > > > other
> > > > > > > > fields.
> > > > > > > >
> > > > > > > > True, we don't. But this is one of the fields we consider
> > > > critical
> > > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > > >
> > > > > > >
> > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > > segments
> > > > > > will provide any benefits.
> > > > > >
> > > > > > It would make this patch unneeded.
> > > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will remain
> > > > write-
> > > > > > free.
> > > > >
> > > > > I see. Then I agree with Konstantin that alternative solutions
> > should
> > > > be considered.
> > > > >
> > > > > The benefit regarding free()'ing non-segmented mbufs - which is a
> > > > very common operation - certainly outweighs the cost of requiring
> > > > split()/chain() operations to set the new head mbuf's nb_segs = 1.
> > > > >
> > > > > Nonetheless, the bug needs to be fixed somehow.
> > > > >
> > > > > If we can't come up with a better solution that doesn't break the
> > > > ABI, we are forced to accept the patch.
> > > > >
> > > > > Unless the techboard accepts to break the ABI in order to avoid
> > the
> > > > performance cost of this patch.
> > > >
> > > > Did someone notice a performance drop with this patch?
> > > > On my side, I don't see any regression on a L3 use case.
> > >
> > > I am afraid that the DPDK performance regression tests are based on
> > TX immediately following RX, so cache misses in TX may go by unnoticed
> > because RX warmed up the cache for TX already. And similarly for RX
> > reusing mbufs that have been warmed up by the preceding free() at TX.
> > >
> > > Please consider testing the performance difference with the mbuf
> > being completely cold at TX, and going completely cold again before
> > being reused for RX.
> > >
> > > >
> > > > Let's sumarize: splitting a mbuf chain and freeing it causes
> > subsequent
> > > > mbuf
> > > > allocation to return a mbuf which is not correctly initialized.
> > There
> > > > are 2
> > > > options to fix it:
> > > >
> > > > 1/ change the mbuf free function (this patch)
> > > >
> > > >    - m->nb_segs would behave like many other field: valid in the
> > first
> > > >      segment, ignored in other segments
> > > >    - may impact performance (suspected)
> > > >
> > > > 2/ change all places where a mbuf chain is split, or trimmed
> > > >
> > > >    - m->nb_segs would have a specific behavior: count the number of
> > > >      segments in the first mbuf, should be 1 in the last segment,
> > > >      ignored in other ones.
> > > >    - no code change in mbuf library, so no performance impact
> > > >    - need to patch all places where we do a mbuf split or trim.
> > From
> > > > afar,
> > > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > > applications
> > > >      may have to be patched (for instance, I already found 3 places
> > in
> > > >      6WIND code base without a deep search).
> > > >
> > > > In my opinion, 1/ is better, except we notice a significant
> > > > performance,
> > > > because the (implicit) behavior is unchanged.
> > > >
> > > > Whatever the solution, some documentation has to be added.
> > > >
> > > > Olivier
> > > >
> > >
> > > Unfortunately, I don't think that anything but the first option will
> > go into 20.11 and stable releases of older versions, so I stand by my
> > acknowledgment of the patch.
> >
> > If we are affraid about 20.11 performance (it is legitimate, few days
> > before the release), we can target 21.02. After all, everybody lives
> > with this bug since 2017, so there is no urgency. If accepted and well
> > tested, it can be backported in stable branches.
> 
> +1
> 
> Good thinking, Olivier!

Looking at the changes once again, it probably can be reworked a bit:

-	if (m->next != NULL) {
-		m->next = NULL;
-		m->nb_segs = 1;
-	}

+	if (m->next != NULL)
+		m->next = NULL;
+	if (m->nb_segs != 1)
+		m->nb_segs = 1;

That way we add one more condition checking, but I suppose it
shouldn't be that perf critical.
That way for direct,non-segmented mbuf it still should be write-free.
Except cases as you described above: chain(), then split(). 

Of-course we still need to do perf testing for that approach too.
So if your preference it to postpone it till 21.02 - that's ok for me.
Konstantin




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06 11:53                                 ` Ananyev, Konstantin
@ 2020-11-06 12:23                                   ` Morten Brørup
  2020-11-08 14:16                                     ` Andrew Rybchenko
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-11-06 12:23 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier Matz; +Cc: Andrew Rybchenko, dev

> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Friday, November 6, 2020 12:54 PM
> 
> > > > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> > > whatever
> > > > > the
> > > > > > > value
> > > > > > > > > of m->next,
> > > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is
> !=
> > > 1.
> > > > > For
> > > > > > > > > instance in this
> > > > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> > > nb_seg in
> > > > > the
> > > > > > > > > initial m1
> > > > > > > > > > > > > > > >>> segment (this is not required), after
> this
> > > code
> > > > > the
> > > > > > > > > mbuf chain
> > > > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> Freeing this mbuf chain will not
> restore
> > > > > nb_seg=1
> > > > > > > in
> > > > > > > > > the second
> > > > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf
> *m)
> > > > > > > > > > > > > > > >> {
> > > > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the
> if()
> > > > > block,
> > > > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > > > >> Thinking in more generic way, that
> change:
> > > > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example
> and
> > > now
> > > > > it
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The full example also adds a split() to
> break
> > > the
> > > > > > > mbuf
> > > > > > > > > chain
> > > > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> > > would
> > > > > be
> > > > > > > done
> > > > > > > > > for
> > > > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If so, may be the right solution is to care
> > > about
> > > > > > > nb_segs
> > > > > > > > > > > > > > > when next is set to NULL on split? Any
> place
> > > when
> > > > > next
> > > > > > > is
> > > > > > > > > set
> > > > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> > > more
> > > > > > > generic
> > > > > > > > > place.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The problem with that approach is that there
> are
> > > > > already
> > > > > > > > > several
> > > > > > > > > > > > > > existing split() or trim() implementations in
> > > > > different
> > > > > > > DPDK-
> > > > > > > > > based
> > > > > > > > > > > > > > applications. For instance, we have some in
> > > > > 6WINDGate. If
> > > > > > > we
> > > > > > > > > force
> > > > > > > > > > > > > > applications to set nb_seg to 1 when
> resetting
> > > next,
> > > > > it
> > > > > > > has
> > > > > > > > > to be
> > > > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > > > From my perspective it seems natural to reset
> > > nb_seg at
> > > > > > > same
> > > > > > > > > time
> > > > > > > > > > > > > we reset next, otherwise inconsistency will
> occur.
> > > > > > > > > > > >
> > > > > > > > > > > > While it is not explicitly stated for nb_segs, to
> me
> > > it
> > > > > was
> > > > > > > clear
> > > > > > > > > that
> > > > > > > > > > > > nb_segs is only valid in the first segment, like
> for
> > > many
> > > > > > > fields
> > > > > > > > > (port,
> > > > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > > > >
> > > > > > > > > > > > If we say that nb_segs has to be valid in any
> > > segments,
> > > > > it
> > > > > > > means
> > > > > > > > > that
> > > > > > > > > > > > chain() or split() will have to update it in all
> > > > > segments,
> > > > > > > which
> > > > > > > > > is not
> > > > > > > > > > > > efficient.
> > > > > > > > > > >
> > > > > > > > > > > Why in all?
> > > > > > > > > > > We can state that nb_segs on non-first segment
> should
> > > > > always
> > > > > > > equal
> > > > > > > > > 1.
> > > > > > > > > > > As I understand in that case, both split() and
> chain()
> > > have
> > > > > to
> > > > > > > > > update nb_segs
> > > > > > > > > > > only for head mbufs, rest ones will remain
> untouched.
> > > > > > > > > >
> > > > > > > > > > Well, anyway, I think it's strange to have a
> constraint
> > > on m-
> > > > > > > >nb_segs
> > > > > > > > > for
> > > > > > > > > > non-first segment. We don't have that kind of
> constraints
> > > for
> > > > > > > other
> > > > > > > > > fields.
> > > > > > > > >
> > > > > > > > > True, we don't. But this is one of the fields we
> consider
> > > > > critical
> > > > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > > > segments
> > > > > > > will provide any benefits.
> > > > > > >
> > > > > > > It would make this patch unneeded.
> > > > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will
> remain
> > > > > write-
> > > > > > > free.
> > > > > >
> > > > > > I see. Then I agree with Konstantin that alternative
> solutions
> > > should
> > > > > be considered.
> > > > > >
> > > > > > The benefit regarding free()'ing non-segmented mbufs - which
> is a
> > > > > very common operation - certainly outweighs the cost of
> requiring
> > > > > split()/chain() operations to set the new head mbuf's nb_segs =
> 1.
> > > > > >
> > > > > > Nonetheless, the bug needs to be fixed somehow.
> > > > > >
> > > > > > If we can't come up with a better solution that doesn't break
> the
> > > > > ABI, we are forced to accept the patch.
> > > > > >
> > > > > > Unless the techboard accepts to break the ABI in order to
> avoid
> > > the
> > > > > performance cost of this patch.
> > > > >
> > > > > Did someone notice a performance drop with this patch?
> > > > > On my side, I don't see any regression on a L3 use case.
> > > >
> > > > I am afraid that the DPDK performance regression tests are based
> on
> > > TX immediately following RX, so cache misses in TX may go by
> unnoticed
> > > because RX warmed up the cache for TX already. And similarly for RX
> > > reusing mbufs that have been warmed up by the preceding free() at
> TX.
> > > >
> > > > Please consider testing the performance difference with the mbuf
> > > being completely cold at TX, and going completely cold again before
> > > being reused for RX.
> > > >
> > > > >
> > > > > Let's sumarize: splitting a mbuf chain and freeing it causes
> > > subsequent
> > > > > mbuf
> > > > > allocation to return a mbuf which is not correctly initialized.
> > > There
> > > > > are 2
> > > > > options to fix it:
> > > > >
> > > > > 1/ change the mbuf free function (this patch)
> > > > >
> > > > >    - m->nb_segs would behave like many other field: valid in
> the
> > > first
> > > > >      segment, ignored in other segments
> > > > >    - may impact performance (suspected)
> > > > >
> > > > > 2/ change all places where a mbuf chain is split, or trimmed
> > > > >
> > > > >    - m->nb_segs would have a specific behavior: count the
> number of
> > > > >      segments in the first mbuf, should be 1 in the last
> segment,
> > > > >      ignored in other ones.
> > > > >    - no code change in mbuf library, so no performance impact
> > > > >    - need to patch all places where we do a mbuf split or trim.
> > > From
> > > > > afar,
> > > > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > > > applications
> > > > >      may have to be patched (for instance, I already found 3
> places
> > > in
> > > > >      6WIND code base without a deep search).
> > > > >
> > > > > In my opinion, 1/ is better, except we notice a significant
> > > > > performance,
> > > > > because the (implicit) behavior is unchanged.
> > > > >
> > > > > Whatever the solution, some documentation has to be added.
> > > > >
> > > > > Olivier
> > > > >
> > > >
> > > > Unfortunately, I don't think that anything but the first option
> will
> > > go into 20.11 and stable releases of older versions, so I stand by
> my
> > > acknowledgment of the patch.
> > >
> > > If we are affraid about 20.11 performance (it is legitimate, few
> days
> > > before the release), we can target 21.02. After all, everybody
> lives
> > > with this bug since 2017, so there is no urgency. If accepted and
> well
> > > tested, it can be backported in stable branches.
> >
> > +1
> >
> > Good thinking, Olivier!
> 
> Looking at the changes once again, it probably can be reworked a bit:
> 
> -	if (m->next != NULL) {
> -		m->next = NULL;
> -		m->nb_segs = 1;
> -	}
> 
> +	if (m->next != NULL)
> +		m->next = NULL;
> +	if (m->nb_segs != 1)
> +		m->nb_segs = 1;
> 
> That way we add one more condition checking, but I suppose it
> shouldn't be that perf critical.
> That way for direct,non-segmented mbuf it still should be write-free.
> Except cases as you described above: chain(), then split().
> 
> Of-course we still need to do perf testing for that approach too.
> So if your preference it to postpone it till 21.02 - that's ok for me.
> Konstantin

With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
  2020-11-05  0:15 ` Ananyev, Konstantin
@ 2020-11-08  7:25 ` Ali Alnubani
  2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 74+ messages in thread
From: Ali Alnubani @ 2020-11-08  7:25 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: konstantin.ananyev, stable, David Marchand

Hi Olivier,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Olivier Matz
> Sent: Wednesday, November 4, 2020 7:00 PM
> To: dev@dpdk.org
> Cc: konstantin.ananyev@intel.com; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
> 

I can reproduce the failure in "ci/iol-mellanox-Performance" locally also with this patch.
I see a 2 mpps degradation with single core on ConnectX-5. Packet size is 64B. Testpmd command:
"""
dpdk-testpmd -n 4  -w 0000:82:00.1  -w 0000:82:00.0 -c 0x9000  -- --burst=64 --mbcache=512 -i  --nb-cores=1  --txd=256 --rxd=256
"""

Regards,
Ali

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-06 12:23                                   ` Morten Brørup
@ 2020-11-08 14:16                                     ` Andrew Rybchenko
  2020-11-08 14:19                                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 74+ messages in thread
From: Andrew Rybchenko @ 2020-11-08 14:16 UTC (permalink / raw)
  To: Morten Brørup, Ananyev, Konstantin, Olivier Matz; +Cc: dev

On 11/6/20 3:23 PM, Morten Brørup wrote:
>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>> Sent: Friday, November 6, 2020 12:54 PM
>>
>>>>>>>>>>>>>>>>>> Hi Olivier,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
>>>> whatever
>>>>>> the
>>>>>>>> value
>>>>>>>>>> of m->next,
>>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
>> !=
>>>> 1.
>>>>>> For
>>>>>>>>>> instance in this
>>>>>>>>>>>>>>>>>>> case:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
>>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
>>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
>>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
>>>> nb_seg in
>>>>>> the
>>>>>>>>>> initial m1
>>>>>>>>>>>>>>>>>>> segment (this is not required), after
>> this
>>>> code
>>>>>> the
>>>>>>>>>> mbuf chain
>>>>>>>>>>>>>>>>>>> have 3 segments:
>>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
>>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
>>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
>> restore
>>>>>> nb_seg=1
>>>>>>>> in
>>>>>>>>>> the second
>>>>>>>>>>>>>>>>>>> segment.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
>>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
>>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
>> *m)
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> 	...
>>>>>>>>>>>>>>>>>> 	if (m->next != NULL) {
>>>>>>>>>>>>>>>>>>                          m->next = NULL;
>>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
>>>>>>>>>>>>>>>>>>                  }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
>> if()
>>>>>> block,
>>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
>>>>>>>>>>>>>>>>>> What I am missing here?
>>>>>>>>>>>>>>>>>> Thinking in more generic way, that
>> change:
>>>>>>>>>>>>>>>>>>   -		if (m->next != NULL) {
>>>>>>>>>>>>>>>>>>   -			m->next = NULL;
>>>>>>>>>>>>>>>>>>   -			m->nb_segs = 1;
>>>>>>>>>>>>>>>>>>   -		}
>>>>>>>>>>>>>>>>>>   +		m->next = NULL;
>>>>>>>>>>>>>>>>>>   +		m->nb_segs = 1;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
>> and
>>>> now
>>>>>> it
>>>>>>>> does
>>>>>>>>>> not
>>>>>>>>>>>>>>>>> show the issue...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The full example also adds a split() to
>> break
>>>> the
>>>>>>>> mbuf
>>>>>>>>>> chain
>>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
>>>> would
>>>>>> be
>>>>>>>> done
>>>>>>>>>> for
>>>>>>>>>>>>>>>>> software TCP segmentation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If so, may be the right solution is to care
>>>> about
>>>>>>>> nb_segs
>>>>>>>>>>>>>>>> when next is set to NULL on split? Any
>> place
>>>> when
>>>>>> next
>>>>>>>> is
>>>>>>>>>> set
>>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
>>>> more
>>>>>>>> generic
>>>>>>>>>> place.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem with that approach is that there
>> are
>>>>>> already
>>>>>>>>>> several
>>>>>>>>>>>>>>> existing split() or trim() implementations in
>>>>>> different
>>>>>>>> DPDK-
>>>>>>>>>> based
>>>>>>>>>>>>>>> applications. For instance, we have some in
>>>>>> 6WINDGate. If
>>>>>>>> we
>>>>>>>>>> force
>>>>>>>>>>>>>>> applications to set nb_seg to 1 when
>> resetting
>>>> next,
>>>>>> it
>>>>>>>> has
>>>>>>>>>> to be
>>>>>>>>>>>>>>> documented because it is not straightforward.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think it is better to go that way.
>>>>>>>>>>>>>>  From my perspective it seems natural to reset
>>>> nb_seg at
>>>>>>>> same
>>>>>>>>>> time
>>>>>>>>>>>>>> we reset next, otherwise inconsistency will
>> occur.
>>>>>>>>>>>>>
>>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
>> me
>>>> it
>>>>>> was
>>>>>>>> clear
>>>>>>>>>> that
>>>>>>>>>>>>> nb_segs is only valid in the first segment, like
>> for
>>>> many
>>>>>>>> fields
>>>>>>>>>> (port,
>>>>>>>>>>>>> ol_flags, vlan, rss, ...).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we say that nb_segs has to be valid in any
>>>> segments,
>>>>>> it
>>>>>>>> means
>>>>>>>>>> that
>>>>>>>>>>>>> chain() or split() will have to update it in all
>>>>>> segments,
>>>>>>>> which
>>>>>>>>>> is not
>>>>>>>>>>>>> efficient.
>>>>>>>>>>>>
>>>>>>>>>>>> Why in all?
>>>>>>>>>>>> We can state that nb_segs on non-first segment
>> should
>>>>>> always
>>>>>>>> equal
>>>>>>>>>> 1.
>>>>>>>>>>>> As I understand in that case, both split() and
>> chain()
>>>> have
>>>>>> to
>>>>>>>>>> update nb_segs
>>>>>>>>>>>> only for head mbufs, rest ones will remain
>> untouched.
>>>>>>>>>>>
>>>>>>>>>>> Well, anyway, I think it's strange to have a
>> constraint
>>>> on m-
>>>>>>>>> nb_segs
>>>>>>>>>> for
>>>>>>>>>>> non-first segment. We don't have that kind of
>> constraints
>>>> for
>>>>>>>> other
>>>>>>>>>> fields.
>>>>>>>>>>
>>>>>>>>>> True, we don't. But this is one of the fields we
>> consider
>>>>>> critical
>>>>>>>>>> for proper work of mbuf alloc/free mechanism.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
>>>>>> segments
>>>>>>>> will provide any benefits.
>>>>>>>>
>>>>>>>> It would make this patch unneeded.
>>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
>> remain
>>>>>> write-
>>>>>>>> free.
>>>>>>>
>>>>>>> I see. Then I agree with Konstantin that alternative
>> solutions
>>>> should
>>>>>> be considered.
>>>>>>>
>>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
>> is a
>>>>>> very common operation - certainly outweighs the cost of
>> requiring
>>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
>> 1.
>>>>>>>
>>>>>>> Nonetheless, the bug needs to be fixed somehow.
>>>>>>>
>>>>>>> If we can't come up with a better solution that doesn't break
>> the
>>>>>> ABI, we are forced to accept the patch.
>>>>>>>
>>>>>>> Unless the techboard accepts to break the ABI in order to
>> avoid
>>>> the
>>>>>> performance cost of this patch.
>>>>>>
>>>>>> Did someone notice a performance drop with this patch?
>>>>>> On my side, I don't see any regression on a L3 use case.
>>>>>
>>>>> I am afraid that the DPDK performance regression tests are based
>> on
>>>> TX immediately following RX, so cache misses in TX may go by
>> unnoticed
>>>> because RX warmed up the cache for TX already. And similarly for RX
>>>> reusing mbufs that have been warmed up by the preceding free() at
>> TX.
>>>>>
>>>>> Please consider testing the performance difference with the mbuf
>>>> being completely cold at TX, and going completely cold again before
>>>> being reused for RX.
>>>>>
>>>>>>
>>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
>>>> subsequent
>>>>>> mbuf
>>>>>> allocation to return a mbuf which is not correctly initialized.
>>>> There
>>>>>> are 2
>>>>>> options to fix it:
>>>>>>
>>>>>> 1/ change the mbuf free function (this patch)
>>>>>>
>>>>>>     - m->nb_segs would behave like many other field: valid in
>> the
>>>> first
>>>>>>       segment, ignored in other segments
>>>>>>     - may impact performance (suspected)
>>>>>>
>>>>>> 2/ change all places where a mbuf chain is split, or trimmed
>>>>>>
>>>>>>     - m->nb_segs would have a specific behavior: count the
>> number of
>>>>>>       segments in the first mbuf, should be 1 in the last
>> segment,
>>>>>>       ignored in other ones.
>>>>>>     - no code change in mbuf library, so no performance impact
>>>>>>     - need to patch all places where we do a mbuf split or trim.
>>>> From
>>>>>> afar,
>>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
>>>>>> applications
>>>>>>       may have to be patched (for instance, I already found 3
>> places
>>>> in
>>>>>>       6WIND code base without a deep search).
>>>>>>
>>>>>> In my opinion, 1/ is better, except we notice a significant
>>>>>> performance,
>>>>>> because the (implicit) behavior is unchanged.
>>>>>>
>>>>>> Whatever the solution, some documentation has to be added.
>>>>>>
>>>>>> Olivier
>>>>>>
>>>>>
>>>>> Unfortunately, I don't think that anything but the first option
>> will
>>>> go into 20.11 and stable releases of older versions, so I stand by
>> my
>>>> acknowledgment of the patch.
>>>>
>>>> If we are affraid about 20.11 performance (it is legitimate, few
>> days
>>>> before the release), we can target 21.02. After all, everybody
>> lives
>>>> with this bug since 2017, so there is no urgency. If accepted and
>> well
>>>> tested, it can be backported in stable branches.
>>>
>>> +1
>>>
>>> Good thinking, Olivier!
>>
>> Looking at the changes once again, it probably can be reworked a bit:
>>
>> -	if (m->next != NULL) {
>> -		m->next = NULL;
>> -		m->nb_segs = 1;
>> -	}
>>
>> +	if (m->next != NULL)
>> +		m->next = NULL;
>> +	if (m->nb_segs != 1)
>> +		m->nb_segs = 1;
>>
>> That way we add one more condition checking, but I suppose it
>> shouldn't be that perf critical.
>> That way for direct,non-segmented mbuf it still should be write-free.
>> Except cases as you described above: chain(), then split().
>>
>> Of-course we still need to do perf testing for that approach too.
>> So if your preference it to postpone it till 21.02 - that's ok for me.
>> Konstantin
> 
> With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.

+1



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-08 14:16                                     ` Andrew Rybchenko
@ 2020-11-08 14:19                                       ` Ananyev, Konstantin
  2020-11-10 16:26                                         ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Ananyev, Konstantin @ 2020-11-08 14:19 UTC (permalink / raw)
  To: Andrew Rybchenko, Morten Brørup, Olivier Matz; +Cc: dev



> >>
> >>>>>>>>>>>>>>>>>> Hi Olivier,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
> >>>> whatever
> >>>>>> the
> >>>>>>>> value
> >>>>>>>>>> of m->next,
> >>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
> >> !=
> >>>> 1.
> >>>>>> For
> >>>>>>>>>> instance in this
> >>>>>>>>>>>>>>>>>>> case:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
> >>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
> >>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
> >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
> >>>> nb_seg in
> >>>>>> the
> >>>>>>>>>> initial m1
> >>>>>>>>>>>>>>>>>>> segment (this is not required), after
> >> this
> >>>> code
> >>>>>> the
> >>>>>>>>>> mbuf chain
> >>>>>>>>>>>>>>>>>>> have 3 segments:
> >>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
> >>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
> >>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
> >> restore
> >>>>>> nb_seg=1
> >>>>>>>> in
> >>>>>>>>>> the second
> >>>>>>>>>>>>>>>>>>> segment.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
> >>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
> >>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
> >> *m)
> >>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>> 	...
> >>>>>>>>>>>>>>>>>> 	if (m->next != NULL) {
> >>>>>>>>>>>>>>>>>>                          m->next = NULL;
> >>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>>                  }
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
> >> if()
> >>>>>> block,
> >>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
> >>>>>>>>>>>>>>>>>> What I am missing here?
> >>>>>>>>>>>>>>>>>> Thinking in more generic way, that
> >> change:
> >>>>>>>>>>>>>>>>>>   -		if (m->next != NULL) {
> >>>>>>>>>>>>>>>>>>   -			m->next = NULL;
> >>>>>>>>>>>>>>>>>>   -			m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>>   -		}
> >>>>>>>>>>>>>>>>>>   +		m->next = NULL;
> >>>>>>>>>>>>>>>>>>   +		m->nb_segs = 1;
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
> >> and
> >>>> now
> >>>>>> it
> >>>>>>>> does
> >>>>>>>>>> not
> >>>>>>>>>>>>>>>>> show the issue...
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The full example also adds a split() to
> >> break
> >>>> the
> >>>>>>>> mbuf
> >>>>>>>>>> chain
> >>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
> >>>> would
> >>>>>> be
> >>>>>>>> done
> >>>>>>>>>> for
> >>>>>>>>>>>>>>>>> software TCP segmentation.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If so, may be the right solution is to care
> >>>> about
> >>>>>>>> nb_segs
> >>>>>>>>>>>>>>>> when next is set to NULL on split? Any
> >> place
> >>>> when
> >>>>>> next
> >>>>>>>> is
> >>>>>>>>>> set
> >>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
> >>>> more
> >>>>>>>> generic
> >>>>>>>>>> place.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The problem with that approach is that there
> >> are
> >>>>>> already
> >>>>>>>>>> several
> >>>>>>>>>>>>>>> existing split() or trim() implementations in
> >>>>>> different
> >>>>>>>> DPDK-
> >>>>>>>>>> based
> >>>>>>>>>>>>>>> applications. For instance, we have some in
> >>>>>> 6WINDGate. If
> >>>>>>>> we
> >>>>>>>>>> force
> >>>>>>>>>>>>>>> applications to set nb_seg to 1 when
> >> resetting
> >>>> next,
> >>>>>> it
> >>>>>>>> has
> >>>>>>>>>> to be
> >>>>>>>>>>>>>>> documented because it is not straightforward.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think it is better to go that way.
> >>>>>>>>>>>>>>  From my perspective it seems natural to reset
> >>>> nb_seg at
> >>>>>>>> same
> >>>>>>>>>> time
> >>>>>>>>>>>>>> we reset next, otherwise inconsistency will
> >> occur.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
> >> me
> >>>> it
> >>>>>> was
> >>>>>>>> clear
> >>>>>>>>>> that
> >>>>>>>>>>>>> nb_segs is only valid in the first segment, like
> >> for
> >>>> many
> >>>>>>>> fields
> >>>>>>>>>> (port,
> >>>>>>>>>>>>> ol_flags, vlan, rss, ...).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we say that nb_segs has to be valid in any
> >>>> segments,
> >>>>>> it
> >>>>>>>> means
> >>>>>>>>>> that
> >>>>>>>>>>>>> chain() or split() will have to update it in all
> >>>>>> segments,
> >>>>>>>> which
> >>>>>>>>>> is not
> >>>>>>>>>>>>> efficient.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Why in all?
> >>>>>>>>>>>> We can state that nb_segs on non-first segment
> >> should
> >>>>>> always
> >>>>>>>> equal
> >>>>>>>>>> 1.
> >>>>>>>>>>>> As I understand in that case, both split() and
> >> chain()
> >>>> have
> >>>>>> to
> >>>>>>>>>> update nb_segs
> >>>>>>>>>>>> only for head mbufs, rest ones will remain
> >> untouched.
> >>>>>>>>>>>
> >>>>>>>>>>> Well, anyway, I think it's strange to have a
> >> constraint
> >>>> on m-
> >>>>>>>>> nb_segs
> >>>>>>>>>> for
> >>>>>>>>>>> non-first segment. We don't have that kind of
> >> constraints
> >>>> for
> >>>>>>>> other
> >>>>>>>>>> fields.
> >>>>>>>>>>
> >>>>>>>>>> True, we don't. But this is one of the fields we
> >> consider
> >>>>>> critical
> >>>>>>>>>> for proper work of mbuf alloc/free mechanism.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
> >>>>>> segments
> >>>>>>>> will provide any benefits.
> >>>>>>>>
> >>>>>>>> It would make this patch unneeded.
> >>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
> >> remain
> >>>>>> write-
> >>>>>>>> free.
> >>>>>>>
> >>>>>>> I see. Then I agree with Konstantin that alternative
> >> solutions
> >>>> should
> >>>>>> be considered.
> >>>>>>>
> >>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
> >> is a
> >>>>>> very common operation - certainly outweighs the cost of
> >> requiring
> >>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
> >> 1.
> >>>>>>>
> >>>>>>> Nonetheless, the bug needs to be fixed somehow.
> >>>>>>>
> >>>>>>> If we can't come up with a better solution that doesn't break
> >> the
> >>>>>> ABI, we are forced to accept the patch.
> >>>>>>>
> >>>>>>> Unless the techboard accepts to break the ABI in order to
> >> avoid
> >>>> the
> >>>>>> performance cost of this patch.
> >>>>>>
> >>>>>> Did someone notice a performance drop with this patch?
> >>>>>> On my side, I don't see any regression on a L3 use case.
> >>>>>
> >>>>> I am afraid that the DPDK performance regression tests are based
> >> on
> >>>> TX immediately following RX, so cache misses in TX may go by
> >> unnoticed
> >>>> because RX warmed up the cache for TX already. And similarly for RX
> >>>> reusing mbufs that have been warmed up by the preceding free() at
> >> TX.
> >>>>>
> >>>>> Please consider testing the performance difference with the mbuf
> >>>> being completely cold at TX, and going completely cold again before
> >>>> being reused for RX.
> >>>>>
> >>>>>>
> >>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
> >>>> subsequent
> >>>>>> mbuf
> >>>>>> allocation to return a mbuf which is not correctly initialized.
> >>>> There
> >>>>>> are 2
> >>>>>> options to fix it:
> >>>>>>
> >>>>>> 1/ change the mbuf free function (this patch)
> >>>>>>
> >>>>>>     - m->nb_segs would behave like many other field: valid in
> >> the
> >>>> first
> >>>>>>       segment, ignored in other segments
> >>>>>>     - may impact performance (suspected)
> >>>>>>
> >>>>>> 2/ change all places where a mbuf chain is split, or trimmed
> >>>>>>
> >>>>>>     - m->nb_segs would have a specific behavior: count the
> >> number of
> >>>>>>       segments in the first mbuf, should be 1 in the last
> >> segment,
> >>>>>>       ignored in other ones.
> >>>>>>     - no code change in mbuf library, so no performance impact
> >>>>>>     - need to patch all places where we do a mbuf split or trim.
> >>>> From
> >>>>>> afar,
> >>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> >>>>>> applications
> >>>>>>       may have to be patched (for instance, I already found 3
> >> places
> >>>> in
> >>>>>>       6WIND code base without a deep search).
> >>>>>>
> >>>>>> In my opinion, 1/ is better, except we notice a significant
> >>>>>> performance,
> >>>>>> because the (implicit) behavior is unchanged.
> >>>>>>
> >>>>>> Whatever the solution, some documentation has to be added.
> >>>>>>
> >>>>>> Olivier
> >>>>>>
> >>>>>
> >>>>> Unfortunately, I don't think that anything but the first option
> >> will
> >>>> go into 20.11 and stable releases of older versions, so I stand by
> >> my
> >>>> acknowledgment of the patch.
> >>>>
> >>>> If we are affraid about 20.11 performance (it is legitimate, few
> >> days
> >>>> before the release), we can target 21.02. After all, everybody
> >> lives
> >>>> with this bug since 2017, so there is no urgency. If accepted and
> >> well
> >>>> tested, it can be backported in stable branches.
> >>>
> >>> +1
> >>>
> >>> Good thinking, Olivier!
> >>
> >> Looking at the changes once again, it probably can be reworked a bit:
> >>
> >> -	if (m->next != NULL) {
> >> -		m->next = NULL;
> >> -		m->nb_segs = 1;
> >> -	}
> >>
> >> +	if (m->next != NULL)
> >> +		m->next = NULL;
> >> +	if (m->nb_segs != 1)
> >> +		m->nb_segs = 1;
> >>
> >> That way we add one more condition checking, but I suppose it
> >> shouldn't be that perf critical.
> >> That way for direct,non-segmented mbuf it still should be write-free.
> >> Except cases as you described above: chain(), then split().
> >>
> >> Of-course we still need to do perf testing for that approach too.
> >> So if your preference it to postpone it till 21.02 - that's ok for me.
> >> Konstantin
> >
> > With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the
> mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
> 
> +1

I don't expect perf drop with that approach either.
But some perf testing still needs to be done, just in case 😊 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free
  2020-11-08 14:19                                       ` Ananyev, Konstantin
@ 2020-11-10 16:26                                         ` Olivier Matz
  0 siblings, 0 replies; 74+ messages in thread
From: Olivier Matz @ 2020-11-10 16:26 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Andrew Rybchenko, Morten Brørup, dev

On Sun, Nov 08, 2020 at 02:19:55PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > >>
> > >>>>>>>>>>>>>>>>>> Hi Olivier,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> m->nb_seg must be reset on mbuf free
> > >>>> whatever
> > >>>>>> the
> > >>>>>>>> value
> > >>>>>>>>>> of m->next,
> > >>>>>>>>>>>>>>>>>>> because it can happen that m->nb_seg is
> > >> !=
> > >>>> 1.
> > >>>>>> For
> > >>>>>>>>>> instance in this
> > >>>>>>>>>>>>>>>>>>> case:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>    m1 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m1, 500);
> > >>>>>>>>>>>>>>>>>>>    m2 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m2, 500);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m1, m2);
> > >>>>>>>>>>>>>>>>>>>    m0 = rte_pktmbuf_alloc(mp);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_append(m0, 500);
> > >>>>>>>>>>>>>>>>>>>    rte_pktmbuf_chain(m0, m1);
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> As rte_pktmbuf_chain() does not reset
> > >>>> nb_seg in
> > >>>>>> the
> > >>>>>>>>>> initial m1
> > >>>>>>>>>>>>>>>>>>> segment (this is not required), after
> > >> this
> > >>>> code
> > >>>>>> the
> > >>>>>>>>>> mbuf chain
> > >>>>>>>>>>>>>>>>>>> have 3 segments:
> > >>>>>>>>>>>>>>>>>>>    - m0: next=m1, nb_seg=3
> > >>>>>>>>>>>>>>>>>>>    - m1: next=m2, nb_seg=2
> > >>>>>>>>>>>>>>>>>>>    - m2: next=NULL, nb_seg=1
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Freeing this mbuf chain will not
> > >> restore
> > >>>>>> nb_seg=1
> > >>>>>>>> in
> > >>>>>>>>>> the second
> > >>>>>>>>>>>>>>>>>>> segment.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hmm, not sure why is that?
> > >>>>>>>>>>>>>>>>>> You are talking about freeing m1, right?
> > >>>>>>>>>>>>>>>>>> rte_pktmbuf_prefree_seg(struct rte_mbuf
> > >> *m)
> > >>>>>>>>>>>>>>>>>> {
> > >>>>>>>>>>>>>>>>>> 	...
> > >>>>>>>>>>>>>>>>>> 	if (m->next != NULL) {
> > >>>>>>>>>>>>>>>>>>                          m->next = NULL;
> > >>>>>>>>>>>>>>>>>>                          m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>>                  }
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> m1->next != NULL, so it will enter the
> > >> if()
> > >>>>>> block,
> > >>>>>>>>>>>>>>>>>> and will reset both next and nb_segs.
> > >>>>>>>>>>>>>>>>>> What I am missing here?
> > >>>>>>>>>>>>>>>>>> Thinking in more generic way, that
> > >> change:
> > >>>>>>>>>>>>>>>>>>   -		if (m->next != NULL) {
> > >>>>>>>>>>>>>>>>>>   -			m->next = NULL;
> > >>>>>>>>>>>>>>>>>>   -			m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>>   -		}
> > >>>>>>>>>>>>>>>>>>   +		m->next = NULL;
> > >>>>>>>>>>>>>>>>>>   +		m->nb_segs = 1;
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Ah, sorry. I oversimplified the example
> > >> and
> > >>>> now
> > >>>>>> it
> > >>>>>>>> does
> > >>>>>>>>>> not
> > >>>>>>>>>>>>>>>>> show the issue...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> The full example also adds a split() to
> > >> break
> > >>>> the
> > >>>>>>>> mbuf
> > >>>>>>>>>> chain
> > >>>>>>>>>>>>>>>>> between m1 and m2. The kind of thing that
> > >>>> would
> > >>>>>> be
> > >>>>>>>> done
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> software TCP segmentation.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> If so, may be the right solution is to care
> > >>>> about
> > >>>>>>>> nb_segs
> > >>>>>>>>>>>>>>>> when next is set to NULL on split? Any
> > >> place
> > >>>> when
> > >>>>>> next
> > >>>>>>>> is
> > >>>>>>>>>> set
> > >>>>>>>>>>>>>>>> to NULL. Just to keep the optimization in a
> > >>>> more
> > >>>>>>>> generic
> > >>>>>>>>>> place.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The problem with that approach is that there
> > >> are
> > >>>>>> already
> > >>>>>>>>>> several
> > >>>>>>>>>>>>>>> existing split() or trim() implementations in
> > >>>>>> different
> > >>>>>>>> DPDK-
> > >>>>>>>>>> based
> > >>>>>>>>>>>>>>> applications. For instance, we have some in
> > >>>>>> 6WINDGate. If
> > >>>>>>>> we
> > >>>>>>>>>> force
> > >>>>>>>>>>>>>>> applications to set nb_seg to 1 when
> > >> resetting
> > >>>> next,
> > >>>>>> it
> > >>>>>>>> has
> > >>>>>>>>>> to be
> > >>>>>>>>>>>>>>> documented because it is not straightforward.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think it is better to go that way.
> > >>>>>>>>>>>>>>  From my perspective it seems natural to reset
> > >>>> nb_seg at
> > >>>>>>>> same
> > >>>>>>>>>> time
> > >>>>>>>>>>>>>> we reset next, otherwise inconsistency will
> > >> occur.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> While it is not explicitly stated for nb_segs, to
> > >> me
> > >>>> it
> > >>>>>> was
> > >>>>>>>> clear
> > >>>>>>>>>> that
> > >>>>>>>>>>>>> nb_segs is only valid in the first segment, like
> > >> for
> > >>>> many
> > >>>>>>>> fields
> > >>>>>>>>>> (port,
> > >>>>>>>>>>>>> ol_flags, vlan, rss, ...).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> If we say that nb_segs has to be valid in any
> > >>>> segments,
> > >>>>>> it
> > >>>>>>>> means
> > >>>>>>>>>> that
> > >>>>>>>>>>>>> chain() or split() will have to update it in all
> > >>>>>> segments,
> > >>>>>>>> which
> > >>>>>>>>>> is not
> > >>>>>>>>>>>>> efficient.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Why in all?
> > >>>>>>>>>>>> We can state that nb_segs on non-first segment
> > >> should
> > >>>>>> always
> > >>>>>>>> equal
> > >>>>>>>>>> 1.
> > >>>>>>>>>>>> As I understand in that case, both split() and
> > >> chain()
> > >>>> have
> > >>>>>> to
> > >>>>>>>>>> update nb_segs
> > >>>>>>>>>>>> only for head mbufs, rest ones will remain
> > >> untouched.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Well, anyway, I think it's strange to have a
> > >> constraint
> > >>>> on m-
> > >>>>>>>>> nb_segs
> > >>>>>>>>>> for
> > >>>>>>>>>>> non-first segment. We don't have that kind of
> > >> constraints
> > >>>> for
> > >>>>>>>> other
> > >>>>>>>>>> fields.
> > >>>>>>>>>>
> > >>>>>>>>>> True, we don't. But this is one of the fields we
> > >> consider
> > >>>>>> critical
> > >>>>>>>>>> for proper work of mbuf alloc/free mechanism.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I am not sure that requiring m->nb_segs == 1 on non-first
> > >>>>>> segments
> > >>>>>>>> will provide any benefits.
> > >>>>>>>>
> > >>>>>>>> It would make this patch unneeded.
> > >>>>>>>> So, for direct, non-segmented mbufs  pktmbuf_free() will
> > >> remain
> > >>>>>> write-
> > >>>>>>>> free.
> > >>>>>>>
> > >>>>>>> I see. Then I agree with Konstantin that alternative
> > >> solutions
> > >>>> should
> > >>>>>> be considered.
> > >>>>>>>
> > >>>>>>> The benefit regarding free()'ing non-segmented mbufs - which
> > >> is a
> > >>>>>> very common operation - certainly outweighs the cost of
> > >> requiring
> > >>>>>> split()/chain() operations to set the new head mbuf's nb_segs =
> > >> 1.
> > >>>>>>>
> > >>>>>>> Nonetheless, the bug needs to be fixed somehow.
> > >>>>>>>
> > >>>>>>> If we can't come up with a better solution that doesn't break
> > >> the
> > >>>>>> ABI, we are forced to accept the patch.
> > >>>>>>>
> > >>>>>>> Unless the techboard accepts to break the ABI in order to
> > >> avoid
> > >>>> the
> > >>>>>> performance cost of this patch.
> > >>>>>>
> > >>>>>> Did someone notice a performance drop with this patch?
> > >>>>>> On my side, I don't see any regression on a L3 use case.
> > >>>>>
> > >>>>> I am afraid that the DPDK performance regression tests are based
> > >> on
> > >>>> TX immediately following RX, so cache misses in TX may go by
> > >> unnoticed
> > >>>> because RX warmed up the cache for TX already. And similarly for RX
> > >>>> reusing mbufs that have been warmed up by the preceding free() at
> > >> TX.
> > >>>>>
> > >>>>> Please consider testing the performance difference with the mbuf
> > >>>> being completely cold at TX, and going completely cold again before
> > >>>> being reused for RX.
> > >>>>>
> > >>>>>>
> > >>>>>> Let's sumarize: splitting a mbuf chain and freeing it causes
> > >>>> subsequent
> > >>>>>> mbuf
> > >>>>>> allocation to return a mbuf which is not correctly initialized.
> > >>>> There
> > >>>>>> are 2
> > >>>>>> options to fix it:
> > >>>>>>
> > >>>>>> 1/ change the mbuf free function (this patch)
> > >>>>>>
> > >>>>>>     - m->nb_segs would behave like many other field: valid in
> > >> the
> > >>>> first
> > >>>>>>       segment, ignored in other segments
> > >>>>>>     - may impact performance (suspected)
> > >>>>>>
> > >>>>>> 2/ change all places where a mbuf chain is split, or trimmed
> > >>>>>>
> > >>>>>>     - m->nb_segs would have a specific behavior: count the
> > >> number of
> > >>>>>>       segments in the first mbuf, should be 1 in the last
> > >> segment,
> > >>>>>>       ignored in other ones.
> > >>>>>>     - no code change in mbuf library, so no performance impact
> > >>>>>>     - need to patch all places where we do a mbuf split or trim.
> > >>>> From
> > >>>>>> afar,
> > >>>>>>       I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > >>>>>> applications
> > >>>>>>       may have to be patched (for instance, I already found 3
> > >> places
> > >>>> in
> > >>>>>>       6WIND code base without a deep search).
> > >>>>>>
> > >>>>>> In my opinion, 1/ is better, except we notice a significant
> > >>>>>> performance,
> > >>>>>> because the (implicit) behavior is unchanged.
> > >>>>>>
> > >>>>>> Whatever the solution, some documentation has to be added.
> > >>>>>>
> > >>>>>> Olivier
> > >>>>>>
> > >>>>>
> > >>>>> Unfortunately, I don't think that anything but the first option
> > >> will
> > >>>> go into 20.11 and stable releases of older versions, so I stand by
> > >> my
> > >>>> acknowledgment of the patch.
> > >>>>
> > >>>> If we are affraid about 20.11 performance (it is legitimate, few
> > >> days
> > >>>> before the release), we can target 21.02. After all, everybody
> > >> lives
> > >>>> with this bug since 2017, so there is no urgency. If accepted and
> > >> well
> > >>>> tested, it can be backported in stable branches.
> > >>>
> > >>> +1
> > >>>
> > >>> Good thinking, Olivier!
> > >>
> > >> Looking at the changes once again, it probably can be reworked a bit:
> > >>
> > >> -	if (m->next != NULL) {
> > >> -		m->next = NULL;
> > >> -		m->nb_segs = 1;
> > >> -	}
> > >>
> > >> +	if (m->next != NULL)
> > >> +		m->next = NULL;
> > >> +	if (m->nb_segs != 1)
> > >> +		m->nb_segs = 1;
> > >>
> > >> That way we add one more condition checking, but I suppose it
> > >> shouldn't be that perf critical.
> > >> That way for direct,non-segmented mbuf it still should be write-free.
> > >> Except cases as you described above: chain(), then split().
> > >>
> > >> Of-course we still need to do perf testing for that approach too.
> > >> So if your preference it to postpone it till 21.02 - that's ok for me.
> > >> Konstantin
> > >
> > > With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the
> > mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.
> > 
> > +1
> 
> I don't expect perf drop with that approach either.
> But some perf testing still needs to be done, just in case 😊 

I also agree with your suggestion Konstantin.
Let's postpone it right after 20.11 so we have more time to test.
I'll send a v2.

Olivier

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [dpdk-dev] [PATCH v2] mbuf: fix reset on mbuf free
  2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
  2020-11-05  0:15 ` Ananyev, Konstantin
  2020-11-08  7:25 ` Ali Alnubani
@ 2020-12-18 12:52 ` Olivier Matz
  2020-12-18 13:18   ` Morten Brørup
  2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
  2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
  4 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2020-12-18 12:52 UTC (permalink / raw)
  To: dev; +Cc: andrew.rybchenko, konstantin.ananyev, mb, stable

m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Then split this chain between m1 and m2, it would result in 2 packets:
  - first packet
    - m0: next=m1, nb_seg=3
    - m1: next=m2, nb_seg=2
  - second packet
    - m2: next=NULL, nb_seg=1

Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.c      |  4 ++--
 lib/librte_mbuf/rte_mbuf.h      |  8 ++++----
 lib/librte_mbuf/rte_mbuf_core.h | 13 +++++++++++--
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 7d09ee2939..5f77840557 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -129,10 +129,10 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
+	if (m->next != NULL)
 		m->next = NULL;
+	if (m->nb_segs != 1)
 		m->nb_segs = 1;
-	}
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c4c9ebfaa0..8c1097ed76 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1340,10 +1340,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 
 		return m;
 
@@ -1357,10 +1357,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 567551deab..78a1fcc8ff 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -495,7 +495,12 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
 	uint16_t refcnt;
-	uint16_t nb_segs;         /**< Number of segments. */
+
+	/**
+	 * Number of segments. Only valid for the first segment of an mbuf
+	 * chain.
+	 */
+	uint16_t nb_segs;
 
 	/** Input port (16 bits to support more than 256 virtual ports).
 	 * The event eth Tx adapter uses this field to specify the output port.
@@ -591,7 +596,11 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+	/**
+	 * Next segment of scattered packet. Must be NULL in the last segment or
+	 * in case of non-segmented packet.
+	 */
+	struct rte_mbuf *next;
 
 	/* fields to support TX offloads */
 	RTE_STD_C11
-- 
2.25.1


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v2] mbuf: fix reset on mbuf free
  2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
@ 2020-12-18 13:18   ` Morten Brørup
  2020-12-18 23:33     ` Ajit Khaparde
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2020-12-18 13:18 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: andrew.rybchenko, konstantin.ananyev, stable

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> 
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
> 
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
> 
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
> 
> Then split this chain between m1 and m2, it would result in 2 packets:
>   - first packet
>     - m0: next=m1, nb_seg=3
>     - m1: next=m2, nb_seg=2

I think you mean:

- m0: next=m1, nb_seg=2  //< nb_seg corrected
- m1: next=NULL, nb_seg=2   //< next corrected

>   - second packet
>     - m2: next=NULL, nb_seg=1
> 
> Freeing the first packet will not restore nb_seg=1 in the second
> segment. This is an issue because it is expected that mbufs stored
> in pool have their nb_seg field set to 1.
> 
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

The code looks good, so:
Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v2] mbuf: fix reset on mbuf free
  2020-12-18 13:18   ` Morten Brørup
@ 2020-12-18 23:33     ` Ajit Khaparde
  0 siblings, 0 replies; 74+ messages in thread
From: Ajit Khaparde @ 2020-12-18 23:33 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Olivier Matz, dev, andrew.rybchenko, konstantin.ananyev, stable

On Fri, Dec 18, 2020 at 5:18 AM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> >
> > m->nb_seg must be reset on mbuf free whatever the value of m->next,
> > because it can happen that m->nb_seg is != 1. For instance in this
> > case:
> >
> >   m1 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m1, 500);
> >   m2 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m2, 500);
> >   rte_pktmbuf_chain(m1, m2);
> >   m0 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m0, 500);
> >   rte_pktmbuf_chain(m0, m1);
> >
> > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > segment (this is not required), after this code the mbuf chain
> > have 3 segments:
> >   - m0: next=m1, nb_seg=3
> >   - m1: next=m2, nb_seg=2
> >   - m2: next=NULL, nb_seg=1
> >
> > Then split this chain between m1 and m2, it would result in 2 packets:
> >   - first packet
> >     - m0: next=m1, nb_seg=3
> >     - m1: next=m2, nb_seg=2
>
> I think you mean:
>
> - m0: next=m1, nb_seg=2  //< nb_seg corrected
> - m1: next=NULL, nb_seg=2   //< next corrected
>
> >   - second packet
> >     - m2: next=NULL, nb_seg=1
> >
> > Freeing the first packet will not restore nb_seg=1 in the second
> > segment. This is an issue because it is expected that mbufs stored
> > in pool have their nb_seg field set to 1.
> >
> > Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>
> The code looks good, so:
> Acked-by: Morten Brørup <mb@smartsharesystems.com>

Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [dpdk-dev] [PATCH v3] mbuf: fix reset on mbuf free
  2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
                   ` (2 preceding siblings ...)
  2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
@ 2021-01-06 13:33 ` Olivier Matz
  2021-01-10  9:28   ` Ali Alnubani
  2021-01-11 13:14   ` Ananyev, Konstantin
  2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
  4 siblings, 2 replies; 74+ messages in thread
From: Olivier Matz @ 2021-01-06 13:33 UTC (permalink / raw)
  To: dev
  Cc: andrew.rybchenko, konstantin.ananyev, mb, alialnu, ajitkhaparde,
	stable, Ajit Khaparde

m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Then split this chain between m1 and m2, it would result in 2 packets:
  - first packet
    - m0: next=m1, nb_seg=2
    - m1: next=NULL, nb_seg=2
  - second packet
    - m2: next=NULL, nb_seg=1

Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---

v3
* fix commit log again (thanks Morten for spotting it)

v2
* avoid write access if uneeded (suggested by Konstantin)
* enhance comments in mbuf header file (suggested by Morten)
* fix commit log


 lib/librte_mbuf/rte_mbuf.c      |  4 ++--
 lib/librte_mbuf/rte_mbuf.h      |  8 ++++----
 lib/librte_mbuf/rte_mbuf_core.h | 13 +++++++++++--
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 7d09ee2939..5f77840557 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -129,10 +129,10 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
+	if (m->next != NULL)
 		m->next = NULL;
+	if (m->nb_segs != 1)
 		m->nb_segs = 1;
-	}
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c4c9ebfaa0..8c1097ed76 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1340,10 +1340,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 
 		return m;
 
@@ -1357,10 +1357,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 567551deab..78a1fcc8ff 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -495,7 +495,12 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
 	uint16_t refcnt;
-	uint16_t nb_segs;         /**< Number of segments. */
+
+	/**
+	 * Number of segments. Only valid for the first segment of an mbuf
+	 * chain.
+	 */
+	uint16_t nb_segs;
 
 	/** Input port (16 bits to support more than 256 virtual ports).
 	 * The event eth Tx adapter uses this field to specify the output port.
@@ -591,7 +596,11 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+	/**
+	 * Next segment of scattered packet. Must be NULL in the last segment or
+	 * in case of non-segmented packet.
+	 */
+	struct rte_mbuf *next;
 
 	/* fields to support TX offloads */
 	RTE_STD_C11
-- 
2.29.2


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v3] mbuf: fix reset on mbuf free
  2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
@ 2021-01-10  9:28   ` Ali Alnubani
  2021-01-11 13:14   ` Ananyev, Konstantin
  1 sibling, 0 replies; 74+ messages in thread
From: Ali Alnubani @ 2021-01-10  9:28 UTC (permalink / raw)
  To: Olivier Matz, dev
  Cc: andrew.rybchenko, konstantin.ananyev, mb, ajitkhaparde, stable,
	Ajit Khaparde

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Wednesday, January 6, 2021 3:34 PM
> To: dev@dpdk.org
> Cc: andrew.rybchenko@oktetlabs.ru; konstantin.ananyev@intel.com;
> mb@smartsharesystems.com; Ali Alnubani <alialnu@nvidia.com>;
> ajitkhaparde@gmail.com; stable@dpdk.org; Ajit Khaparde
> <ajit.khaparde@broadcom.com>
> Subject: [PATCH v3] mbuf: fix reset on mbuf free
> 

Even though the performance tests on Mellanox NICs are passing, I see a performance drop of up to 0.5 mpps with 64B frames (tests only fail for 1 mpps drop or more):
https://mails.dpdk.org/archives/test-report/2021-January/172759.html

I'll verify this on local hardware and reply back.

Regards,
Ali 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v3] mbuf: fix reset on mbuf free
  2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
  2021-01-10  9:28   ` Ali Alnubani
@ 2021-01-11 13:14   ` Ananyev, Konstantin
  1 sibling, 0 replies; 74+ messages in thread
From: Ananyev, Konstantin @ 2021-01-11 13:14 UTC (permalink / raw)
  To: Olivier Matz, dev
  Cc: andrew.rybchenko, mb, alialnu, ajitkhaparde, stable, Ajit Khaparde

> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
> 
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
> 
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
> 
> Then split this chain between m1 and m2, it would result in 2 packets:
>   - first packet
>     - m0: next=m1, nb_seg=2
>     - m1: next=NULL, nb_seg=2
>   - second packet
>     - m2: next=NULL, nb_seg=1
> 
> Freeing the first packet will not restore nb_seg=1 in the second
> segment. This is an issue because it is expected that mbufs stored
> in pool have their nb_seg field set to 1.
> 
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> ---
> 
> v3
> * fix commit log again (thanks Morten for spotting it)
> 
> v2
> * avoid write access if uneeded (suggested by Konstantin)
> * enhance comments in mbuf header file (suggested by Morten)
> * fix commit log
> 
> 
>  lib/librte_mbuf/rte_mbuf.c      |  4 ++--
>  lib/librte_mbuf/rte_mbuf.h      |  8 ++++----
>  lib/librte_mbuf/rte_mbuf_core.h | 13 +++++++++++--
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 7d09ee2939..5f77840557 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -129,10 +129,10 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
> 
>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>  	m->ol_flags = EXT_ATTACHED_MBUF;
> -	if (m->next != NULL) {
> +	if (m->next != NULL)
>  		m->next = NULL;
> +	if (m->nb_segs != 1)
>  		m->nb_segs = 1;
> -	}
>  	rte_mbuf_raw_free(m);
>  }
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index c4c9ebfaa0..8c1097ed76 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1340,10 +1340,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> +		if (m->next != NULL)
>  			m->next = NULL;
> +		if (m->nb_segs != 1)
>  			m->nb_segs = 1;
> -		}
> 
>  		return m;
> 
> @@ -1357,10 +1357,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> +		if (m->next != NULL)
>  			m->next = NULL;
> +		if (m->nb_segs != 1)
>  			m->nb_segs = 1;
> -		}
>  		rte_mbuf_refcnt_set(m, 1);
> 
>  		return m;
> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index 567551deab..78a1fcc8ff 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -495,7 +495,12 @@ struct rte_mbuf {
>  	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
>  	 */
>  	uint16_t refcnt;
> -	uint16_t nb_segs;         /**< Number of segments. */
> +
> +	/**
> +	 * Number of segments. Only valid for the first segment of an mbuf
> +	 * chain.
> +	 */
> +	uint16_t nb_segs;
> 
>  	/** Input port (16 bits to support more than 256 virtual ports).
>  	 * The event eth Tx adapter uses this field to specify the output port.
> @@ -591,7 +596,11 @@ struct rte_mbuf {
>  	/* second cache line - fields only used in slow path or on TX */
>  	RTE_MARKER cacheline1 __rte_cache_min_aligned;
> 
> -	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
> +	/**
> +	 * Next segment of scattered packet. Must be NULL in the last segment or
> +	 * in case of non-segmented packet.
> +	 */
> +	struct rte_mbuf *next;
> 
>  	/* fields to support TX offloads */
>  	RTE_STD_C11
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.29.2


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [dpdk-dev] [PATCH v4] mbuf: fix reset on mbuf free
  2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
                   ` (3 preceding siblings ...)
  2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
@ 2021-01-13 13:27 ` Olivier Matz
  2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
                     ` (2 more replies)
  4 siblings, 3 replies; 74+ messages in thread
From: Olivier Matz @ 2021-01-13 13:27 UTC (permalink / raw)
  To: dev
  Cc: andrew.rybchenko, konstantin.ananyev, mb, alialnu, ajitkhaparde,
	stable, Ajit Khaparde

m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Then split this chain between m1 and m2, it would result in 2 packets:
  - first packet
    - m0: next=m1, nb_seg=2
    - m1: next=NULL, nb_seg=2
  - second packet
    - m2: next=NULL, nb_seg=1

Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

v4
* add a unit test (suggested by David)

v3
* fix commit log again (thanks Morten for spotting it)

v2
* avoid write access if uneeded (suggested by Konstantin)
* enhance comments in mbuf header file (suggested by Morten)
* fix commit log

 app/test/test_mbuf.c            | 69 +++++++++++++++++++++++++++++++++
 lib/librte_mbuf/rte_mbuf.c      |  4 +-
 lib/librte_mbuf/rte_mbuf.h      |  8 ++--
 lib/librte_mbuf/rte_mbuf_core.h | 13 ++++++-
 4 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index a40f7d4883..ad2cbab600 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -2677,6 +2677,70 @@ test_mbuf_dyn(struct rte_mempool *pktmbuf_pool)
 	return -1;
 }
 
+/* check that m->nb_segs and m->next are reset on mbuf free */
+static int
+test_nb_segs_and_next_reset(void)
+{
+	struct rte_mbuf *m0 = NULL, *m1 = NULL, *m2 = NULL;
+	struct rte_mempool *pool = NULL;
+
+	pool = rte_pktmbuf_pool_create("test_mbuf_reset",
+			3, 0, 0, MBUF_DATA_SIZE, SOCKET_ID_ANY);
+	if (pool == NULL)
+		GOTO_FAIL("Failed to create mbuf pool");
+
+	/* alloc mbufs */
+	m0 = rte_pktmbuf_alloc(pool);
+	m1 = rte_pktmbuf_alloc(pool);
+	m2 = rte_pktmbuf_alloc(pool);
+	if (m0 == NULL || m1 == NULL || m2 == NULL)
+		GOTO_FAIL("Failed to allocate mbuf");
+
+	/* append data in all of them */
+	if (rte_pktmbuf_append(m0, 500) == NULL ||
+			rte_pktmbuf_append(m1, 500) == NULL ||
+			rte_pktmbuf_append(m2, 500) == NULL)
+		GOTO_FAIL("Failed to append data in mbuf");
+
+	/* chain them in one mbuf m0 */
+	rte_pktmbuf_chain(m1, m2);
+	rte_pktmbuf_chain(m0, m1);
+	if (m0->nb_segs != 3 || m0->next != m1 || m1->next != m2 ||
+			m2->next != NULL) {
+		m1 = m2 = NULL;
+		GOTO_FAIL("Failed to chain mbufs");
+	}
+
+	/* split m0 chain in two, between m1 and m2 */
+	m0->nb_segs = 2;
+	m1->next = NULL;
+	m2->nb_segs = 1;
+
+	/* free the 2 mbuf chains m0 and m2  */
+	rte_pktmbuf_free(m0);
+	rte_pktmbuf_free(m2);
+
+	/* realloc the 3 mbufs */
+	m0 = rte_mbuf_raw_alloc(pool);
+	m1 = rte_mbuf_raw_alloc(pool);
+	m2 = rte_mbuf_raw_alloc(pool);
+	if (m0 == NULL || m1 == NULL || m2 == NULL)
+		GOTO_FAIL("Failed to reallocate mbuf");
+
+	/* ensure that m->next and m->nb_segs are reset allocated mbufs */
+	if (m0->nb_segs != 1 || m0->next != NULL ||
+			m1->nb_segs != 1 || m1->next != NULL ||
+			m2->nb_segs != 1 || m2->next != NULL)
+		GOTO_FAIL("nb_segs or next was not reset properly");
+
+	return 0;
+
+fail:
+	if (pool != NULL)
+		rte_mempool_free(pool);
+	return -1;
+}
+
 static int
 test_mbuf(void)
 {
@@ -2867,6 +2931,11 @@ test_mbuf(void)
 		goto err;
 	}
 
+	/* test reset of m->nb_segs and m->next on mbuf free */
+	if (test_nb_segs_and_next_reset() < 0) {
+		printf("test_nb_segs_and_next_reset() failed\n");
+		goto err;
+	}
 
 	ret = 0;
 err:
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 7d09ee2939..5f77840557 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -129,10 +129,10 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
+	if (m->next != NULL)
 		m->next = NULL;
+	if (m->nb_segs != 1)
 		m->nb_segs = 1;
-	}
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c4c9ebfaa0..8c1097ed76 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1340,10 +1340,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 
 		return m;
 
@@ -1357,10 +1357,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 567551deab..78a1fcc8ff 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -495,7 +495,12 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
 	uint16_t refcnt;
-	uint16_t nb_segs;         /**< Number of segments. */
+
+	/**
+	 * Number of segments. Only valid for the first segment of an mbuf
+	 * chain.
+	 */
+	uint16_t nb_segs;
 
 	/** Input port (16 bits to support more than 256 virtual ports).
 	 * The event eth Tx adapter uses this field to specify the output port.
@@ -591,7 +596,11 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+	/**
+	 * Next segment of scattered packet. Must be NULL in the last segment or
+	 * in case of non-segmented packet.
+	 */
+	struct rte_mbuf *next;
 
 	/* fields to support TX offloads */
 	RTE_STD_C11
-- 
2.29.2


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
@ 2021-01-15 13:59   ` David Marchand
  2021-01-15 18:39     ` Ali Alnubani
  2021-06-15 13:56   ` [dpdk-dev] " Morten Brørup
  2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
  2 siblings, 1 reply; 74+ messages in thread
From: David Marchand @ 2021-01-15 13:59 UTC (permalink / raw)
  To: Olivier Matz, Ali Alnubani
  Cc: dev, Andrew Rybchenko, Ananyev, Konstantin, Morten Brørup,
	ajitkhaparde, dpdk stable, Ajit Khaparde

On Wed, Jan 13, 2021 at 2:28 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
>
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
>
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
>
> Then split this chain between m1 and m2, it would result in 2 packets:
>   - first packet
>     - m0: next=m1, nb_seg=2
>     - m1: next=NULL, nb_seg=2
>   - second packet
>     - m2: next=NULL, nb_seg=1
>
> Freeing the first packet will not restore nb_seg=1 in the second
> segment. This is an issue because it is expected that mbufs stored
> in pool have their nb_seg field set to 1.
>
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>
> v4
> * add a unit test (suggested by David)

Olivier,

Thanks, for adding it.


Ali,

You reported some performance regression, did you confirm it?
If I get no reply by monday, I'll proceed with this patch.


Thanks.
-- 
David Marchand


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
@ 2021-01-15 18:39     ` Ali Alnubani
  2021-01-18 17:52       ` Ali Alnubani
  2021-01-21  9:19       ` Ferruh Yigit
  0 siblings, 2 replies; 74+ messages in thread
From: Ali Alnubani @ 2021-01-15 18:39 UTC (permalink / raw)
  To: David Marchand, Olivier Matz, Ferruh Yigit, zhaoyan.chen
  Cc: dev, Andrew Rybchenko, Ananyev, Konstantin, Morten Brørup,
	ajitkhaparde, dpdk stable, Ajit Khaparde

Hi,
Adding Ferruh and Zhaoyan,

> Ali,
> 
> You reported some performance regression, did you confirm it?
> If I get no reply by monday, I'll proceed with this patch.

Sure I'll confirm by Monday.

Doesn't the regression also reproduce on the Lab's Intel servers?
Even though the check iol-intel-Performance isn't failing, I can see that the throughput differences from expected for this patch are less than those of another patch that was tested only 20 minutes earlier. Both patches were applied to the same tree:

https://mails.dpdk.org/archives/test-report/2021-January/173927.html
> | 64         | 512     | 1.571                               |

https://mails.dpdk.org/archives/test-report/2021-January/173919.html
> | 64         | 512     | 2.698                               |

Assuming that pw86457 doesn't have an effect on this test, it looks to me that this patch caused a regression in Intel hardware as well.

Can someone update the baseline's expected values for the Intel NICs and rerun the test on this patch?

Thanks,
Ali

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-15 18:39     ` Ali Alnubani
@ 2021-01-18 17:52       ` Ali Alnubani
  2021-01-19  8:32         ` Olivier Matz
  2021-01-21  9:19       ` Ferruh Yigit
  1 sibling, 1 reply; 74+ messages in thread
From: Ali Alnubani @ 2021-01-18 17:52 UTC (permalink / raw)
  To: David Marchand, Olivier Matz, Ferruh Yigit, zhaoyan.chen
  Cc: dev, Andrew Rybchenko, Ananyev, Konstantin, Morten Brørup,
	ajitkhaparde, dpdk stable, Ajit Khaparde, Slava Ovsiienko,
	Alexander Kozyrev

Hi,
(Sorry had to resend this to some recipients due to mail server problems).

Just confirming that I can still reproduce the regression with single core and 64B frames on other servers.

- Ali

> -----Original Message-----
> From: Ali Alnubani <alialnu@nvidia.com>
> Sent: Friday, January 15, 2021 8:39 PM
> To: David Marchand <david.marchand@redhat.com>; Olivier Matz
> <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> zhaoyan.chen@intel.com
> Cc: dev <dev@dpdk.org>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Morten Brørup
> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> 
> Hi,
> Adding Ferruh and Zhaoyan,
> 
> > Ali,
> >
> > You reported some performance regression, did you confirm it?
> > If I get no reply by monday, I'll proceed with this patch.
> 
> Sure I'll confirm by Monday.
> 
> Doesn't the regression also reproduce on the Lab's Intel servers?
> Even though the check iol-intel-Performance isn't failing, I can see that the
> throughput differences from expected for this patch are less than those of
> another patch that was tested only 20 minutes earlier. Both patches were
> applied to the same tree:
> 
> https://mails.dpdk.org/archives/test-report/2021-January/173927.html
> > | 64         | 512     | 1.571                               |
> 
> https://mails.dpdk.org/archives/test-report/2021-January/173919.html
> > | 64         | 512     | 2.698                               |
> 
> Assuming that pw86457 doesn't have an effect on this test, it looks to me
> that this patch caused a regression in Intel hardware as well.
> 
> Can someone update the baseline's expected values for the Intel NICs and
> rerun the test on this patch?
> 
> Thanks,
> Ali

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-18 17:52       ` Ali Alnubani
@ 2021-01-19  8:32         ` Olivier Matz
  2021-01-19  8:53           ` Morten Brørup
  2021-01-19 14:04           ` Slava Ovsiienko
  0 siblings, 2 replies; 74+ messages in thread
From: Olivier Matz @ 2021-01-19  8:32 UTC (permalink / raw)
  To: Ali Alnubani
  Cc: David Marchand, Ferruh Yigit, zhaoyan.chen, dev,
	Andrew Rybchenko, Ananyev, Konstantin, Morten Brørup,
	ajitkhaparde, dpdk stable, Ajit Khaparde, Slava Ovsiienko,
	Alexander Kozyrev

Hi Ali,


On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> Hi,
> (Sorry had to resend this to some recipients due to mail server problems).
> 
> Just confirming that I can still reproduce the regression with single core and 64B frames on other servers.

Many thanks for the feedback. Can you please detail what is the amount
of performance loss in percent, and confirm the test case? (I suppose it
is testpmd io forward).

Unfortunatly, I won't be able to spend a lot of time on this soon (sorry
for that). So I see at least these 2 options:

- postpone the patch again, until I can find more time to analyze
  and optimize
- apply the patch if the performance loss is acceptable compared to
  the added value of fixing a bug

Regards,
Olivier


> 
> - Ali
> 
> > -----Original Message-----
> > From: Ali Alnubani <alialnu@nvidia.com>
> > Sent: Friday, January 15, 2021 8:39 PM
> > To: David Marchand <david.marchand@redhat.com>; Olivier Matz
> > <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > zhaoyan.chen@intel.com
> > Cc: dev <dev@dpdk.org>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Morten Brørup
> > <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> > <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> > Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> > 
> > Hi,
> > Adding Ferruh and Zhaoyan,
> > 
> > > Ali,
> > >
> > > You reported some performance regression, did you confirm it?
> > > If I get no reply by monday, I'll proceed with this patch.
> > 
> > Sure I'll confirm by Monday.
> > 
> > Doesn't the regression also reproduce on the Lab's Intel servers?
> > Even though the check iol-intel-Performance isn't failing, I can see that the
> > throughput differences from expected for this patch are less than those of
> > another patch that was tested only 20 minutes earlier. Both patches were
> > applied to the same tree:
> > 
> > https://mails.dpdk.org/archives/test-report/2021-January/173927.html
> > > | 64         | 512     | 1.571                               |
> > 
> > https://mails.dpdk.org/archives/test-report/2021-January/173919.html
> > > | 64         | 512     | 2.698                               |
> > 
> > Assuming that pw86457 doesn't have an effect on this test, it looks to me
> > that this patch caused a regression in Intel hardware as well.
> > 
> > Can someone update the baseline's expected values for the Intel NICs and
> > rerun the test on this patch?
> > 
> > Thanks,
> > Ali

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19  8:32         ` Olivier Matz
@ 2021-01-19  8:53           ` Morten Brørup
  2021-01-19 12:00             ` Ferruh Yigit
  2021-01-19 14:04           ` Slava Ovsiienko
  1 sibling, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-01-19  8:53 UTC (permalink / raw)
  To: Olivier Matz, Ali Alnubani
  Cc: David Marchand, Ferruh Yigit, zhaoyan.chen, dev,
	Andrew Rybchenko, Ananyev, Konstantin, ajitkhaparde, dpdk stable,
	Ajit Khaparde, Slava Ovsiienko, Alexander Kozyrev,
	Bruce Richardson

Could someone at Intel please update the test script to provide output according to the test plan? Or delegate to the right person.

According to the test plan, the information requested by Olivier should be in the test output already:
http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test_plan.rst?h=next

PS: I can't find out who is the maintainer of the test plan, so I'm randomly pointing my finger at the test plan doc copyright holder. :-)


Med venlig hilsen / kind regards
- Morten Brørup

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, January 19, 2021 9:32 AM
> To: Ali Alnubani
> Cc: David Marchand; Ferruh Yigit; zhaoyan.chen@intel.com; dev; Andrew
> Rybchenko; Ananyev, Konstantin; Morten Brørup; ajitkhaparde@gmail.com;
> dpdk stable; Ajit Khaparde; Slava Ovsiienko; Alexander Kozyrev
> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> 
> Hi Ali,
> 
> 
> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > Hi,
> > (Sorry had to resend this to some recipients due to mail server
> problems).
> >
> > Just confirming that I can still reproduce the regression with single
> core and 64B frames on other servers.
> 
> Many thanks for the feedback. Can you please detail what is the amount
> of performance loss in percent, and confirm the test case? (I suppose
> it
> is testpmd io forward).
> 
> Unfortunatly, I won't be able to spend a lot of time on this soon
> (sorry
> for that). So I see at least these 2 options:
> 
> - postpone the patch again, until I can find more time to analyze
>   and optimize
> - apply the patch if the performance loss is acceptable compared to
>   the added value of fixing a bug
> 
> Regards,
> Olivier
> 
> 
> >
> > - Ali
> >
> > > -----Original Message-----
> > > From: Ali Alnubani <alialnu@nvidia.com>
> > > Sent: Friday, January 15, 2021 8:39 PM
> > > To: David Marchand <david.marchand@redhat.com>; Olivier Matz
> > > <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > zhaoyan.chen@intel.com
> > > Cc: dev <dev@dpdk.org>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>; Morten Brørup
> > > <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> > > <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> > > Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> > >
> > > Hi,
> > > Adding Ferruh and Zhaoyan,
> > >
> > > > Ali,
> > > >
> > > > You reported some performance regression, did you confirm it?
> > > > If I get no reply by monday, I'll proceed with this patch.
> > >
> > > Sure I'll confirm by Monday.
> > >
> > > Doesn't the regression also reproduce on the Lab's Intel servers?
> > > Even though the check iol-intel-Performance isn't failing, I can
> see that the
> > > throughput differences from expected for this patch are less than
> those of
> > > another patch that was tested only 20 minutes earlier. Both patches
> were
> > > applied to the same tree:
> > >
> > > https://mails.dpdk.org/archives/test-report/2021-
> January/173927.html
> > > > | 64         | 512     | 1.571                               |
> > >
> > > https://mails.dpdk.org/archives/test-report/2021-
> January/173919.html
> > > > | 64         | 512     | 2.698                               |
> > >
> > > Assuming that pw86457 doesn't have an effect on this test, it looks
> to me
> > > that this patch caused a regression in Intel hardware as well.
> > >
> > > Can someone update the baseline's expected values for the Intel
> NICs and
> > > rerun the test on this patch?
> > >
> > > Thanks,
> > > Ali


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19  8:53           ` Morten Brørup
@ 2021-01-19 12:00             ` Ferruh Yigit
  2021-01-19 12:27               ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Ferruh Yigit @ 2021-01-19 12:00 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz, Ali Alnubani
  Cc: David Marchand, zhaoyan.chen, dev, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde,
	Slava Ovsiienko, Alexander Kozyrev, Bruce Richardson

On 1/19/2021 8:53 AM, Morten Brørup wrote:
> Could someone at Intel please update the test script to provide output according to the test plan? Or delegate to the right person.
> 
> According to the test plan, the information requested by Olivier should be in the test output already:
> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test_plan.rst?h=next
> 
> PS: I can't find out who is the maintainer of the test plan, so I'm randomly pointing my finger at the test plan doc copyright holder. :-)
> 

Hi Morten,

Ali has a request to update the expected baseline, to be able to detect the 
performance drops, let me internally figure out who can do this.

And do you have any other request, or asking same thing?

> 
> Med venlig hilsen / kind regards
> - Morten Brørup
> 
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Tuesday, January 19, 2021 9:32 AM
>> To: Ali Alnubani
>> Cc: David Marchand; Ferruh Yigit; zhaoyan.chen@intel.com; dev; Andrew
>> Rybchenko; Ananyev, Konstantin; Morten Brørup; ajitkhaparde@gmail.com;
>> dpdk stable; Ajit Khaparde; Slava Ovsiienko; Alexander Kozyrev
>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
>>
>> Hi Ali,
>>
>>
>> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
>>> Hi,
>>> (Sorry had to resend this to some recipients due to mail server
>> problems).
>>>
>>> Just confirming that I can still reproduce the regression with single
>> core and 64B frames on other servers.
>>
>> Many thanks for the feedback. Can you please detail what is the amount
>> of performance loss in percent, and confirm the test case? (I suppose
>> it
>> is testpmd io forward).
>>
>> Unfortunatly, I won't be able to spend a lot of time on this soon
>> (sorry
>> for that). So I see at least these 2 options:
>>
>> - postpone the patch again, until I can find more time to analyze
>>    and optimize
>> - apply the patch if the performance loss is acceptable compared to
>>    the added value of fixing a bug
>>
>> Regards,
>> Olivier
>>
>>
>>>
>>> - Ali
>>>
>>>> -----Original Message-----
>>>> From: Ali Alnubani <alialnu@nvidia.com>
>>>> Sent: Friday, January 15, 2021 8:39 PM
>>>> To: David Marchand <david.marchand@redhat.com>; Olivier Matz
>>>> <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
>>>> zhaoyan.chen@intel.com
>>>> Cc: dev <dev@dpdk.org>; Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
>>>> <konstantin.ananyev@intel.com>; Morten Brørup
>>>> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
>>>> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
>>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
>>>>
>>>> Hi,
>>>> Adding Ferruh and Zhaoyan,
>>>>
>>>>> Ali,
>>>>>
>>>>> You reported some performance regression, did you confirm it?
>>>>> If I get no reply by monday, I'll proceed with this patch.
>>>>
>>>> Sure I'll confirm by Monday.
>>>>
>>>> Doesn't the regression also reproduce on the Lab's Intel servers?
>>>> Even though the check iol-intel-Performance isn't failing, I can
>> see that the
>>>> throughput differences from expected for this patch are less than
>> those of
>>>> another patch that was tested only 20 minutes earlier. Both patches
>> were
>>>> applied to the same tree:
>>>>
>>>> https://mails.dpdk.org/archives/test-report/2021-
>> January/173927.html
>>>>> | 64         | 512     | 1.571                               |
>>>>
>>>> https://mails.dpdk.org/archives/test-report/2021-
>> January/173919.html
>>>>> | 64         | 512     | 2.698                               |
>>>>
>>>> Assuming that pw86457 doesn't have an effect on this test, it looks
>> to me
>>>> that this patch caused a regression in Intel hardware as well.
>>>>
>>>> Can someone update the baseline's expected values for the Intel
>> NICs and
>>>> rerun the test on this patch?
>>>>
>>>> Thanks,
>>>> Ali
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19 12:00             ` Ferruh Yigit
@ 2021-01-19 12:27               ` Morten Brørup
  2021-01-19 14:03                 ` Ferruh Yigit
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-01-19 12:27 UTC (permalink / raw)
  To: Ferruh Yigit, Olivier Matz, Ali Alnubani
  Cc: David Marchand, zhaoyan.chen, dev, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde,
	Slava Ovsiienko, Alexander Kozyrev, Bruce Richardson

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Tuesday, January 19, 2021 1:01 PM
> 
> On 1/19/2021 8:53 AM, Morten Brørup wrote:
> > Could someone at Intel please update the test script to provide
> output according to the test plan? Or delegate to the right person.
> >
> > According to the test plan, the information requested by Olivier
> should be in the test output already:
> >
> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test
> _plan.rst?h=next
> >
> > PS: I can't find out who is the maintainer of the test plan, so I'm
> randomly pointing my finger at the test plan doc copyright holder. :-)
> >
> 
> Hi Morten,
> 
> Ali has a request to update the expected baseline, to be able to detect
> the
> performance drops, let me internally figure out who can do this.
> 
> And do you have any other request, or asking same thing?
> 

Hi Ferruh,

I am asking for something else:

The test script does not provide the output that its documentation says that it does.

Apparently, the test script for nic_single_core_perf produces an output table with these four columns (as seen at https://lab.dpdk.org/results/dashboard/patchsets/15142/#env-18):

   +--------+--------------------+-----------------------+------------------------------+
   | Result | frame_size (bytes) | txd/rxd (descriptors) | throughput Difference (Mpps) |
   +--------+--------------------+-----------------------+------------------------------+
   | PASS   | 64                 | 512                   | 1.57100                      |
   +--------+--------------------+-----------------------+------------------------------+
   | PASS   | 64                 | 2048                  | 1.87500                      |
   +--------+--------------------+-----------------------+------------------------------+

But the test plan documentation (at http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test_plan.rst) says that this output should be produced:

   +------------+---------+-------------+---------+---------------------+
   | Frame Size | TXD/RXD |  Throughput |   Rate  | Expected Throughput |
   +------------+---------+-------------+---------+---------------------+
   |     64     |   512   | xxxxxx Mpps |   xxx % |     xxx    Mpps     |
   +------------+---------+-------------+---------+---------------------+
   |     64     |   2048  | xxxxxx Mpps |   xxx % |     xxx    Mpps     |
   +------------+---------+-------------+---------+---------------------+

Olivier and I am saying that only showing the Throughput Difference (Mpps) does not provide any perspective to the result.

I am requesting that the Expected Throughput (Mpps) should be shown in the result too, as documented in the test plan.

> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> >
> >> -----Original Message-----
> >> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> >> Sent: Tuesday, January 19, 2021 9:32 AM
> >> To: Ali Alnubani
> >> Cc: David Marchand; Ferruh Yigit; zhaoyan.chen@intel.com; dev;
> Andrew
> >> Rybchenko; Ananyev, Konstantin; Morten Brørup;
> ajitkhaparde@gmail.com;
> >> dpdk stable; Ajit Khaparde; Slava Ovsiienko; Alexander Kozyrev
> >> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> >>
> >> Hi Ali,
> >>
> >>
> >> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> >>> Hi,
> >>> (Sorry had to resend this to some recipients due to mail server
> >> problems).
> >>>
> >>> Just confirming that I can still reproduce the regression with
> single
> >> core and 64B frames on other servers.
> >>
> >> Many thanks for the feedback. Can you please detail what is the
> amount
> >> of performance loss in percent, and confirm the test case? (I
> suppose
> >> it
> >> is testpmd io forward).
> >>
> >> Unfortunatly, I won't be able to spend a lot of time on this soon
> >> (sorry
> >> for that). So I see at least these 2 options:
> >>
> >> - postpone the patch again, until I can find more time to analyze
> >>    and optimize
> >> - apply the patch if the performance loss is acceptable compared to
> >>    the added value of fixing a bug
> >>
> >> Regards,
> >> Olivier
> >>
> >>
> >>>
> >>> - Ali
> >>>
> >>>> -----Original Message-----
> >>>> From: Ali Alnubani <alialnu@nvidia.com>
> >>>> Sent: Friday, January 15, 2021 8:39 PM
> >>>> To: David Marchand <david.marchand@redhat.com>; Olivier Matz
> >>>> <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >>>> zhaoyan.chen@intel.com
> >>>> Cc: dev <dev@dpdk.org>; Andrew Rybchenko
> >>>> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> >>>> <konstantin.ananyev@intel.com>; Morten Brørup
> >>>> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> >>>> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> >>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> >>>>
> >>>> Hi,
> >>>> Adding Ferruh and Zhaoyan,
> >>>>
> >>>>> Ali,
> >>>>>
> >>>>> You reported some performance regression, did you confirm it?
> >>>>> If I get no reply by monday, I'll proceed with this patch.
> >>>>
> >>>> Sure I'll confirm by Monday.
> >>>>
> >>>> Doesn't the regression also reproduce on the Lab's Intel servers?
> >>>> Even though the check iol-intel-Performance isn't failing, I can
> >> see that the
> >>>> throughput differences from expected for this patch are less than
> >> those of
> >>>> another patch that was tested only 20 minutes earlier. Both
> patches
> >> were
> >>>> applied to the same tree:
> >>>>
> >>>> https://mails.dpdk.org/archives/test-report/2021-
> >> January/173927.html
> >>>>> | 64         | 512     | 1.571                               |
> >>>>
> >>>> https://mails.dpdk.org/archives/test-report/2021-
> >> January/173919.html
> >>>>> | 64         | 512     | 2.698                               |
> >>>>
> >>>> Assuming that pw86457 doesn't have an effect on this test, it
> looks
> >> to me
> >>>> that this patch caused a regression in Intel hardware as well.
> >>>>
> >>>> Can someone update the baseline's expected values for the Intel
> >> NICs and
> >>>> rerun the test on this patch?
> >>>>
> >>>> Thanks,
> >>>> Ali
> >
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19 12:27               ` Morten Brørup
@ 2021-01-19 14:03                 ` Ferruh Yigit
  2021-01-19 14:21                   ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Ferruh Yigit @ 2021-01-19 14:03 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz, Ali Alnubani
  Cc: David Marchand, zhaoyan.chen, dev, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde,
	Slava Ovsiienko, Alexander Kozyrev, Bruce Richardson

On 1/19/2021 12:27 PM, Morten Brørup wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
>> Sent: Tuesday, January 19, 2021 1:01 PM
>>
>> On 1/19/2021 8:53 AM, Morten Brørup wrote:
>>> Could someone at Intel please update the test script to provide
>> output according to the test plan? Or delegate to the right person.
>>>
>>> According to the test plan, the information requested by Olivier
>> should be in the test output already:
>>>
>> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test
>> _plan.rst?h=next
>>>
>>> PS: I can't find out who is the maintainer of the test plan, so I'm
>> randomly pointing my finger at the test plan doc copyright holder. :-)
>>>
>>
>> Hi Morten,
>>
>> Ali has a request to update the expected baseline, to be able to detect
>> the
>> performance drops, let me internally figure out who can do this.
>>
>> And do you have any other request, or asking same thing?
>>
> 
> Hi Ferruh,
> 
> I am asking for something else:
> 
> The test script does not provide the output that its documentation says that it does.
> 
> Apparently, the test script for nic_single_core_perf produces an output table with these four columns (as seen at https://lab.dpdk.org/results/dashboard/patchsets/15142/#env-18):
> 
>     +--------+--------------------+-----------------------+------------------------------+
>     | Result | frame_size (bytes) | txd/rxd (descriptors) | throughput Difference (Mpps) |
>     +--------+--------------------+-----------------------+------------------------------+
>     | PASS   | 64                 | 512                   | 1.57100                      |
>     +--------+--------------------+-----------------------+------------------------------+
>     | PASS   | 64                 | 2048                  | 1.87500                      |
>     +--------+--------------------+-----------------------+------------------------------+
> 
> But the test plan documentation (at http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test_plan.rst) says that this output should be produced:
> 
>     +------------+---------+-------------+---------+---------------------+
>     | Frame Size | TXD/RXD |  Throughput |   Rate  | Expected Throughput |
>     +------------+---------+-------------+---------+---------------------+
>     |     64     |   512   | xxxxxx Mpps |   xxx % |     xxx    Mpps     |
>     +------------+---------+-------------+---------+---------------------+
>     |     64     |   2048  | xxxxxx Mpps |   xxx % |     xxx    Mpps     |
>     +------------+---------+-------------+---------+---------------------+
> 
> Olivier and I am saying that only showing the Throughput Difference (Mpps) does not provide any perspective to the result.
> 
> I am requesting that the Expected Throughput (Mpps) should be shown in the result too, as documented in the test plan.
> 

Ahh, this has a history, when the initial community lab infrastructure prepared 
some vendor(s) didn't want to show the actual throughput numbers.

That is why this diff and baseline introduced, and this is the how current 
infrastructure works. So this is not something related to Intel.

And as you can imagine this is not a technical issue, some companies seems not 
willing to share their performance numbers via community lab, and I don't know 
if something changed here in last a few years.


>>>
>>> Med venlig hilsen / kind regards
>>> - Morten Brørup
>>>
>>>> -----Original Message-----
>>>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>>>> Sent: Tuesday, January 19, 2021 9:32 AM
>>>> To: Ali Alnubani
>>>> Cc: David Marchand; Ferruh Yigit; zhaoyan.chen@intel.com; dev;
>> Andrew
>>>> Rybchenko; Ananyev, Konstantin; Morten Brørup;
>> ajitkhaparde@gmail.com;
>>>> dpdk stable; Ajit Khaparde; Slava Ovsiienko; Alexander Kozyrev
>>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
>>>>
>>>> Hi Ali,
>>>>
>>>>
>>>> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
>>>>> Hi,
>>>>> (Sorry had to resend this to some recipients due to mail server
>>>> problems).
>>>>>
>>>>> Just confirming that I can still reproduce the regression with
>> single
>>>> core and 64B frames on other servers.
>>>>
>>>> Many thanks for the feedback. Can you please detail what is the
>> amount
>>>> of performance loss in percent, and confirm the test case? (I
>> suppose
>>>> it
>>>> is testpmd io forward).
>>>>
>>>> Unfortunatly, I won't be able to spend a lot of time on this soon
>>>> (sorry
>>>> for that). So I see at least these 2 options:
>>>>
>>>> - postpone the patch again, until I can find more time to analyze
>>>>     and optimize
>>>> - apply the patch if the performance loss is acceptable compared to
>>>>     the added value of fixing a bug
>>>>
>>>> Regards,
>>>> Olivier
>>>>
>>>>
>>>>>
>>>>> - Ali
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ali Alnubani <alialnu@nvidia.com>
>>>>>> Sent: Friday, January 15, 2021 8:39 PM
>>>>>> To: David Marchand <david.marchand@redhat.com>; Olivier Matz
>>>>>> <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
>>>>>> zhaoyan.chen@intel.com
>>>>>> Cc: dev <dev@dpdk.org>; Andrew Rybchenko
>>>>>> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
>>>>>> <konstantin.ananyev@intel.com>; Morten Brørup
>>>>>> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
>>>>>> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
>>>>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
>>>>>>
>>>>>> Hi,
>>>>>> Adding Ferruh and Zhaoyan,
>>>>>>
>>>>>>> Ali,
>>>>>>>
>>>>>>> You reported some performance regression, did you confirm it?
>>>>>>> If I get no reply by monday, I'll proceed with this patch.
>>>>>>
>>>>>> Sure I'll confirm by Monday.
>>>>>>
>>>>>> Doesn't the regression also reproduce on the Lab's Intel servers?
>>>>>> Even though the check iol-intel-Performance isn't failing, I can
>>>> see that the
>>>>>> throughput differences from expected for this patch are less than
>>>> those of
>>>>>> another patch that was tested only 20 minutes earlier. Both
>> patches
>>>> were
>>>>>> applied to the same tree:
>>>>>>
>>>>>> https://mails.dpdk.org/archives/test-report/2021-
>>>> January/173927.html
>>>>>>> | 64         | 512     | 1.571                               |
>>>>>>
>>>>>> https://mails.dpdk.org/archives/test-report/2021-
>>>> January/173919.html
>>>>>>> | 64         | 512     | 2.698                               |
>>>>>>
>>>>>> Assuming that pw86457 doesn't have an effect on this test, it
>> looks
>>>> to me
>>>>>> that this patch caused a regression in Intel hardware as well.
>>>>>>
>>>>>> Can someone update the baseline's expected values for the Intel
>>>> NICs and
>>>>>> rerun the test on this patch?
>>>>>>
>>>>>> Thanks,
>>>>>> Ali
>>>
>>
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19  8:32         ` Olivier Matz
  2021-01-19  8:53           ` Morten Brørup
@ 2021-01-19 14:04           ` Slava Ovsiienko
  2021-07-24  8:47             ` Thomas Monjalon
  1 sibling, 1 reply; 74+ messages in thread
From: Slava Ovsiienko @ 2021-01-19 14:04 UTC (permalink / raw)
  To: Olivier Matz, Ali Alnubani
  Cc: David Marchand, Ferruh Yigit, zhaoyan.chen, dev,
	Andrew Rybchenko, Ananyev, Konstantin, Morten Brørup,
	ajitkhaparde, dpdk stable, Ajit Khaparde, Alexander Kozyrev

Hi, All

Could we postpose this patch at least to rc2? We would like to conduct more investigations?

With best regards, Slava

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, January 19, 2021 10:32
> To: Ali Alnubani <alialnu@nvidia.com>
> Cc: David Marchand <david.marchand@redhat.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; zhaoyan.chen@intel.com; dev <dev@dpdk.org>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Morten Brørup
> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; Alexander Kozyrev
> <akozyrev@nvidia.com>
> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> 
> Hi Ali,
> 
> 
> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > Hi,
> > (Sorry had to resend this to some recipients due to mail server problems).
> >
> > Just confirming that I can still reproduce the regression with single core and
> 64B frames on other servers.
> 
> Many thanks for the feedback. Can you please detail what is the amount of
> performance loss in percent, and confirm the test case? (I suppose it is
> testpmd io forward).
> 
> Unfortunatly, I won't be able to spend a lot of time on this soon (sorry for
> that). So I see at least these 2 options:
> 
> - postpone the patch again, until I can find more time to analyze
>   and optimize
> - apply the patch if the performance loss is acceptable compared to
>   the added value of fixing a bug
> 
> Regards,
> Olivier
> 
> 
> >
> > - Ali
> >
> > > -----Original Message-----
> > > From: Ali Alnubani <alialnu@nvidia.com>
> > > Sent: Friday, January 15, 2021 8:39 PM
> > > To: David Marchand <david.marchand@redhat.com>; Olivier Matz
> > > <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > > zhaoyan.chen@intel.com
> > > Cc: dev <dev@dpdk.org>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>; Morten Brørup
> > > <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> > > <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> > > Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> > >
> > > Hi,
> > > Adding Ferruh and Zhaoyan,
> > >
> > > > Ali,
> > > >
> > > > You reported some performance regression, did you confirm it?
> > > > If I get no reply by monday, I'll proceed with this patch.
> > >
> > > Sure I'll confirm by Monday.
> > >
> > > Doesn't the regression also reproduce on the Lab's Intel servers?
> > > Even though the check iol-intel-Performance isn't failing, I can see
> > > that the throughput differences from expected for this patch are
> > > less than those of another patch that was tested only 20 minutes
> > > earlier. Both patches were applied to the same tree:
> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> > > ils.dpdk.org%2Farchives%2Ftest-report%2F2021-
> January%2F173927.html&a
> > >
> mp;data=04%7C01%7Cviacheslavo%40nvidia.com%7Ce2d18a8563fb42dfaba40
> 8d
> > >
> 8bc54c83b%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C63746641
> 96385
> > >
> 80374%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2lu
> MzIiLC
> > >
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VGvj%2F5GcAOxof6C
> mlZkq
> > > KXKOL52GctXcuL5RJXr1y8g%3D&amp;reserved=0
> > > > | 64         | 512     | 1.571                               |
> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> > > ils.dpdk.org%2Farchives%2Ftest-report%2F2021-
> January%2F173919.html&a
> > >
> mp;data=04%7C01%7Cviacheslavo%40nvidia.com%7Ce2d18a8563fb42dfaba40
> 8d
> > >
> 8bc54c83b%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C63746641
> 96385
> > >
> 80374%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2lu
> MzIiLC
> > >
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=XSirRbm5G0WwfxySe
> b0ALp
> > > owVosqoY6Nlv4UZCd1CZM%3D&amp;reserved=0
> > > > | 64         | 512     | 2.698                               |
> > >
> > > Assuming that pw86457 doesn't have an effect on this test, it looks
> > > to me that this patch caused a regression in Intel hardware as well.
> > >
> > > Can someone update the baseline's expected values for the Intel NICs
> > > and rerun the test on this patch?
> > >
> > > Thanks,
> > > Ali

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19 14:03                 ` Ferruh Yigit
@ 2021-01-19 14:21                   ` Morten Brørup
  2021-01-21  9:15                     ` Ferruh Yigit
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-01-19 14:21 UTC (permalink / raw)
  To: Ferruh Yigit, Olivier Matz, Ali Alnubani
  Cc: David Marchand, zhaoyan.chen, dev, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde,
	Slava Ovsiienko, Alexander Kozyrev, Bruce Richardson

> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Tuesday, January 19, 2021 3:03 PM
> 
> On 1/19/2021 12:27 PM, Morten Brørup wrote:
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> >> Sent: Tuesday, January 19, 2021 1:01 PM
> >>
> >> On 1/19/2021 8:53 AM, Morten Brørup wrote:
> >>> Could someone at Intel please update the test script to provide
> >> output according to the test plan? Or delegate to the right person.
> >>>
> >>> According to the test plan, the information requested by Olivier
> >> should be in the test output already:
> >>>
> >>
> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test
> >> _plan.rst?h=next
> >>>
> >>> PS: I can't find out who is the maintainer of the test plan, so I'm
> >> randomly pointing my finger at the test plan doc copyright holder.
> :-)
> >>>
> >>
> >> Hi Morten,
> >>
> >> Ali has a request to update the expected baseline, to be able to
> detect
> >> the
> >> performance drops, let me internally figure out who can do this.
> >>
> >> And do you have any other request, or asking same thing?
> >>
> >
> > Hi Ferruh,
> >
> > I am asking for something else:
> >
> > The test script does not provide the output that its documentation
> says that it does.
> >
> > Apparently, the test script for nic_single_core_perf produces an
> output table with these four columns (as seen at
> https://lab.dpdk.org/results/dashboard/patchsets/15142/#env-18):
> >
> >     +--------+--------------------+-----------------------+----------
> --------------------+
> >     | Result | frame_size (bytes) | txd/rxd (descriptors) |
> throughput Difference (Mpps) |
> >     +--------+--------------------+-----------------------+----------
> --------------------+
> >     | PASS   | 64                 | 512                   | 1.57100
> |
> >     +--------+--------------------+-----------------------+----------
> --------------------+
> >     | PASS   | 64                 | 2048                  | 1.87500
> |
> >     +--------+--------------------+-----------------------+----------
> --------------------+
> >
> > But the test plan documentation (at
> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test
> _plan.rst) says that this output should be produced:
> >
> >     +------------+---------+-------------+---------+-----------------
> ----+
> >     | Frame Size | TXD/RXD |  Throughput |   Rate  | Expected
> Throughput |
> >     +------------+---------+-------------+---------+-----------------
> ----+
> >     |     64     |   512   | xxxxxx Mpps |   xxx % |     xxx    Mpps
> |
> >     +------------+---------+-------------+---------+-----------------
> ----+
> >     |     64     |   2048  | xxxxxx Mpps |   xxx % |     xxx    Mpps
> |
> >     +------------+---------+-------------+---------+-----------------
> ----+
> >
> > Olivier and I am saying that only showing the Throughput Difference
> (Mpps) does not provide any perspective to the result.
> >
> > I am requesting that the Expected Throughput (Mpps) should be shown
> in the result too, as documented in the test plan.
> >
> 
> Ahh, this has a history, when the initial community lab infrastructure
> prepared
> some vendor(s) didn't want to show the actual throughput numbers.
> 
> That is why this diff and baseline introduced, and this is the how
> current
> infrastructure works. So this is not something related to Intel.
> 
> And as you can imagine this is not a technical issue, some companies
> seems not
> willing to share their performance numbers via community lab, and I
> don't know
> if something changed here in last a few years.
> 

That explains it!

If those companies still want to keep the community lab performance data hidden (which I don't object to), wouldn't it be better if the performance test scripts output the deviation from the expected throughput in percent (with one or two decimals after the comma) instead of in Mpps?

> 
> >>>
> >>> Med venlig hilsen / kind regards
> >>> - Morten Brørup
> >>>
> >>>> -----Original Message-----
> >>>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> >>>> Sent: Tuesday, January 19, 2021 9:32 AM
> >>>> To: Ali Alnubani
> >>>> Cc: David Marchand; Ferruh Yigit; zhaoyan.chen@intel.com; dev;
> >> Andrew
> >>>> Rybchenko; Ananyev, Konstantin; Morten Brørup;
> >> ajitkhaparde@gmail.com;
> >>>> dpdk stable; Ajit Khaparde; Slava Ovsiienko; Alexander Kozyrev
> >>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> >>>>
> >>>> Hi Ali,
> >>>>
> >>>>
> >>>> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> >>>>> Hi,
> >>>>> (Sorry had to resend this to some recipients due to mail server
> >>>> problems).
> >>>>>
> >>>>> Just confirming that I can still reproduce the regression with
> >> single
> >>>> core and 64B frames on other servers.
> >>>>
> >>>> Many thanks for the feedback. Can you please detail what is the
> >> amount
> >>>> of performance loss in percent, and confirm the test case? (I
> >> suppose
> >>>> it
> >>>> is testpmd io forward).
> >>>>
> >>>> Unfortunatly, I won't be able to spend a lot of time on this soon
> >>>> (sorry
> >>>> for that). So I see at least these 2 options:
> >>>>
> >>>> - postpone the patch again, until I can find more time to analyze
> >>>>     and optimize
> >>>> - apply the patch if the performance loss is acceptable compared
> to
> >>>>     the added value of fixing a bug
> >>>>
> >>>> Regards,
> >>>> Olivier
> >>>>
> >>>>
> >>>>>
> >>>>> - Ali
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Ali Alnubani <alialnu@nvidia.com>
> >>>>>> Sent: Friday, January 15, 2021 8:39 PM
> >>>>>> To: David Marchand <david.marchand@redhat.com>; Olivier Matz
> >>>>>> <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> >>>>>> zhaoyan.chen@intel.com
> >>>>>> Cc: dev <dev@dpdk.org>; Andrew Rybchenko
> >>>>>> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> >>>>>> <konstantin.ananyev@intel.com>; Morten Brørup
> >>>>>> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
> >>>>>> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> >>>>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf
> free
> >>>>>>
> >>>>>> Hi,
> >>>>>> Adding Ferruh and Zhaoyan,
> >>>>>>
> >>>>>>> Ali,
> >>>>>>>
> >>>>>>> You reported some performance regression, did you confirm it?
> >>>>>>> If I get no reply by monday, I'll proceed with this patch.
> >>>>>>
> >>>>>> Sure I'll confirm by Monday.
> >>>>>>
> >>>>>> Doesn't the regression also reproduce on the Lab's Intel
> servers?
> >>>>>> Even though the check iol-intel-Performance isn't failing, I can
> >>>> see that the
> >>>>>> throughput differences from expected for this patch are less
> than
> >>>> those of
> >>>>>> another patch that was tested only 20 minutes earlier. Both
> >> patches
> >>>> were
> >>>>>> applied to the same tree:
> >>>>>>
> >>>>>> https://mails.dpdk.org/archives/test-report/2021-
> >>>> January/173927.html
> >>>>>>> | 64         | 512     | 1.571                               |
> >>>>>>
> >>>>>> https://mails.dpdk.org/archives/test-report/2021-
> >>>> January/173919.html
> >>>>>>> | 64         | 512     | 2.698                               |
> >>>>>>
> >>>>>> Assuming that pw86457 doesn't have an effect on this test, it
> >> looks
> >>>> to me
> >>>>>> that this patch caused a regression in Intel hardware as well.
> >>>>>>
> >>>>>> Can someone update the baseline's expected values for the Intel
> >>>> NICs and
> >>>>>> rerun the test on this patch?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ali
> >>>
> >>
> >
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19 14:21                   ` Morten Brørup
@ 2021-01-21  9:15                     ` Ferruh Yigit
  0 siblings, 0 replies; 74+ messages in thread
From: Ferruh Yigit @ 2021-01-21  9:15 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz, Ali Alnubani
  Cc: David Marchand, zhaoyan.chen, dev, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde,
	Slava Ovsiienko, Alexander Kozyrev, Bruce Richardson, dpdklab

On 1/19/2021 2:21 PM, Morten Brørup wrote:
>> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
>> Sent: Tuesday, January 19, 2021 3:03 PM
>>
>> On 1/19/2021 12:27 PM, Morten Brørup wrote:
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
>>>> Sent: Tuesday, January 19, 2021 1:01 PM
>>>>
>>>> On 1/19/2021 8:53 AM, Morten Brørup wrote:
>>>>> Could someone at Intel please update the test script to provide
>>>> output according to the test plan? Or delegate to the right person.
>>>>>
>>>>> According to the test plan, the information requested by Olivier
>>>> should be in the test output already:
>>>>>
>>>>
>> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test
>>>> _plan.rst?h=next
>>>>>
>>>>> PS: I can't find out who is the maintainer of the test plan, so I'm
>>>> randomly pointing my finger at the test plan doc copyright holder.
>> :-)
>>>>>
>>>>
>>>> Hi Morten,
>>>>
>>>> Ali has a request to update the expected baseline, to be able to
>> detect
>>>> the
>>>> performance drops, let me internally figure out who can do this.
>>>>
>>>> And do you have any other request, or asking same thing?
>>>>
>>>
>>> Hi Ferruh,
>>>
>>> I am asking for something else:
>>>
>>> The test script does not provide the output that its documentation
>> says that it does.
>>>
>>> Apparently, the test script for nic_single_core_perf produces an
>> output table with these four columns (as seen at
>> https://lab.dpdk.org/results/dashboard/patchsets/15142/#env-18):
>>>
>>>      +--------+--------------------+-----------------------+----------
>> --------------------+
>>>      | Result | frame_size (bytes) | txd/rxd (descriptors) |
>> throughput Difference (Mpps) |
>>>      +--------+--------------------+-----------------------+----------
>> --------------------+
>>>      | PASS   | 64                 | 512                   | 1.57100
>> |
>>>      +--------+--------------------+-----------------------+----------
>> --------------------+
>>>      | PASS   | 64                 | 2048                  | 1.87500
>> |
>>>      +--------+--------------------+-----------------------+----------
>> --------------------+
>>>
>>> But the test plan documentation (at
>> http://git.dpdk.org/tools/dts/tree/test_plans/nic_single_core_perf_test
>> _plan.rst) says that this output should be produced:
>>>
>>>      +------------+---------+-------------+---------+-----------------
>> ----+
>>>      | Frame Size | TXD/RXD |  Throughput |   Rate  | Expected
>> Throughput |
>>>      +------------+---------+-------------+---------+-----------------
>> ----+
>>>      |     64     |   512   | xxxxxx Mpps |   xxx % |     xxx    Mpps
>> |
>>>      +------------+---------+-------------+---------+-----------------
>> ----+
>>>      |     64     |   2048  | xxxxxx Mpps |   xxx % |     xxx    Mpps
>> |
>>>      +------------+---------+-------------+---------+-----------------
>> ----+
>>>
>>> Olivier and I am saying that only showing the Throughput Difference
>> (Mpps) does not provide any perspective to the result.
>>>
>>> I am requesting that the Expected Throughput (Mpps) should be shown
>> in the result too, as documented in the test plan.
>>>
>>
>> Ahh, this has a history, when the initial community lab infrastructure
>> prepared
>> some vendor(s) didn't want to show the actual throughput numbers.
>>
>> That is why this diff and baseline introduced, and this is the how
>> current
>> infrastructure works. So this is not something related to Intel.
>>
>> And as you can imagine this is not a technical issue, some companies
>> seems not
>> willing to share their performance numbers via community lab, and I
>> don't know
>> if something changed here in last a few years.
>>
> 
> That explains it!
> 
> If those companies still want to keep the community lab performance data hidden (which I don't object to), wouldn't it be better if the performance test scripts output the deviation from the expected throughput in percent (with one or two decimals after the comma) instead of in Mpps?
> 

Sounds reasonable, I assume there is a reason behind it but I don't remember, 
cc'ed lab.


>>
>>>>>
>>>>> Med venlig hilsen / kind regards
>>>>> - Morten Brørup
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>>>>>> Sent: Tuesday, January 19, 2021 9:32 AM
>>>>>> To: Ali Alnubani
>>>>>> Cc: David Marchand; Ferruh Yigit; zhaoyan.chen@intel.com; dev;
>>>> Andrew
>>>>>> Rybchenko; Ananyev, Konstantin; Morten Brørup;
>>>> ajitkhaparde@gmail.com;
>>>>>> dpdk stable; Ajit Khaparde; Slava Ovsiienko; Alexander Kozyrev
>>>>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
>>>>>>
>>>>>> Hi Ali,
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
>>>>>>> Hi,
>>>>>>> (Sorry had to resend this to some recipients due to mail server
>>>>>> problems).
>>>>>>>
>>>>>>> Just confirming that I can still reproduce the regression with
>>>> single
>>>>>> core and 64B frames on other servers.
>>>>>>
>>>>>> Many thanks for the feedback. Can you please detail what is the
>>>> amount
>>>>>> of performance loss in percent, and confirm the test case? (I
>>>> suppose
>>>>>> it
>>>>>> is testpmd io forward).
>>>>>>
>>>>>> Unfortunatly, I won't be able to spend a lot of time on this soon
>>>>>> (sorry
>>>>>> for that). So I see at least these 2 options:
>>>>>>
>>>>>> - postpone the patch again, until I can find more time to analyze
>>>>>>      and optimize
>>>>>> - apply the patch if the performance loss is acceptable compared
>> to
>>>>>>      the added value of fixing a bug
>>>>>>
>>>>>> Regards,
>>>>>> Olivier
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - Ali
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ali Alnubani <alialnu@nvidia.com>
>>>>>>>> Sent: Friday, January 15, 2021 8:39 PM
>>>>>>>> To: David Marchand <david.marchand@redhat.com>; Olivier Matz
>>>>>>>> <olivier.matz@6wind.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
>>>>>>>> zhaoyan.chen@intel.com
>>>>>>>> Cc: dev <dev@dpdk.org>; Andrew Rybchenko
>>>>>>>> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
>>>>>>>> <konstantin.ananyev@intel.com>; Morten Brørup
>>>>>>>> <mb@smartsharesystems.com>; ajitkhaparde@gmail.com; dpdk stable
>>>>>>>> <stable@dpdk.org>; Ajit Khaparde <ajit.khaparde@broadcom.com>
>>>>>>>> Subject: Re: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf
>> free
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> Adding Ferruh and Zhaoyan,
>>>>>>>>
>>>>>>>>> Ali,
>>>>>>>>>
>>>>>>>>> You reported some performance regression, did you confirm it?
>>>>>>>>> If I get no reply by monday, I'll proceed with this patch.
>>>>>>>>
>>>>>>>> Sure I'll confirm by Monday.
>>>>>>>>
>>>>>>>> Doesn't the regression also reproduce on the Lab's Intel
>> servers?
>>>>>>>> Even though the check iol-intel-Performance isn't failing, I can
>>>>>> see that the
>>>>>>>> throughput differences from expected for this patch are less
>> than
>>>>>> those of
>>>>>>>> another patch that was tested only 20 minutes earlier. Both
>>>> patches
>>>>>> were
>>>>>>>> applied to the same tree:
>>>>>>>>
>>>>>>>> https://mails.dpdk.org/archives/test-report/2021-
>>>>>> January/173927.html
>>>>>>>>> | 64         | 512     | 1.571                               |
>>>>>>>>
>>>>>>>> https://mails.dpdk.org/archives/test-report/2021-
>>>>>> January/173919.html
>>>>>>>>> | 64         | 512     | 2.698                               |
>>>>>>>>
>>>>>>>> Assuming that pw86457 doesn't have an effect on this test, it
>>>> looks
>>>>>> to me
>>>>>>>> that this patch caused a regression in Intel hardware as well.
>>>>>>>>
>>>>>>>> Can someone update the baseline's expected values for the Intel
>>>>>> NICs and
>>>>>>>> rerun the test on this patch?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ali
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-15 18:39     ` Ali Alnubani
  2021-01-18 17:52       ` Ali Alnubani
@ 2021-01-21  9:19       ` Ferruh Yigit
  2021-01-21  9:29         ` Morten Brørup
  1 sibling, 1 reply; 74+ messages in thread
From: Ferruh Yigit @ 2021-01-21  9:19 UTC (permalink / raw)
  To: Ali Alnubani, David Marchand, Olivier Matz, zhaoyan.chen
  Cc: dev, Andrew Rybchenko, Ananyev, Konstantin, Morten Brørup,
	ajitkhaparde, dpdk stable, Ajit Khaparde, dpdklab

On 1/15/2021 6:39 PM, Ali Alnubani wrote:
> Hi,
> Adding Ferruh and Zhaoyan,
> 
>> Ali,
>>
>> You reported some performance regression, did you confirm it?
>> If I get no reply by monday, I'll proceed with this patch.
> 
> Sure I'll confirm by Monday.
> 
> Doesn't the regression also reproduce on the Lab's Intel servers?
> Even though the check iol-intel-Performance isn't failing, I can see that the throughput differences from expected for this patch are less than those of another patch that was tested only 20 minutes earlier. Both patches were applied to the same tree:
> 
> https://mails.dpdk.org/archives/test-report/2021-January/173927.html
>> | 64         | 512     | 1.571                               |
> 
> https://mails.dpdk.org/archives/test-report/2021-January/173919.html
>> | 64         | 512     | 2.698                               |
> 
> Assuming that pw86457 doesn't have an effect on this test, it looks to me that this patch caused a regression in Intel hardware as well.
> 
> Can someone update the baseline's expected values for the Intel NICs and rerun the test on this patch?
> 

Zhaoyan said that the baseline is calculated dynamically,
what I understand is baseline set based on previous days performance result, so 
it shouldn't require updating.

But cc'ed the lab for more details.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-21  9:19       ` Ferruh Yigit
@ 2021-01-21  9:29         ` Morten Brørup
  2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-01-21  9:29 UTC (permalink / raw)
  To: Ferruh Yigit, Ali Alnubani, David Marchand, Olivier Matz, zhaoyan.chen
  Cc: dev, Andrew Rybchenko, Ananyev, Konstantin, ajitkhaparde,
	dpdk stable, Ajit Khaparde, dpdklab

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Thursday, January 21, 2021 10:19 AM
> 
> On 1/15/2021 6:39 PM, Ali Alnubani wrote:
> > Hi,
> > Adding Ferruh and Zhaoyan,
> >
> >> Ali,
> >>
> >> You reported some performance regression, did you confirm it?
> >> If I get no reply by monday, I'll proceed with this patch.
> >
> > Sure I'll confirm by Monday.
> >
> > Doesn't the regression also reproduce on the Lab's Intel servers?
> > Even though the check iol-intel-Performance isn't failing, I can see
> that the throughput differences from expected for this patch are less
> than those of another patch that was tested only 20 minutes earlier.
> Both patches were applied to the same tree:
> >
> > https://mails.dpdk.org/archives/test-report/2021-January/173927.html
> >> | 64         | 512     | 1.571                               |
> >
> > https://mails.dpdk.org/archives/test-report/2021-January/173919.html
> >> | 64         | 512     | 2.698                               |
> >
> > Assuming that pw86457 doesn't have an effect on this test, it looks
> to me that this patch caused a regression in Intel hardware as well.
> >
> > Can someone update the baseline's expected values for the Intel NICs
> and rerun the test on this patch?
> >
> 
> Zhaoyan said that the baseline is calculated dynamically,
> what I understand is baseline set based on previous days performance
> result, so
> it shouldn't require updating.

That sounds smart!

Perhaps another reference baseline could be added, for informational purposes only:
Deviation from the performance of the last official release.

> 
> But cc'ed the lab for more details.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdklab] RE: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-21  9:29         ` Morten Brørup
@ 2021-01-21 16:35           ` Lincoln Lavoie
  2021-01-23  8:57             ` Morten Brørup
  2021-01-25 18:42             ` Ferruh Yigit
  0 siblings, 2 replies; 74+ messages in thread
From: Lincoln Lavoie @ 2021-01-21 16:35 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Ferruh Yigit, Ali Alnubani, David Marchand, Olivier Matz, Chen,
	Zhaoyan, dev, Andrew Rybchenko, Ananyev, Konstantin,
	ajitkhaparde, dpdk stable, Ajit Khaparde, dpdklab

Hi All,

Trying to follow the specific conversation.  It is correct, the lab does
not list the specific throughput values achieved by the hardware, as that
data can be sensitive to the hardware vendors, etc. The purpose of the lab
is to check for degradations caused by patches, so the difference is really
the important factor.  The comparison is against a prior run on the same
hardware, via the DPDK main branch, so any delta should be caused by the
specific patch changes (excluding statistical "wiggle").

If the group would prefer, we could calculate additional references if
desired (i.e. difference from the last official release, or a monthly run
of the current, etc.).  We just need the community to define their needs,
and we can add this to the development queue.

Cheers,
Lincoln


On Thu, Jan 21, 2021 at 4:29 AM Morten Brørup <mb@smartsharesystems.com>
wrote:

> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > Sent: Thursday, January 21, 2021 10:19 AM
> >
> > On 1/15/2021 6:39 PM, Ali Alnubani wrote:
> > > Hi,
> > > Adding Ferruh and Zhaoyan,
> > >
> > >> Ali,
> > >>
> > >> You reported some performance regression, did you confirm it?
> > >> If I get no reply by monday, I'll proceed with this patch.
> > >
> > > Sure I'll confirm by Monday.
> > >
> > > Doesn't the regression also reproduce on the Lab's Intel servers?
> > > Even though the check iol-intel-Performance isn't failing, I can see
> > that the throughput differences from expected for this patch are less
> > than those of another patch that was tested only 20 minutes earlier.
> > Both patches were applied to the same tree:
> > >
> > > https://mails.dpdk.org/archives/test-report/2021-January/173927.html
> > >> | 64         | 512     | 1.571                               |
> > >
> > > https://mails.dpdk.org/archives/test-report/2021-January/173919.html
> > >> | 64         | 512     | 2.698                               |
> > >
> > > Assuming that pw86457 doesn't have an effect on this test, it looks
> > to me that this patch caused a regression in Intel hardware as well.
> > >
> > > Can someone update the baseline's expected values for the Intel NICs
> > and rerun the test on this patch?
> > >
> >
> > Zhaoyan said that the baseline is calculated dynamically,
> > what I understand is baseline set based on previous days performance
> > result, so
> > it shouldn't require updating.
>
> That sounds smart!
>
> Perhaps another reference baseline could be added, for informational
> purposes only:
> Deviation from the performance of the last official release.
>
> >
> > But cc'ed the lab for more details.
>
>

-- 
*Lincoln Lavoie*
Senior Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
lylavoie@iol.unh.edu
https://www.iol.unh.edu
+1-603-674-2755 (m)
<https://www.iol.unh.edu>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdklab] RE: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
@ 2021-01-23  8:57             ` Morten Brørup
  2021-01-25 17:00               ` Brandon Lo
  2021-01-25 18:42             ` Ferruh Yigit
  1 sibling, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-01-23  8:57 UTC (permalink / raw)
  To: Lincoln Lavoie
  Cc: Ferruh Yigit, Ali Alnubani, David Marchand, Olivier Matz, Chen,
	Zhaoyan, dev, Andrew Rybchenko, Ananyev, Konstantin,
	ajitkhaparde, dpdk stable, Ajit Khaparde, dpdklab

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Lincoln Lavoie
> Sent: Thursday, January 21, 2021 5:35 PM
> To: Morten Brørup
> 
> Hi All,
> 
> Trying to follow the specific conversation.  It is correct, the lab
> does
> not list the specific throughput values achieved by the hardware, as
> that
> data can be sensitive to the hardware vendors, etc. The purpose of the
> lab
> is to check for degradations caused by patches, so the difference is
> really
> the important factor.  The comparison is against a prior run on the
> same
> hardware, via the DPDK main branch, so any delta should be caused by
> the
> specific patch changes (excluding statistical "wiggle").
> 

Thank you for clarifying, Lincoln.

This sounds like a perfect solution to the meet the purpose.

I request that you change the output to show the relative difference (i.e. percentage above/below baseline), instead of the absolute difference (i.e. number of packets per second).

By showing a percentage, anyone reading the test report can understand if the difference is insignificant, or big enough to require further discussion before accepting the patch. Showing the difference in packets per second requires the reader of the test report to have prior knowledge about the expected performance.

> If the group would prefer, we could calculate additional references if
> desired (i.e. difference from the last official release, or a monthly
> run
> of the current, etc.).  We just need the community to define their
> needs,
> and we can add this to the development queue.
> 

I retract my suggestion about adding additional references. You clearly explained how your baselining works, and I think it fully serves the needs of a regression test.

So please just put this small change in your development queue: Output the difference in percent with a few decimals after the comma, instead of the difference in packets per second.

For readability, performance drops should be shown as negative values, and increases as positive, e.g.:

Difference = (NewPPS - BaselinePPS) / BaselinePPS * 100.00 %.


If we want to compare performance against official releases, current or older, it should go elsewhere, not in the continuous testing environment. E.g. the release notes could include a report showing differences from the last few official releases. But that is a task for another day.


> Cheers,
> Lincoln
> 
> 
> On Thu, Jan 21, 2021 at 4:29 AM Morten Brørup
> <mb@smartsharesystems.com>
> wrote:
> 
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > > Sent: Thursday, January 21, 2021 10:19 AM
> > >
> > > On 1/15/2021 6:39 PM, Ali Alnubani wrote:
> > > > Hi,
> > > > Adding Ferruh and Zhaoyan,
> > > >
> > > >> Ali,
> > > >>
> > > >> You reported some performance regression, did you confirm it?
> > > >> If I get no reply by monday, I'll proceed with this patch.
> > > >
> > > > Sure I'll confirm by Monday.
> > > >
> > > > Doesn't the regression also reproduce on the Lab's Intel servers?
> > > > Even though the check iol-intel-Performance isn't failing, I can
> see
> > > that the throughput differences from expected for this patch are
> less
> > > than those of another patch that was tested only 20 minutes
> earlier.
> > > Both patches were applied to the same tree:
> > > >
> > > > https://mails.dpdk.org/archives/test-report/2021-
> January/173927.html
> > > >> | 64         | 512     | 1.571                               |
> > > >
> > > > https://mails.dpdk.org/archives/test-report/2021-
> January/173919.html
> > > >> | 64         | 512     | 2.698                               |
> > > >
> > > > Assuming that pw86457 doesn't have an effect on this test, it
> looks
> > > to me that this patch caused a regression in Intel hardware as
> well.
> > > >
> > > > Can someone update the baseline's expected values for the Intel
> NICs
> > > and rerun the test on this patch?
> > > >
> > >
> > > Zhaoyan said that the baseline is calculated dynamically,
> > > what I understand is baseline set based on previous days
> performance
> > > result, so
> > > it shouldn't require updating.
> >
> > That sounds smart!
> >
> > Perhaps another reference baseline could be added, for informational
> > purposes only:
> > Deviation from the performance of the last official release.
> >
> > >
> > > But cc'ed the lab for more details.
> >
> >
> 
> --
> *Lincoln Lavoie*
> Senior Engineer, Broadband Technologies
> 21 Madbury Rd., Ste. 100, Durham, NH 03824
> lylavoie@iol.unh.edu
> https://www.iol.unh.edu
> +1-603-674-2755 (m)
> <https://www.iol.unh.edu>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdklab] RE: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-23  8:57             ` Morten Brørup
@ 2021-01-25 17:00               ` Brandon Lo
  0 siblings, 0 replies; 74+ messages in thread
From: Brandon Lo @ 2021-01-25 17:00 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Lincoln Lavoie, Ferruh Yigit, Ali Alnubani, David Marchand,
	Olivier Matz, Chen, Zhaoyan, dev, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde, dpdklab

Hi,

This seems like a good task for us to do. I will see what it would
take in order to convert the difference into a decimal-formatted
percentage.
I have put this into bugzilla to keep track of this issue:
https://bugs.dpdk.org/show_bug.cgi?id=626

Thanks,
Brandon

On Sat, Jan 23, 2021 at 3:57 AM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Lincoln Lavoie
> > Sent: Thursday, January 21, 2021 5:35 PM
> > To: Morten Brørup
> >
> > Hi All,
> >
> > Trying to follow the specific conversation.  It is correct, the lab
> > does
> > not list the specific throughput values achieved by the hardware, as
> > that
> > data can be sensitive to the hardware vendors, etc. The purpose of the
> > lab
> > is to check for degradations caused by patches, so the difference is
> > really
> > the important factor.  The comparison is against a prior run on the
> > same
> > hardware, via the DPDK main branch, so any delta should be caused by
> > the
> > specific patch changes (excluding statistical "wiggle").
> >
>
> Thank you for clarifying, Lincoln.
>
> This sounds like a perfect solution to the meet the purpose.
>
> I request that you change the output to show the relative difference (i.e. percentage above/below baseline), instead of the absolute difference (i.e. number of packets per second).
>
> By showing a percentage, anyone reading the test report can understand if the difference is insignificant, or big enough to require further discussion before accepting the patch. Showing the difference in packets per second requires the reader of the test report to have prior knowledge about the expected performance.
>
> > If the group would prefer, we could calculate additional references if
> > desired (i.e. difference from the last official release, or a monthly
> > run
> > of the current, etc.).  We just need the community to define their
> > needs,
> > and we can add this to the development queue.
> >
>
> I retract my suggestion about adding additional references. You clearly explained how your baselining works, and I think it fully serves the needs of a regression test.
>
> So please just put this small change in your development queue: Output the difference in percent with a few decimals after the comma, instead of the difference in packets per second.
>
> For readability, performance drops should be shown as negative values, and increases as positive, e.g.:
>
> Difference = (NewPPS - BaselinePPS) / BaselinePPS * 100.00 %.
>
>
> If we want to compare performance against official releases, current or older, it should go elsewhere, not in the continuous testing environment. E.g. the release notes could include a report showing differences from the last few official releases. But that is a task for another day.
>
>
> > Cheers,
> > Lincoln
> >
> >
> > On Thu, Jan 21, 2021 at 4:29 AM Morten Brørup
> > <mb@smartsharesystems.com>
> > wrote:
> >
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > > > Sent: Thursday, January 21, 2021 10:19 AM
> > > >
> > > > On 1/15/2021 6:39 PM, Ali Alnubani wrote:
> > > > > Hi,
> > > > > Adding Ferruh and Zhaoyan,
> > > > >
> > > > >> Ali,
> > > > >>
> > > > >> You reported some performance regression, did you confirm it?
> > > > >> If I get no reply by monday, I'll proceed with this patch.
> > > > >
> > > > > Sure I'll confirm by Monday.
> > > > >
> > > > > Doesn't the regression also reproduce on the Lab's Intel servers?
> > > > > Even though the check iol-intel-Performance isn't failing, I can
> > see
> > > > that the throughput differences from expected for this patch are
> > less
> > > > than those of another patch that was tested only 20 minutes
> > earlier.
> > > > Both patches were applied to the same tree:
> > > > >
> > > > > https://mails.dpdk.org/archives/test-report/2021-
> > January/173927.html
> > > > >> | 64         | 512     | 1.571                               |
> > > > >
> > > > > https://mails.dpdk.org/archives/test-report/2021-
> > January/173919.html
> > > > >> | 64         | 512     | 2.698                               |
> > > > >
> > > > > Assuming that pw86457 doesn't have an effect on this test, it
> > looks
> > > > to me that this patch caused a regression in Intel hardware as
> > well.
> > > > >
> > > > > Can someone update the baseline's expected values for the Intel
> > NICs
> > > > and rerun the test on this patch?
> > > > >
> > > >
> > > > Zhaoyan said that the baseline is calculated dynamically,
> > > > what I understand is baseline set based on previous days
> > performance
> > > > result, so
> > > > it shouldn't require updating.
> > >
> > > That sounds smart!
> > >
> > > Perhaps another reference baseline could be added, for informational
> > > purposes only:
> > > Deviation from the performance of the last official release.
> > >
> > > >
> > > > But cc'ed the lab for more details.
> > >
> > >
> >
> > --
> > *Lincoln Lavoie*
> > Senior Engineer, Broadband Technologies
> > 21 Madbury Rd., Ste. 100, Durham, NH 03824
> > lylavoie@iol.unh.edu
> > https://www.iol.unh.edu
> > +1-603-674-2755 (m)
> > <https://www.iol.unh.edu>
>


-- 

Brandon Lo

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

blo@iol.unh.edu

www.iol.unh.edu

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdklab] RE: [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
  2021-01-23  8:57             ` Morten Brørup
@ 2021-01-25 18:42             ` Ferruh Yigit
  1 sibling, 0 replies; 74+ messages in thread
From: Ferruh Yigit @ 2021-01-25 18:42 UTC (permalink / raw)
  To: Lincoln Lavoie, Morten Brørup, Brandon Lo, Aaron Conole
  Cc: Ali Alnubani, David Marchand, Olivier Matz, Chen, Zhaoyan, dev,
	Andrew Rybchenko, Ananyev, Konstantin, ajitkhaparde, dpdk stable,
	Ajit Khaparde, dpdklab

On 1/21/2021 4:35 PM, Lincoln Lavoie wrote:
> Hi All,
> 
> Trying to follow the specific conversation.  It is correct, the lab does not 
> list the specific throughput values achieved by the hardware, as that data can 
> be sensitive to the hardware vendors, etc. The purpose of the lab is to check 
> for degradations caused by patches, so the difference is really the important 
> factor.  The comparison is against a prior run on the same hardware, via the 
> DPDK main branch, so any delta should be caused by the specific patch changes 
> (excluding statistical "wiggle").
> 
> If the group would prefer, we could calculate additional references if desired 
> (i.e. difference from the last official release, or a monthly run of the 
> current, etc.).  We just need the community to define their needs, and we can 
> add this to the development queue.
> 

Hi Brandon,

Can you also put above to the backlog, to display the performance difference to 
the a fixed point, like a previous release or a previous LTS?

Thanks,
ferruh

> Cheers,
> Lincoln
> 
> 
> On Thu, Jan 21, 2021 at 4:29 AM Morten Brørup <mb@smartsharesystems.com 
> <mailto:mb@smartsharesystems.com>> wrote:
> 
>      > From: dev [mailto:dev-bounces@dpdk.org <mailto:dev-bounces@dpdk.org>] On
>     Behalf Of Ferruh Yigit
>      > Sent: Thursday, January 21, 2021 10:19 AM
>      >
>      > On 1/15/2021 6:39 PM, Ali Alnubani wrote:
>      > > Hi,
>      > > Adding Ferruh and Zhaoyan,
>      > >
>      > >> Ali,
>      > >>
>      > >> You reported some performance regression, did you confirm it?
>      > >> If I get no reply by monday, I'll proceed with this patch.
>      > >
>      > > Sure I'll confirm by Monday.
>      > >
>      > > Doesn't the regression also reproduce on the Lab's Intel servers?
>      > > Even though the check iol-intel-Performance isn't failing, I can see
>      > that the throughput differences from expected for this patch are less
>      > than those of another patch that was tested only 20 minutes earlier.
>      > Both patches were applied to the same tree:
>      > >
>      > > https://mails.dpdk.org/archives/test-report/2021-January/173927.html
>     <https://mails.dpdk.org/archives/test-report/2021-January/173927.html>
>      > >> | 64         | 512     | 1.571                               |
>      > >
>      > > https://mails.dpdk.org/archives/test-report/2021-January/173919.html
>     <https://mails.dpdk.org/archives/test-report/2021-January/173919.html>
>      > >> | 64         | 512     | 2.698                               |
>      > >
>      > > Assuming that pw86457 doesn't have an effect on this test, it looks
>      > to me that this patch caused a regression in Intel hardware as well.
>      > >
>      > > Can someone update the baseline's expected values for the Intel NICs
>      > and rerun the test on this patch?
>      > >
>      >
>      > Zhaoyan said that the baseline is calculated dynamically,
>      > what I understand is baseline set based on previous days performance
>      > result, so
>      > it shouldn't require updating.
> 
>     That sounds smart!
> 
>     Perhaps another reference baseline could be added, for informational
>     purposes only:
>     Deviation from the performance of the last official release.
> 
>      >
>      > But cc'ed the lab for more details.
> 
> 
> 
> -- 
> *Lincoln Lavoie*
> Senior Engineer, Broadband Technologies
> 21 Madbury Rd., Ste. 100, Durham, NH 03824
> lylavoie@iol.unh.edu <mailto:lylavoie@iol.unh.edu>
> https://www.iol.unh.edu <https://www.iol.unh.edu>
> +1-603-674-2755 (m)
> <https://www.iol.unh.edu>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
  2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
@ 2021-06-15 13:56   ` Morten Brørup
  2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
  2 siblings, 0 replies; 74+ messages in thread
From: Morten Brørup @ 2021-06-15 13:56 UTC (permalink / raw)
  To: Olivier Matz, dev
  Cc: andrew.rybchenko, konstantin.ananyev, alialnu, ajitkhaparde,
	stable, Ajit Khaparde

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Wednesday, 13 January 2021 14.28

[snip]

> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> 
> v4
> * add a unit test (suggested by David)
> 
> v3
> * fix commit log again (thanks Morten for spotting it)
> 
> v2
> * avoid write access if uneeded (suggested by Konstantin)
> * enhance comments in mbuf header file (suggested by Morten)
> * fix commit log
> 
>  app/test/test_mbuf.c            | 69 +++++++++++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.c      |  4 +-
>  lib/librte_mbuf/rte_mbuf.h      |  8 ++--
>  lib/librte_mbuf/rte_mbuf_core.h | 13 ++++++-
>  4 files changed, 86 insertions(+), 8 deletions(-)
> 

[snip]

> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 7d09ee2939..5f77840557 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -129,10 +129,10 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void
> *opaque)
> 
>  	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
>  	m->ol_flags = EXT_ATTACHED_MBUF;
> -	if (m->next != NULL) {
> +	if (m->next != NULL)
>  		m->next = NULL;
> +	if (m->nb_segs != 1)
>  		m->nb_segs = 1;
> -	}
>  	rte_mbuf_raw_free(m);
>  }
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index c4c9ebfaa0..8c1097ed76 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1340,10 +1340,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> +		if (m->next != NULL)
>  			m->next = NULL;
> +		if (m->nb_segs != 1)
>  			m->nb_segs = 1;
> -		}
> 
>  		return m;
> 
> @@ -1357,10 +1357,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>  				return NULL;
>  		}
> 
> -		if (m->next != NULL) {
> +		if (m->next != NULL)
>  			m->next = NULL;
> +		if (m->nb_segs != 1)
>  			m->nb_segs = 1;
> -		}
>  		rte_mbuf_refcnt_set(m, 1);
> 
>  		return m;

Olivier, have you considered if Konstantin's suggestion (to avoid write access if unneeded) could be applied to any of the other functions in rte_mbuf.h, e.g. __rte_pktmbuf_free_direct()?



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-01-19 14:04           ` Slava Ovsiienko
@ 2021-07-24  8:47             ` Thomas Monjalon
  2021-07-30 12:36               ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Monjalon @ 2021-07-24  8:47 UTC (permalink / raw)
  To: Olivier Matz, Ali Alnubani, David Marchand, Alexander Kozyrev,
	Slava Ovsiienko
  Cc: dev, Ferruh Yigit, zhaoyan.chen, Andrew Rybchenko, Ananyev,
	Konstantin, Morten Brørup, ajitkhaparde, dpdk stable,
	Ajit Khaparde

What's the follow-up for this patch?

19/01/2021 15:04, Slava Ovsiienko:
> Hi, All
> 
> Could we postpose this patch at least to rc2? We would like to conduct more investigations?
> 
> With best regards, Slava
> 
> From: Olivier Matz <olivier.matz@6wind.com>
> > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > Hi,
> > > (Sorry had to resend this to some recipients due to mail server problems).
> > >
> > > Just confirming that I can still reproduce the regression with single core and
> > 64B frames on other servers.
> > 
> > Many thanks for the feedback. Can you please detail what is the amount of
> > performance loss in percent, and confirm the test case? (I suppose it is
> > testpmd io forward).
> > 
> > Unfortunatly, I won't be able to spend a lot of time on this soon (sorry for
> > that). So I see at least these 2 options:
> > 
> > - postpone the patch again, until I can find more time to analyze
> >   and optimize
> > - apply the patch if the performance loss is acceptable compared to
> >   the added value of fixing a bug
> > 
> > Regards,
> > Olivier
> > 
[...]
> > > > Assuming that pw86457 doesn't have an effect on this test, it looks
> > > > to me that this patch caused a regression in Intel hardware as well.
> > > >
> > > > Can someone update the baseline's expected values for the Intel NICs
> > > > and rerun the test on this patch?
> > > >
> > > > Thanks,
> > > > Ali





^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-07-24  8:47             ` Thomas Monjalon
@ 2021-07-30 12:36               ` Olivier Matz
  2021-07-30 14:35                 ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2021-07-30 12:36 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ali Alnubani, David Marchand, Alexander Kozyrev, Slava Ovsiienko,
	dev, Ferruh Yigit, zhaoyan.chen, Andrew Rybchenko, Ananyev,
	Konstantin, Morten Brørup, ajitkhaparde, dpdk stable,
	Ajit Khaparde

Hi Thomas,

On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> What's the follow-up for this patch?

Unfortunatly, I still don't have the time to work on this topic yet.

In my initial tests, in our lab, I didn't notice any performance
regression, but Ali has seen an impact (0.5M PPS, but I don't know how
much in percent).


> 19/01/2021 15:04, Slava Ovsiienko:
> > Hi, All
> > 
> > Could we postpose this patch at least to rc2? We would like to conduct more investigations?
> > 
> > With best regards, Slava
> > 
> > From: Olivier Matz <olivier.matz@6wind.com>
> > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > Hi,
> > > > (Sorry had to resend this to some recipients due to mail server problems).
> > > >
> > > > Just confirming that I can still reproduce the regression with single core and
> > > 64B frames on other servers.
> > > 
> > > Many thanks for the feedback. Can you please detail what is the amount of
> > > performance loss in percent, and confirm the test case? (I suppose it is
> > > testpmd io forward).
> > > 
> > > Unfortunatly, I won't be able to spend a lot of time on this soon (sorry for
> > > that). So I see at least these 2 options:
> > > 
> > > - postpone the patch again, until I can find more time to analyze
> > >   and optimize
> > > - apply the patch if the performance loss is acceptable compared to
> > >   the added value of fixing a bug
> > > 
> [...]

Statu quo...

Olivier

> > > > > Assuming that pw86457 doesn't have an effect on this test, it looks
> > > > > to me that this patch caused a regression in Intel hardware as well.
> > > > >
> > > > > Can someone update the baseline's expected values for the Intel NICs
> > > > > and rerun the test on this patch?
> > > > >
> > > > > Thanks,
> > > > > Ali
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-07-30 12:36               ` Olivier Matz
@ 2021-07-30 14:35                 ` Morten Brørup
  2021-07-30 14:54                   ` Thomas Monjalon
  0 siblings, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-07-30 14:35 UTC (permalink / raw)
  To: Olivier Matz, Thomas Monjalon
  Cc: Ali Alnubani, David Marchand, Alexander Kozyrev, Slava Ovsiienko,
	dev, Ferruh Yigit, zhaoyan.chen, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, 30 July 2021 14.37
> 
> Hi Thomas,
> 
> On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > What's the follow-up for this patch?
> 
> Unfortunatly, I still don't have the time to work on this topic yet.
> 
> In my initial tests, in our lab, I didn't notice any performance
> regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> much in percent).
> 
> 
> > 19/01/2021 15:04, Slava Ovsiienko:
> > > Hi, All
> > >
> > > Could we postpose this patch at least to rc2? We would like to
> conduct more investigations?
> > >
> > > With best regards, Slava
> > >
> > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > > Hi,
> > > > > (Sorry had to resend this to some recipients due to mail server
> problems).
> > > > >
> > > > > Just confirming that I can still reproduce the regression with
> single core and
> > > > 64B frames on other servers.
> > > >
> > > > Many thanks for the feedback. Can you please detail what is the
> amount of
> > > > performance loss in percent, and confirm the test case? (I
> suppose it is
> > > > testpmd io forward).
> > > >
> > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> (sorry for
> > > > that). So I see at least these 2 options:
> > > >
> > > > - postpone the patch again, until I can find more time to analyze
> > > >   and optimize
> > > > - apply the patch if the performance loss is acceptable compared
> to
> > > >   the added value of fixing a bug
> > > >
> > [...]
> 
> Statu quo...
> 
> Olivier
> 

The decision should be simple:

Does the DPDK project support segmented packets?
If yes, then apply the patch to fix the bug!

If anyone seriously cares about the regression it introduces, optimization patches are welcome later. We shouldn't wait for it.

If the patch is not applied, the documentation must be updated to mention that we are releasing DPDK with a known bug: that segmented packets are handled incorrectly in the scenario described in this patch.


Generally, there could be some performance to gain by not supporting segmented packets at all, as a compile time option. But that is a different discussion.


-Morten


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-07-30 14:35                 ` Morten Brørup
@ 2021-07-30 14:54                   ` Thomas Monjalon
  2021-07-30 15:14                     ` Olivier Matz
  2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
  0 siblings, 2 replies; 74+ messages in thread
From: Thomas Monjalon @ 2021-07-30 14:54 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Olivier Matz, Ali Alnubani, David Marchand, Alexander Kozyrev,
	Slava Ovsiienko, dev, Ferruh Yigit, zhaoyan.chen,
	Andrew Rybchenko, Ananyev, Konstantin, ajitkhaparde, dpdk stable,
	Ajit Khaparde

30/07/2021 16:35, Morten Brørup:
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Friday, 30 July 2021 14.37
> > 
> > Hi Thomas,
> > 
> > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > What's the follow-up for this patch?
> > 
> > Unfortunatly, I still don't have the time to work on this topic yet.
> > 
> > In my initial tests, in our lab, I didn't notice any performance
> > regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> > much in percent).
> > 
> > 
> > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > Hi, All
> > > >
> > > > Could we postpose this patch at least to rc2? We would like to
> > conduct more investigations?
> > > >
> > > > With best regards, Slava
> > > >
> > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > > > Hi,
> > > > > > (Sorry had to resend this to some recipients due to mail server
> > problems).
> > > > > >
> > > > > > Just confirming that I can still reproduce the regression with
> > single core and
> > > > > 64B frames on other servers.
> > > > >
> > > > > Many thanks for the feedback. Can you please detail what is the
> > amount of
> > > > > performance loss in percent, and confirm the test case? (I
> > suppose it is
> > > > > testpmd io forward).
> > > > >
> > > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> > (sorry for
> > > > > that). So I see at least these 2 options:
> > > > >
> > > > > - postpone the patch again, until I can find more time to analyze
> > > > >   and optimize
> > > > > - apply the patch if the performance loss is acceptable compared
> > to
> > > > >   the added value of fixing a bug
> > > > >
> > > [...]
> > 
> > Statu quo...
> > 
> > Olivier
> > 
> 
> The decision should be simple:
> 
> Does the DPDK project support segmented packets?
> If yes, then apply the patch to fix the bug!
> 
> If anyone seriously cares about the regression it introduces, optimization patches are welcome later. We shouldn't wait for it.

You're right, but the regression is flagged to a 4-years old patch,
that's why I don't consider it as urgent.

> If the patch is not applied, the documentation must be updated to mention that we are releasing DPDK with a known bug: that segmented packets are handled incorrectly in the scenario described in this patch.

Yes, would be good to document the known issue,
no matter how old it is.

> Generally, there could be some performance to gain by not supporting segmented packets at all, as a compile time option. But that is a different discussion.
> 
> 
> -Morten




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-07-30 14:54                   ` Thomas Monjalon
@ 2021-07-30 15:14                     ` Olivier Matz
  2021-07-30 15:23                       ` Morten Brørup
  2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
  2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
  1 sibling, 2 replies; 74+ messages in thread
From: Olivier Matz @ 2021-07-30 15:14 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Morten Brørup, Ali Alnubani, David Marchand,
	Alexander Kozyrev, Slava Ovsiienko, dev, Ferruh Yigit,
	zhaoyan.chen, Andrew Rybchenko, Ananyev, Konstantin,
	ajitkhaparde, dpdk stable, Ajit Khaparde

Hi,

On Fri, Jul 30, 2021 at 04:54:05PM +0200, Thomas Monjalon wrote:
> 30/07/2021 16:35, Morten Brørup:
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Friday, 30 July 2021 14.37
> > > 
> > > Hi Thomas,
> > > 
> > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > What's the follow-up for this patch?
> > > 
> > > Unfortunatly, I still don't have the time to work on this topic yet.
> > > 
> > > In my initial tests, in our lab, I didn't notice any performance
> > > regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> > > much in percent).
> > > 
> > > 
> > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > Hi, All
> > > > >
> > > > > Could we postpose this patch at least to rc2? We would like to
> > > conduct more investigations?
> > > > >
> > > > > With best regards, Slava
> > > > >
> > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > > > > Hi,
> > > > > > > (Sorry had to resend this to some recipients due to mail server
> > > problems).
> > > > > > >
> > > > > > > Just confirming that I can still reproduce the regression with
> > > single core and
> > > > > > 64B frames on other servers.
> > > > > >
> > > > > > Many thanks for the feedback. Can you please detail what is the
> > > amount of
> > > > > > performance loss in percent, and confirm the test case? (I
> > > suppose it is
> > > > > > testpmd io forward).
> > > > > >
> > > > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> > > (sorry for
> > > > > > that). So I see at least these 2 options:
> > > > > >
> > > > > > - postpone the patch again, until I can find more time to analyze
> > > > > >   and optimize
> > > > > > - apply the patch if the performance loss is acceptable compared
> > > to
> > > > > >   the added value of fixing a bug
> > > > > >
> > > > [...]
> > > 
> > > Statu quo...
> > > 
> > > Olivier
> > > 
> > 
> > The decision should be simple:
> > 
> > Does the DPDK project support segmented packets?
> > If yes, then apply the patch to fix the bug!
> > 
> > If anyone seriously cares about the regression it introduces, optimization patches are welcome later. We shouldn't wait for it.
> 
> You're right, but the regression is flagged to a 4-years old patch,
> that's why I don't consider it as urgent.
> 
> > If the patch is not applied, the documentation must be updated to mention that we are releasing DPDK with a known bug: that segmented packets are handled incorrectly in the scenario described in this patch.
> 
> Yes, would be good to document the known issue,
> no matter how old it is.

The problem description could be something like this:

  It is expected that free mbufs have their field m->nb_seg set to 1, so
  that when it is allocated, the user does not need to set its
  value. The mbuf free functions are responsible of resetting this field
  to 1 before returning the mbuf to the pool.

  When a multi-segment mbuf is freed, the m->nb_seg field is not reset
  to 1 for the last segment of the chain. On next allocation of this
  segment, if the field is not explicitly reset by the user, an invalid
  mbuf can be created, and can cause an undefined behavior.


> > Generally, there could be some performance to gain by not supporting segmented packets at all, as a compile time option. But that is a different discussion.
> > 
> > 
> > -Morten
> 
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-07-30 15:14                     ` Olivier Matz
@ 2021-07-30 15:23                       ` Morten Brørup
  2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
  1 sibling, 0 replies; 74+ messages in thread
From: Morten Brørup @ 2021-07-30 15:23 UTC (permalink / raw)
  To: Olivier Matz, Thomas Monjalon
  Cc: Ali Alnubani, David Marchand, Alexander Kozyrev, Slava Ovsiienko,
	dev, Ferruh Yigit, zhaoyan.chen, Andrew Rybchenko, Ananyev,
	Konstantin, ajitkhaparde, dpdk stable, Ajit Khaparde

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, 30 July 2021 17.15
> 
> Hi,
> 
> On Fri, Jul 30, 2021 at 04:54:05PM +0200, Thomas Monjalon wrote:
> > 30/07/2021 16:35, Morten Brørup:
> > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > Sent: Friday, 30 July 2021 14.37
> > > >
> > > > Hi Thomas,
> > > >
> > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > > What's the follow-up for this patch?
> > > >
> > > > Unfortunatly, I still don't have the time to work on this topic
> yet.
> > > >
> > > > In my initial tests, in our lab, I didn't notice any performance
> > > > regression, but Ali has seen an impact (0.5M PPS, but I don't
> know how
> > > > much in percent).
> > > >
> > > >
> > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > Hi, All
> > > > > >
> > > > > > Could we postpose this patch at least to rc2? We would like
> to
> > > > conduct more investigations?
> > > > > >
> > > > > > With best regards, Slava
> > > > > >
> > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani
> wrote:
> > > > > > > > Hi,
> > > > > > > > (Sorry had to resend this to some recipients due to mail
> server
> > > > problems).
> > > > > > > >
> > > > > > > > Just confirming that I can still reproduce the regression
> with
> > > > single core and
> > > > > > > 64B frames on other servers.
> > > > > > >
> > > > > > > Many thanks for the feedback. Can you please detail what is
> the
> > > > amount of
> > > > > > > performance loss in percent, and confirm the test case? (I
> > > > suppose it is
> > > > > > > testpmd io forward).
> > > > > > >
> > > > > > > Unfortunatly, I won't be able to spend a lot of time on
> this soon
> > > > (sorry for
> > > > > > > that). So I see at least these 2 options:
> > > > > > >
> > > > > > > - postpone the patch again, until I can find more time to
> analyze
> > > > > > >   and optimize
> > > > > > > - apply the patch if the performance loss is acceptable
> compared
> > > > to
> > > > > > >   the added value of fixing a bug
> > > > > > >
> > > > > [...]
> > > >
> > > > Statu quo...
> > > >
> > > > Olivier
> > > >
> > >
> > > The decision should be simple:
> > >
> > > Does the DPDK project support segmented packets?
> > > If yes, then apply the patch to fix the bug!
> > >
> > > If anyone seriously cares about the regression it introduces,
> optimization patches are welcome later. We shouldn't wait for it.
> >
> > You're right, but the regression is flagged to a 4-years old patch,
> > that's why I don't consider it as urgent.
> >
> > > If the patch is not applied, the documentation must be updated to
> mention that we are releasing DPDK with a known bug: that segmented
> packets are handled incorrectly in the scenario described in this
> patch.
> >
> > Yes, would be good to document the known issue,
> > no matter how old it is.
> 
> The problem description could be something like this:
> 
>   It is expected that free mbufs have their field m->nb_seg set to 1,
> so
>   that when it is allocated, the user does not need to set its
>   value. The mbuf free functions are responsible of resetting this
> field
>   to 1 before returning the mbuf to the pool.
> 
>   When a multi-segment mbuf is freed, the m->nb_seg field is not reset
>   to 1 for the last segment of the chain. On next allocation of this
>   segment, if the field is not explicitly reset by the user, an invalid
>   mbuf can be created, and can cause an undefined behavior.
> 

And it needs to be put somewhere very prominent if we expect the users to read it.

Would adding an RTE_VERIFY() - instead of fixing the bug - cause a regression? If not, then any affected user will know what went wrong and where. This would still be an improvement, if the bugfix patch cannot be applied.

> 
> > > Generally, there could be some performance to gain by not
> supporting segmented packets at all, as a compile time option. But that
> is a different discussion.
> > >
> > >
> > > -Morten
> >
> >
> >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [dpdk-dev] [PATCH] doc: add known issue with mbuf segment
  2021-07-30 15:14                     ` Olivier Matz
  2021-07-30 15:23                       ` Morten Brørup
@ 2021-08-04 13:29                       ` Thomas Monjalon
  2021-08-04 14:25                         ` Ajit Khaparde
  2021-08-05  6:08                         ` Morten Brørup
  1 sibling, 2 replies; 74+ messages in thread
From: Thomas Monjalon @ 2021-08-04 13:29 UTC (permalink / raw)
  To: dev
  Cc: mb, alialnu, david.marchand, akozyrev, viacheslavo, ferruh.yigit,
	andrew.rybchenko, konstantin.ananyev, ajit.khaparde,
	Olivier Matz

A bug with segmented packets has been discovered but the agreement
to apply the fix is not concluded at the time of DPDK 21.08 release.
This bug seems to be in DPDK for many years and should be fixed in 21.11.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/rel_notes/release_21_08.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index d7559ec6bf..14e32585b8 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -251,6 +251,18 @@ Known Issues
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* **Last mbuf segment not implicitly reset.**
+
+  It is expected that free mbufs have their field ``nb_seg`` set to 1,
+  so that when it is allocated, the user does not need to set its value.
+  The mbuf free functions are responsible of resetting this field to 1
+  before returning the mbuf to the pool.
+
+  When a multi-segment mbuf is freed, the field ``nb_seg`` is not reset
+  to 1 for the last segment of the chain. On next allocation of this segment,
+  if the field is not explicitly reset by the user,
+  an invalid mbuf can be created, and can cause an undefined behavior.
+
 
 Tested Platforms
 ----------------
-- 
2.31.1


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: add known issue with mbuf segment
  2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
@ 2021-08-04 14:25                         ` Ajit Khaparde
  2021-08-05  6:08                         ` Morten Brørup
  1 sibling, 0 replies; 74+ messages in thread
From: Ajit Khaparde @ 2021-08-04 14:25 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dpdk-dev, Morten Brørup, Ali Alnubani, David Marchand,
	Alexander Kozyrev, Slava Ovsiienko, Ferruh Yigit,
	Andrew Rybchenko, Ananyev, Konstantin, Olivier Matz

[-- Attachment #1: Type: text/plain, Size: 1624 bytes --]

On Wed, Aug 4, 2021 at 6:30 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> A bug with segmented packets has been discovered but the agreement
> to apply the fix is not concluded at the time of DPDK 21.08 release.
> This bug seems to be in DPDK for many years and should be fixed in 21.11.
>
> Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

> ---
>  doc/guides/rel_notes/release_21_08.rst | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
> index d7559ec6bf..14e32585b8 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -251,6 +251,18 @@ Known Issues
>     Also, make sure to start the actual text at the margin.
>     =======================================================
>
> +* **Last mbuf segment not implicitly reset.**
> +
> +  It is expected that free mbufs have their field ``nb_seg`` set to 1,
> +  so that when it is allocated, the user does not need to set its value.
> +  The mbuf free functions are responsible of resetting this field to 1
> +  before returning the mbuf to the pool.
> +
> +  When a multi-segment mbuf is freed, the field ``nb_seg`` is not reset
> +  to 1 for the last segment of the chain. On next allocation of this segment,
> +  if the field is not explicitly reset by the user,
> +  an invalid mbuf can be created, and can cause an undefined behavior.
> +
>
>  Tested Platforms
>  ----------------
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: add known issue with mbuf segment
  2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
  2021-08-04 14:25                         ` Ajit Khaparde
@ 2021-08-05  6:08                         ` Morten Brørup
  2021-08-06 14:21                           ` Thomas Monjalon
  1 sibling, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-08-05  6:08 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: alialnu, david.marchand, akozyrev, viacheslavo, ferruh.yigit,
	andrew.rybchenko, konstantin.ananyev, ajit.khaparde,
	Olivier Matz

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Wednesday, 4 August 2021 15.30
> 
> A bug with segmented packets has been discovered but the agreement
> to apply the fix is not concluded at the time of DPDK 21.08 release.
> This bug seems to be in DPDK for many years and should be fixed in
> 21.11.
> 
> Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  doc/guides/rel_notes/release_21_08.rst | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_21_08.rst
> b/doc/guides/rel_notes/release_21_08.rst
> index d7559ec6bf..14e32585b8 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -251,6 +251,18 @@ Known Issues
>     Also, make sure to start the actual text at the margin.
>     =======================================================
> 
> +* **Last mbuf segment not implicitly reset.**
> +
> +  It is expected that free mbufs have their field ``nb_seg`` set to 1,
> +  so that when it is allocated, the user does not need to set its
> value.
> +  The mbuf free functions are responsible of resetting this field to 1
> +  before returning the mbuf to the pool.
> +
> +  When a multi-segment mbuf is freed, the field ``nb_seg`` is not
> reset
> +  to 1 for the last segment of the chain. On next allocation of this
> segment,
> +  if the field is not explicitly reset by the user,
> +  an invalid mbuf can be created, and can cause an undefined behavior.
> +
> 

I have a couple of suggestions:

1. Add that this bug was present in previous DPDK releases for many years. If possible, mention which release introduced the bug.

We don't want users to choose to stay on older release, thinking that it does not have this bug.

Also, if a user experiences unexplainable problems with segmented packets an older DPDK release, he might get lucky and find these release notes, which could reveal the root cause of the problems he is experiencing.

2. Add a link to the patch that fixes the bug. It will let users affected by the bug easily fix it (possibly at the expense of degraded performance).

With those two additions,
Acked-by: Morten Brørup <mb@smartsharesystems.com>

-Morten

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: add known issue with mbuf segment
  2021-08-05  6:08                         ` Morten Brørup
@ 2021-08-06 14:21                           ` Thomas Monjalon
  2021-08-06 14:24                             ` Morten Brørup
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Monjalon @ 2021-08-06 14:21 UTC (permalink / raw)
  To: Olivier Matz, Morten Brørup
  Cc: dev, alialnu, david.marchand, akozyrev, viacheslavo,
	ferruh.yigit, andrew.rybchenko, konstantin.ananyev,
	ajit.khaparde

05/08/2021 08:08, Morten Brørup:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Wednesday, 4 August 2021 15.30
> > 
> > A bug with segmented packets has been discovered but the agreement
> > to apply the fix is not concluded at the time of DPDK 21.08 release.
> > This bug seems to be in DPDK for many years and should be fixed in
> > 21.11.
> > 
> > Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> >  doc/guides/rel_notes/release_21_08.rst | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/doc/guides/rel_notes/release_21_08.rst
> > b/doc/guides/rel_notes/release_21_08.rst
> > index d7559ec6bf..14e32585b8 100644
> > --- a/doc/guides/rel_notes/release_21_08.rst
> > +++ b/doc/guides/rel_notes/release_21_08.rst
> > @@ -251,6 +251,18 @@ Known Issues
> >     Also, make sure to start the actual text at the margin.
> >     =======================================================
> > 
> > +* **Last mbuf segment not implicitly reset.**
> > +
> > +  It is expected that free mbufs have their field ``nb_seg`` set to 1,
> > +  so that when it is allocated, the user does not need to set its
> > value.
> > +  The mbuf free functions are responsible of resetting this field to 1
> > +  before returning the mbuf to the pool.
> > +
> > +  When a multi-segment mbuf is freed, the field ``nb_seg`` is not
> > reset
> > +  to 1 for the last segment of the chain. On next allocation of this
> > segment,
> > +  if the field is not explicitly reset by the user,
> > +  an invalid mbuf can be created, and can cause an undefined behavior.
> > +
> > 
> 
> I have a couple of suggestions:
> 
> 1. Add that this bug was present in previous DPDK releases for many years. If possible, mention which release introduced the bug.
> 
> We don't want users to choose to stay on older release, thinking that it does not have this bug.
> 
> Also, if a user experiences unexplainable problems with segmented packets an older DPDK release, he might get lucky and find these release notes, which could reveal the root cause of the problems he is experiencing.
> 
> 2. Add a link to the patch that fixes the bug. It will let users affected by the bug easily fix it (possibly at the expense of degraded performance).
> 
> With those two additions,
> Acked-by: Morten Brørup <mb@smartsharesystems.com>

Applied with the suggested additions:

+  This issue has a root cause in DPDK 17.05, meaning it is 4 years old.
+  A fix is available and discussed but not merged in DPDK 21.08:
+  https://patches.dpdk.org/patch/86458/




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH] doc: add known issue with mbuf segment
  2021-08-06 14:21                           ` Thomas Monjalon
@ 2021-08-06 14:24                             ` Morten Brørup
  0 siblings, 0 replies; 74+ messages in thread
From: Morten Brørup @ 2021-08-06 14:24 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier Matz
  Cc: dev, alialnu, david.marchand, akozyrev, viacheslavo,
	ferruh.yigit, andrew.rybchenko, konstantin.ananyev,
	ajit.khaparde

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Friday, 6 August 2021 16.21
> 
> 05/08/2021 08:08, Morten Brørup:
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> Monjalon
> > > Sent: Wednesday, 4 August 2021 15.30
> > >
> > > A bug with segmented packets has been discovered but the agreement
> > > to apply the fix is not concluded at the time of DPDK 21.08
> release.
> > > This bug seems to be in DPDK for many years and should be fixed in
> > > 21.11.
> > >
> > > Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> > > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > > ---
> > >  doc/guides/rel_notes/release_21_08.rst | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/release_21_08.rst
> > > b/doc/guides/rel_notes/release_21_08.rst
> > > index d7559ec6bf..14e32585b8 100644
> > > --- a/doc/guides/rel_notes/release_21_08.rst
> > > +++ b/doc/guides/rel_notes/release_21_08.rst
> > > @@ -251,6 +251,18 @@ Known Issues
> > >     Also, make sure to start the actual text at the margin.
> > >     =======================================================
> > >
> > > +* **Last mbuf segment not implicitly reset.**
> > > +
> > > +  It is expected that free mbufs have their field ``nb_seg`` set
> to 1,
> > > +  so that when it is allocated, the user does not need to set its
> > > value.
> > > +  The mbuf free functions are responsible of resetting this field
> to 1
> > > +  before returning the mbuf to the pool.
> > > +
> > > +  When a multi-segment mbuf is freed, the field ``nb_seg`` is not
> > > reset
> > > +  to 1 for the last segment of the chain. On next allocation of
> this
> > > segment,
> > > +  if the field is not explicitly reset by the user,
> > > +  an invalid mbuf can be created, and can cause an undefined
> behavior.
> > > +
> > >
> >
> > I have a couple of suggestions:
> >
> > 1. Add that this bug was present in previous DPDK releases for many
> years. If possible, mention which release introduced the bug.
> >
> > We don't want users to choose to stay on older release, thinking that
> it does not have this bug.
> >
> > Also, if a user experiences unexplainable problems with segmented
> packets an older DPDK release, he might get lucky and find these
> release notes, which could reveal the root cause of the problems he is
> experiencing.
> >
> > 2. Add a link to the patch that fixes the bug. It will let users
> affected by the bug easily fix it (possibly at the expense of degraded
> performance).
> >
> > With those two additions,
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> Applied with the suggested additions:
> 
> +  This issue has a root cause in DPDK 17.05, meaning it is 4 years
> old.
> +  A fix is available and discussed but not merged in DPDK 21.08:
> +  https://patches.dpdk.org/patch/86458/
> 

Perfect! Thanks.

-Morten

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-07-30 14:54                   ` Thomas Monjalon
  2021-07-30 15:14                     ` Olivier Matz
@ 2021-09-28  8:28                     ` Thomas Monjalon
  2021-09-28  9:00                       ` Slava Ovsiienko
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Monjalon @ 2021-09-28  8:28 UTC (permalink / raw)
  To: Olivier Matz, Ali Alnubani, Slava Ovsiienko
  Cc: Morten Brørup, dev, David Marchand, Alexander Kozyrev,
	Ferruh Yigit, zhaoyan.chen, Andrew Rybchenko, Ananyev,
	Konstantin, Ajit Khaparde, jerinj

Follow-up again:
We have added a note in 21.08, we should fix it in 21.11.
If there are no counter proposal, I suggest applying this patch,
no matter the performance regression.


30/07/2021 16:54, Thomas Monjalon:
> 30/07/2021 16:35, Morten Brørup:
> > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Sent: Friday, 30 July 2021 14.37
> > > 
> > > Hi Thomas,
> > > 
> > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > What's the follow-up for this patch?
> > > 
> > > Unfortunatly, I still don't have the time to work on this topic yet.
> > > 
> > > In my initial tests, in our lab, I didn't notice any performance
> > > regression, but Ali has seen an impact (0.5M PPS, but I don't know how
> > > much in percent).
> > > 
> > > 
> > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > Hi, All
> > > > >
> > > > > Could we postpose this patch at least to rc2? We would like to
> > > conduct more investigations?
> > > > >
> > > > > With best regards, Slava
> > > > >
> > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > > > > Hi,
> > > > > > > (Sorry had to resend this to some recipients due to mail server
> > > problems).
> > > > > > >
> > > > > > > Just confirming that I can still reproduce the regression with
> > > single core and
> > > > > > 64B frames on other servers.
> > > > > >
> > > > > > Many thanks for the feedback. Can you please detail what is the
> > > amount of
> > > > > > performance loss in percent, and confirm the test case? (I
> > > suppose it is
> > > > > > testpmd io forward).
> > > > > >
> > > > > > Unfortunatly, I won't be able to spend a lot of time on this soon
> > > (sorry for
> > > > > > that). So I see at least these 2 options:
> > > > > >
> > > > > > - postpone the patch again, until I can find more time to analyze
> > > > > >   and optimize
> > > > > > - apply the patch if the performance loss is acceptable compared
> > > to
> > > > > >   the added value of fixing a bug
> > > > > >
> > > > [...]
> > > 
> > > Statu quo...
> > > 
> > > Olivier
> > > 
> > 
> > The decision should be simple:
> > 
> > Does the DPDK project support segmented packets?
> > If yes, then apply the patch to fix the bug!
> > 
> > If anyone seriously cares about the regression it introduces, optimization patches are welcome later. We shouldn't wait for it.
> 
> You're right, but the regression is flagged to a 4-years old patch,
> that's why I don't consider it as urgent.
> 
> > If the patch is not applied, the documentation must be updated to mention that we are releasing DPDK with a known bug: that segmented packets are handled incorrectly in the scenario described in this patch.
> 
> Yes, would be good to document the known issue,
> no matter how old it is.
> 
> > Generally, there could be some performance to gain by not supporting segmented packets at all, as a compile time option. But that is a different discussion.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
@ 2021-09-28  9:00                       ` Slava Ovsiienko
  2021-09-28  9:25                         ` Ananyev, Konstantin
  2021-09-28  9:39                         ` Morten Brørup
  0 siblings, 2 replies; 74+ messages in thread
From: Slava Ovsiienko @ 2021-09-28  9:00 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon, Olivier Matz, Ali Alnubani
  Cc: Morten Brørup, dev, David Marchand, Alexander Kozyrev,
	Ferruh Yigit, zhaoyan.chen, Andrew Rybchenko, Ananyev,
	Konstantin, Ajit Khaparde, jerinj

Hi, 

I've re-read the entire thread.
If I understand correctly, the root problem was (in initial patch):

>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
>
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 segment 
> (this is not required), after this code the mbuf chain have 3 
> segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
>
The proposed fix was to ALWAYS set next and nb_seg fields on mbuf_free(),
regardless next field content. That would perform unconditional write
to mbuf, and might affect the configurations, where are no multi-segment
packets at al. mbuf_free() is "backbone" API, it is used by all cases, all
scenaries are affected.

As far as I know, the current approach for nb_seg field - it contains other
value than 1 only in the first mbuf , for the following segments,  it should
not be considered at all (only the first segment fields are valid), and it is
supposed to contain 1, as it was initially allocated from the pool.

In the example above the problem was introduced by
rte_pktmbuf_chain(). Could we consider fixing the rte_pktmbuf_chain()
(used in potentially fewer common sceneries)  instead of touching
the extremely common rte_mbuf_free() ?

With best regards,
Slava

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, September 28, 2021 11:29
> To: Olivier Matz <olivier.matz@6wind.com>; Ali Alnubani
> <alialnu@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; David
> Marchand <david.marchand@redhat.com>; Alexander Kozyrev
> <akozyrev@nvidia.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> zhaoyan.chen@intel.com; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; jerinj@marvell.com
> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf
> free
> 
> Follow-up again:
> We have added a note in 21.08, we should fix it in 21.11.
> If there are no counter proposal, I suggest applying this patch, no matter the
> performance regression.
> 
> 
> 30/07/2021 16:54, Thomas Monjalon:
> > 30/07/2021 16:35, Morten Brørup:
> > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > Sent: Friday, 30 July 2021 14.37
> > > >
> > > > Hi Thomas,
> > > >
> > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > > What's the follow-up for this patch?
> > > >
> > > > Unfortunatly, I still don't have the time to work on this topic yet.
> > > >
> > > > In my initial tests, in our lab, I didn't notice any performance
> > > > regression, but Ali has seen an impact (0.5M PPS, but I don't know
> > > > how much in percent).
> > > >
> > > >
> > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > Hi, All
> > > > > >
> > > > > > Could we postpose this patch at least to rc2? We would like to
> > > > conduct more investigations?
> > > > > >
> > > > > > With best regards, Slava
> > > > > >
> > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > > > > > Hi,
> > > > > > > > (Sorry had to resend this to some recipients due to mail
> > > > > > > > server
> > > > problems).
> > > > > > > >
> > > > > > > > Just confirming that I can still reproduce the regression
> > > > > > > > with
> > > > single core and
> > > > > > > 64B frames on other servers.
> > > > > > >
> > > > > > > Many thanks for the feedback. Can you please detail what is
> > > > > > > the
> > > > amount of
> > > > > > > performance loss in percent, and confirm the test case? (I
> > > > suppose it is
> > > > > > > testpmd io forward).
> > > > > > >
> > > > > > > Unfortunatly, I won't be able to spend a lot of time on this
> > > > > > > soon
> > > > (sorry for
> > > > > > > that). So I see at least these 2 options:
> > > > > > >
> > > > > > > - postpone the patch again, until I can find more time to analyze
> > > > > > >   and optimize
> > > > > > > - apply the patch if the performance loss is acceptable
> > > > > > > compared
> > > > to
> > > > > > >   the added value of fixing a bug
> > > > > > >
> > > > > [...]
> > > >
> > > > Statu quo...
> > > >
> > > > Olivier
> > > >
> > >
> > > The decision should be simple:
> > >
> > > Does the DPDK project support segmented packets?
> > > If yes, then apply the patch to fix the bug!
> > >
> > > If anyone seriously cares about the regression it introduces, optimization
> patches are welcome later. We shouldn't wait for it.
> >
> > You're right, but the regression is flagged to a 4-years old patch,
> > that's why I don't consider it as urgent.
> >
> > > If the patch is not applied, the documentation must be updated to
> mention that we are releasing DPDK with a known bug: that segmented
> packets are handled incorrectly in the scenario described in this patch.
> >
> > Yes, would be good to document the known issue, no matter how old it
> > is.
> >
> > > Generally, there could be some performance to gain by not supporting
> segmented packets at all, as a compile time option. But that is a different
> discussion.
> 
> 
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-28  9:00                       ` Slava Ovsiienko
@ 2021-09-28  9:25                         ` Ananyev, Konstantin
  2021-09-28  9:39                         ` Morten Brørup
  1 sibling, 0 replies; 74+ messages in thread
From: Ananyev, Konstantin @ 2021-09-28  9:25 UTC (permalink / raw)
  To: Slava Ovsiienko, NBU-Contact-Thomas Monjalon, Olivier Matz, Ali Alnubani
  Cc: Morten Brørup, dev, David Marchand, Alexander Kozyrev,
	Yigit, Ferruh, Chen, Zhaoyan, Andrew Rybchenko, Ajit Khaparde,
	jerinj


> 
> Hi,
> 
> I've re-read the entire thread.
> If I understand correctly, the root problem was (in initial patch):
> 
> >   m1 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m1, 500);
> >   m2 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m2, 500);
> >   rte_pktmbuf_chain(m1, m2);
> >   m0 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m0, 500);
> >   rte_pktmbuf_chain(m0, m1);
> >
> > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 segment
> > (this is not required), after this code the mbuf chain have 3
> > segments:
> >   - m0: next=m1, nb_seg=3
> >   - m1: next=m2, nb_seg=2
> >   - m2: next=NULL, nb_seg=1
> >
> The proposed fix was to ALWAYS set next and nb_seg fields on mbuf_free(),
> regardless next field content. That would perform unconditional write
> to mbuf, 

I don't think it is a correct understanding see below.

Current code:
if (m->next != NULL) {
       m->next = NULL;
      m->nb_segs = 1;
}

Proposed code:
if (m->next != NULL)
     m->next = NULL;
if (m->nb_segs != 1)
    m->nb_segs = 1;

So what this patch adds: one more load and compare.
Note that load is from the first mbuf cache line, which
already has to be in the L1 cache by that time.

As I remember the reported slowdown is really tiny.
My vote would be to go ahead with this patch.

> and might affect the configurations, where are no multi-segment
> packets at al. mbuf_free() is "backbone" API, it is used by all cases, all
> scenaries are affected.
> 
> As far as I know, the current approach for nb_seg field - it contains other
> value than 1 only in the first mbuf , for the following segments,  it should
> not be considered at all (only the first segment fields are valid), and it is
> supposed to contain 1, as it was initially allocated from the pool.
> 
> In the example above the problem was introduced by
> rte_pktmbuf_chain(). Could we consider fixing the rte_pktmbuf_chain()
> (used in potentially fewer common sceneries)  instead of touching
> the extremely common rte_mbuf_free() ?
> 
> With best regards,
> Slava
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, September 28, 2021 11:29
> > To: Olivier Matz <olivier.matz@6wind.com>; Ali Alnubani
> > <alialnu@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; David
> > Marchand <david.marchand@redhat.com>; Alexander Kozyrev
> > <akozyrev@nvidia.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > zhaoyan.chen@intel.com; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Ajit Khaparde
> > <ajit.khaparde@broadcom.com>; jerinj@marvell.com
> > Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf
> > free
> >
> > Follow-up again:
> > We have added a note in 21.08, we should fix it in 21.11.
> > If there are no counter proposal, I suggest applying this patch, no matter the
> > performance regression.
> >
> >
> > 30/07/2021 16:54, Thomas Monjalon:
> > > 30/07/2021 16:35, Morten Brørup:
> > > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > > Sent: Friday, 30 July 2021 14.37
> > > > >
> > > > > Hi Thomas,
> > > > >
> > > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon wrote:
> > > > > > What's the follow-up for this patch?
> > > > >
> > > > > Unfortunatly, I still don't have the time to work on this topic yet.
> > > > >
> > > > > In my initial tests, in our lab, I didn't notice any performance
> > > > > regression, but Ali has seen an impact (0.5M PPS, but I don't know
> > > > > how much in percent).
> > > > >
> > > > >
> > > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > > Hi, All
> > > > > > >
> > > > > > > Could we postpose this patch at least to rc2? We would like to
> > > > > conduct more investigations?
> > > > > > >
> > > > > > > With best regards, Slava
> > > > > > >
> > > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani wrote:
> > > > > > > > > Hi,
> > > > > > > > > (Sorry had to resend this to some recipients due to mail
> > > > > > > > > server
> > > > > problems).
> > > > > > > > >
> > > > > > > > > Just confirming that I can still reproduce the regression
> > > > > > > > > with
> > > > > single core and
> > > > > > > > 64B frames on other servers.
> > > > > > > >
> > > > > > > > Many thanks for the feedback. Can you please detail what is
> > > > > > > > the
> > > > > amount of
> > > > > > > > performance loss in percent, and confirm the test case? (I
> > > > > suppose it is
> > > > > > > > testpmd io forward).
> > > > > > > >
> > > > > > > > Unfortunatly, I won't be able to spend a lot of time on this
> > > > > > > > soon
> > > > > (sorry for
> > > > > > > > that). So I see at least these 2 options:
> > > > > > > >
> > > > > > > > - postpone the patch again, until I can find more time to analyze
> > > > > > > >   and optimize
> > > > > > > > - apply the patch if the performance loss is acceptable
> > > > > > > > compared
> > > > > to
> > > > > > > >   the added value of fixing a bug
> > > > > > > >
> > > > > > [...]
> > > > >
> > > > > Statu quo...
> > > > >
> > > > > Olivier
> > > > >
> > > >
> > > > The decision should be simple:
> > > >
> > > > Does the DPDK project support segmented packets?
> > > > If yes, then apply the patch to fix the bug!
> > > >
> > > > If anyone seriously cares about the regression it introduces, optimization
> > patches are welcome later. We shouldn't wait for it.
> > >
> > > You're right, but the regression is flagged to a 4-years old patch,
> > > that's why I don't consider it as urgent.
> > >
> > > > If the patch is not applied, the documentation must be updated to
> > mention that we are releasing DPDK with a known bug: that segmented
> > packets are handled incorrectly in the scenario described in this patch.
> > >
> > > Yes, would be good to document the known issue, no matter how old it
> > > is.
> > >
> > > > Generally, there could be some performance to gain by not supporting
> > segmented packets at all, as a compile time option. But that is a different
> > discussion.
> >
> >
> >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-28  9:00                       ` Slava Ovsiienko
  2021-09-28  9:25                         ` Ananyev, Konstantin
@ 2021-09-28  9:39                         ` Morten Brørup
  2021-09-29  8:03                           ` Ali Alnubani
  1 sibling, 1 reply; 74+ messages in thread
From: Morten Brørup @ 2021-09-28  9:39 UTC (permalink / raw)
  To: Slava Ovsiienko, NBU-Contact-Thomas Monjalon, Olivier Matz, Ali Alnubani
  Cc: dev, David Marchand, Alexander Kozyrev, Ferruh Yigit,
	zhaoyan.chen, Andrew Rybchenko, Ananyev, Konstantin,
	Ajit Khaparde, jerinj

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> Sent: Tuesday, 28 September 2021 11.01
> 
> Hi,
> 
> I've re-read the entire thread.
> If I understand correctly, the root problem was (in initial patch):
> 
> >   m1 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m1, 500);
> >   m2 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m2, 500);
> >   rte_pktmbuf_chain(m1, m2);
> >   m0 = rte_pktmbuf_alloc(mp);
> >   rte_pktmbuf_append(m0, 500);
> >   rte_pktmbuf_chain(m0, m1);
> >
> > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment
> > (this is not required), after this code the mbuf chain have 3
> > segments:
> >   - m0: next=m1, nb_seg=3
> >   - m1: next=m2, nb_seg=2
> >   - m2: next=NULL, nb_seg=1
> >
> The proposed fix was to ALWAYS set next and nb_seg fields on
> mbuf_free(),
> regardless next field content. That would perform unconditional write
> to mbuf, and might affect the configurations, where are no multi-
> segment
> packets at al. mbuf_free() is "backbone" API, it is used by all cases,
> all
> scenaries are affected.
> 
> As far as I know, the current approach for nb_seg field - it contains
> other
> value than 1 only in the first mbuf , for the following segments,  it
> should
> not be considered at all (only the first segment fields are valid), and
> it is
> supposed to contain 1, as it was initially allocated from the pool.
> 
> In the example above the problem was introduced by
> rte_pktmbuf_chain(). Could we consider fixing the rte_pktmbuf_chain()
> (used in potentially fewer common sceneries)  instead of touching
> the extremely common rte_mbuf_free() ?
> 
> With best regards,
> Slava

Great idea, Slava!

Changing the invariant for 'nb_segs', so it must be 1, except in the first segment of a segmented packet.

Thinking further about it, perhaps we can achieve even higher performance by a minor additional modification: Use 0 instead of 1? Or offset 'nb_segs' by -1, so it reflects the number of additional segments?

And perhaps combining the invariants for 'nb_segs' and 'next' could provide even more performance improvements. I don't know, just sharing a thought.

Anyway, I vote for fixing the bug. One way or the other!

-Morten

> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday, September 28, 2021 11:29
> >
> > Follow-up again:
> > We have added a note in 21.08, we should fix it in 21.11.
> > If there are no counter proposal, I suggest applying this patch, no
> matter the
> > performance regression.
> >
> >
> > 30/07/2021 16:54, Thomas Monjalon:
> > > 30/07/2021 16:35, Morten Brørup:
> > > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > > Sent: Friday, 30 July 2021 14.37
> > > > >
> > > > > Hi Thomas,
> > > > >
> > > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon
> wrote:
> > > > > > What's the follow-up for this patch?
> > > > >
> > > > > Unfortunatly, I still don't have the time to work on this topic
> yet.
> > > > >
> > > > > In my initial tests, in our lab, I didn't notice any
> performance
> > > > > regression, but Ali has seen an impact (0.5M PPS, but I don't
> know
> > > > > how much in percent).
> > > > >
> > > > >
> > > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > > Hi, All
> > > > > > >
> > > > > > > Could we postpose this patch at least to rc2? We would like
> to
> > > > > conduct more investigations?
> > > > > > >
> > > > > > > With best regards, Slava
> > > > > > >
> > > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani
> wrote:
> > > > > > > > > Hi,
> > > > > > > > > (Sorry had to resend this to some recipients due to
> mail
> > > > > > > > > server
> > > > > problems).
> > > > > > > > >
> > > > > > > > > Just confirming that I can still reproduce the
> regression
> > > > > > > > > with
> > > > > single core and
> > > > > > > > 64B frames on other servers.
> > > > > > > >
> > > > > > > > Many thanks for the feedback. Can you please detail what
> is
> > > > > > > > the
> > > > > amount of
> > > > > > > > performance loss in percent, and confirm the test case?
> (I
> > > > > suppose it is
> > > > > > > > testpmd io forward).
> > > > > > > >
> > > > > > > > Unfortunatly, I won't be able to spend a lot of time on
> this
> > > > > > > > soon
> > > > > (sorry for
> > > > > > > > that). So I see at least these 2 options:
> > > > > > > >
> > > > > > > > - postpone the patch again, until I can find more time to
> analyze
> > > > > > > >   and optimize
> > > > > > > > - apply the patch if the performance loss is acceptable
> > > > > > > > compared
> > > > > to
> > > > > > > >   the added value of fixing a bug
> > > > > > > >
> > > > > > [...]
> > > > >
> > > > > Statu quo...
> > > > >
> > > > > Olivier
> > > > >
> > > >
> > > > The decision should be simple:
> > > >
> > > > Does the DPDK project support segmented packets?
> > > > If yes, then apply the patch to fix the bug!
> > > >
> > > > If anyone seriously cares about the regression it introduces,
> optimization
> > patches are welcome later. We shouldn't wait for it.
> > >
> > > You're right, but the regression is flagged to a 4-years old patch,
> > > that's why I don't consider it as urgent.
> > >
> > > > If the patch is not applied, the documentation must be updated to
> > mention that we are releasing DPDK with a known bug: that segmented
> > packets are handled incorrectly in the scenario described in this
> patch.
> > >
> > > Yes, would be good to document the known issue, no matter how old
> it
> > > is.
> > >
> > > > Generally, there could be some performance to gain by not
> supporting
> > segmented packets at all, as a compile time option. But that is a
> different
> > discussion.
> >
> >
> >
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-28  9:39                         ` Morten Brørup
@ 2021-09-29  8:03                           ` Ali Alnubani
  2021-09-29 21:39                             ` Olivier Matz
  0 siblings, 1 reply; 74+ messages in thread
From: Ali Alnubani @ 2021-09-29  8:03 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, David Marchand, Alexander Kozyrev,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit, Slava Ovsiienko,
	zhaoyan.chen, Morten Brørup, Andrew Rybchenko, Ananyev,
	Konstantin, Ajit Khaparde, jerinj

Hi Olivier,

I wanted to retest the patch on latest main, but it no longer applies, could you please rebase it?

Thanks,
Ali

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Tuesday, September 28, 2021 12:40 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> Monjalon <thomas@monjalon.net>; Olivier Matz <olivier.matz@6wind.com>;
> Ali Alnubani <alialnu@nvidia.com>
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Alexander
> Kozyrev <akozyrev@nvidia.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> zhaoyan.chen@intel.com; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; jerinj@marvell.com
> Subject: RE: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> > Sent: Tuesday, 28 September 2021 11.01
> >
> > Hi,
> >
> > I've re-read the entire thread.
> > If I understand correctly, the root problem was (in initial patch):
> >
> > >   m1 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m1, 500);
> > >   m2 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m2, 500);
> > >   rte_pktmbuf_chain(m1, m2);
> > >   m0 = rte_pktmbuf_alloc(mp);
> > >   rte_pktmbuf_append(m0, 500);
> > >   rte_pktmbuf_chain(m0, m1);
> > >
> > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > segment
> > > (this is not required), after this code the mbuf chain have 3
> > > segments:
> > >   - m0: next=m1, nb_seg=3
> > >   - m1: next=m2, nb_seg=2
> > >   - m2: next=NULL, nb_seg=1
> > >
> > The proposed fix was to ALWAYS set next and nb_seg fields on
> > mbuf_free(), regardless next field content. That would perform
> > unconditional write to mbuf, and might affect the configurations,
> > where are no multi- segment packets at al. mbuf_free() is "backbone"
> > API, it is used by all cases, all scenaries are affected.
> >
> > As far as I know, the current approach for nb_seg field - it contains
> > other value than 1 only in the first mbuf , for the following
> > segments,  it should not be considered at all (only the first segment
> > fields are valid), and it is supposed to contain 1, as it was
> > initially allocated from the pool.
> >
> > In the example above the problem was introduced by
> > rte_pktmbuf_chain(). Could we consider fixing the rte_pktmbuf_chain()
> > (used in potentially fewer common sceneries)  instead of touching the
> > extremely common rte_mbuf_free() ?
> >
> > With best regards,
> > Slava
> 
> Great idea, Slava!
> 
> Changing the invariant for 'nb_segs', so it must be 1, except in the first segment
> of a segmented packet.
> 
> Thinking further about it, perhaps we can achieve even higher performance by a
> minor additional modification: Use 0 instead of 1? Or offset 'nb_segs' by -1, so it
> reflects the number of additional segments?
> 
> And perhaps combining the invariants for 'nb_segs' and 'next' could provide even
> more performance improvements. I don't know, just sharing a thought.
> 
> Anyway, I vote for fixing the bug. One way or the other!
> 
> -Morten
> 
> >
> > > -----Original Message-----
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > Sent: Tuesday, September 28, 2021 11:29
> > >
> > > Follow-up again:
> > > We have added a note in 21.08, we should fix it in 21.11.
> > > If there are no counter proposal, I suggest applying this patch, no
> > matter the
> > > performance regression.
> > >
> > >
> > > 30/07/2021 16:54, Thomas Monjalon:
> > > > 30/07/2021 16:35, Morten Brørup:
> > > > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > > > Sent: Friday, 30 July 2021 14.37
> > > > > >
> > > > > > Hi Thomas,
> > > > > >
> > > > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon
> > wrote:
> > > > > > > What's the follow-up for this patch?
> > > > > >
> > > > > > Unfortunatly, I still don't have the time to work on this
> > > > > > topic
> > yet.
> > > > > >
> > > > > > In my initial tests, in our lab, I didn't notice any
> > performance
> > > > > > regression, but Ali has seen an impact (0.5M PPS, but I don't
> > know
> > > > > > how much in percent).
> > > > > >
> > > > > >
> > > > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > > > Hi, All
> > > > > > > >
> > > > > > > > Could we postpose this patch at least to rc2? We would
> > > > > > > > like
> > to
> > > > > > conduct more investigations?
> > > > > > > >
> > > > > > > > With best regards, Slava
> > > > > > > >
> > > > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani
> > wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > (Sorry had to resend this to some recipients due to
> > mail
> > > > > > > > > > server
> > > > > > problems).
> > > > > > > > > >
> > > > > > > > > > Just confirming that I can still reproduce the
> > regression
> > > > > > > > > > with
> > > > > > single core and
> > > > > > > > > 64B frames on other servers.
> > > > > > > > >
> > > > > > > > > Many thanks for the feedback. Can you please detail what
> > is
> > > > > > > > > the
> > > > > > amount of
> > > > > > > > > performance loss in percent, and confirm the test case?
> > (I
> > > > > > suppose it is
> > > > > > > > > testpmd io forward).
> > > > > > > > >
> > > > > > > > > Unfortunatly, I won't be able to spend a lot of time on
> > this
> > > > > > > > > soon
> > > > > > (sorry for
> > > > > > > > > that). So I see at least these 2 options:
> > > > > > > > >
> > > > > > > > > - postpone the patch again, until I can find more time
> > > > > > > > > to
> > analyze
> > > > > > > > >   and optimize
> > > > > > > > > - apply the patch if the performance loss is acceptable
> > > > > > > > > compared
> > > > > > to
> > > > > > > > >   the added value of fixing a bug
> > > > > > > > >
> > > > > > > [...]
> > > > > >
> > > > > > Statu quo...
> > > > > >
> > > > > > Olivier
> > > > > >
> > > > >
> > > > > The decision should be simple:
> > > > >
> > > > > Does the DPDK project support segmented packets?
> > > > > If yes, then apply the patch to fix the bug!
> > > > >
> > > > > If anyone seriously cares about the regression it introduces,
> > optimization
> > > patches are welcome later. We shouldn't wait for it.
> > > >
> > > > You're right, but the regression is flagged to a 4-years old
> > > > patch, that's why I don't consider it as urgent.
> > > >
> > > > > If the patch is not applied, the documentation must be updated
> > > > > to
> > > mention that we are releasing DPDK with a known bug: that segmented
> > > packets are handled incorrectly in the scenario described in this
> > patch.
> > > >
> > > > Yes, would be good to document the known issue, no matter how old
> > it
> > > > is.
> > > >
> > > > > Generally, there could be some performance to gain by not
> > supporting
> > > segmented packets at all, as a compile time option. But that is a
> > different
> > > discussion.
> > >
> > >
> > >
> >


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [dpdk-dev] [PATCH v5] mbuf: fix reset on mbuf free
  2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
  2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
  2021-06-15 13:56   ` [dpdk-dev] " Morten Brørup
@ 2021-09-29 21:37   ` Olivier Matz
  2021-09-30 13:27     ` Ali Alnubani
  2021-10-21  9:18     ` David Marchand
  2 siblings, 2 replies; 74+ messages in thread
From: Olivier Matz @ 2021-09-29 21:37 UTC (permalink / raw)
  To: dev
  Cc: ajit.khaparde, ajitkhaparde, alialnu, andrew.rybchenko,
	konstantin.ananyev, mb, stable, viacheslavo

m->nb_seg must be reset on mbuf free whatever the value of m->next,
because it can happen that m->nb_seg is != 1. For instance in this
case:

  m1 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m1, 500);
  m2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m2, 500);
  rte_pktmbuf_chain(m1, m2);
  m0 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_append(m0, 500);
  rte_pktmbuf_chain(m0, m1);

As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
segment (this is not required), after this code the mbuf chain
have 3 segments:
  - m0: next=m1, nb_seg=3
  - m1: next=m2, nb_seg=2
  - m2: next=NULL, nb_seg=1

Then split this chain between m1 and m2, it would result in 2 packets:
  - first packet
    - m0: next=m1, nb_seg=2
    - m1: next=NULL, nb_seg=2
  - second packet
    - m2: next=NULL, nb_seg=1

Freeing the first packet will not restore nb_seg=1 in the second
segment. This is an issue because it is expected that mbufs stored
in pool have their nb_seg field set to 1.

Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_mbuf.c     | 69 ++++++++++++++++++++++++++++++++++++++++
 lib/mbuf/rte_mbuf.c      |  4 +--
 lib/mbuf/rte_mbuf.h      |  8 ++---
 lib/mbuf/rte_mbuf_core.h | 13 ++++++--
 4 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 9a248dfaea..a4e32e933a 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -2684,6 +2684,70 @@ test_mbuf_dyn(struct rte_mempool *pktmbuf_pool)
 	return -1;
 }
 
+/* check that m->nb_segs and m->next are reset on mbuf free */
+static int
+test_nb_segs_and_next_reset(void)
+{
+	struct rte_mbuf *m0 = NULL, *m1 = NULL, *m2 = NULL;
+	struct rte_mempool *pool = NULL;
+
+	pool = rte_pktmbuf_pool_create("test_mbuf_reset",
+			3, 0, 0, MBUF_DATA_SIZE, SOCKET_ID_ANY);
+	if (pool == NULL)
+		GOTO_FAIL("Failed to create mbuf pool");
+
+	/* alloc mbufs */
+	m0 = rte_pktmbuf_alloc(pool);
+	m1 = rte_pktmbuf_alloc(pool);
+	m2 = rte_pktmbuf_alloc(pool);
+	if (m0 == NULL || m1 == NULL || m2 == NULL)
+		GOTO_FAIL("Failed to allocate mbuf");
+
+	/* append data in all of them */
+	if (rte_pktmbuf_append(m0, 500) == NULL ||
+			rte_pktmbuf_append(m1, 500) == NULL ||
+			rte_pktmbuf_append(m2, 500) == NULL)
+		GOTO_FAIL("Failed to append data in mbuf");
+
+	/* chain them in one mbuf m0 */
+	rte_pktmbuf_chain(m1, m2);
+	rte_pktmbuf_chain(m0, m1);
+	if (m0->nb_segs != 3 || m0->next != m1 || m1->next != m2 ||
+			m2->next != NULL) {
+		m1 = m2 = NULL;
+		GOTO_FAIL("Failed to chain mbufs");
+	}
+
+	/* split m0 chain in two, between m1 and m2 */
+	m0->nb_segs = 2;
+	m1->next = NULL;
+	m2->nb_segs = 1;
+
+	/* free the 2 mbuf chains m0 and m2  */
+	rte_pktmbuf_free(m0);
+	rte_pktmbuf_free(m2);
+
+	/* realloc the 3 mbufs */
+	m0 = rte_mbuf_raw_alloc(pool);
+	m1 = rte_mbuf_raw_alloc(pool);
+	m2 = rte_mbuf_raw_alloc(pool);
+	if (m0 == NULL || m1 == NULL || m2 == NULL)
+		GOTO_FAIL("Failed to reallocate mbuf");
+
+	/* ensure that m->next and m->nb_segs are reset allocated mbufs */
+	if (m0->nb_segs != 1 || m0->next != NULL ||
+			m1->nb_segs != 1 || m1->next != NULL ||
+			m2->nb_segs != 1 || m2->next != NULL)
+		GOTO_FAIL("nb_segs or next was not reset properly");
+
+	return 0;
+
+fail:
+	if (pool != NULL)
+		rte_mempool_free(pool);
+	return -1;
+}
+
 static int
 test_mbuf(void)
 {
@@ -2874,6 +2938,11 @@ test_mbuf(void)
 		goto err;
 	}
 
+	/* test reset of m->nb_segs and m->next on mbuf free */
+	if (test_nb_segs_and_next_reset() < 0) {
+		printf("test_nb_segs_and_next_reset() failed\n");
+		goto err;
+	}
 
 	ret = 0;
 err:
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index f7e3c1a187..f145cd800a 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -134,10 +134,10 @@ rte_pktmbuf_free_pinned_extmem(void *addr, void *opaque)
 
 	rte_mbuf_ext_refcnt_set(m->shinfo, 1);
 	m->ol_flags = EXT_ATTACHED_MBUF;
-	if (m->next != NULL) {
+	if (m->next != NULL)
 		m->next = NULL;
+	if (m->nb_segs != 1)
 		m->nb_segs = 1;
-	}
 	rte_mbuf_raw_free(m);
 }
 
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index a555f216ae..dded3dd70c 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -1346,10 +1346,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 
 		return m;
 
@@ -1363,10 +1363,10 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 				return NULL;
 		}
 
-		if (m->next != NULL) {
+		if (m->next != NULL)
 			m->next = NULL;
+		if (m->nb_segs != 1)
 			m->nb_segs = 1;
-		}
 		rte_mbuf_refcnt_set(m, 1);
 
 		return m;
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9d8e3ddc86..9ef807daaa 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -508,7 +508,12 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
 	uint16_t refcnt;
-	uint16_t nb_segs;         /**< Number of segments. */
+
+	/**
+	 * Number of segments. Only valid for the first segment of an mbuf
+	 * chain.
+	 */
+	uint16_t nb_segs;
 
 	/** Input port (16 bits to support more than 256 virtual ports).
 	 * The event eth Tx adapter uses this field to specify the output port.
@@ -604,7 +609,11 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+	/**
+	 * Next segment of scattered packet. Must be NULL in the last segment or
+	 * in case of non-segmented packet.
+	 */
+	struct rte_mbuf *next;
 
 	/* fields to support TX offloads */
 	RTE_STD_C11
-- 
2.30.2


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-29  8:03                           ` Ali Alnubani
@ 2021-09-29 21:39                             ` Olivier Matz
  2021-09-30 13:29                               ` Ali Alnubani
  0 siblings, 1 reply; 74+ messages in thread
From: Olivier Matz @ 2021-09-29 21:39 UTC (permalink / raw)
  To: Ali Alnubani
  Cc: dev, David Marchand, Alexander Kozyrev,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit, Slava Ovsiienko,
	zhaoyan.chen, Morten Brørup, Andrew Rybchenko, Ananyev,
	Konstantin, Ajit Khaparde, jerinj

Hi Ali,


On Wed, Sep 29, 2021 at 08:03:17AM +0000, Ali Alnubani wrote:
> Hi Olivier,
> 
> I wanted to retest the patch on latest main, but it no longer applies, could you please rebase it?

I rebased the patch:
https://patchwork.dpdk.org/project/dpdk/patch/20210929213707.17727-1-olivier.matz@6wind.com/

Thanks,
Olivier

> 
> Thanks,
> Ali
> 
> > -----Original Message-----
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Tuesday, September 28, 2021 12:40 PM
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-Contact-Thomas
> > Monjalon <thomas@monjalon.net>; Olivier Matz <olivier.matz@6wind.com>;
> > Ali Alnubani <alialnu@nvidia.com>
> > Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>; Alexander
> > Kozyrev <akozyrev@nvidia.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > zhaoyan.chen@intel.com; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Ajit Khaparde
> > <ajit.khaparde@broadcom.com>; jerinj@marvell.com
> > Subject: RE: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
> > 
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> > > Sent: Tuesday, 28 September 2021 11.01
> > >
> > > Hi,
> > >
> > > I've re-read the entire thread.
> > > If I understand correctly, the root problem was (in initial patch):
> > >
> > > >   m1 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m1, 500);
> > > >   m2 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m2, 500);
> > > >   rte_pktmbuf_chain(m1, m2);
> > > >   m0 = rte_pktmbuf_alloc(mp);
> > > >   rte_pktmbuf_append(m0, 500);
> > > >   rte_pktmbuf_chain(m0, m1);
> > > >
> > > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> > > segment
> > > > (this is not required), after this code the mbuf chain have 3
> > > > segments:
> > > >   - m0: next=m1, nb_seg=3
> > > >   - m1: next=m2, nb_seg=2
> > > >   - m2: next=NULL, nb_seg=1
> > > >
> > > The proposed fix was to ALWAYS set next and nb_seg fields on
> > > mbuf_free(), regardless next field content. That would perform
> > > unconditional write to mbuf, and might affect the configurations,
> > > where are no multi- segment packets at al. mbuf_free() is "backbone"
> > > API, it is used by all cases, all scenaries are affected.
> > >
> > > As far as I know, the current approach for nb_seg field - it contains
> > > other value than 1 only in the first mbuf , for the following
> > > segments,  it should not be considered at all (only the first segment
> > > fields are valid), and it is supposed to contain 1, as it was
> > > initially allocated from the pool.
> > >
> > > In the example above the problem was introduced by
> > > rte_pktmbuf_chain(). Could we consider fixing the rte_pktmbuf_chain()
> > > (used in potentially fewer common sceneries)  instead of touching the
> > > extremely common rte_mbuf_free() ?
> > >
> > > With best regards,
> > > Slava
> > 
> > Great idea, Slava!
> > 
> > Changing the invariant for 'nb_segs', so it must be 1, except in the first segment
> > of a segmented packet.
> > 
> > Thinking further about it, perhaps we can achieve even higher performance by a
> > minor additional modification: Use 0 instead of 1? Or offset 'nb_segs' by -1, so it
> > reflects the number of additional segments?
> > 
> > And perhaps combining the invariants for 'nb_segs' and 'next' could provide even
> > more performance improvements. I don't know, just sharing a thought.
> > 
> > Anyway, I vote for fixing the bug. One way or the other!
> > 
> > -Morten
> > 
> > >
> > > > -----Original Message-----
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > Sent: Tuesday, September 28, 2021 11:29
> > > >
> > > > Follow-up again:
> > > > We have added a note in 21.08, we should fix it in 21.11.
> > > > If there are no counter proposal, I suggest applying this patch, no
> > > matter the
> > > > performance regression.
> > > >
> > > >
> > > > 30/07/2021 16:54, Thomas Monjalon:
> > > > > 30/07/2021 16:35, Morten Brørup:
> > > > > > > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > > > > > Sent: Friday, 30 July 2021 14.37
> > > > > > >
> > > > > > > Hi Thomas,
> > > > > > >
> > > > > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon
> > > wrote:
> > > > > > > > What's the follow-up for this patch?
> > > > > > >
> > > > > > > Unfortunatly, I still don't have the time to work on this
> > > > > > > topic
> > > yet.
> > > > > > >
> > > > > > > In my initial tests, in our lab, I didn't notice any
> > > performance
> > > > > > > regression, but Ali has seen an impact (0.5M PPS, but I don't
> > > know
> > > > > > > how much in percent).
> > > > > > >
> > > > > > >
> > > > > > > > 19/01/2021 15:04, Slava Ovsiienko:
> > > > > > > > > Hi, All
> > > > > > > > >
> > > > > > > > > Could we postpose this patch at least to rc2? We would
> > > > > > > > > like
> > > to
> > > > > > > conduct more investigations?
> > > > > > > > >
> > > > > > > > > With best regards, Slava
> > > > > > > > >
> > > > > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +0000, Ali Alnubani
> > > wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > (Sorry had to resend this to some recipients due to
> > > mail
> > > > > > > > > > > server
> > > > > > > problems).
> > > > > > > > > > >
> > > > > > > > > > > Just confirming that I can still reproduce the
> > > regression
> > > > > > > > > > > with
> > > > > > > single core and
> > > > > > > > > > 64B frames on other servers.
> > > > > > > > > >
> > > > > > > > > > Many thanks for the feedback. Can you please detail what
> > > is
> > > > > > > > > > the
> > > > > > > amount of
> > > > > > > > > > performance loss in percent, and confirm the test case?
> > > (I
> > > > > > > suppose it is
> > > > > > > > > > testpmd io forward).
> > > > > > > > > >
> > > > > > > > > > Unfortunatly, I won't be able to spend a lot of time on
> > > this
> > > > > > > > > > soon
> > > > > > > (sorry for
> > > > > > > > > > that). So I see at least these 2 options:
> > > > > > > > > >
> > > > > > > > > > - postpone the patch again, until I can find more time
> > > > > > > > > > to
> > > analyze
> > > > > > > > > >   and optimize
> > > > > > > > > > - apply the patch if the performance loss is acceptable
> > > > > > > > > > compared
> > > > > > > to
> > > > > > > > > >   the added value of fixing a bug
> > > > > > > > > >
> > > > > > > > [...]
> > > > > > >
> > > > > > > Statu quo...
> > > > > > >
> > > > > > > Olivier
> > > > > > >
> > > > > >
> > > > > > The decision should be simple:
> > > > > >
> > > > > > Does the DPDK project support segmented packets?
> > > > > > If yes, then apply the patch to fix the bug!
> > > > > >
> > > > > > If anyone seriously cares about the regression it introduces,
> > > optimization
> > > > patches are welcome later. We shouldn't wait for it.
> > > > >
> > > > > You're right, but the regression is flagged to a 4-years old
> > > > > patch, that's why I don't consider it as urgent.
> > > > >
> > > > > > If the patch is not applied, the documentation must be updated
> > > > > > to
> > > > mention that we are releasing DPDK with a known bug: that segmented
> > > > packets are handled incorrectly in the scenario described in this
> > > patch.
> > > > >
> > > > > Yes, would be good to document the known issue, no matter how old
> > > it
> > > > > is.
> > > > >
> > > > > > Generally, there could be some performance to gain by not
> > > supporting
> > > > segmented packets at all, as a compile time option. But that is a
> > > different
> > > > discussion.
> > > >
> > > >
> > > >
> > >
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v5] mbuf: fix reset on mbuf free
  2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
@ 2021-09-30 13:27     ` Ali Alnubani
  2021-10-21  9:18     ` David Marchand
  1 sibling, 0 replies; 74+ messages in thread
From: Ali Alnubani @ 2021-09-30 13:27 UTC (permalink / raw)
  To: Olivier Matz, dev
  Cc: ajit.khaparde, ajitkhaparde, andrew.rybchenko,
	konstantin.ananyev, mb, stable, Slava Ovsiienko

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Thursday, September 30, 2021 12:37 AM
> To: dev@dpdk.org
> Cc: ajit.khaparde@broadcom.com; ajitkhaparde@gmail.com; Ali Alnubani
> <alialnu@nvidia.com>; andrew.rybchenko@oktetlabs.ru;
> konstantin.ananyev@intel.com; mb@smartsharesystems.com;
> stable@dpdk.org; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH v5] mbuf: fix reset on mbuf free
> 
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
> 
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
> 
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 segment (this
> is not required), after this code the mbuf chain have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
> 
> Then split this chain between m1 and m2, it would result in 2 packets:
>   - first packet
>     - m0: next=m1, nb_seg=2
>     - m1: next=NULL, nb_seg=2
>   - second packet
>     - m2: next=NULL, nb_seg=1
> 
> Freeing the first packet will not restore nb_seg=1 in the second segment.
> This is an issue because it is expected that mbufs stored in pool have their
> nb_seg field set to 1.
> 
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---

Tested-by: Ali Alnubani <alialnu@nvidia.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-29 21:39                             ` Olivier Matz
@ 2021-09-30 13:29                               ` Ali Alnubani
  2021-10-21  8:26                                 ` Thomas Monjalon
  0 siblings, 1 reply; 74+ messages in thread
From: Ali Alnubani @ 2021-09-30 13:29 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, David Marchand, Alexander Kozyrev,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit, Slava Ovsiienko,
	zhaoyan.chen, Morten Brørup, Andrew Rybchenko, Ananyev,
	Konstantin, Ajit Khaparde, jerinj

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Thursday, September 30, 2021 12:39 AM
> To: Ali Alnubani <alialnu@nvidia.com>
> Cc: dev@dpdk.org; David Marchand <david.marchand@redhat.com>;
> Alexander Kozyrev <akozyrev@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; zhaoyan.chen@intel.com; Morten
> Brørup <mb@smartsharesystems.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; jerinj@marvell.com
> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf
> free
> 
> Hi Ali,
> 
> 
> On Wed, Sep 29, 2021 at 08:03:17AM +0000, Ali Alnubani wrote:
> > Hi Olivier,
> >
> > I wanted to retest the patch on latest main, but it no longer applies, could
> you please rebase it?
> 
> I rebased the patch:
> https://patchwork.dpdk.org/project/dpdk/patch/20210929213707.17727-1-
> olivier.matz@6wind.com/
> 

Thanks for the update,
I tested v5 on main (84b3e4555a1f) and the regression we previously saw no longer reproduces.

Will the patch be backported to 20.11? I can still reproduce a 2% drop in single core packet forwarding performance with v4 applied on v20.11.3.

Testpmd: dpdk-testpmd -n 4  -a 0000:82:00.0  -a 0000:82:00.1 -c 0x9000  -- -i  --nb-cores=1  --txd=256 --rxd=256 --auto-start
Traffic generator is TREX with pattern (Ether / IP / UDP).
Drop is from ~50.5 mpps to ~49.5 mpps.

OS: 18.04.5 LTS
Device: ConnectX-5
MLNX_OFED: MLNX_OFED_LINUX-5.4-1.0.3.0
Firmware: 16.31.1014

Regards,
Ali

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
  2021-09-30 13:29                               ` Ali Alnubani
@ 2021-10-21  8:26                                 ` Thomas Monjalon
  0 siblings, 0 replies; 74+ messages in thread
From: Thomas Monjalon @ 2021-10-21  8:26 UTC (permalink / raw)
  To: Olivier Matz, Ali Alnubani
  Cc: dev, David Marchand, Alexander Kozyrev, Ferruh Yigit,
	Slava Ovsiienko, zhaoyan.chen, Morten Brørup,
	Andrew Rybchenko, Ananyev, Konstantin, Ajit Khaparde, jerinj

30/09/2021 15:29, Ali Alnubani:
> From: Olivier Matz <olivier.matz@6wind.com>
> > On Wed, Sep 29, 2021 at 08:03:17AM +0000, Ali Alnubani wrote:
> > > Hi Olivier,
> > >
> > > I wanted to retest the patch on latest main, but it no longer applies, could
> > you please rebase it?
> > 
> > I rebased the patch:
> > https://patchwork.dpdk.org/project/dpdk/patch/20210929213707.17727-1-
> > olivier.matz@6wind.com/
> 
> Thanks for the update,
> I tested v5 on main (84b3e4555a1f) and the regression we previously saw no longer reproduces.
> 
> Will the patch be backported to 20.11? I can still reproduce a 2% drop in single core packet forwarding performance with v4 applied on v20.11.3.

As any significant fix, we must backport, or at least, try.
If there is a real issue in a stable branch, we'll discuss it
in the test&release process of this branch.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [dpdk-dev] [PATCH v5] mbuf: fix reset on mbuf free
  2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
  2021-09-30 13:27     ` Ali Alnubani
@ 2021-10-21  9:18     ` David Marchand
  2022-07-28 14:06       ` CI performance test results might be misleading Morten Brørup
  1 sibling, 1 reply; 74+ messages in thread
From: David Marchand @ 2021-10-21  9:18 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Ajit Khaparde, ajitkhaparde, Ali Alnubani, Andrew Rybchenko,
	Ananyev, Konstantin, Morten Brørup, dpdk stable,
	Slava Ovsiienko

On Wed, Sep 29, 2021 at 11:37 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> m->nb_seg must be reset on mbuf free whatever the value of m->next,
> because it can happen that m->nb_seg is != 1. For instance in this
> case:
>
>   m1 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m1, 500);
>   m2 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m2, 500);
>   rte_pktmbuf_chain(m1, m2);
>   m0 = rte_pktmbuf_alloc(mp);
>   rte_pktmbuf_append(m0, 500);
>   rte_pktmbuf_chain(m0, m1);
>
> As rte_pktmbuf_chain() does not reset nb_seg in the initial m1
> segment (this is not required), after this code the mbuf chain
> have 3 segments:
>   - m0: next=m1, nb_seg=3
>   - m1: next=m2, nb_seg=2
>   - m2: next=NULL, nb_seg=1
>
> Then split this chain between m1 and m2, it would result in 2 packets:
>   - first packet
>     - m0: next=m1, nb_seg=2
>     - m1: next=NULL, nb_seg=2
>   - second packet
>     - m2: next=NULL, nb_seg=1
>
> Freeing the first packet will not restore nb_seg=1 in the second
> segment. This is an issue because it is expected that mbufs stored
> in pool have their nb_seg field set to 1.
>
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: stable@dpdk.org
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>

Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 74+ messages in thread

* CI performance test results might be misleading
  2021-10-21  9:18     ` David Marchand
@ 2022-07-28 14:06       ` Morten Brørup
  0 siblings, 0 replies; 74+ messages in thread
From: Morten Brørup @ 2022-07-28 14:06 UTC (permalink / raw)
  To: David Marchand, Olivier Matz
  Cc: dev, Ajit Khaparde, ajitkhaparde, Ali Alnubani, Andrew Rybchenko,
	Slava Ovsiienko, Konstantin Ananyev, dts, Thomas Monjalon,
	Bruce Richardson, Stephen Hemminger

I know it's too late for this patch [1], but I am afraid it was performance tested using test-pmd with RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE [2], so the test didn't involve the function fixed by this patch: rte_pktmbuf_prefree_seg().

If so, the performance test results were completely irrelevant. The same goes for other patches related to code bypassed by RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE.

RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE is a performance optimization with many application limitations, and should not be enabled for performance testing in the CI environment. Alternatively, performance testing can be performed both with and without this optimization.

[1] https://patchwork.dpdk.org/project/dpdk/patch/20210113132734.1636-1-olivier.matz@6wind.com/
[2] http://git.dpdk.org/dpdk/commit/app/test-pmd/testpmd.c?id=07e5f7bd65718461e4c63757248b9c7bab08341f


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2022-07-28 14:07 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04 17:00 [dpdk-dev] [PATCH] mbuf: fix reset on mbuf free Olivier Matz
2020-11-05  0:15 ` Ananyev, Konstantin
2020-11-05  7:46   ` Olivier Matz
2020-11-05  8:26     ` Andrew Rybchenko
2020-11-05  9:10       ` Olivier Matz
2020-11-05 11:34         ` Ananyev, Konstantin
2020-11-05 12:31           ` Olivier Matz
2020-11-05 13:14             ` Ananyev, Konstantin
2020-11-05 13:24               ` Olivier Matz
2020-11-05 13:55                 ` Ananyev, Konstantin
2020-11-05 16:30                   ` Morten Brørup
2020-11-05 23:55                     ` Ananyev, Konstantin
2020-11-06  7:52                       ` Morten Brørup
2020-11-06  8:20                         ` Olivier Matz
2020-11-06  8:50                           ` Morten Brørup
2020-11-06 10:04                             ` Olivier Matz
2020-11-06 10:07                               ` Morten Brørup
2020-11-06 11:53                                 ` Ananyev, Konstantin
2020-11-06 12:23                                   ` Morten Brørup
2020-11-08 14:16                                     ` Andrew Rybchenko
2020-11-08 14:19                                       ` Ananyev, Konstantin
2020-11-10 16:26                                         ` Olivier Matz
2020-11-05  8:33     ` Morten Brørup
2020-11-05  9:03       ` Olivier Matz
2020-11-05  9:09     ` Andrew Rybchenko
2020-11-08  7:25 ` Ali Alnubani
2020-12-18 12:52 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2020-12-18 13:18   ` Morten Brørup
2020-12-18 23:33     ` Ajit Khaparde
2021-01-06 13:33 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2021-01-10  9:28   ` Ali Alnubani
2021-01-11 13:14   ` Ananyev, Konstantin
2021-01-13 13:27 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2021-01-15 13:59   ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-15 18:39     ` Ali Alnubani
2021-01-18 17:52       ` Ali Alnubani
2021-01-19  8:32         ` Olivier Matz
2021-01-19  8:53           ` Morten Brørup
2021-01-19 12:00             ` Ferruh Yigit
2021-01-19 12:27               ` Morten Brørup
2021-01-19 14:03                 ` Ferruh Yigit
2021-01-19 14:21                   ` Morten Brørup
2021-01-21  9:15                     ` Ferruh Yigit
2021-01-19 14:04           ` Slava Ovsiienko
2021-07-24  8:47             ` Thomas Monjalon
2021-07-30 12:36               ` Olivier Matz
2021-07-30 14:35                 ` Morten Brørup
2021-07-30 14:54                   ` Thomas Monjalon
2021-07-30 15:14                     ` Olivier Matz
2021-07-30 15:23                       ` Morten Brørup
2021-08-04 13:29                       ` [dpdk-dev] [PATCH] doc: add known issue with mbuf segment Thomas Monjalon
2021-08-04 14:25                         ` Ajit Khaparde
2021-08-05  6:08                         ` Morten Brørup
2021-08-06 14:21                           ` Thomas Monjalon
2021-08-06 14:24                             ` Morten Brørup
2021-09-28  8:28                     ` [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free Thomas Monjalon
2021-09-28  9:00                       ` Slava Ovsiienko
2021-09-28  9:25                         ` Ananyev, Konstantin
2021-09-28  9:39                         ` Morten Brørup
2021-09-29  8:03                           ` Ali Alnubani
2021-09-29 21:39                             ` Olivier Matz
2021-09-30 13:29                               ` Ali Alnubani
2021-10-21  8:26                                 ` Thomas Monjalon
2021-01-21  9:19       ` Ferruh Yigit
2021-01-21  9:29         ` Morten Brørup
2021-01-21 16:35           ` [dpdk-dev] [dpdklab] " Lincoln Lavoie
2021-01-23  8:57             ` Morten Brørup
2021-01-25 17:00               ` Brandon Lo
2021-01-25 18:42             ` Ferruh Yigit
2021-06-15 13:56   ` [dpdk-dev] " Morten Brørup
2021-09-29 21:37   ` [dpdk-dev] [PATCH v5] " Olivier Matz
2021-09-30 13:27     ` Ali Alnubani
2021-10-21  9:18     ` David Marchand
2022-07-28 14:06       ` CI performance test results might be misleading Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).