DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx
@ 2021-08-20 13:39 Tudor Cornea
  2021-09-01 16:34 ` Ferruh Yigit
  2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
  0 siblings, 2 replies; 14+ messages in thread
From: Tudor Cornea @ 2021-08-20 13:39 UTC (permalink / raw)
  To: linville; +Cc: thomas, dev, Tudor Cornea, Mihai Pogonaru

The poll call can return POLLERR which is ignored, or it can return
POLLOUT, even if there are no free frames in the mmap-ed area.

We can account for both of these cases by re-checking if the next
frame is empty before writing into it.

We also now eliminate the timestamp status from the frame status.

Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
---
 drivers/net/af_packet/rte_eth_af_packet.c | 47 +++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index b73b211..3845df5 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -167,6 +167,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	return num_rx;
 }
 
+static inline __u32 tx_ring_status_remove_ts(volatile __u32 *tp_status)
+{
+	return *tp_status &
+		~(TP_STATUS_TS_SOFTWARE | TP_STATUS_TS_RAW_HARDWARE);
+}
+
 /*
  * Callback to handle sending packets through a real NIC.
  */
@@ -211,9 +217,41 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 			}
 		}
 
+		/*
+		 * We must eliminate the timestamp status from the packet
+		 * status. This should only matter if timestamping is enabled
+		 * on the socket, but there is a BUG in the kernel which is
+		 * fixed in newer releases.
+
+		 * For interfaces of type 'veth', the sent skb is forwarded
+		 * to the peer and back into the network stack which timestamps
+		 * it on the RX path if timestamping is enabled globally
+		 * (which happens if any socket enables timestamping).
+
+		 * When the skb is destructed, tpacket_destruct_skb() is called
+		 * and it calls __packet_set_timestamp() which doesn't check
+		 * the flags on the socket and returns the timestamp if it is
+		 * set in the skb (and for veth it is, as mentioned above).
+		 */
+
 		/* point at the next incoming frame */
-		if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
-		    (poll(&pfd, 1, -1) < 0))
+		if ((tx_ring_status_remove_ts(&ppd->tp_status)
+			!= TP_STATUS_AVAILABLE) && (poll(&pfd, 1, -1) < 0))
+			break;
+
+		/*
+		 * Poll can return POLLERR if the interface is down or POLLOUT,
+		 * even if there are no extra buffers available.
+		 * This happens, because packet_poll() calls datagram_poll()
+		 * which checks the space left in the socket buffer and in the
+		 * case of packet_mmap the default socket buffer length
+		 * doesn't match the requested size for the tx_ring so there
+		 * is always space left in socket buffer, which doesn't seem
+		 * to be correlated to the requested size for the tx_ring
+		 * in packet_mmap.
+		 */
+		if (tx_ring_status_remove_ts(&ppd->tp_status)
+			!= TP_STATUS_AVAILABLE)
 			break;
 
 		/* copy the tx frame data */
@@ -242,6 +280,11 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		rte_pktmbuf_free(mbuf);
 	}
 
+	/*
+	 * We might have to ignore a few more errnos here since the packets
+	 * remain in the mmap-ed queue and will be sent later, presumably.
+	 */
+
 	/* kick-off transmits */
 	if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1 &&
 			errno != ENOBUFS && errno != EAGAIN) {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx
  2021-08-20 13:39 [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx Tudor Cornea
@ 2021-09-01 16:34 ` Ferruh Yigit
  2021-09-06 10:23   ` Tudor Cornea
  2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
  1 sibling, 1 reply; 14+ messages in thread
From: Ferruh Yigit @ 2021-09-01 16:34 UTC (permalink / raw)
  To: Tudor Cornea, linville; +Cc: thomas, dev, Mihai Pogonaru

On 8/20/2021 2:39 PM, Tudor Cornea wrote:
> The poll call can return POLLERR which is ignored, or it can return
> POLLOUT, even if there are no free frames in the mmap-ed area.
> 
> We can account for both of these cases by re-checking if the next
> frame is empty before writing into it.
> 
> We also now eliminate the timestamp status from the frame status.
> 

Hi Tudor,

Would you mind separate timestamp status fix to its own patch? I think better to
fix 'ignoring full Tx ring' first, to not make it dependent to timestamp patch.

> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 47 +++++++++++++++++++++++++++++--
>  1 file changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
> index b73b211..3845df5 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -167,6 +167,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  	return num_rx;
>  }
>  
> +static inline __u32 tx_ring_status_remove_ts(volatile __u32 *tp_status)
> +{
> +	return *tp_status &
> +		~(TP_STATUS_TS_SOFTWARE | TP_STATUS_TS_RAW_HARDWARE);

I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the kernel
versions the bug exists, this flag is not set, but can you please confirm?

> +}
> +
>  /*
>   * Callback to handle sending packets through a real NIC.
>   */
> @@ -211,9 +217,41 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  			}
>  		}
>  
> +		/*
> +		 * We must eliminate the timestamp status from the packet
> +		 * status. This should only matter if timestamping is enabled
> +		 * on the socket, but there is a BUG in the kernel which is
> +		 * fixed in newer releases.
> +
> +		 * For interfaces of type 'veth', the sent skb is forwarded
> +		 * to the peer and back into the network stack which timestamps
> +		 * it on the RX path if timestamping is enabled globally
> +		 * (which happens if any socket enables timestamping).
> +
> +		 * When the skb is destructed, tpacket_destruct_skb() is called
> +		 * and it calls __packet_set_timestamp() which doesn't check
> +		 * the flags on the socket and returns the timestamp if it is
> +		 * set in the skb (and for veth it is, as mentioned above).
> +		 */
> +

Can you give some more details on this bug, any link etc..

And does it only seen with veth, if so I wonder if we can ignore it, not sure
how common to use af_packet PMD over veth interface, do you have this usecase?

And if only specific kernel versions impacted from this bug, what do you think
about adding kernel version check to reduce the scope of the fix in af_packet PMD.

>  		/* point at the next incoming frame */
> -		if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
> -		    (poll(&pfd, 1, -1) < 0))
> +		if ((tx_ring_status_remove_ts(&ppd->tp_status)
> +			!= TP_STATUS_AVAILABLE) && (poll(&pfd, 1, -1) < 0))
> +			break;
> +
> +		/*
> +		 * Poll can return POLLERR if the interface is down or POLLOUT,
> +		 * even if there are no extra buffers available.
> +		 * This happens, because packet_poll() calls datagram_poll()
> +		 * which checks the space left in the socket buffer and in the
> +		 * case of packet_mmap the default socket buffer length
> +		 * doesn't match the requested size for the tx_ring so there
> +		 * is always space left in socket buffer, which doesn't seem
> +		 * to be correlated to the requested size for the tx_ring
> +		 * in packet_mmap.
> +		 */
> +		if (tx_ring_status_remove_ts(&ppd->tp_status)
> +			!= TP_STATUS_AVAILABLE)
>  			break;

In this case should we break or poll again?

If 'POLLERR' is received when interface is down, it makes sense to break. Do you
know if is there any other case that 'POLLERR' is returned?

And for 'POLLOUT', when exactly this event sent? If 'POLLOUT' received, can we
poll again to wait Tx ring available to send more packets?

>  
>  		/* copy the tx frame data */
> @@ -242,6 +280,11 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  		rte_pktmbuf_free(mbuf);
>  	}
>  
> +	/*
> +	 * We might have to ignore a few more errnos here since the packets
> +	 * remain in the mmap-ed queue and will be sent later, presumably.
> +	 */
> +

Can you please describe above comment more? What errors ignored?
And won't all packets sent will below sendto() call?

>  	/* kick-off transmits */
>  	if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1 &&
>  			errno != ENOBUFS && errno != EAGAIN) {
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx
  2021-09-01 16:34 ` Ferruh Yigit
@ 2021-09-06 10:23   ` Tudor Cornea
  2021-09-20 17:11     ` Ferruh Yigit
  0 siblings, 1 reply; 14+ messages in thread
From: Tudor Cornea @ 2021-09-06 10:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: linville, thomas, dev, Mihai Pogonaru

Hi Ferruh,

Would you mind separate timestamp status fix to its own patch? I think
> better to
> fix 'ignoring full Tx ring' first, to not make it dependent to timestamp
> patch.


Agreed. There are two issues solved by this patch. We will break it in two
different patches.

I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the
> kernel
> versions the bug exists, this flag is not set, but can you please confirm?


And does it only seen with veth, if so I wonder if we can ignore it, not
> sure
> how common to use af_packet PMD over veth interface, do you have this
> usecase?


We've seen the timestamping issue only when running af_packet over
veth interfaces. We have a particular use-case internally, in which we need
to run inside a Kubernetes cluster.
We've found the following resources [1] , [2] related to this behavior in
the kernel.

We believe that issue #2 (the ring getting full), can theoretically occur
on any type of NIC.
We managed to reproduce the bursty behavior on af_packet PMD over vmxnet3
interface, by Tx-ing packets at a low rate (e.g ~340 pps), and toggling the
interface on / off
ifconfig $iface_name down; sleep 10; ifconfig $iface_name up

We will attempt to give more context on the issue below, about what we
think happens:
- we have a 2048 queue shared between the kernel and the dpdk socket,
there's an index the queue in both the kernel and the dpdk driver
- the dpdk driver writes a packet or a burst, advances its idx and tells
the kernel to send the packets via a call to sendto() and the kernel sends
the packets and advances its idx
- once the interface is down the kernel can no longer send packets, but it
doesn't drop them, it just doesn't advance its idx
- for each packet there is header and in the header there is a status
integer which, among others, indicates the owner of the packet: the
userspace or the kernel - the userspace (dpdk driver) sets the status as
owned by the kernel when it adds another packet ; the kernel sets the
status back to owned by the userspace once it sends a packet
- the dpdk driver was ignoring this status bit and,  even after the queue
was full, it would continue to put packets in the queue - its idx would be
"after" that of the kernel
- once the interface is brought up, the kernel would send all the packets
in the queue (since they have the status of being owned by the kernel) on
the next call to sendto() and the idx would be back to where it was before
the interface was brought up (let's call it k1)
- the dpdk driver idx into the queue would point somewhere in the queue
(let's call it d1) and would continue to add packets at that point, but the
kernel wouldn't send any packet anymore since there is now a gap of packets
owned by the userspace between the kernel index (k1) and the dpdk driver
idx (d1)
- the dpdk idx would eventually reach k1 and packets would be transferred
at a normal rate until both the dpdk idx and the kernel idx would reach d1
again
- between d1 and k1 there are only packets with the status as owned by the
kernel - which where added by the dpdk driver while its index was between
d1 and k1 ; thus the kernel would burst all the packets till k1, while the
dpdk idx is at d1
- the cycle repeats

If a new traffic config comes (in our application) while this cycle is
happening, it could be that some of the packets of the old config are still
in queue (between d1 and k1) and will be bursted when the dpdk and kernel
idx reach d1 ; this would explain seeing packets from an old config, but
only in the first 2048 packets (which is the queue size)


[1] https://www.spinics.net/lists/kernel/msg3959391.html
[2] https://www.spinics.net/lists/netdev/msg739372.html

On Wed, 1 Sept 2021 at 19:34, Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 8/20/2021 2:39 PM, Tudor Cornea wrote:
> > The poll call can return POLLERR which is ignored, or it can return
> > POLLOUT, even if there are no free frames in the mmap-ed area.
> >
> > We can account for both of these cases by re-checking if the next
> > frame is empty before writing into it.
> >
> > We also now eliminate the timestamp status from the frame status.
> >
>
> Hi Tudor,
>
> Would you mind separate timestamp status fix to its own patch? I think
> better to
> fix 'ignoring full Tx ring' first, to not make it dependent to timestamp
> patch.
>
> > Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> > ---
> >  drivers/net/af_packet/rte_eth_af_packet.c | 47
> +++++++++++++++++++++++++++++--
> >  1 file changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
> b/drivers/net/af_packet/rte_eth_af_packet.c
> > index b73b211..3845df5 100644
> > --- a/drivers/net/af_packet/rte_eth_af_packet.c
> > +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> > @@ -167,6 +167,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> >       return num_rx;
> >  }
> >
> > +static inline __u32 tx_ring_status_remove_ts(volatile __u32 *tp_status)
> > +{
> > +     return *tp_status &
> > +             ~(TP_STATUS_TS_SOFTWARE | TP_STATUS_TS_RAW_HARDWARE);
>
> I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the
> kernel
> versions the bug exists, this flag is not set, but can you please confirm?
>
> > +}
> > +
> >  /*
> >   * Callback to handle sending packets through a real NIC.
> >   */
> > @@ -211,9 +217,41 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> >                       }
> >               }
> >
> > +             /*
> > +              * We must eliminate the timestamp status from the packet
> > +              * status. This should only matter if timestamping is
> enabled
> > +              * on the socket, but there is a BUG in the kernel which is
> > +              * fixed in newer releases.
> > +
> > +              * For interfaces of type 'veth', the sent skb is forwarded
> > +              * to the peer and back into the network stack which
> timestamps
> > +              * it on the RX path if timestamping is enabled globally
> > +              * (which happens if any socket enables timestamping).
> > +
> > +              * When the skb is destructed, tpacket_destruct_skb() is
> called
> > +              * and it calls __packet_set_timestamp() which doesn't
> check
> > +              * the flags on the socket and returns the timestamp if it
> is
> > +              * set in the skb (and for veth it is, as mentioned above).
> > +              */
> > +
>
> Can you give some more details on this bug, any link etc..
>
> And does it only seen with veth, if so I wonder if we can ignore it, not
> sure
> how common to use af_packet PMD over veth interface, do you have this
> usecase?
>
> And if only specific kernel versions impacted from this bug, what do you
> think
> about adding kernel version check to reduce the scope of the fix in
> af_packet PMD.
>
> >               /* point at the next incoming frame */
> > -             if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
> > -                 (poll(&pfd, 1, -1) < 0))
> > +             if ((tx_ring_status_remove_ts(&ppd->tp_status)
> > +                     != TP_STATUS_AVAILABLE) && (poll(&pfd, 1, -1) < 0))
> > +                     break;
> > +
> > +             /*
> > +              * Poll can return POLLERR if the interface is down or
> POLLOUT,
> > +              * even if there are no extra buffers available.
> > +              * This happens, because packet_poll() calls
> datagram_poll()
> > +              * which checks the space left in the socket buffer and in
> the
> > +              * case of packet_mmap the default socket buffer length
> > +              * doesn't match the requested size for the tx_ring so
> there
> > +              * is always space left in socket buffer, which doesn't
> seem
> > +              * to be correlated to the requested size for the tx_ring
> > +              * in packet_mmap.
> > +              */
> > +             if (tx_ring_status_remove_ts(&ppd->tp_status)
> > +                     != TP_STATUS_AVAILABLE)
> >                       break;
>
> In this case should we break or poll again?
>
> If 'POLLERR' is received when interface is down, it makes sense to break.
> Do you
> know if is there any other case that 'POLLERR' is returned?
>
> And for 'POLLOUT', when exactly this event sent? If 'POLLOUT' received,
> can we
> poll again to wait Tx ring available to send more packets?
>
> >
> >               /* copy the tx frame data */
> > @@ -242,6 +280,11 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> >               rte_pktmbuf_free(mbuf);
> >       }
> >
> > +     /*
> > +      * We might have to ignore a few more errnos here since the packets
> > +      * remain in the mmap-ed queue and will be sent later, presumably.
> > +      */
> > +
>
> Can you please describe above comment more? What errors ignored?
> And won't all packets sent will below sendto() call?
>
> >       /* kick-off transmits */
> >       if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1 &&
> >                       errno != ENOBUFS && errno != EAGAIN) {
> >
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
  2021-08-20 13:39 [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx Tudor Cornea
  2021-09-01 16:34 ` Ferruh Yigit
@ 2021-09-13 13:45 ` Tudor Cornea
  2021-09-20 17:44   ` Ferruh Yigit
  2021-11-02 15:47   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
  1 sibling, 2 replies; 14+ messages in thread
From: Tudor Cornea @ 2021-09-13 13:45 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: linville, thomas, pogonarumihai, dev, Tudor Cornea

The poll call can return POLLERR which is ignored, or it can return
POLLOUT, even if there are no free frames in the mmap-ed area.

We can account for both of these cases by re-checking if the next
frame is empty before writing into it.

Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
---
 drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index b73b211..087c196 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		    (poll(&pfd, 1, -1) < 0))
 			break;
 
+		/*
+		 * Poll can return POLLERR if the interface is down
+		 *
+		 * It will almost always return POLLOUT, even if there
+		 * are no extra buffers available
+		 *
+		 * This happens, because packet_poll() calls datagram_poll()
+		 * which checks the space left in the socket buffer and,
+		 * in the case of packet_mmap, the default socket buffer length
+		 * doesn't match the requested size for the tx_ring.
+		 * As such, there is almost always space left in socket buffer,
+		 * which doesn't seem to be correlated to the requested size
+		 * for the tx_ring in packet_mmap.
+		 *
+		 * This results in poll() returning POLLOUT.
+		 */
+		if (ppd->tp_status != TP_STATUS_AVAILABLE)
+			break;
+
 		/* copy the tx frame data */
 		pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
 			sizeof(struct sockaddr_ll);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx
  2021-09-06 10:23   ` Tudor Cornea
@ 2021-09-20 17:11     ` Ferruh Yigit
  0 siblings, 0 replies; 14+ messages in thread
From: Ferruh Yigit @ 2021-09-20 17:11 UTC (permalink / raw)
  To: Tudor Cornea; +Cc: linville, thomas, dev, Mihai Pogonaru

On 9/6/2021 11:23 AM, Tudor Cornea wrote:
> Hi Ferruh,
> 
> Would you mind separate timestamp status fix to its own patch? I think
>> better to
>> fix 'ignoring full Tx ring' first, to not make it dependent to timestamp
>> patch.
> 
> 
> Agreed. There are two issues solved by this patch. We will break it in two
> different patches.
> 
> I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the
>> kernel
>> versions the bug exists, this flag is not set, but can you please confirm?
> 
> 
> And does it only seen with veth, if so I wonder if we can ignore it, not
>> sure
>> how common to use af_packet PMD over veth interface, do you have this
>> usecase?
> 
> 
> We've seen the timestamping issue only when running af_packet over
> veth interfaces. We have a particular use-case internally, in which we need
> to run inside a Kubernetes cluster.
> We've found the following resources [1] , [2] related to this behavior in
> the kernel.
> 
> We believe that issue #2 (the ring getting full), can theoretically occur
> on any type of NIC.
> We managed to reproduce the bursty behavior on af_packet PMD over vmxnet3
> interface, by Tx-ing packets at a low rate (e.g ~340 pps), and toggling the
> interface on / off
> ifconfig $iface_name down; sleep 10; ifconfig $iface_name up
> 
> We will attempt to give more context on the issue below, about what we
> think happens:
> - we have a 2048 queue shared between the kernel and the dpdk socket,
> there's an index the queue in both the kernel and the dpdk driver
> - the dpdk driver writes a packet or a burst, advances its idx and tells
> the kernel to send the packets via a call to sendto() and the kernel sends
> the packets and advances its idx
> - once the interface is down the kernel can no longer send packets, but it
> doesn't drop them, it just doesn't advance its idx
> - for each packet there is header and in the header there is a status
> integer which, among others, indicates the owner of the packet: the
> userspace or the kernel - the userspace (dpdk driver) sets the status as
> owned by the kernel when it adds another packet ; the kernel sets the
> status back to owned by the userspace once it sends a packet
> - the dpdk driver was ignoring this status bit and,  even after the queue
> was full, it would continue to put packets in the queue - its idx would be
> "after" that of the kernel
> - once the interface is brought up, the kernel would send all the packets
> in the queue (since they have the status of being owned by the kernel) on
> the next call to sendto() and the idx would be back to where it was before
> the interface was brought up (let's call it k1)
> - the dpdk driver idx into the queue would point somewhere in the queue
> (let's call it d1) and would continue to add packets at that point, but the
> kernel wouldn't send any packet anymore since there is now a gap of packets
> owned by the userspace between the kernel index (k1) and the dpdk driver
> idx (d1)
> - the dpdk idx would eventually reach k1 and packets would be transferred
> at a normal rate until both the dpdk idx and the kernel idx would reach d1
> again
> - between d1 and k1 there are only packets with the status as owned by the
> kernel - which where added by the dpdk driver while its index was between
> d1 and k1 ; thus the kernel would burst all the packets till k1, while the
> dpdk idx is at d1
> - the cycle repeats
> 
> If a new traffic config comes (in our application) while this cycle is
> happening, it could be that some of the packets of the old config are still
> in queue (between d1 and k1) and will be bursted when the dpdk and kernel
> idx reach d1 ; this would explain seeing packets from an old config, but
> only in the first 2048 packets (which is the queue size)
> 
> 

Hi Tudor,

If there is an usage on of veth, OK to fix the timestamps issue.

What you described above looks like a ring buffer with single producer and
single consumer, and producer overwrites the not consumed items.

I assume this happens because af_packet (consumer) can't send the packets
because of the timestamp defect. (Also producer (dpdk app) should have checks to
prevent overwrite, but that is a different issue.)

I will comment to the new versions of the patches.

Our of curiosity, are you using an modified af_packet implementation in kernel
for above described usage?

> [1] https://www.spinics.net/lists/kernel/msg3959391.html
> [2] https://www.spinics.net/lists/netdev/msg739372.html
> 
> On Wed, 1 Sept 2021 at 19:34, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
>> On 8/20/2021 2:39 PM, Tudor Cornea wrote:
>>> The poll call can return POLLERR which is ignored, or it can return
>>> POLLOUT, even if there are no free frames in the mmap-ed area.
>>>
>>> We can account for both of these cases by re-checking if the next
>>> frame is empty before writing into it.
>>>
>>> We also now eliminate the timestamp status from the frame status.
>>>
>>
>> Hi Tudor,
>>
>> Would you mind separate timestamp status fix to its own patch? I think
>> better to
>> fix 'ignoring full Tx ring' first, to not make it dependent to timestamp
>> patch.
>>
>>> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
>>> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
>>> ---
>>>  drivers/net/af_packet/rte_eth_af_packet.c | 47
>> +++++++++++++++++++++++++++++--
>>>  1 file changed, 45 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>> index b73b211..3845df5 100644
>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>> @@ -167,6 +167,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf
>> **bufs, uint16_t nb_pkts)
>>>       return num_rx;
>>>  }
>>>
>>> +static inline __u32 tx_ring_status_remove_ts(volatile __u32 *tp_status)
>>> +{
>>> +     return *tp_status &
>>> +             ~(TP_STATUS_TS_SOFTWARE | TP_STATUS_TS_RAW_HARDWARE);
>>
>> I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the
>> kernel
>> versions the bug exists, this flag is not set, but can you please confirm?
>>
>>> +}
>>> +
>>>  /*
>>>   * Callback to handle sending packets through a real NIC.
>>>   */
>>> @@ -211,9 +217,41 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
>> **bufs, uint16_t nb_pkts)
>>>                       }
>>>               }
>>>
>>> +             /*
>>> +              * We must eliminate the timestamp status from the packet
>>> +              * status. This should only matter if timestamping is
>> enabled
>>> +              * on the socket, but there is a BUG in the kernel which is
>>> +              * fixed in newer releases.
>>> +
>>> +              * For interfaces of type 'veth', the sent skb is forwarded
>>> +              * to the peer and back into the network stack which
>> timestamps
>>> +              * it on the RX path if timestamping is enabled globally
>>> +              * (which happens if any socket enables timestamping).
>>> +
>>> +              * When the skb is destructed, tpacket_destruct_skb() is
>> called
>>> +              * and it calls __packet_set_timestamp() which doesn't
>> check
>>> +              * the flags on the socket and returns the timestamp if it
>> is
>>> +              * set in the skb (and for veth it is, as mentioned above).
>>> +              */
>>> +
>>
>> Can you give some more details on this bug, any link etc..
>>
>> And does it only seen with veth, if so I wonder if we can ignore it, not
>> sure
>> how common to use af_packet PMD over veth interface, do you have this
>> usecase?
>>
>> And if only specific kernel versions impacted from this bug, what do you
>> think
>> about adding kernel version check to reduce the scope of the fix in
>> af_packet PMD.
>>
>>>               /* point at the next incoming frame */
>>> -             if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
>>> -                 (poll(&pfd, 1, -1) < 0))
>>> +             if ((tx_ring_status_remove_ts(&ppd->tp_status)
>>> +                     != TP_STATUS_AVAILABLE) && (poll(&pfd, 1, -1) < 0))
>>> +                     break;
>>> +
>>> +             /*
>>> +              * Poll can return POLLERR if the interface is down or
>> POLLOUT,
>>> +              * even if there are no extra buffers available.
>>> +              * This happens, because packet_poll() calls
>> datagram_poll()
>>> +              * which checks the space left in the socket buffer and in
>> the
>>> +              * case of packet_mmap the default socket buffer length
>>> +              * doesn't match the requested size for the tx_ring so
>> there
>>> +              * is always space left in socket buffer, which doesn't
>> seem
>>> +              * to be correlated to the requested size for the tx_ring
>>> +              * in packet_mmap.
>>> +              */
>>> +             if (tx_ring_status_remove_ts(&ppd->tp_status)
>>> +                     != TP_STATUS_AVAILABLE)
>>>                       break;
>>
>> In this case should we break or poll again?
>>
>> If 'POLLERR' is received when interface is down, it makes sense to break.
>> Do you
>> know if is there any other case that 'POLLERR' is returned?
>>
>> And for 'POLLOUT', when exactly this event sent? If 'POLLOUT' received,
>> can we
>> poll again to wait Tx ring available to send more packets?
>>
>>>
>>>               /* copy the tx frame data */
>>> @@ -242,6 +280,11 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
>> **bufs, uint16_t nb_pkts)
>>>               rte_pktmbuf_free(mbuf);
>>>       }
>>>
>>> +     /*
>>> +      * We might have to ignore a few more errnos here since the packets
>>> +      * remain in the mmap-ed queue and will be sent later, presumably.
>>> +      */
>>> +
>>
>> Can you please describe above comment more? What errors ignored?
>> And won't all packets sent will below sendto() call?
>>
>>>       /* kick-off transmits */
>>>       if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1 &&
>>>                       errno != ENOBUFS && errno != EAGAIN) {
>>>
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
  2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
@ 2021-09-20 17:44   ` Ferruh Yigit
  2021-09-29 10:03     ` Tudor Cornea
  2021-11-02 15:47   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
  1 sibling, 1 reply; 14+ messages in thread
From: Ferruh Yigit @ 2021-09-20 17:44 UTC (permalink / raw)
  To: Tudor Cornea; +Cc: linville, thomas, pogonarumihai, dev

On 9/13/2021 2:45 PM, Tudor Cornea wrote:
> The poll call can return POLLERR which is ignored, or it can return
> POLLOUT, even if there are no free frames in the mmap-ed area.
> 
> We can account for both of these cases by re-checking if the next
> frame is empty before writing into it.
> 
> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
> index b73b211..087c196 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  		    (poll(&pfd, 1, -1) < 0))
>  			break;
>  
> +		/*
> +		 * Poll can return POLLERR if the interface is down
> +		 *
> +		 * It will almost always return POLLOUT, even if there
> +		 * are no extra buffers available
> +		 *
> +		 * This happens, because packet_poll() calls datagram_poll()
> +		 * which checks the space left in the socket buffer and,
> +		 * in the case of packet_mmap, the default socket buffer length
> +		 * doesn't match the requested size for the tx_ring.
> +		 * As such, there is almost always space left in socket buffer,
> +		 * which doesn't seem to be correlated to the requested size
> +		 * for the tx_ring in packet_mmap.
> +		 *
> +		 * This results in poll() returning POLLOUT.
> +		 */
> +		if (ppd->tp_status != TP_STATUS_AVAILABLE)
> +			break;
> +

If 'POLLOUT' doesn't indicate that there is space in the buffer, what is the
point of the 'poll()' at all?

What can we test/reproduce the mentioned behavior? Or is there a way to fix the
behavior of poll() or use an alternative of it?


OK to break on the 'POLLERR', I guess it can be detected in the 'pfd.revent'.


>  		/* copy the tx frame data */
>  		pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
>  			sizeof(struct sockaddr_ll);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
  2021-09-20 17:44   ` Ferruh Yigit
@ 2021-09-29 10:03     ` Tudor Cornea
  2021-10-05 15:11       ` Tudor Cornea
  0 siblings, 1 reply; 14+ messages in thread
From: Tudor Cornea @ 2021-09-29 10:03 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: linville, Thomas Monjalon, Mihai Pogonaru, dev

Hi Ferruh,

What you described above looks like a ring buffer with single producer and
> single consumer, and producer overwrites the not consumed items.


Indeed. This is also my understanding of the bug.
I am going to try to isolate the issue, and should probably be able to come
up with a script in a few days.

Our of curiosity, are you using an modified af_packet implementation in
> kernel
> for above described usage?


We are currently using an Ubuntu-based distro with a 4.15 Linux kernel.
We don't have any kernel patches for the af_packet implementation to my
knowledge (probably excepting patches that are back-ported by Ubuntu
maintainers from newer releases).


On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 9/13/2021 2:45 PM, Tudor Cornea wrote:
> > The poll call can return POLLERR which is ignored, or it can return
> > POLLOUT, even if there are no free frames in the mmap-ed area.
> >
> > We can account for both of these cases by re-checking if the next
> > frame is empty before writing into it.
> >
> > Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> > ---
> >  drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
> b/drivers/net/af_packet/rte_eth_af_packet.c
> > index b73b211..087c196 100644
> > --- a/drivers/net/af_packet/rte_eth_af_packet.c
> > +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> > @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> >                   (poll(&pfd, 1, -1) < 0))
> >                       break;
> >
> > +             /*
> > +              * Poll can return POLLERR if the interface is down
> > +              *
> > +              * It will almost always return POLLOUT, even if there
> > +              * are no extra buffers available
> > +              *
> > +              * This happens, because packet_poll() calls
> datagram_poll()
> > +              * which checks the space left in the socket buffer and,
> > +              * in the case of packet_mmap, the default socket buffer
> length
> > +              * doesn't match the requested size for the tx_ring.
> > +              * As such, there is almost always space left in socket
> buffer,
> > +              * which doesn't seem to be correlated to the requested
> size
> > +              * for the tx_ring in packet_mmap.
> > +              *
> > +              * This results in poll() returning POLLOUT.
> > +              */
> > +             if (ppd->tp_status != TP_STATUS_AVAILABLE)
> > +                     break;
> > +
>
> If 'POLLOUT' doesn't indicate that there is space in the buffer, what is
> the
> point of the 'poll()' at all?
>
> What can we test/reproduce the mentioned behavior? Or is there a way to
> fix the
> behavior of poll() or use an alternative of it?
>
>
> OK to break on the 'POLLERR', I guess it can be detected in the
> 'pfd.revent'.
>
>
> >               /* copy the tx frame data */
> >               pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
> >                       sizeof(struct sockaddr_ll);
> >
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
  2021-09-29 10:03     ` Tudor Cornea
@ 2021-10-05 15:11       ` Tudor Cornea
  2021-10-26 14:30         ` Ferruh Yigit
  0 siblings, 1 reply; 14+ messages in thread
From: Tudor Cornea @ 2021-10-05 15:11 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: linville, Thomas Monjalon, Mihai Pogonaru, dev

Hi Ferruh,

I have attempted to narrow down the issue.
I have the following bash script, which computes packet rates on an
interface.

[root@localhost ~]# cat compute-rates.sh
#!/usr/bin/env bash

if [[ ${#} -ne 2 ]]; then
    echo "Usage: ${0} <iface-name> <sleep-interval-seconds>"
    exit 1
fi

IFACE_NAME="${1}"
SLEEP_INTERVAL_SECONDS="${2}"
TMP_STATS_FILE="/tmp/netstat"

# Clear Previous stats file
echo "0 0 0 0" > "${TMP_STATS_FILE}"

echo "Press CTRL+C to exit..."

while true; do
    export "RxB=0" "RxP=0" "TxB=0" "TxP=0"

    # Extract Rx{Bytes,Packets} and Tx{Bytes,Packets} and
    # format the output. Individual fields will be exported
    export $(\
        ifconfig "${IFACE_NAME}" \
            | grep 'packets' \
            | awk '{print $5, $3}' \
            | xargs echo \
            | sed -E -e \
                "s/([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/RxB=\1 RxP=\2
TxB=\3 TxP=\4/")

    # Print Packet and Byte Rates
    # Format: | Rx Bytes | Rx Packets | Tx Bytes | Tx Packets |

    echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" $(cat "${TMP_STATS_FILE}") \
        | awk '{print "RxB="$1-$5, "RxP="$2-$6, "TxB="$3-$7, "TxP="$4-$8}'

    # Save the new values
    echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" > "${TMP_STATS_FILE}"

    sleep "${SLEEP_INTERVAL_SECONDS}"

done

On the transmit side, I'm using the engine behind [1] with the af_packet
PMD.

The configuration for the af_packet PMD is the following:
--vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0

I'm configuring a Tx rate of 335 packets / second and a packet size of 300
Bytes.
These seem to be the values using which we seem to have better chances of
seeing the problem. I suspect it might also be linked with the af_packet
configuration.

I'm starting traffic using the specified configuration, and in parallel,
running the script that computes the rates as follows:
./compute-rates.sh eth1 0.1

Initially, the packet rates seem steady

RxB=0 RxP=0 TxB=10952 TxP=37
RxB=0 RxP=0 TxB=10656 TxP=36
RxB=0 RxP=0 TxB=10656 TxP=36
RxB=0 RxP=0 TxB=10656 TxP=36
RxB=0 RxP=0 TxB=10952 TxP=37
RxB=0 RxP=0 TxB=10952 TxP=37
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10952 TxP=37

[...]

After a while, we toggle the interface up / down with a sleep between the
steps. I suspect the length of the sleep might be a variable in the
equation.

ifconfig eth1 down; sleep 7; ifconfig eth1 up


What we see, is that even after the interface is toggled back up, the rates
never seem to recover.

RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=2072 TxP=7
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=10360 TxP=35
RxB=0 RxP=0 TxB=521256 TxP=1761
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0

[...]


I've attempted to mirror the same behavior using dpdk-pktgen [2] on a
different machine (Ubuntu 20.04). This time, af_packet runs on top of
a Linux virtio_net interface.

I seem to be getting a  similar behavior. I have used the following
dpdk-pktgen configuration and run-time settings


pktgen \
    -l 1-4 \
    -n 4 \
    --proc-type=primary \
    --no-pci \
    --no-telemetry \
    --no-huge \
    -m 512 \
    --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
\
    -- \
    -P \
    -T \
    -m "3.0" \
    -f themes/black-yellow.theme

set 0 size 300
set 0 rate 0.008
set 0 burst 1
start 0


[1] https://github.com/open-traffic-generator/ixia-c
[2] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md

On Wed, 29 Sept 2021 at 13:03, Tudor Cornea <tudor.cornea@gmail.com> wrote:

> Hi Ferruh,
>
> What you described above looks like a ring buffer with single producer and
>> single consumer, and producer overwrites the not consumed items.
>
>
> Indeed. This is also my understanding of the bug.
> I am going to try to isolate the issue, and should probably be able to
> come up with a script in a few days.
>
> Our of curiosity, are you using an modified af_packet implementation in
>> kernel
>> for above described usage?
>
>
> We are currently using an Ubuntu-based distro with a 4.15 Linux kernel.
> We don't have any kernel patches for the af_packet implementation to my
> knowledge (probably excepting patches that are back-ported by Ubuntu
> maintainers from newer releases).
>
>
> On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
>
>> On 9/13/2021 2:45 PM, Tudor Cornea wrote:
>> > The poll call can return POLLERR which is ignored, or it can return
>> > POLLOUT, even if there are no free frames in the mmap-ed area.
>> >
>> > We can account for both of these cases by re-checking if the next
>> > frame is empty before writing into it.
>> >
>> > Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
>> > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
>> > ---
>> >  drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
>> >  1 file changed, 19 insertions(+)
>> >
>> > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>> b/drivers/net/af_packet/rte_eth_af_packet.c
>> > index b73b211..087c196 100644
>> > --- a/drivers/net/af_packet/rte_eth_af_packet.c
>> > +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>> > @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
>> **bufs, uint16_t nb_pkts)
>> >                   (poll(&pfd, 1, -1) < 0))
>> >                       break;
>> >
>> > +             /*
>> > +              * Poll can return POLLERR if the interface is down
>> > +              *
>> > +              * It will almost always return POLLOUT, even if there
>> > +              * are no extra buffers available
>> > +              *
>> > +              * This happens, because packet_poll() calls
>> datagram_poll()
>> > +              * which checks the space left in the socket buffer and,
>> > +              * in the case of packet_mmap, the default socket buffer
>> length
>> > +              * doesn't match the requested size for the tx_ring.
>> > +              * As such, there is almost always space left in socket
>> buffer,
>> > +              * which doesn't seem to be correlated to the requested
>> size
>> > +              * for the tx_ring in packet_mmap.
>> > +              *
>> > +              * This results in poll() returning POLLOUT.
>> > +              */
>> > +             if (ppd->tp_status != TP_STATUS_AVAILABLE)
>> > +                     break;
>> > +
>>
>> If 'POLLOUT' doesn't indicate that there is space in the buffer, what is
>> the
>> point of the 'poll()' at all?
>>
>> What can we test/reproduce the mentioned behavior? Or is there a way to
>> fix the
>> behavior of poll() or use an alternative of it?
>>
>>
>> OK to break on the 'POLLERR', I guess it can be detected in the
>> 'pfd.revent'.
>>
>>
>> >               /* copy the tx frame data */
>> >               pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
>> >                       sizeof(struct sockaddr_ll);
>> >
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
  2021-10-05 15:11       ` Tudor Cornea
@ 2021-10-26 14:30         ` Ferruh Yigit
  2021-11-02 15:24           ` Tudor Cornea
  0 siblings, 1 reply; 14+ messages in thread
From: Ferruh Yigit @ 2021-10-26 14:30 UTC (permalink / raw)
  To: Tudor Cornea; +Cc: linville, Thomas Monjalon, Mihai Pogonaru, dev

On 10/5/2021 4:11 PM, Tudor Cornea wrote:
> Hi Ferruh,
> 
> I have attempted to narrow down the issue.
> I have the following bash script, which computes packet rates on an
> interface.
> 
> [root@localhost ~]# cat compute-rates.sh
> #!/usr/bin/env bash
> 
> if [[ ${#} -ne 2 ]]; then
>      echo "Usage: ${0} <iface-name> <sleep-interval-seconds>"
>      exit 1
> fi
> 
> IFACE_NAME="${1}"
> SLEEP_INTERVAL_SECONDS="${2}"
> TMP_STATS_FILE="/tmp/netstat"
> 
> # Clear Previous stats file
> echo "0 0 0 0" > "${TMP_STATS_FILE}"
> 
> echo "Press CTRL+C to exit..."
> 
> while true; do
>      export "RxB=0" "RxP=0" "TxB=0" "TxP=0"
> 
>      # Extract Rx{Bytes,Packets} and Tx{Bytes,Packets} and
>      # format the output. Individual fields will be exported
>      export $(\
>          ifconfig "${IFACE_NAME}" \
>              | grep 'packets' \
>              | awk '{print $5, $3}' \
>              | xargs echo \
>              | sed -E -e \
>                  "s/([0-9]+) ([0-9]+) ([0-9]+) ([0-9]+)/RxB=\1 RxP=\2
> TxB=\3 TxP=\4/")
> 
>      # Print Packet and Byte Rates
>      # Format: | Rx Bytes | Rx Packets | Tx Bytes | Tx Packets |
> 
>      echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" $(cat "${TMP_STATS_FILE}") \
>          | awk '{print "RxB="$1-$5, "RxP="$2-$6, "TxB="$3-$7, "TxP="$4-$8}'
> 
>      # Save the new values
>      echo "${RxB}" "${RxP}" "${TxB}" "${TxP}" > "${TMP_STATS_FILE}"
> 
>      sleep "${SLEEP_INTERVAL_SECONDS}"
> 
> done
> 
> On the transmit side, I'm using the engine behind [1] with the af_packet
> PMD.
> 
> The configuration for the af_packet PMD is the following:
> --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
> 
> I'm configuring a Tx rate of 335 packets / second and a packet size of 300
> Bytes.
> These seem to be the values using which we seem to have better chances of
> seeing the problem. I suspect it might also be linked with the af_packet
> configuration.
> 
> I'm starting traffic using the specified configuration, and in parallel,
> running the script that computes the rates as follows:
> ./compute-rates.sh eth1 0.1
> 
> Initially, the packet rates seem steady
> 
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10656 TxP=36
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10952 TxP=37
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10952 TxP=37
> 
> [...]
> 
> After a while, we toggle the interface up / down with a sleep between the
> steps. I suspect the length of the sleep might be a variable in the
> equation.
> 
> ifconfig eth1 down; sleep 7; ifconfig eth1 up
> 
> 
> What we see, is that even after the interface is toggled back up, the rates
> never seem to recover.
> 
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=2072 TxP=7
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=10360 TxP=35
> RxB=0 RxP=0 TxB=521256 TxP=1761
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> RxB=0 RxP=0 TxB=0 TxP=0
> 
> [...]
> 
> 
> I've attempted to mirror the same behavior using dpdk-pktgen [2] on a
> different machine (Ubuntu 20.04). This time, af_packet runs on top of
> a Linux virtio_net interface.
> 
> I seem to be getting a  similar behavior. I have used the following
> dpdk-pktgen configuration and run-time settings
> 
> 
> pktgen \
>      -l 1-4 \
>      -n 4 \
>      --proc-type=primary \
>      --no-pci \
>      --no-telemetry \
>      --no-huge \
>      -m 512 \
>      --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192,framecnt=2048,qpairs=1,qdisc_bypass=0
> \
>      -- \
>      -P \
>      -T \
>      -m "3.0" \
>      -f themes/black-yellow.theme
> 
> set 0 size 300
> set 0 rate 0.008
> set 0 burst 1
> start 0
> 
> 
> [1] https://github.com/open-traffic-generator/ixia-c
> [2] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md
> 
> On Wed, 29 Sept 2021 at 13:03, Tudor Cornea <tudor.cornea@gmail.com> wrote:
> 

Hi Tudor,

I have used testpmd, 'txonly' forwarding. Tx recovers after interface up,
but by adding some debug logs I can see 'poll()' returns with POLLOUT even
there is no space in the buffer.

According the logic in the PMD, when 'poll()' returns success, it expects
to have some space in the Tx buffer.

So I agree to add the check.

Only have a question on the POLLERR, should we separate the POLLERR check
to cover ifdown case, what do you think about following logic:

if (!TP_STATUS_AVAILABLE) {
     if (poll() < 0)
         break;
     if (pfd.revents & POLLERR)
         break;
}

if (!TP_STATUS_AVAILABLE)
     break;



>> Hi Ferruh,
>>
>> What you described above looks like a ring buffer with single producer and
>>> single consumer, and producer overwrites the not consumed items.
>>
>>
>> Indeed. This is also my understanding of the bug.
>> I am going to try to isolate the issue, and should probably be able to
>> come up with a script in a few days.
>>
>> Our of curiosity, are you using an modified af_packet implementation in
>>> kernel
>>> for above described usage?
>>
>>
>> We are currently using an Ubuntu-based distro with a 4.15 Linux kernel.
>> We don't have any kernel patches for the af_packet implementation to my
>> knowledge (probably excepting patches that are back-ported by Ubuntu
>> maintainers from newer releases).
>>
>>
>> On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit <ferruh.yigit@intel.com>
>> wrote:
>>
>>> On 9/13/2021 2:45 PM, Tudor Cornea wrote:
>>>> The poll call can return POLLERR which is ignored, or it can return
>>>> POLLOUT, even if there are no free frames in the mmap-ed area.
>>>>
>>>> We can account for both of these cases by re-checking if the next
>>>> frame is empty before writing into it.
>>>>
>>>> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
>>>> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
>>>> ---
>>>>   drivers/net/af_packet/rte_eth_af_packet.c | 19 +++++++++++++++++++
>>>>   1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> index b73b211..087c196 100644
>>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf
>>> **bufs, uint16_t nb_pkts)
>>>>                    (poll(&pfd, 1, -1) < 0))
>>>>                        break;
>>>>
>>>> +             /*
>>>> +              * Poll can return POLLERR if the interface is down
>>>> +              *
>>>> +              * It will almost always return POLLOUT, even if there
>>>> +              * are no extra buffers available
>>>> +              *
>>>> +              * This happens, because packet_poll() calls
>>> datagram_poll()
>>>> +              * which checks the space left in the socket buffer and,
>>>> +              * in the case of packet_mmap, the default socket buffer
>>> length
>>>> +              * doesn't match the requested size for the tx_ring.
>>>> +              * As such, there is almost always space left in socket
>>> buffer,
>>>> +              * which doesn't seem to be correlated to the requested
>>> size
>>>> +              * for the tx_ring in packet_mmap.
>>>> +              *
>>>> +              * This results in poll() returning POLLOUT.
>>>> +              */
>>>> +             if (ppd->tp_status != TP_STATUS_AVAILABLE)
>>>> +                     break;
>>>> +
>>>
>>> If 'POLLOUT' doesn't indicate that there is space in the buffer, what is
>>> the
>>> point of the 'poll()' at all?
>>>
>>> What can we test/reproduce the mentioned behavior? Or is there a way to
>>> fix the
>>> behavior of poll() or use an alternative of it?
>>>
>>>
>>> OK to break on the 'POLLERR', I guess it can be detected in the
>>> 'pfd.revent'.
>>>
>>>
>>>>                /* copy the tx frame data */
>>>>                pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
>>>>                        sizeof(struct sockaddr_ll);
>>>>
>>>
>>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
  2021-10-26 14:30         ` Ferruh Yigit
@ 2021-11-02 15:24           ` Tudor Cornea
  0 siblings, 0 replies; 14+ messages in thread
From: Tudor Cornea @ 2021-11-02 15:24 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: linville, Thomas Monjalon, Mihai Pogonaru, dev

On Tue, 26 Oct 2021 at 17:41, Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> Hi Tudor,
>
> I have used testpmd, 'txonly' forwarding. Tx recovers after interface up,
> but by adding some debug logs I can see 'poll()' returns with POLLOUT even
> there is no space in the buffer.
>
> According the logic in the PMD, when 'poll()' returns success, it expects
> to have some space in the Tx buffer.
>
> So I agree to add the check.
>
> Only have a question on the POLLERR, should we separate the POLLERR check
> to cover ifdown case, what do you think about following logic:
>
> if (!TP_STATUS_AVAILABLE) {
>      if (poll() < 0)
>          break;
>      if (pfd.revents & POLLERR)
>          break;
> }
>
> if (!TP_STATUS_AVAILABLE)
>      break;
>
>
Hi Ferruh,

Thanks for the suggestion.
I was thinking of adding this check, and intuitively, it seems correct. I
tried to do some further testing.

I tested with the POLLERR check, and I don't see the issue anymore.
However, if I remove the second if (!TP_STATUS_AVAILABLE), I seem to get
bursts of 2048 packets followed by periods of not sending anything when
toggling the interface link state.
Without the second if(!TP_STATUS_AVAILABLE) statement, the issue seems to
reproduce, regardless if I add the check for POLLERR or not.

RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=606208 TxP=2048
RxB=0 RxP=0 TxB=0 TxP=0
RxB=0 RxP=0 TxB=0 TxP=0

I will send an updated version of the patch.

Thanks,
Tudor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v3] net/af_packet: fix ignoring full ring on tx
  2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
  2021-09-20 17:44   ` Ferruh Yigit
@ 2021-11-02 15:47   ` Tudor Cornea
  2021-11-02 16:47     ` Ferruh Yigit
  2021-11-03  9:31     ` [dpdk-dev] [PATCH v4] " Tudor Cornea
  1 sibling, 2 replies; 14+ messages in thread
From: Tudor Cornea @ 2021-11-02 15:47 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: linville, thomas, pogonarumihai, dev, Tudor Cornea

The poll call can return POLLERR which is ignored, or it can return
POLLOUT, even if there are no free frames in the mmap-ed area.

We can account for both of these cases by re-checking if the next
frame is empty before writing into it.

We have attempted to reproduce this issue with pktgen-dpdk, using the
following configuration.

pktgen -l 1-4 -n 4 --proc-type=primary --no-pci --no-telemetry \
    --no-huge -m 512 \
    --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192, \
    framecnt=2048,qpairs=1,qdisc_bypass=0 \
    -- \
    -P \
    -T \
    -m "3.0" \
    -f themes/black-yellow.theme

We configure a low tx rate (~ 335 packets / second) and a small
packet size, of about 300 Bytes from the pktgen CLI.

set 0 size 300
set 0 rate 0.008
set 0 burst 1
start 0

After bringing the interface down, and up again, we seem to arrive
in a state in which the tx rate is inconsistent, and does not recover.

ifconfig eth1 down; sleep 7; ifconfig eth1 up

[1] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md

Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>

---
v2:
* Added check for POLLERR
* Used tx_ring_status_available() for checking TP_STATUS_AVAILABLE
---
 drivers/net/af_packet/rte_eth_af_packet.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index 559f5a0..d3d3104 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -237,8 +237,30 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		}
 
 		/* point at the next incoming frame */
-		if (!tx_ring_status_available(ppd->tp_status) &&
-		    poll(&pfd, 1, -1) < 0)
+		if (!tx_ring_status_available(ppd->tp_status)) {
+			if (poll(&pfd, 1, -1) < 0)
+				break;
+			if (pfd.revents & POLLERR)
+				break;
+		}
+
+		/*
+		 * Poll can return POLLERR if the interface is down
+		 *
+		 * It will almost always return POLLOUT, even if there
+		 * are no extra buffers available
+		 *
+		 * This happens, because packet_poll() calls datagram_poll()
+		 * which checks the space left in the socket buffer and,
+		 * in the case of packet_mmap, the default socket buffer length
+		 * doesn't match the requested size for the tx_ring.
+		 * As such, there is almost always space left in socket buffer,
+		 * which doesn't seem to be correlated to the requested size
+		 * for the tx_ring in packet_mmap.
+		 *
+		 * This results in poll() returning POLLOUT.
+		 */
+		if (!tx_ring_status_available(ppd->tp_status))
 			break;
 
 		/* copy the tx frame data */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v3] net/af_packet: fix ignoring full ring on tx
  2021-11-02 15:47   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
@ 2021-11-02 16:47     ` Ferruh Yigit
  2021-11-03  9:31     ` [dpdk-dev] [PATCH v4] " Tudor Cornea
  1 sibling, 0 replies; 14+ messages in thread
From: Ferruh Yigit @ 2021-11-02 16:47 UTC (permalink / raw)
  To: Tudor Cornea; +Cc: linville, thomas, pogonarumihai, dev

On 11/2/2021 3:47 PM, Tudor Cornea wrote:
> The poll call can return POLLERR which is ignored, or it can return
> POLLOUT, even if there are no free frames in the mmap-ed area.
> 
> We can account for both of these cases by re-checking if the next
> frame is empty before writing into it.
> 
> We have attempted to reproduce this issue with pktgen-dpdk, using the
> following configuration.
> 
> pktgen -l 1-4 -n 4 --proc-type=primary --no-pci --no-telemetry \
>      --no-huge -m 512 \
>      --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192, \
>      framecnt=2048,qpairs=1,qdisc_bypass=0 \
>      -- \
>      -P \
>      -T \
>      -m "3.0" \
>      -f themes/black-yellow.theme
> 
> We configure a low tx rate (~ 335 packets / second) and a small
> packet size, of about 300 Bytes from the pktgen CLI.
> 
> set 0 size 300
> set 0 rate 0.008
> set 0 burst 1
> start 0
> 
> After bringing the interface down, and up again, we seem to arrive
> in a state in which the tx rate is inconsistent, and does not recover.
> 
> ifconfig eth1 down; sleep 7; ifconfig eth1 up
> 
> [1] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md
> 
> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> 
> ---
> v2:
> * Added check for POLLERR
> * Used tx_ring_status_available() for checking TP_STATUS_AVAILABLE
> ---
>   drivers/net/af_packet/rte_eth_af_packet.c | 26 ++++++++++++++++++++++++--
>   1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
> index 559f5a0..d3d3104 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -237,8 +237,30 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>   		}
>   
>   		/* point at the next incoming frame */
> -		if (!tx_ring_status_available(ppd->tp_status) &&
> -		    poll(&pfd, 1, -1) < 0)
> +		if (!tx_ring_status_available(ppd->tp_status)) {
> +			if (poll(&pfd, 1, -1) < 0)
> +				break;
> +			if (pfd.revents & POLLERR)
> +				break;
> +		}
> +
> +		/*
> +		 * Poll can return POLLERR if the interface is down
> +		 *

Above comment fits better to above check, or can remove it completely.
Can you send a new version, or I can remove it while merging if you prefer?

Except from above comment,
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

> +		 * It will almost always return POLLOUT, even if there
> +		 * are no extra buffers available
> +		 *
> +		 * This happens, because packet_poll() calls datagram_poll()
> +		 * which checks the space left in the socket buffer and,
> +		 * in the case of packet_mmap, the default socket buffer length
> +		 * doesn't match the requested size for the tx_ring.
> +		 * As such, there is almost always space left in socket buffer,
> +		 * which doesn't seem to be correlated to the requested size
> +		 * for the tx_ring in packet_mmap.
> +		 *
> +		 * This results in poll() returning POLLOUT.
> +		 */
> +		if (!tx_ring_status_available(ppd->tp_status))
>   			break;
>   
>   		/* copy the tx frame data */
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v4] net/af_packet: fix ignoring full ring on tx
  2021-11-02 15:47   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
  2021-11-02 16:47     ` Ferruh Yigit
@ 2021-11-03  9:31     ` Tudor Cornea
  2021-11-04 12:07       ` Ferruh Yigit
  1 sibling, 1 reply; 14+ messages in thread
From: Tudor Cornea @ 2021-11-03  9:31 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: linville, thomas, pogonarumihai, dev, Tudor Cornea

The poll call can return POLLERR which is ignored, or it can return
POLLOUT, even if there are no free frames in the mmap-ed area.

We can account for both of these cases by re-checking if the next
frame is empty before writing into it.

We have attempted to reproduce this issue with pktgen-dpdk, using the
following configuration.

pktgen -l 1-4 -n 4 --proc-type=primary --no-pci --no-telemetry \
    --no-huge -m 512 \
    --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192, \
    framecnt=2048,qpairs=1,qdisc_bypass=0 \
    -- \
    -P \
    -T \
    -m "3.0" \
    -f themes/black-yellow.theme

We configure a low tx rate (~ 335 packets / second) and a small
packet size, of about 300 bytes from the pktgen CLI.

set 0 size 300
set 0 rate 0.008
set 0 burst 1
start 0

After bringing the interface down, and up again, we seem to arrive
in a state in which the tx rate is inconsistent, and does not recover.

ifconfig eth1 down; sleep 7; ifconfig eth1 up

[1] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md

Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

---
v4:
* Updated comment regarding POLLERR
  Removed some extra details, and left notice about poll() returning
  POLLERR, in case of interface link down.
v3:
* Added check for POLLERR
* Used tx_ring_status_available() for checking TP_STATUS_AVAILABLE
---
 drivers/net/af_packet/rte_eth_af_packet.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index 559f5a0..dd6d165 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -237,8 +237,16 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		}
 
 		/* point at the next incoming frame */
-		if (!tx_ring_status_available(ppd->tp_status) &&
-		    poll(&pfd, 1, -1) < 0)
+		if (!tx_ring_status_available(ppd->tp_status)) {
+			if (poll(&pfd, 1, -1) < 0)
+				break;
+
+			/* poll() can return POLLERR if the interface is down */
+			if (pfd.revents & POLLERR)
+				break;
+		}
+
+		if (!tx_ring_status_available(ppd->tp_status))
 			break;
 
 		/* copy the tx frame data */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v4] net/af_packet: fix ignoring full ring on tx
  2021-11-03  9:31     ` [dpdk-dev] [PATCH v4] " Tudor Cornea
@ 2021-11-04 12:07       ` Ferruh Yigit
  0 siblings, 0 replies; 14+ messages in thread
From: Ferruh Yigit @ 2021-11-04 12:07 UTC (permalink / raw)
  To: Tudor Cornea; +Cc: linville, thomas, pogonarumihai, dev

On 11/3/2021 9:31 AM, Tudor Cornea wrote:
> The poll call can return POLLERR which is ignored, or it can return
> POLLOUT, even if there are no free frames in the mmap-ed area.
> 
> We can account for both of these cases by re-checking if the next
> frame is empty before writing into it.
> 
> We have attempted to reproduce this issue with pktgen-dpdk, using the
> following configuration.
> 
> pktgen -l 1-4 -n 4 --proc-type=primary --no-pci --no-telemetry \
>      --no-huge -m 512 \
>      --vdev=net_af_packet0,iface=eth1,blocksz=16384,framesz=8192, \
>      framecnt=2048,qpairs=1,qdisc_bypass=0 \
>      -- \
>      -P \
>      -T \
>      -m "3.0" \
>      -f themes/black-yellow.theme
> 
> We configure a low tx rate (~ 335 packets / second) and a small
> packet size, of about 300 bytes from the pktgen CLI.
> 
> set 0 size 300
> set 0 rate 0.008
> set 0 burst 1
> start 0
> 
> After bringing the interface down, and up again, we seem to arrive
> in a state in which the tx rate is inconsistent, and does not recover.
> 
> ifconfig eth1 down; sleep 7; ifconfig eth1 up
> 
> [1] http://code.dpdk.org/pktgen-dpdk/pktgen-20.11.2/source/INSTALL.md
> 
> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Used comment from v3, only POLLERR related part kept as in this patch.

     Fixes: 364e08f2bbc0 ("af_packet: add PMD for AF_PACKET-based virtual devices")
     Cc: stable@dpdk.org

Applied to dpdk-next-net/main, thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-11-04 12:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 13:39 [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx Tudor Cornea
2021-09-01 16:34 ` Ferruh Yigit
2021-09-06 10:23   ` Tudor Cornea
2021-09-20 17:11     ` Ferruh Yigit
2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
2021-09-20 17:44   ` Ferruh Yigit
2021-09-29 10:03     ` Tudor Cornea
2021-10-05 15:11       ` Tudor Cornea
2021-10-26 14:30         ` Ferruh Yigit
2021-11-02 15:24           ` Tudor Cornea
2021-11-02 15:47   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
2021-11-02 16:47     ` Ferruh Yigit
2021-11-03  9:31     ` [dpdk-dev] [PATCH v4] " Tudor Cornea
2021-11-04 12:07       ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).