DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Tudor Cornea <tudor.cornea@gmail.com>, <linville@tuxdriver.com>
Cc: <thomas@monjalon.net>, <dev@dpdk.org>,
	Mihai Pogonaru <pogonarumihai@gmail.com>
Subject: Re: [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx
Date: Wed, 1 Sep 2021 17:34:20 +0100	[thread overview]
Message-ID: <3154f4e8-9f0e-b04f-6e8f-096dc39f489c@intel.com> (raw)
In-Reply-To: <1629466761-127333-1-git-send-email-tudor.cornea@gmail.com>

On 8/20/2021 2:39 PM, Tudor Cornea wrote:
> The poll call can return POLLERR which is ignored, or it can return
> POLLOUT, even if there are no free frames in the mmap-ed area.
> 
> We can account for both of these cases by re-checking if the next
> frame is empty before writing into it.
> 
> We also now eliminate the timestamp status from the frame status.
> 

Hi Tudor,

Would you mind separate timestamp status fix to its own patch? I think better to
fix 'ignoring full Tx ring' first, to not make it dependent to timestamp patch.

> Signed-off-by: Mihai Pogonaru <pogonarumihai@gmail.com>
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 47 +++++++++++++++++++++++++++++--
>  1 file changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
> index b73b211..3845df5 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -167,6 +167,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  	return num_rx;
>  }
>  
> +static inline __u32 tx_ring_status_remove_ts(volatile __u32 *tp_status)
> +{
> +	return *tp_status &
> +		~(TP_STATUS_TS_SOFTWARE | TP_STATUS_TS_RAW_HARDWARE);

I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the kernel
versions the bug exists, this flag is not set, but can you please confirm?

> +}
> +
>  /*
>   * Callback to handle sending packets through a real NIC.
>   */
> @@ -211,9 +217,41 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  			}
>  		}
>  
> +		/*
> +		 * We must eliminate the timestamp status from the packet
> +		 * status. This should only matter if timestamping is enabled
> +		 * on the socket, but there is a BUG in the kernel which is
> +		 * fixed in newer releases.
> +
> +		 * For interfaces of type 'veth', the sent skb is forwarded
> +		 * to the peer and back into the network stack which timestamps
> +		 * it on the RX path if timestamping is enabled globally
> +		 * (which happens if any socket enables timestamping).
> +
> +		 * When the skb is destructed, tpacket_destruct_skb() is called
> +		 * and it calls __packet_set_timestamp() which doesn't check
> +		 * the flags on the socket and returns the timestamp if it is
> +		 * set in the skb (and for veth it is, as mentioned above).
> +		 */
> +

Can you give some more details on this bug, any link etc..

And does it only seen with veth, if so I wonder if we can ignore it, not sure
how common to use af_packet PMD over veth interface, do you have this usecase?

And if only specific kernel versions impacted from this bug, what do you think
about adding kernel version check to reduce the scope of the fix in af_packet PMD.

>  		/* point at the next incoming frame */
> -		if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
> -		    (poll(&pfd, 1, -1) < 0))
> +		if ((tx_ring_status_remove_ts(&ppd->tp_status)
> +			!= TP_STATUS_AVAILABLE) && (poll(&pfd, 1, -1) < 0))
> +			break;
> +
> +		/*
> +		 * Poll can return POLLERR if the interface is down or POLLOUT,
> +		 * even if there are no extra buffers available.
> +		 * This happens, because packet_poll() calls datagram_poll()
> +		 * which checks the space left in the socket buffer and in the
> +		 * case of packet_mmap the default socket buffer length
> +		 * doesn't match the requested size for the tx_ring so there
> +		 * is always space left in socket buffer, which doesn't seem
> +		 * to be correlated to the requested size for the tx_ring
> +		 * in packet_mmap.
> +		 */
> +		if (tx_ring_status_remove_ts(&ppd->tp_status)
> +			!= TP_STATUS_AVAILABLE)
>  			break;

In this case should we break or poll again?

If 'POLLERR' is received when interface is down, it makes sense to break. Do you
know if is there any other case that 'POLLERR' is returned?

And for 'POLLOUT', when exactly this event sent? If 'POLLOUT' received, can we
poll again to wait Tx ring available to send more packets?

>  
>  		/* copy the tx frame data */
> @@ -242,6 +280,11 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>  		rte_pktmbuf_free(mbuf);
>  	}
>  
> +	/*
> +	 * We might have to ignore a few more errnos here since the packets
> +	 * remain in the mmap-ed queue and will be sent later, presumably.
> +	 */
> +

Can you please describe above comment more? What errors ignored?
And won't all packets sent will below sendto() call?

>  	/* kick-off transmits */
>  	if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1 &&
>  			errno != ENOBUFS && errno != EAGAIN) {
> 


  reply	other threads:[~2021-09-01 16:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20 13:39 Tudor Cornea
2021-09-01 16:34 ` Ferruh Yigit [this message]
2021-09-06 10:23   ` Tudor Cornea
2021-09-20 17:11     ` Ferruh Yigit
2021-09-13 13:45 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
2021-09-20 17:44   ` Ferruh Yigit
2021-09-29 10:03     ` Tudor Cornea
2021-10-05 15:11       ` Tudor Cornea
2021-10-26 14:30         ` Ferruh Yigit
2021-11-02 15:24           ` Tudor Cornea
2021-11-02 15:47   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
2021-11-02 16:47     ` Ferruh Yigit
2021-11-03  9:31     ` [dpdk-dev] [PATCH v4] " Tudor Cornea
2021-11-04 12:07       ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3154f4e8-9f0e-b04f-6e8f-096dc39f489c@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=dev@dpdk.org \
    --cc=linville@tuxdriver.com \
    --cc=pogonarumihai@gmail.com \
    --cc=thomas@monjalon.net \
    --cc=tudor.cornea@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).