DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	dev@dpdk.org,
	"Konstantin Ananyev" <konstantin.ananyev@huawei.com>
Cc: bruce.richardson@intel.com, Thomas Monjalon <thomas@monjalon.net>,
	John McNamara <john.mcnamara@intel.com>
Subject: Re: [RFC] ethdev: fix UAF in rte_eth_tx_burst with filtering
Date: Mon, 10 Nov 2025 12:02:15 +0300	[thread overview]
Message-ID: <1fd7889d-9187-4bf6-b7c0-863783bfc7ed@oktetlabs.ru> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F65544@smartserver.smartshare.dk>

On 11/10/25 11:34, Morten Brørup wrote:
> +TO: Konstantin, regarding Stage Ordered Ring.
> 
>> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
>> Sent: Sunday, 9 November 2025 16.23
>>
>> If I remember correctly we should avoid function names in summary.
>>
>> (I'm really surprised that callbacks and rte_eth_tx_burst() are
>> supposed to reorder mbufs in tx_pkts array, but it looks like it
>> the only sensible option.)
> 
> Reordering the packets is incompatible with the Stage Ordered Ring.
> When the transmit function returns e.g. 2 non-consumed packets, the metadata will be associated with the 2 last packets in the original burst passed to the transmit function, not the 2 packets moved to the last position in the array.

Good point.

> 
>>
>> On 10/30/25 02:54, Stephen Hemminger wrote:
>>> When packets are filtered by Tx callback, they return value
>>> needs to include those packets; otherwise the application will
>>> do a duplicate free of those mbufs.
>>>
>>> The example would be application that sent a burst of 8 packets,
>>> and 2 were filtered but the PMD was busy and could not send any.
>>> The result should be a return value of 2 and mbufs 0..1 should
>>> be freed and 2..7 should be untouched and up to application to
>>> resend or drop.
>>>
>>> Original buggy code would return 0 and 0..1 would be freed
>>> 2..7 would be unused. Application would then try and drop
>>> or resend an already freed mbuf.
>>>
>>> Bugzilla ID: 1816
>>> Fixes: 4dc294158cac ("ethdev: support optional Rx and Tx callbacks")
>>> Cc: bruce.richardson@intel.com
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> ---
>>>    lib/ethdev/rte_ethdev.h | 47 ++++++++++++++++++++++++++++++--------
>> ---
>>>    1 file changed, 35 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>> index a66c2abbdb..d2cfe390f9 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -6714,6 +6714,10 @@ rte_eth_tx_burst(uint16_t port_id, uint16_t
>> queue_id,
>>>    	}
>>>    #endif
>>>
>>> +	uint16_t requested_pkts = nb_pkts;
>>> +	uint16_t filtered_pkts = 0;
>>> +	rte_mbuf_history_mark_bulk(tx_pkts, nb_pkts,
>> RTE_MBUF_HISTORY_OP_TX);
>>> +
>>>    #ifdef RTE_ETHDEV_RXTX_CALLBACKS
>>>    	{
>>>    		void *cb;
>>> @@ -6727,22 +6731,41 @@ rte_eth_tx_burst(uint16_t port_id, uint16_t
>> queue_id,
>>>    		cb = rte_atomic_load_explicit(&p->txq.clbk[queue_id],
>>>    				rte_memory_order_relaxed);
>>>    		if (unlikely(cb != NULL))
>>> -			nb_pkts = rte_eth_call_tx_callbacks(port_id,
>> queue_id,
>>> -					tx_pkts, nb_pkts, cb);
>>> +			filtered_pkts = nb_pkts -
>>> +				rte_eth_call_tx_callbacks(port_id, queue_id,
>> tx_pkts, nb_pkts, cb);
>>>    	}
>>>    #endif
>>>
>>> -	uint16_t requested_pkts = nb_pkts;
>>> -	rte_mbuf_history_mark_bulk(tx_pkts, nb_pkts,
>> RTE_MBUF_HISTORY_OP_TX);
>>> -
>>> -	nb_pkts = p->tx_pkt_burst(qd, tx_pkts, nb_pkts);
>>> -
>>> -	if (requested_pkts > nb_pkts)
>>> -		rte_mbuf_history_mark_bulk(tx_pkts + nb_pkts,
>>> -				requested_pkts - nb_pkts,
>> RTE_MBUF_HISTORY_OP_TX_BUSY);
>>> +	uint16_t sent_pkts = p->tx_pkt_burst(qd, tx_pkts, nb_pkts -
>> filtered_pkts);
>>> +	uint16_t result_pkts = sent_pkts + filtered_pkts;
>>> +
>>> +	if (unlikely(result_pkts < requested_pkts)) {
>>
>> First of all unlikely() is wrong here. It is not an error path. It is
>> OK
>> when not everything is sent and it should not cost too much as unlikely
>> does.
>> Second, I'd remove the condition completely. It is not a big deal to
>> calculate unsent_pkts and call trace (if enabled) with zero
>> unsent_pkts.
>>
>>> +		uint16_t unsent_pkts = requested_pkts - result_pkts;
>>> +
>>> +		if (unlikely(filtered_pkts > 0)) {
>>
>> Same here for unlikely().
>> Also since filtered_pkts appear in RTE_ETHDEV_RXTX_CALLBACKS case,
>> I'd put it under corresponding conditional compilation to be
>> consistent.
>> Yes, code with conditional compilation is hard to read, but consistency
>> is more important here.
>>
>>> +			/* Need to reshuffle packet list so that unsent
>> packets are
>>> +			 * after the return value
>>> +			 *
>>> +			 * Original: MMMMMM
>>> +			 * Filter:   MMMMMF filtered = 1
>>> +			 * Tx_burst: SSSUUF sent = 3 unsent = 2
>>> +			 *
>>> +			 * Return:   SSSFUU
>>> +			 *   return from tx_burst is 4
>>> +			 */
>>> +			memmove(tx_pkts + result_pkts, tx_pkts + sent_pkts,
>>> +				unsent_pkts * sizeof(struct rte_mbuf *));
>>> +
>>> +			/* Tx filter process does not drop the packets until
>> here */
>>> +			rte_pktmbuf_free_bulk(tx_pkts + sent_pkts,
>> filtered_pkts);
> 
> NAK.
> If the callback consumes the packet, the callback must free it too. The callback could consume the packet for other purposes, e.g. redirection, so freeing it here would introduce use-after-free bugs.

Good point as well. However, it still requires reordering which has
problem mentioned above.

> 
>>> +		}
>>> +
>>> +		rte_mbuf_history_mark_bulk(tx_pkts + result_pkts,
>> unsent_pkts,
>>> +					   RTE_MBUF_HISTORY_OP_TX_BUSY);
>>> +	}
>>>
>>> -	rte_ethdev_trace_tx_burst(port_id, queue_id, (void **)tx_pkts,
>> nb_pkts);
>>> -	return nb_pkts;
>>> +	rte_ethdev_trace_tx_burst(port_id, queue_id, (void **)tx_pkts,
>> result_pkts);
>>> +	return result_pkts;
>>>    }
>>>
>>>    /**
> 
> IMO, letting a Tx callback change the number of packets to actually transmit has too many side effects.
> E.g. port packet counters are not updated accordingly if a callback consumes a packet and silently drops it.
> 
> The Ethdev Tx API is not designed for the ethdev layer to handle any of retransmit-or-drop (when insufficient descriptors), redirect, mirror, drop (redirect/mirror/drop of all packets or selectively based on e.g. VLAN or BPF filter).
> 
> IMO, we should update the callback API to reflect this, by changing the callback return type to void.
> 


      reply	other threads:[~2025-11-10  9:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 23:54 Stephen Hemminger
2025-11-09 15:22 ` Andrew Rybchenko
2025-11-10  8:34   ` Morten Brørup
2025-11-10  9:02     ` Andrew Rybchenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1fd7889d-9187-4bf6-b7c0-863783bfc7ed@oktetlabs.ru \
    --to=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=john.mcnamara@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=mb@smartsharesystems.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).