From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Stephen Hemminger" <stephen@networkplumber.org>, <dev@dpdk.org>,
"Konstantin Ananyev" <konstantin.ananyev@huawei.com>
Cc: <bruce.richardson@intel.com>,
"Thomas Monjalon" <thomas@monjalon.net>,
"John McNamara" <john.mcnamara@intel.com>
Subject: RE: [RFC] ethdev: fix UAF in rte_eth_tx_burst with filtering
Date: Mon, 10 Nov 2025 09:34:53 +0100 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F65544@smartserver.smartshare.dk> (raw)
In-Reply-To: <11bf07b2-30fe-4935-a43c-fe64a37cc0b4@oktetlabs.ru>
+TO: Konstantin, regarding Stage Ordered Ring.
> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Sunday, 9 November 2025 16.23
>
> If I remember correctly we should avoid function names in summary.
>
> (I'm really surprised that callbacks and rte_eth_tx_burst() are
> supposed to reorder mbufs in tx_pkts array, but it looks like it
> the only sensible option.)
Reordering the packets is incompatible with the Stage Ordered Ring.
When the transmit function returns e.g. 2 non-consumed packets, the metadata will be associated with the 2 last packets in the original burst passed to the transmit function, not the 2 packets moved to the last position in the array.
>
> On 10/30/25 02:54, Stephen Hemminger wrote:
> > When packets are filtered by Tx callback, they return value
> > needs to include those packets; otherwise the application will
> > do a duplicate free of those mbufs.
> >
> > The example would be application that sent a burst of 8 packets,
> > and 2 were filtered but the PMD was busy and could not send any.
> > The result should be a return value of 2 and mbufs 0..1 should
> > be freed and 2..7 should be untouched and up to application to
> > resend or drop.
> >
> > Original buggy code would return 0 and 0..1 would be freed
> > 2..7 would be unused. Application would then try and drop
> > or resend an already freed mbuf.
> >
> > Bugzilla ID: 1816
> > Fixes: 4dc294158cac ("ethdev: support optional Rx and Tx callbacks")
> > Cc: bruce.richardson@intel.com
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > lib/ethdev/rte_ethdev.h | 47 ++++++++++++++++++++++++++++++--------
> ---
> > 1 file changed, 35 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index a66c2abbdb..d2cfe390f9 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -6714,6 +6714,10 @@ rte_eth_tx_burst(uint16_t port_id, uint16_t
> queue_id,
> > }
> > #endif
> >
> > + uint16_t requested_pkts = nb_pkts;
> > + uint16_t filtered_pkts = 0;
> > + rte_mbuf_history_mark_bulk(tx_pkts, nb_pkts,
> RTE_MBUF_HISTORY_OP_TX);
> > +
> > #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> > {
> > void *cb;
> > @@ -6727,22 +6731,41 @@ rte_eth_tx_burst(uint16_t port_id, uint16_t
> queue_id,
> > cb = rte_atomic_load_explicit(&p->txq.clbk[queue_id],
> > rte_memory_order_relaxed);
> > if (unlikely(cb != NULL))
> > - nb_pkts = rte_eth_call_tx_callbacks(port_id,
> queue_id,
> > - tx_pkts, nb_pkts, cb);
> > + filtered_pkts = nb_pkts -
> > + rte_eth_call_tx_callbacks(port_id, queue_id,
> tx_pkts, nb_pkts, cb);
> > }
> > #endif
> >
> > - uint16_t requested_pkts = nb_pkts;
> > - rte_mbuf_history_mark_bulk(tx_pkts, nb_pkts,
> RTE_MBUF_HISTORY_OP_TX);
> > -
> > - nb_pkts = p->tx_pkt_burst(qd, tx_pkts, nb_pkts);
> > -
> > - if (requested_pkts > nb_pkts)
> > - rte_mbuf_history_mark_bulk(tx_pkts + nb_pkts,
> > - requested_pkts - nb_pkts,
> RTE_MBUF_HISTORY_OP_TX_BUSY);
> > + uint16_t sent_pkts = p->tx_pkt_burst(qd, tx_pkts, nb_pkts -
> filtered_pkts);
> > + uint16_t result_pkts = sent_pkts + filtered_pkts;
> > +
> > + if (unlikely(result_pkts < requested_pkts)) {
>
> First of all unlikely() is wrong here. It is not an error path. It is
> OK
> when not everything is sent and it should not cost too much as unlikely
> does.
> Second, I'd remove the condition completely. It is not a big deal to
> calculate unsent_pkts and call trace (if enabled) with zero
> unsent_pkts.
>
> > + uint16_t unsent_pkts = requested_pkts - result_pkts;
> > +
> > + if (unlikely(filtered_pkts > 0)) {
>
> Same here for unlikely().
> Also since filtered_pkts appear in RTE_ETHDEV_RXTX_CALLBACKS case,
> I'd put it under corresponding conditional compilation to be
> consistent.
> Yes, code with conditional compilation is hard to read, but consistency
> is more important here.
>
> > + /* Need to reshuffle packet list so that unsent
> packets are
> > + * after the return value
> > + *
> > + * Original: MMMMMM
> > + * Filter: MMMMMF filtered = 1
> > + * Tx_burst: SSSUUF sent = 3 unsent = 2
> > + *
> > + * Return: SSSFUU
> > + * return from tx_burst is 4
> > + */
> > + memmove(tx_pkts + result_pkts, tx_pkts + sent_pkts,
> > + unsent_pkts * sizeof(struct rte_mbuf *));
> > +
> > + /* Tx filter process does not drop the packets until
> here */
> > + rte_pktmbuf_free_bulk(tx_pkts + sent_pkts,
> filtered_pkts);
NAK.
If the callback consumes the packet, the callback must free it too. The callback could consume the packet for other purposes, e.g. redirection, so freeing it here would introduce use-after-free bugs.
> > + }
> > +
> > + rte_mbuf_history_mark_bulk(tx_pkts + result_pkts,
> unsent_pkts,
> > + RTE_MBUF_HISTORY_OP_TX_BUSY);
> > + }
> >
> > - rte_ethdev_trace_tx_burst(port_id, queue_id, (void **)tx_pkts,
> nb_pkts);
> > - return nb_pkts;
> > + rte_ethdev_trace_tx_burst(port_id, queue_id, (void **)tx_pkts,
> result_pkts);
> > + return result_pkts;
> > }
> >
> > /**
IMO, letting a Tx callback change the number of packets to actually transmit has too many side effects.
E.g. port packet counters are not updated accordingly if a callback consumes a packet and silently drops it.
The Ethdev Tx API is not designed for the ethdev layer to handle any of retransmit-or-drop (when insufficient descriptors), redirect, mirror, drop (redirect/mirror/drop of all packets or selectively based on e.g. VLAN or BPF filter).
IMO, we should update the callback API to reflect this, by changing the callback return type to void.
next prev parent reply other threads:[~2025-11-10 8:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 23:54 Stephen Hemminger
2025-11-09 15:22 ` Andrew Rybchenko
2025-11-10 8:34 ` Morten Brørup [this message]
2025-11-10 9:02 ` Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35F65544@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=john.mcnamara@intel.com \
--cc=konstantin.ananyev@huawei.com \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).