* [PATCH] net/gve: fix refill logic causing memory corruption
@ 2024-10-04 1:05 Joshua Washington
2024-10-08 0:46 ` Ferruh Yigit
0 siblings, 1 reply; 2+ messages in thread
From: Joshua Washington @ 2024-10-04 1:05 UTC (permalink / raw)
To: Jeroen de Borst, Rushil Gupta, Joshua Washington, Junfeng Guo
Cc: dev, stable, Ferruh Yigit, Praveen Kaligineedi
There is a seemingly mundane error in the RX refill path which can lead
to major issues and ultimately program crashing.
This error occurs as part of an edge case where the exact number of
buffers the refill causes the ring to wrap around to 0. The current
refill logic is split into two conditions: first, when the number of
buffers to refill is greater than the number of buffers left in the ring
before wraparound occurs; second, when the opposite is true, and there
are enough buffers before wraparound to refill all buffers.
In this edge case, the first condition erroneously uses a (<) condition
to decide whether to wrap around, when it should have been (<=). In that
case, the second condition would run and the tail pointer would be set
to an invalid value (RING_SIZE). This causes a number of cascading
failures.
1. The first issue rather mundane in that rxq->bufq_tail == RING_SIZE at
the end of the refill, this will correct itself on the next refill
without any sort of memory leak or courrption;
2. The second failure is that the head pointer would end up overrunning
the tail because the last buffer that is refilled is refilled at
sw_ring[RING_SIZE] instead of sw_ring[0]. This would cause the driver
to give the application a stale mbuf, one that has been potentially
freed or is otherwise stale;
3. The third failure comes from the fact that the software ring is being
overrun. Because we directly use the sw_ring pointer to refill
buffers, when sw_ring[RING_SIZE] is filled, a buffer overflow occurs.
The overwritten data has the potential to be important data, and this
can potentially cause the program to crash outright.
This patch fixes the refill bug while greatly simplifying the logic so
that it is much less error-prone.
Fixes: 45da16b5b181 ("net/gve: support basic Rx data path for DQO")
Cc: junfeng.guo@intel.com
Cc: stable@dpdk.org
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
---
drivers/net/gve/gve_rx_dqo.c | 62 ++++++++++--------------------------
1 file changed, 16 insertions(+), 46 deletions(-)
diff --git a/drivers/net/gve/gve_rx_dqo.c b/drivers/net/gve/gve_rx_dqo.c
index e4084bc0dd..5371bab77d 100644
--- a/drivers/net/gve/gve_rx_dqo.c
+++ b/drivers/net/gve/gve_rx_dqo.c
@@ -11,66 +11,36 @@
static inline void
gve_rx_refill_dqo(struct gve_rx_queue *rxq)
{
- volatile struct gve_rx_desc_dqo *rx_buf_ring;
volatile struct gve_rx_desc_dqo *rx_buf_desc;
struct rte_mbuf *nmb[rxq->nb_rx_hold];
uint16_t nb_refill = rxq->nb_rx_hold;
- uint16_t nb_desc = rxq->nb_rx_desc;
uint16_t next_avail = rxq->bufq_tail;
struct rte_eth_dev *dev;
uint64_t dma_addr;
- uint16_t delta;
int i;
if (rxq->nb_rx_hold < rxq->free_thresh)
return;
- rx_buf_ring = rxq->rx_ring;
- delta = nb_desc - next_avail;
- if (unlikely(delta < nb_refill)) {
- if (likely(rte_pktmbuf_alloc_bulk(rxq->mpool, nmb, delta) == 0)) {
- for (i = 0; i < delta; i++) {
- rx_buf_desc = &rx_buf_ring[next_avail + i];
- rxq->sw_ring[next_avail + i] = nmb[i];
- dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb[i]));
- rx_buf_desc->header_buf_addr = 0;
- rx_buf_desc->buf_addr = dma_addr;
- }
- nb_refill -= delta;
- next_avail = 0;
- rxq->nb_rx_hold -= delta;
- } else {
- rxq->stats.no_mbufs_bulk++;
- rxq->stats.no_mbufs += nb_desc - next_avail;
- dev = &rte_eth_devices[rxq->port_id];
- dev->data->rx_mbuf_alloc_failed += nb_desc - next_avail;
- PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u queue_id=%u",
- rxq->port_id, rxq->queue_id);
- return;
- }
+ if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mpool, nmb, nb_refill))) {
+ rxq->stats.no_mbufs_bulk++;
+ rxq->stats.no_mbufs += nb_refill;
+ dev = &rte_eth_devices[rxq->port_id];
+ dev->data->rx_mbuf_alloc_failed += nb_refill;
+ PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u queue_id=%u",
+ rxq->port_id, rxq->queue_id);
+ return;
}
- if (nb_desc - next_avail >= nb_refill) {
- if (likely(rte_pktmbuf_alloc_bulk(rxq->mpool, nmb, nb_refill) == 0)) {
- for (i = 0; i < nb_refill; i++) {
- rx_buf_desc = &rx_buf_ring[next_avail + i];
- rxq->sw_ring[next_avail + i] = nmb[i];
- dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb[i]));
- rx_buf_desc->header_buf_addr = 0;
- rx_buf_desc->buf_addr = dma_addr;
- }
- next_avail += nb_refill;
- rxq->nb_rx_hold -= nb_refill;
- } else {
- rxq->stats.no_mbufs_bulk++;
- rxq->stats.no_mbufs += nb_desc - next_avail;
- dev = &rte_eth_devices[rxq->port_id];
- dev->data->rx_mbuf_alloc_failed += nb_desc - next_avail;
- PMD_DRV_LOG(DEBUG, "RX mbuf alloc failed port_id=%u queue_id=%u",
- rxq->port_id, rxq->queue_id);
- }
+ for (i = 0; i < nb_refill; i++) {
+ rx_buf_desc = &rxq->rx_ring[next_avail];
+ rxq->sw_ring[next_avail] = nmb[i];
+ dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb[i]));
+ rx_buf_desc->header_buf_addr = 0;
+ rx_buf_desc->buf_addr = dma_addr;
+ next_avail = (next_avail + 1) & (rxq->nb_rx_desc - 1);
}
-
+ rxq->nb_rx_hold -= nb_refill;
rte_write32(next_avail, rxq->qrx_tail);
rxq->bufq_tail = next_avail;
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] net/gve: fix refill logic causing memory corruption
2024-10-04 1:05 [PATCH] net/gve: fix refill logic causing memory corruption Joshua Washington
@ 2024-10-08 0:46 ` Ferruh Yigit
0 siblings, 0 replies; 2+ messages in thread
From: Ferruh Yigit @ 2024-10-08 0:46 UTC (permalink / raw)
To: Joshua Washington, Jeroen de Borst, Rushil Gupta, Junfeng Guo
Cc: dev, stable, Praveen Kaligineedi
On 10/4/2024 2:05 AM, Joshua Washington wrote:
> There is a seemingly mundane error in the RX refill path which can lead
> to major issues and ultimately program crashing.
>
> This error occurs as part of an edge case where the exact number of
> buffers the refill causes the ring to wrap around to 0. The current
> refill logic is split into two conditions: first, when the number of
> buffers to refill is greater than the number of buffers left in the ring
> before wraparound occurs; second, when the opposite is true, and there
> are enough buffers before wraparound to refill all buffers.
>
> In this edge case, the first condition erroneously uses a (<) condition
> to decide whether to wrap around, when it should have been (<=). In that
> case, the second condition would run and the tail pointer would be set
> to an invalid value (RING_SIZE). This causes a number of cascading
> failures.
>
> 1. The first issue rather mundane in that rxq->bufq_tail == RING_SIZE at
> the end of the refill, this will correct itself on the next refill
> without any sort of memory leak or courrption;
> 2. The second failure is that the head pointer would end up overrunning
> the tail because the last buffer that is refilled is refilled at
> sw_ring[RING_SIZE] instead of sw_ring[0]. This would cause the driver
> to give the application a stale mbuf, one that has been potentially
> freed or is otherwise stale;
> 3. The third failure comes from the fact that the software ring is being
> overrun. Because we directly use the sw_ring pointer to refill
> buffers, when sw_ring[RING_SIZE] is filled, a buffer overflow occurs.
> The overwritten data has the potential to be important data, and this
> can potentially cause the program to crash outright.
>
> This patch fixes the refill bug while greatly simplifying the logic so
> that it is much less error-prone.
>
> Fixes: 45da16b5b181 ("net/gve: support basic Rx data path for DQO")
> Cc: junfeng.guo@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Joshua Washington <joshwash@google.com>
> Reviewed-by: Rushil Gupta <rushilg@google.com>
> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
>
Applied to dpdk-next-net/main, thanks.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-10-08 0:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-04 1:05 [PATCH] net/gve: fix refill logic causing memory corruption Joshua Washington
2024-10-08 0:46 ` Ferruh Yigit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).