From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Mihai Brodschi <mihai.brodschi@broadcom.com>,
Jakub Grajciar <jgrajcia@cisco.com>
Cc: dev@dpdk.org, stable@dpdk.org
Subject: Re: [PATCH v2] net/memif: fix buffer overflow in zero copy Rx
Date: Sun, 7 Jul 2024 15:05:35 +0100 [thread overview]
Message-ID: <ffa4241c-342e-499a-ba21-cde5111018b1@amd.com> (raw)
In-Reply-To: <8bf5e505-191b-46ea-8b90-0ed4fc15d306@broadcom.com>
On 7/7/2024 6:50 AM, Mihai Brodschi wrote:
> Hi Ferruh,
>
> On 07/07/2024 05:12, Ferruh Yigit wrote:
>> On 6/28/2024 10:01 PM, Mihai Brodschi wrote:
>>> rte_pktmbuf_alloc_bulk is called by the zero-copy receiver to allocate
>>> new mbufs to be provided to the sender. The allocated mbuf pointers
>>> are stored in a ring, but the alloc function doesn't implement index
>>> wrap-around, so it writes past the end of the array. This results in
>>> memory corruption and duplicate mbufs being received.
>>>
>>
>> Hi Mihai,
>>
>> I am not sure writing past the ring actually occurs.
>>
>> As far as I can see is to keep the ring full as much as possible, when
>> initially 'head' and 'tail' are 0, it fills all ring.
>> Later tails moves and emptied space filled again. So head (in modulo) is
>> always just behind tail after refill. In next run, refill will only fill
>> the part tail moved, and this is calculated by 'n_slots'. As this is
>> only the size of the gap, starting from 'head' (with modulo) shouldn't
>> pass the ring length.
>>
>> Do you observe this issue practically? If so can you please provide your
>> backtrace and numbers that is showing how to reproduce the issue?
>
> The alloc function writes starting from the ring's head, but the ring's
> head can be located at the end of the ring's memory buffer (ring_size - 1).
> The correct behavior would be to wrap around to the start of the buffer (0),
> but the alloc function has no awareness of the fact that it's writing to a
> ring, so it writes to ring_size, ring_size + 1, etc.
>
> Let's look at the existing code:
> We assume the ring size is 256 and we just received 32 packets.
> The previous tail was at index 255, now it's at index 31.
> The head is initially at index 255.
>
> head = __atomic_load_n(&ring->head, __ATOMIC_RELAXED); // head = 255
> n_slots = ring_size - head + mq->last_tail; // n_slots = 32
>
> if (n_slots < 32) // not taken
> goto no_free_mbufs;
>
> ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots);
> // This will write 32 mbuf pointers starting at index (head & mask) = 255.
> // The ring size is 256, so apart from the first one all pointers will be
> // written out of bounds (index 256 .. 286, when it should be 0 .. 30).
>
My expectation is numbers should be like following:
Initially:
size = 256
head = 0
tail = 0
In first refill:
n_slots = 256
head = 256
tail = 0
Subsequent run that 32 slots used:
head = 256
tail = 32
n_slots = 32
rte_pktmbuf_alloc_bulk(mq, buf[head & mask], n_slots);
head & mask = 0
// So it fills first 32 elements of buffer, which is inbound
This will continue as above, combination of only gap filled and head
masked with 'mask' provides the wrapping required.
> I can reproduce a crash 100% of the time with my application, but the output
> is not very helpful, since it crashes elsewhere because of mempool corruption.
> Applying this patch fixes the crashes completely.
>
This causing always reproducible crash means existing memif zero copy Rx
is broken and nobody can use it, but I am suspicions that this is the
case, perhaps something special in your usecase triggering this issue.
@Jakup, can you please confirm that memif Rx zero copy is tested?
>>> Allocate 2x the space for the mbuf ring, so that the alloc function
>>> has a contiguous array to write to, then copy the excess entries
>>> to the start of the array.
>>>
>>
>> Even issue is valid, I am not sure about solution to double to buffer
>> memory, but lets confirm the issue first before discussing the solution.
>
> Initially, I thought about splitting the call to rte_pktmbuf_alloc_bulk in two,
> but I thought that might be bad for performance if the mempool is being used
> concurrently from multiple threads.
>
> If we want to use only one call to rte_pktmbuf_alloc_bulk, we need an array
> to store the allocated mbuf pointers. This array must be of length ring_size,
> since that's the maximum amount of mbufs which may be allocated in one go.
> We need to copy the pointers from this array to the ring.
>
> If we instead allocate twice the space for the ring, we can skip copying
> the pointers which were written to the ring, and only copy those that were
> written outside of its bounds.
>
First thing comes my mind was also using two 'rte_pktmbuf_alloc_bulk()'
calls.
I can see why you prefer doubling the buffer size, but it comes with
copying overhead.
So both options comes with some overhead, not sure which one is better,
although I am leaning to the first solution we should do some
measurements to decide.
BUT first lets agree on the problem first, before doing more work, I am
not still fully convinced that original code is wrong.
>>> Fixes: 43b815d88188 ("net/memif: support zero-copy slave")
>>> Cc: stable@dpdk.org
>>> Signed-off-by: Mihai Brodschi <mihai.brodschi@broadcom.com>
>>> ---
>>> v2:
>>> - fix email formatting
>>>
>>> ---
>>> drivers/net/memif/rte_eth_memif.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
>>> index 16da22b5c6..3491c53cf1 100644
>>> --- a/drivers/net/memif/rte_eth_memif.c
>>> +++ b/drivers/net/memif/rte_eth_memif.c
>>> @@ -600,6 +600,10 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>> ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots);
>>> if (unlikely(ret < 0))
>>> goto no_free_mbufs;
>>> + if (unlikely(n_slots > ring_size - (head & mask))) {
>>> + rte_memcpy(mq->buffers, &mq->buffers[ring_size],
>>> + (n_slots + (head & mask) - ring_size) * sizeof(struct rte_mbuf *));
>>> + }
>>>
>>> while (n_slots--) {
>>> s0 = head++ & mask;
>>> @@ -1245,8 +1249,12 @@ memif_init_queues(struct rte_eth_dev *dev)
>>> }
>>> mq->buffers = NULL;
>>> if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) {
>>> + /*
>>> + * Allocate 2x ring_size to reserve a contiguous array for
>>> + * rte_pktmbuf_alloc_bulk (to store allocated mbufs).
>>> + */
>>> mq->buffers = rte_zmalloc("bufs", sizeof(struct rte_mbuf *) *
>>> - (1 << mq->log2_ring_size), 0);
>>> + (1 << (mq->log2_ring_size + 1)), 0);
>>> if (mq->buffers == NULL)
>>> return -ENOMEM;
>>> }
>>
>
> Apologies for sending this multiple times, I'm not familiar with mailing lists.
>
>
next prev parent reply other threads:[~2024-07-07 14:05 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-28 21:01 Mihai Brodschi
2024-07-01 4:57 ` Patrick Robb
2024-07-07 2:12 ` Ferruh Yigit
2024-07-07 5:50 ` Mihai Brodschi
2024-07-07 14:05 ` Ferruh Yigit [this message]
2024-07-07 15:18 ` Mihai Brodschi
2024-07-07 18:46 ` Mihai Brodschi
2024-07-08 3:39 ` Mihai Brodschi
2024-07-08 11:45 ` Ferruh Yigit
2024-07-19 9:03 ` Ferruh Yigit
2024-08-31 13:38 ` Mihai Brodschi
2024-10-10 2:00 ` Ferruh Yigit
2024-10-10 2:33 ` Ferruh Yigit
2024-07-07 5:31 Mihai Brodschi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ffa4241c-342e-499a-ba21-cde5111018b1@amd.com \
--to=ferruh.yigit@amd.com \
--cc=dev@dpdk.org \
--cc=jgrajcia@cisco.com \
--cc=mihai.brodschi@broadcom.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).