[PATCH v4] net/ixgbe: add proper memory barriers for some Rx functions

patches for DPDK stable branches
 help / color / mirror / Atom feed

From: Min Zhou <zhoumin@loongson.cn>
To: thomas@monjalon.net, qi.z.zhang@intel.com,
	mb@smartsharesystems.com, konstantin.v.ananyev@yandex.ru,
	ruifeng.wang@arm.com, drc@linux.vnet.ibm.com,
	roretzla@linux.microsoft.com, qiming.yang@intel.com,
	wenjun1.wu@intel.com, zhoumin@loongson.cn
Cc: dev@dpdk.org, stable@dpdk.org, maobibo@loongson.cn,
	jiawenwu@trustnetic.com
Subject: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx functions
Date: Tue, 13 Jun 2023 17:44:25 +0800	[thread overview]
Message-ID: <20230613094425.2469036-1-zhoumin@loongson.cn> (raw)
In-Reply-To: <20230506102359.243462-1-zhoumin@loongson.cn>

Segmentation fault has been observed while running the
ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000
processor which has 64 cores and 4 NUMA nodes.

From the ixgbe_recv_pkts_lro() function, we found that as long as the first
packet has the EOP bit set, and the length of this packet is less than or
equal to rxq->crc_len, the segmentation fault will definitely happen even
though on the other platforms. For example, if we made the first packet
which had the EOP bit set had a zero length by force, the segmentation
fault would happen on X86.

Because when processd the first packet the first_seg->next will be NULL, if
at the same time this packet has the EOP bit set and its length is less
than or equal to rxq->crc_len, the following loop will be executed:

    for (lp = first_seg; lp->next != rxm; lp = lp->next)
        ;

We know that the first_seg->next will be NULL under this condition. So the
expression of lp->next->next will cause the segmentation fault.

Normally, the length of the first packet with EOP bit set will be greater
than rxq->crc_len. However, the out-of-order execution of CPU may make the
read ordering of the status and the rest of the descriptor fields in this
function not be correct. The related codes are as following:

        rxdp = &rx_ring[rx_id];
 #1     staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);

        if (!(staterr & IXGBE_RXDADV_STAT_DD))
            break;

 #2     rxd = *rxdp;

The sentence #2 may be executed before sentence #1. This action is likely
to make the ready packet zero length. If the packet is the first packet and
has the EOP bit set, the above segmentation fault will happen.

So, we should add a proper memory barrier to ensure the read ordering be
correct. We also did the same thing in the ixgbe_recv_pkts() function to
make the rxd data be valid even though we did not find segmentation fault
in this function.

Fixes: 8eecb3295ae ("ixgbe: add LRO support")
Cc: stable@dpdk.org

Signed-off-by: Min Zhou <zhoumin@loongson.cn>
---
v4:
- Replace rte_smp_rmb() with rte_atomic_thread_fence() as the proper memory
  barrier
---
v3:
- Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb()
---
v2:
- Make the calling of rte_rmb() for all platforms
---

 drivers/net/ixgbe/ixgbe_rxtx.c | 47 +++++++++++++++-------------------
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 6cbb992823..61f17cd90b 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1817,11 +1817,22 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		 * of accesses cannot be reordered by the compiler. If they were
 		 * not volatile, they could be reordered which could lead to
 		 * using invalid descriptor fields when read from rxd.
+		 *
+		 * Meanwhile, to prevent the CPU from executing out of order, we
+		 * need to use a proper memory barrier to ensure the memory
+		 * ordering below.
 		 */
 		rxdp = &rx_ring[rx_id];
 		staterr = rxdp->wb.upper.status_error;
 		if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
 			break;
+
+		/*
+		 * Use acquire fence to ensure that status_error which includes
+		 * DD bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		rxd = *rxdp;
 
 		/*
@@ -2088,32 +2099,10 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
 
 next_desc:
 		/*
-		 * The code in this whole file uses the volatile pointer to
-		 * ensure the read ordering of the status and the rest of the
-		 * descriptor fields (on the compiler level only!!!). This is so
-		 * UGLY - why not to just use the compiler barrier instead? DPDK
-		 * even has the rte_compiler_barrier() for that.
-		 *
-		 * But most importantly this is just wrong because this doesn't
-		 * ensure memory ordering in a general case at all. For
-		 * instance, DPDK is supposed to work on Power CPUs where
-		 * compiler barrier may just not be enough!
-		 *
-		 * I tried to write only this function properly to have a
-		 * starting point (as a part of an LRO/RSC series) but the
-		 * compiler cursed at me when I tried to cast away the
-		 * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
-		 * keeping it the way it is for now.
-		 *
-		 * The code in this file is broken in so many other places and
-		 * will just not work on a big endian CPU anyway therefore the
-		 * lines below will have to be revisited together with the rest
-		 * of the ixgbe PMD.
-		 *
-		 * TODO:
-		 *    - Get rid of "volatile" and let the compiler do its job.
-		 *    - Use the proper memory barrier (rte_rmb()) to ensure the
-		 *      memory ordering below.
+		 * "Volatile" only prevents caching of the variable marked
+		 * volatile. Most important, "volatile" cannot prevent the CPU
+		 * from executing out of order. So, it is necessary to use a
+		 * proper memory barrier to ensure the memory ordering below.
 		 */
 		rxdp = &rx_ring[rx_id];
 		staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
@@ -2121,6 +2110,12 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
 		if (!(staterr & IXGBE_RXDADV_STAT_DD))
 			break;
 
+		/*
+		 * Use acquire fence to ensure that status_error which includes
+		 * DD bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		rxd = *rxdp;
 
 		PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
-- 
2.31.1

next prev parent reply	other threads:[~2023-06-13  9:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230424090532.367194-1-zhoumin@loongson.cn>
2023-05-06 10:23 ` [PATCH v3] " Min Zhou
2023-05-08  6:03   ` Ruifeng Wang
2023-05-15  2:10     ` Zhang, Qi Z
2023-06-12 10:26       ` Thomas Monjalon
2023-06-12 11:58         ` zhoumin
2023-06-12 12:44           ` Thomas Monjalon
2023-06-13  1:42             ` zhoumin
2023-06-13  3:30               ` Jiawen Wu
2023-06-13 10:12                 ` zhoumin
2023-06-14 10:58               ` Konstantin Ananyev
2023-06-13  9:25             ` Ruifeng Wang
2023-06-20 15:52               ` Thomas Monjalon
2023-06-21  6:50                 ` Ruifeng Wang
2023-06-13  9:44   ` Min Zhou [this message]
2023-06-13 10:20     ` [PATCH v4] " Ruifeng Wang
2023-06-13 12:11       ` Zhang, Qi Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230613094425.2469036-1-zhoumin@loongson.cn \
    --to=zhoumin@loongson.cn \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=maobibo@loongson.cn \
    --cc=mb@smartsharesystems.com \
    --cc=qi.z.zhang@intel.com \
    --cc=qiming.yang@intel.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=ruifeng.wang@arm.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    --cc=wenjun1.wu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).