* [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions @ 2023-04-24 9:05 Min Zhou 2023-04-28 3:43 ` Zhang, Qi Z ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Min Zhou @ 2023-04-24 9:05 UTC (permalink / raw) To: qiming.yang, wenjun1.wu, zhoumin; +Cc: dev, maobibo Segmentation fault has been observed while running the ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which has 64 cores and 4 NUMA nodes. From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the segmentation fault will definitely happen even though on the other platforms, such as X86. Because when processd the first packet the first_seg->next will be NULL, if at the same time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, the following loop will be excecuted: for (lp = first_seg; lp->next != rxm; lp = lp->next) ; We know that the first_seg->next will be NULL under this condition. So the expression of lp->next->next will cause the segmentation fault. Normally, the length of the first packet with EOP bit set will be greater than rxq->crc_len. However, the out-of-order execution of CPU may make the read ordering of the status and the rest of the descriptor fields in this function not be correct. The related codes are as following: rxdp = &rx_ring[rx_id]; #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; #2 rxd = *rxdp; The sentence #2 may be executed before sentence #1. This action is likely to make the ready packet zero length. If the packet is the first packet and has the EOP bit set, the above segmentation fault will happen. So, we should add rte_rmb() to ensure the read ordering be correct. We also did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even thougth we did not find segmentation fault in this function. Signed-off-by: Min Zhou <zhoumin@loongson.cn> --- v2: - Make the calling of rte_rmb() for all platforms --- drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index c9d6ca9efe..302a5ab7ff 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, staterr = rxdp->wb.upper.status_error; if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) break; + + rte_rmb(); rxd = *rxdp; /* @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts, if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; + rte_rmb(); rxd = *rxdp; PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " -- 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-04-24 9:05 [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions Min Zhou @ 2023-04-28 3:43 ` Zhang, Qi Z 2023-04-28 6:27 ` Morten Brørup 2023-05-04 12:42 ` zhoumin 2023-05-01 13:29 ` Konstantin Ananyev 2023-05-06 10:23 ` [PATCH v3] " Min Zhou 2 siblings, 2 replies; 30+ messages in thread From: Zhang, Qi Z @ 2023-04-28 3:43 UTC (permalink / raw) To: Min Zhou, Yang, Qiming, Wu, Wenjun1; +Cc: dev, maobibo > -----Original Message----- > From: Min Zhou <zhoumin@loongson.cn> > Sent: Monday, April 24, 2023 5:06 PM > To: Yang, Qiming <qiming.yang@intel.com>; Wu, Wenjun1 > <wenjun1.wu@intel.com>; zhoumin@loongson.cn > Cc: dev@dpdk.org; maobibo@loongson.cn > Subject: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx > functions > > Segmentation fault has been observed while running the > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 > processor which has 64 cores and 4 NUMA nodes. > > From the ixgbe_recv_pkts_lro() function, we found that as long as the first > packet has the EOP bit set, and the length of this packet is less than or equal > to rxq->crc_len, the segmentation fault will definitely happen even though > on the other platforms, such as X86. > > Because when processd the first packet the first_seg->next will be NULL, if at > the same time this packet has the EOP bit set and its length is less than or > equal to rxq->crc_len, the following loop will be excecuted: > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > ; > > We know that the first_seg->next will be NULL under this condition. So the > expression of lp->next->next will cause the segmentation fault. > > Normally, the length of the first packet with EOP bit set will be greater than > rxq->crc_len. However, the out-of-order execution of CPU may make the > read ordering of the status and the rest of the descriptor fields in this > function not be correct. The related codes are as following: > > rxdp = &rx_ring[rx_id]; > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > #2 rxd = *rxdp; > > The sentence #2 may be executed before sentence #1. This action is likely to > make the ready packet zero length. If the packet is the first packet and has > the EOP bit set, the above segmentation fault will happen. > > So, we should add rte_rmb() to ensure the read ordering be correct. We also > did the same thing in the ixgbe_recv_pkts() function to make the rxd data be > valid even thougth we did not find segmentation fault in this function. > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > --- > v2: > - Make the calling of rte_rmb() for all platforms > --- > drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > index c9d6ca9efe..302a5ab7ff 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf > **rx_pkts, > staterr = rxdp->wb.upper.status_error; > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > break; > + > + rte_rmb(); So "volatile" does not prevent re-order with Loongson compiler? > rxd = *rxdp; > > /* > @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct > rte_mbuf **rx_pkts, uint16_t nb_pkts, > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > + rte_rmb(); > rxd = *rxdp; > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > -- > 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-04-28 3:43 ` Zhang, Qi Z @ 2023-04-28 6:27 ` Morten Brørup 2023-05-04 12:58 ` zhoumin 2023-05-04 12:42 ` zhoumin 1 sibling, 1 reply; 30+ messages in thread From: Morten Brørup @ 2023-04-28 6:27 UTC (permalink / raw) To: Zhang, Qi Z, Min Zhou, Yang, Qiming, Wu, Wenjun1; +Cc: dev, maobibo > From: Zhang, Qi Z [mailto:qi.z.zhang@intel.com] > Sent: Friday, 28 April 2023 05.44 > > > From: Min Zhou <zhoumin@loongson.cn> > > Sent: Monday, April 24, 2023 5:06 PM > > > > Segmentation fault has been observed while running the > > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 > > processor which has 64 cores and 4 NUMA nodes. > > > > From the ixgbe_recv_pkts_lro() function, we found that as long as the first > > packet has the EOP bit set, and the length of this packet is less than or > equal > > to rxq->crc_len, the segmentation fault will definitely happen even though > > on the other platforms, such as X86. > > > > Because when processd the first packet the first_seg->next will be NULL, if > at > > the same time this packet has the EOP bit set and its length is less than or > > equal to rxq->crc_len, the following loop will be excecuted: > > > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > > ; > > > > We know that the first_seg->next will be NULL under this condition. So the > > expression of lp->next->next will cause the segmentation fault. > > > > Normally, the length of the first packet with EOP bit set will be greater > than > > rxq->crc_len. However, the out-of-order execution of CPU may make the > > read ordering of the status and the rest of the descriptor fields in this > > function not be correct. The related codes are as following: > > > > rxdp = &rx_ring[rx_id]; > > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > #2 rxd = *rxdp; > > > > The sentence #2 may be executed before sentence #1. This action is likely to > > make the ready packet zero length. If the packet is the first packet and has > > the EOP bit set, the above segmentation fault will happen. > > > > So, we should add rte_rmb() to ensure the read ordering be correct. We also > > did the same thing in the ixgbe_recv_pkts() function to make the rxd data be > > valid even thougth we did not find segmentation fault in this function. > > > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > > --- > > v2: > > - Make the calling of rte_rmb() for all platforms > > --- > > drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > > index c9d6ca9efe..302a5ab7ff 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf > > **rx_pkts, > > staterr = rxdp->wb.upper.status_error; > > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > > break; > > + > > + rte_rmb(); > > So "volatile" does not prevent re-order with Loongson compiler? "Volatile" does not prevent re-ordering on any compiler. "Volatile" only prevents caching of the variable marked volatile. https://wiki.sei.cmu.edu/confluence/display/c/CON02-C.+Do+not+use+volatile+as+a+synchronization+primitive Thinking out loud: I don't know the performance cost of rte_rmb(); perhaps using atomic accesses with the optimal memory ordering would be a better solution in the long term. > > > > rxd = *rxdp; > > > > /* > > @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct > > rte_mbuf **rx_pkts, uint16_t nb_pkts, > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > + rte_rmb(); > > rxd = *rxdp; > > > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > > -- > > 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-04-28 6:27 ` Morten Brørup @ 2023-05-04 12:58 ` zhoumin 0 siblings, 0 replies; 30+ messages in thread From: zhoumin @ 2023-05-04 12:58 UTC (permalink / raw) To: Morten Brørup, Zhang, Qi Z, Yang, Qiming, Wu, Wenjun1; +Cc: dev, maobibo Hi Morten, Thanks for your comments. On Fri, Apr 28, 2023 at 2:27PM, Morten Brørup wrote: >> From: Zhang, Qi Z [mailto:qi.z.zhang@intel.com] >> Sent: Friday, 28 April 2023 05.44 >> >>> From: Min Zhou <zhoumin@loongson.cn> >>> Sent: Monday, April 24, 2023 5:06 PM >>> >>> Segmentation fault has been observed while running the >>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 >>> processor which has 64 cores and 4 NUMA nodes. >>> >>> From the ixgbe_recv_pkts_lro() function, we found that as long as the first >>> packet has the EOP bit set, and the length of this packet is less than or >> equal >>> to rxq->crc_len, the segmentation fault will definitely happen even though >>> on the other platforms, such as X86. >>> >>> Because when processd the first packet the first_seg->next will be NULL, if >> at >>> the same time this packet has the EOP bit set and its length is less than or >>> equal to rxq->crc_len, the following loop will be excecuted: >>> >>> for (lp = first_seg; lp->next != rxm; lp = lp->next) >>> ; >>> >>> We know that the first_seg->next will be NULL under this condition. So the >>> expression of lp->next->next will cause the segmentation fault. >>> >>> Normally, the length of the first packet with EOP bit set will be greater >> than >>> rxq->crc_len. However, the out-of-order execution of CPU may make the >>> read ordering of the status and the rest of the descriptor fields in this >>> function not be correct. The related codes are as following: >>> >>> rxdp = &rx_ring[rx_id]; >>> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >>> >>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>> break; >>> >>> #2 rxd = *rxdp; >>> >>> The sentence #2 may be executed before sentence #1. This action is likely to >>> make the ready packet zero length. If the packet is the first packet and has >>> the EOP bit set, the above segmentation fault will happen. >>> >>> So, we should add rte_rmb() to ensure the read ordering be correct. We also >>> did the same thing in the ixgbe_recv_pkts() function to make the rxd data be >>> valid even thougth we did not find segmentation fault in this function. >>> >>> Signed-off-by: Min Zhou <zhoumin@loongson.cn> >>> --- >>> v2: >>> - Make the calling of rte_rmb() for all platforms >>> --- >>> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >>> index c9d6ca9efe..302a5ab7ff 100644 >>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf >>> **rx_pkts, >>> staterr = rxdp->wb.upper.status_error; >>> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) >>> break; >>> + >>> + rte_rmb(); >> So "volatile" does not prevent re-order with Loongson compiler? > "Volatile" does not prevent re-ordering on any compiler. "Volatile" only prevents caching of the variable marked volatile. > > https://wiki.sei.cmu.edu/confluence/display/c/CON02-C.+Do+not+use+volatile+as+a+synchronization+primitive > > Thinking out loud: I don't know the performance cost of rte_rmb(); perhaps using atomic accesses with the optimal memory ordering would be a better solution in the long term. Yes, rte_rmb() probably had side effects on the performance. I will use a better solution to solve the problem in the V2 patch. >> >>> rxd = *rxdp; >>> >>> /* >>> @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct >>> rte_mbuf **rx_pkts, uint16_t nb_pkts, >>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>> break; >>> >>> + rte_rmb(); >>> rxd = *rxdp; >>> >>> PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " >>> -- >>> 2.31.1 Best regards, Min ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-04-28 3:43 ` Zhang, Qi Z 2023-04-28 6:27 ` Morten Brørup @ 2023-05-04 12:42 ` zhoumin 1 sibling, 0 replies; 30+ messages in thread From: zhoumin @ 2023-05-04 12:42 UTC (permalink / raw) To: Zhang, Qi Z, Yang, Qiming, Wu, Wenjun1; +Cc: dev, maobibo Hi Qi, Thanks for your review. On Fri, Apr 28, 2023 at 11:43AM, Zhang, Qi Z wrote: > >> -----Original Message----- >> From: Min Zhou <zhoumin@loongson.cn> >> Sent: Monday, April 24, 2023 5:06 PM >> To: Yang, Qiming <qiming.yang@intel.com>; Wu, Wenjun1 >> <wenjun1.wu@intel.com>; zhoumin@loongson.cn >> Cc: dev@dpdk.org; maobibo@loongson.cn >> Subject: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx >> functions >> >> Segmentation fault has been observed while running the >> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 >> processor which has 64 cores and 4 NUMA nodes. >> >> From the ixgbe_recv_pkts_lro() function, we found that as long as the first >> packet has the EOP bit set, and the length of this packet is less than or equal >> to rxq->crc_len, the segmentation fault will definitely happen even though >> on the other platforms, such as X86. >> >> Because when processd the first packet the first_seg->next will be NULL, if at >> the same time this packet has the EOP bit set and its length is less than or >> equal to rxq->crc_len, the following loop will be excecuted: >> >> for (lp = first_seg; lp->next != rxm; lp = lp->next) >> ; >> >> We know that the first_seg->next will be NULL under this condition. So the >> expression of lp->next->next will cause the segmentation fault. >> >> Normally, the length of the first packet with EOP bit set will be greater than >> rxq->crc_len. However, the out-of-order execution of CPU may make the >> read ordering of the status and the rest of the descriptor fields in this >> function not be correct. The related codes are as following: >> >> rxdp = &rx_ring[rx_id]; >> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >> >> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >> break; >> >> #2 rxd = *rxdp; >> >> The sentence #2 may be executed before sentence #1. This action is likely to >> make the ready packet zero length. If the packet is the first packet and has >> the EOP bit set, the above segmentation fault will happen. >> >> So, we should add rte_rmb() to ensure the read ordering be correct. We also >> did the same thing in the ixgbe_recv_pkts() function to make the rxd data be >> valid even thougth we did not find segmentation fault in this function. >> >> Signed-off-by: Min Zhou <zhoumin@loongson.cn> >> --- >> v2: >> - Make the calling of rte_rmb() for all platforms >> --- >> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >> index c9d6ca9efe..302a5ab7ff 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf >> **rx_pkts, >> staterr = rxdp->wb.upper.status_error; >> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) >> break; >> + >> + rte_rmb(); > So "volatile" does not prevent re-order with Loongson compiler? The memory consistency model of the LoongArch [1] uses the Weak Consistency model in which memory operations can be reordered. [1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN#overview-of-memory-consistency >> rxd = *rxdp; >> >> /* >> @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct >> rte_mbuf **rx_pkts, uint16_t nb_pkts, >> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >> break; >> >> + rte_rmb(); >> rxd = *rxdp; >> >> PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " >> -- >> 2.31.1 Best regards, Min ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-04-24 9:05 [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions Min Zhou 2023-04-28 3:43 ` Zhang, Qi Z @ 2023-05-01 13:29 ` Konstantin Ananyev 2023-05-04 6:13 ` Ruifeng Wang 2023-05-04 13:16 ` zhoumin 2023-05-06 10:23 ` [PATCH v3] " Min Zhou 2 siblings, 2 replies; 30+ messages in thread From: Konstantin Ananyev @ 2023-05-01 13:29 UTC (permalink / raw) To: zhoumin Cc: dev, maobibo, qiming.yang, wenjun1.wu, ruifeng.wang@arm.com >> Ruifeng Wang, drc > Segmentation fault has been observed while running the > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 > processor which has 64 cores and 4 NUMA nodes. > > From the ixgbe_recv_pkts_lro() function, we found that as long as the first > packet has the EOP bit set, and the length of this packet is less than or > equal to rxq->crc_len, the segmentation fault will definitely happen even > though on the other platforms, such as X86. > > Because when processd the first packet the first_seg->next will be NULL, if > at the same time this packet has the EOP bit set and its length is less > than or equal to rxq->crc_len, the following loop will be excecuted: > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > ; > > We know that the first_seg->next will be NULL under this condition. So the > expression of lp->next->next will cause the segmentation fault. > > Normally, the length of the first packet with EOP bit set will be greater > than rxq->crc_len. However, the out-of-order execution of CPU may make the > read ordering of the status and the rest of the descriptor fields in this > function not be correct. The related codes are as following: > > rxdp = &rx_ring[rx_id]; > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > #2 rxd = *rxdp; > > The sentence #2 may be executed before sentence #1. This action is likely > to make the ready packet zero length. If the packet is the first packet and > has the EOP bit set, the above segmentation fault will happen. > > So, we should add rte_rmb() to ensure the read ordering be correct. We also > did the same thing in the ixgbe_recv_pkts() function to make the rxd data > be valid even thougth we did not find segmentation fault in this function. > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > --- > v2: > - Make the calling of rte_rmb() for all platforms > --- > drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > index c9d6ca9efe..302a5ab7ff 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, > staterr = rxdp->wb.upper.status_error; > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > break; > + > + rte_rmb(); > rxd = *rxdp; Indeed, looks like a problem to me on systems with relaxed MO. Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. About a fix - looks right, but a bit excessive to me - as I understand all we need here is to prevent re-ordering by CPU itself. So rte_smp_rmb() seems enough here. Or might be just: staterr = __atomic_load_n(&rxdp->wb.upper.status_error, __ATOMIC_ACQUIRE); > /* > @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts, > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > + rte_rmb(); > rxd = *rxdp; > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > -- > 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-01 13:29 ` Konstantin Ananyev @ 2023-05-04 6:13 ` Ruifeng Wang 2023-05-05 1:45 ` zhoumin 2023-05-04 13:16 ` zhoumin 1 sibling, 1 reply; 30+ messages in thread From: Ruifeng Wang @ 2023-05-04 6:13 UTC (permalink / raw) To: Konstantin Ananyev, zhoumin Cc: dev, maobibo, qiming.yang, wenjun1.wu, drc, nd > -----Original Message----- > From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru> > Sent: Monday, May 1, 2023 9:29 PM > To: zhoumin@loongson.cn > Cc: dev@dpdk.org; maobibo@loongson.cn; qiming.yang@intel.com; wenjun1.wu@intel.com; > Ruifeng Wang <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com > Subject: Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions > > > Segmentation fault has been observed while running the > > ixgbe_recv_pkts_lro() function to receive packets on the Loongson > > 3C5000 processor which has 64 cores and 4 NUMA nodes. > > > > From the ixgbe_recv_pkts_lro() function, we found that as long as the > > first packet has the EOP bit set, and the length of this packet is > > less than or equal to rxq->crc_len, the segmentation fault will > > definitely happen even though on the other platforms, such as X86. > > > > Because when processd the first packet the first_seg->next will be > > NULL, if at the same time this packet has the EOP bit set and its > > length is less than or equal to rxq->crc_len, the following loop will be excecuted: > > > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > > ; > > > > We know that the first_seg->next will be NULL under this condition. So > > the expression of lp->next->next will cause the segmentation fault. > > > > Normally, the length of the first packet with EOP bit set will be > > greater than rxq->crc_len. However, the out-of-order execution of CPU > > may make the read ordering of the status and the rest of the > > descriptor fields in this function not be correct. The related codes are as following: > > > > rxdp = &rx_ring[rx_id]; > > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > #2 rxd = *rxdp; > > > > The sentence #2 may be executed before sentence #1. This action is > > likely to make the ready packet zero length. If the packet is the > > first packet and has the EOP bit set, the above segmentation fault will happen. > > > > So, we should add rte_rmb() to ensure the read ordering be correct. We > > also did the same thing in the ixgbe_recv_pkts() function to make the > > rxd data be valid even thougth we did not find segmentation fault in this function. > > > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> "Fixes" tag for backport. > > --- > > v2: > > - Make the calling of rte_rmb() for all platforms > > --- > > drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > > b/drivers/net/ixgbe/ixgbe_rxtx.c index c9d6ca9efe..302a5ab7ff 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, > > staterr = rxdp->wb.upper.status_error; > > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > > break; > > + > > + rte_rmb(); > > rxd = *rxdp; > > > > Indeed, looks like a problem to me on systems with relaxed MO. > Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. Thanks, Konstantin. > About a fix - looks right, but a bit excessive to me - as I understand all we need here is > to prevent re-ordering by CPU itself. > So rte_smp_rmb() seems enough here. Agree that rte_rmb() is excessive. rte_smp_rmb() or rte_atomic_thread_fence(__ATOMIC_ACQUIRE) is enough. And it is better to add a comment to justify the barrier. > Or might be just: > staterr = __atomic_load_n(&rxdp->wb.upper.status_error, __ATOMIC_ACQUIRE); > > > > /* > > @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts, With the proper barrier in place, I think the long comments at the beginning of this loop can be removed. > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > + rte_rmb(); > > rxd = *rxdp; > > > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > > -- > > 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-04 6:13 ` Ruifeng Wang @ 2023-05-05 1:45 ` zhoumin 0 siblings, 0 replies; 30+ messages in thread From: zhoumin @ 2023-05-05 1:45 UTC (permalink / raw) To: Ruifeng Wang, Konstantin Ananyev Cc: dev, maobibo, qiming.yang, wenjun1.wu, drc, nd Hi Ruifeng, Thanks for your review. On Thur, May 4, 2023 at 2:13PM, Ruifeng Wang wrote: >> -----Original Message----- >> From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru> >> Sent: Monday, May 1, 2023 9:29 PM >> To: zhoumin@loongson.cn >> Cc: dev@dpdk.org; maobibo@loongson.cn; qiming.yang@intel.com; wenjun1.wu@intel.com; >> Ruifeng Wang <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com >> Subject: Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions >> >>> Segmentation fault has been observed while running the >>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson >>> 3C5000 processor which has 64 cores and 4 NUMA nodes. >>> >>> From the ixgbe_recv_pkts_lro() function, we found that as long as the >>> first packet has the EOP bit set, and the length of this packet is >>> less than or equal to rxq->crc_len, the segmentation fault will >>> definitely happen even though on the other platforms, such as X86. >>> >>> Because when processd the first packet the first_seg->next will be >>> NULL, if at the same time this packet has the EOP bit set and its >>> length is less than or equal to rxq->crc_len, the following loop will be excecuted: >>> >>> for (lp = first_seg; lp->next != rxm; lp = lp->next) >>> ; >>> >>> We know that the first_seg->next will be NULL under this condition. So >>> the expression of lp->next->next will cause the segmentation fault. >>> >>> Normally, the length of the first packet with EOP bit set will be >>> greater than rxq->crc_len. However, the out-of-order execution of CPU >>> may make the read ordering of the status and the rest of the >>> descriptor fields in this function not be correct. The related codes are as following: >>> >>> rxdp = &rx_ring[rx_id]; >>> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >>> >>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>> break; >>> >>> #2 rxd = *rxdp; >>> >>> The sentence #2 may be executed before sentence #1. This action is >>> likely to make the ready packet zero length. If the packet is the >>> first packet and has the EOP bit set, the above segmentation fault will happen. >>> >>> So, we should add rte_rmb() to ensure the read ordering be correct. We >>> also did the same thing in the ixgbe_recv_pkts() function to make the >>> rxd data be valid even thougth we did not find segmentation fault in this function. >>> >>> Signed-off-by: Min Zhou <zhoumin@loongson.cn> > "Fixes" tag for backport. OK, I will add the "Fixes" tag in the V3 patch. > >>> --- >>> v2: >>> - Make the calling of rte_rmb() for all platforms >>> --- >>> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >>> b/drivers/net/ixgbe/ixgbe_rxtx.c index c9d6ca9efe..302a5ab7ff 100644 >>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, >>> staterr = rxdp->wb.upper.status_error; >>> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) >>> break; >>> + >>> + rte_rmb(); >>> rxd = *rxdp; >> >> >> Indeed, looks like a problem to me on systems with relaxed MO. >> Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. > Thanks, Konstantin. > >> About a fix - looks right, but a bit excessive to me - as I understand all we need here is >> to prevent re-ordering by CPU itself. >> So rte_smp_rmb() seems enough here. > Agree that rte_rmb() is excessive. > rte_smp_rmb() or rte_atomic_thread_fence(__ATOMIC_ACQUIRE) is enough. Thanks for your advice. I will compare the rte_smp_rmb(), __atomic_load_n() and rte_atomic_thread_fence() to choose a better one. > And it is better to add a comment to justify the barrier. OK, I will add a comment for this change. >> Or might be just: >> staterr = __atomic_load_n(&rxdp->wb.upper.status_error, __ATOMIC_ACQUIRE); >> >> >>> /* >>> @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, >> uint16_t nb_pkts, > With the proper barrier in place, I think the long comments at the beginning of this loop can be removed. Yes, I think the long comments can be simplified when the proper barrier is already in place. >>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>> break; >>> >>> + rte_rmb(); >>> rxd = *rxdp; >>> >>> PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " >>> -- >>> 2.31.1 Best regards, Min ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-01 13:29 ` Konstantin Ananyev 2023-05-04 6:13 ` Ruifeng Wang @ 2023-05-04 13:16 ` zhoumin 2023-05-04 13:21 ` Morten Brørup 1 sibling, 1 reply; 30+ messages in thread From: zhoumin @ 2023-05-04 13:16 UTC (permalink / raw) To: Konstantin Ananyev Cc: dev, maobibo, qiming.yang, wenjun1.wu, ruifeng.wang, drc Hi Konstantin, Thanks for your comments. On 2023/5/1 下午9:29, Konstantin Ananyev wrote: >> Segmentation fault has been observed while running the >> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 >> processor which has 64 cores and 4 NUMA nodes. >> >> From the ixgbe_recv_pkts_lro() function, we found that as long as the >> first >> packet has the EOP bit set, and the length of this packet is less >> than or >> equal to rxq->crc_len, the segmentation fault will definitely happen >> even >> though on the other platforms, such as X86. >> >> Because when processd the first packet the first_seg->next will be >> NULL, if >> at the same time this packet has the EOP bit set and its length is less >> than or equal to rxq->crc_len, the following loop will be excecuted: >> >> for (lp = first_seg; lp->next != rxm; lp = lp->next) >> ; >> >> We know that the first_seg->next will be NULL under this condition. >> So the >> expression of lp->next->next will cause the segmentation fault. >> >> Normally, the length of the first packet with EOP bit set will be >> greater >> than rxq->crc_len. However, the out-of-order execution of CPU may >> make the >> read ordering of the status and the rest of the descriptor fields in >> this >> function not be correct. The related codes are as following: >> >> rxdp = &rx_ring[rx_id]; >> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >> >> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >> break; >> >> #2 rxd = *rxdp; >> >> The sentence #2 may be executed before sentence #1. This action is >> likely >> to make the ready packet zero length. If the packet is the first >> packet and >> has the EOP bit set, the above segmentation fault will happen. >> >> So, we should add rte_rmb() to ensure the read ordering be correct. >> We also >> did the same thing in the ixgbe_recv_pkts() function to make the rxd >> data >> be valid even thougth we did not find segmentation fault in this >> function. >> >> Signed-off-by: Min Zhou <zhoumin@loongson.cn> >> --- >> v2: >> - Make the calling of rte_rmb() for all platforms >> --- >> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >> b/drivers/net/ixgbe/ixgbe_rxtx.c >> index c9d6ca9efe..302a5ab7ff 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf >> **rx_pkts, >> staterr = rxdp->wb.upper.status_error; >> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) >> break; >> + >> + rte_rmb(); >> rxd = *rxdp; > > > > Indeed, looks like a problem to me on systems with relaxed MO. > Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. The LoongArch architecture uses the Weak Consistency model which can cause the problem, especially in scenario with many cores, such as Loongson 3C5000 with four NUMA node, which has 64 cores. I cannot reproduce it on Loongson 3C5000 with one NUMA node, which just has 16 cores. > About a fix - looks right, but a bit excessive to me - > as I understand all we need here is to prevent re-ordering by CPU itself. Yes, thanks for cc-ing. > So rte_smp_rmb() seems enough here. > Or might be just: > staterr = __atomic_load_n(&rxdp->wb.upper.status_error, > __ATOMIC_ACQUIRE); > Does __atomic_load_n() work on Windows if we use it to solve this problem ? > >> /* >> @@ -2122,6 +2124,7 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct >> rte_mbuf **rx_pkts, uint16_t nb_pkts, >> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >> break; >> >> + rte_rmb(); >> rxd = *rxdp; >> >> PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " >> -- >> 2.31.1 Best regards, Min ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-04 13:16 ` zhoumin @ 2023-05-04 13:21 ` Morten Brørup 2023-05-04 13:33 ` Zhang, Qi Z 2023-05-05 1:54 ` zhoumin 0 siblings, 2 replies; 30+ messages in thread From: Morten Brørup @ 2023-05-04 13:21 UTC (permalink / raw) To: zhoumin, Konstantin Ananyev Cc: dev, maobibo, qiming.yang, wenjun1.wu, ruifeng.wang, drc, Tyler Retzlaff > From: zhoumin [mailto:zhoumin@loongson.cn] > Sent: Thursday, 4 May 2023 15.17 > > Hi Konstantin, > > Thanks for your comments. > > On 2023/5/1 下午9:29, Konstantin Ananyev wrote: > >> Segmentation fault has been observed while running the > >> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 > >> processor which has 64 cores and 4 NUMA nodes. > >> > >> From the ixgbe_recv_pkts_lro() function, we found that as long as the > >> first > >> packet has the EOP bit set, and the length of this packet is less > >> than or > >> equal to rxq->crc_len, the segmentation fault will definitely happen > >> even > >> though on the other platforms, such as X86. > >> > >> Because when processd the first packet the first_seg->next will be > >> NULL, if > >> at the same time this packet has the EOP bit set and its length is less > >> than or equal to rxq->crc_len, the following loop will be excecuted: > >> > >> for (lp = first_seg; lp->next != rxm; lp = lp->next) > >> ; > >> > >> We know that the first_seg->next will be NULL under this condition. > >> So the > >> expression of lp->next->next will cause the segmentation fault. > >> > >> Normally, the length of the first packet with EOP bit set will be > >> greater > >> than rxq->crc_len. However, the out-of-order execution of CPU may > >> make the > >> read ordering of the status and the rest of the descriptor fields in > >> this > >> function not be correct. The related codes are as following: > >> > >> rxdp = &rx_ring[rx_id]; > >> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > >> > >> if (!(staterr & IXGBE_RXDADV_STAT_DD)) > >> break; > >> > >> #2 rxd = *rxdp; > >> > >> The sentence #2 may be executed before sentence #1. This action is > >> likely > >> to make the ready packet zero length. If the packet is the first > >> packet and > >> has the EOP bit set, the above segmentation fault will happen. > >> > >> So, we should add rte_rmb() to ensure the read ordering be correct. > >> We also > >> did the same thing in the ixgbe_recv_pkts() function to make the rxd > >> data > >> be valid even thougth we did not find segmentation fault in this > >> function. > >> > >> Signed-off-by: Min Zhou <zhoumin@loongson.cn> > >> --- > >> v2: > >> - Make the calling of rte_rmb() for all platforms > >> --- > >> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > >> b/drivers/net/ixgbe/ixgbe_rxtx.c > >> index c9d6ca9efe..302a5ab7ff 100644 > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf > >> **rx_pkts, > >> staterr = rxdp->wb.upper.status_error; > >> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > >> break; > >> + > >> + rte_rmb(); > >> rxd = *rxdp; > > > > > > > > Indeed, looks like a problem to me on systems with relaxed MO. > > Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. > The LoongArch architecture uses the Weak Consistency model which can > cause the problem, especially in scenario with many cores, such as > Loongson 3C5000 with four NUMA node, which has 64 cores. I cannot > reproduce it on Loongson 3C5000 with one NUMA node, which just has 16 cores. > > About a fix - looks right, but a bit excessive to me - > > as I understand all we need here is to prevent re-ordering by CPU itself. > Yes, thanks for cc-ing. > > So rte_smp_rmb() seems enough here. > > Or might be just: > > staterr = __atomic_load_n(&rxdp->wb.upper.status_error, > > __ATOMIC_ACQUIRE); > > > Does __atomic_load_n() work on Windows if we use it to solve this problem ? Yes, __atomic_load_n() works on Windows too. ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-04 13:21 ` Morten Brørup @ 2023-05-04 13:33 ` Zhang, Qi Z 2023-05-05 2:42 ` zhoumin 2023-05-05 1:54 ` zhoumin 1 sibling, 1 reply; 30+ messages in thread From: Zhang, Qi Z @ 2023-05-04 13:33 UTC (permalink / raw) To: Morten Brørup, zhoumin, Konstantin Ananyev Cc: dev, maobibo, Yang, Qiming, Wu, Wenjun1, ruifeng.wang, drc, Tyler Retzlaff > -----Original Message----- > From: Morten Brørup <mb@smartsharesystems.com> > Sent: Thursday, May 4, 2023 9:22 PM > To: zhoumin <zhoumin@loongson.cn>; Konstantin Ananyev > <konstantin.v.ananyev@yandex.ru> > Cc: dev@dpdk.org; maobibo@loongson.cn; Yang, Qiming > <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; > ruifeng.wang@arm.com; drc@linux.vnet.ibm.com; Tyler Retzlaff > <roretzla@linux.microsoft.com> > Subject: RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx > functions > > > From: zhoumin [mailto:zhoumin@loongson.cn] > > Sent: Thursday, 4 May 2023 15.17 > > > > Hi Konstantin, > > > > Thanks for your comments. > > > > On 2023/5/1 下午9:29, Konstantin Ananyev wrote: > > >> Segmentation fault has been observed while running the > > >> ixgbe_recv_pkts_lro() function to receive packets on the Loongson > > >> 3C5000 processor which has 64 cores and 4 NUMA nodes. > > >> > > >> From the ixgbe_recv_pkts_lro() function, we found that as long as > > >> the first packet has the EOP bit set, and the length of this packet > > >> is less than or equal to rxq->crc_len, the segmentation fault will > > >> definitely happen even though on the other platforms, such as X86. Sorry to interrupt, but I am curious why this issue still exists on x86 architecture. Can volatile be used to instruct the compiler to generate read instructions in a specific order, and does x86 guarantee not to reorder load operations? > > >> > > >> Because when processd the first packet the first_seg->next will be > > >> NULL, if at the same time this packet has the EOP bit set and its > > >> length is less than or equal to rxq->crc_len, the following loop > > >> will be excecuted: > > >> > > >> for (lp = first_seg; lp->next != rxm; lp = lp->next) > > >> ; > > >> > > >> We know that the first_seg->next will be NULL under this condition. > > >> So the > > >> expression of lp->next->next will cause the segmentation fault. > > >> > > >> Normally, the length of the first packet with EOP bit set will be > > >> greater than rxq->crc_len. However, the out-of-order execution of > > >> CPU may make the read ordering of the status and the rest of the > > >> descriptor fields in this function not be correct. The related > > >> codes are as following: > > >> > > >> rxdp = &rx_ring[rx_id]; > > >> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > >> > > >> if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > >> break; > > >> > > >> #2 rxd = *rxdp; > > >> > > >> The sentence #2 may be executed before sentence #1. This action is > > >> likely to make the ready packet zero length. If the packet is the > > >> first packet and has the EOP bit set, the above segmentation fault > > >> will happen. > > >> > > >> So, we should add rte_rmb() to ensure the read ordering be correct. > > >> We also > > >> did the same thing in the ixgbe_recv_pkts() function to make the > > >> rxd data be valid even thougth we did not find segmentation fault > > >> in this function. > > >> > > >> Signed-off-by: Min Zhou <zhoumin@loongson.cn> > > >> --- > > >> v2: > > >> - Make the calling of rte_rmb() for all platforms > > >> --- > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > > >> 1 file changed, 3 insertions(+) > > >> > > >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > > >> b/drivers/net/ixgbe/ixgbe_rxtx.c index c9d6ca9efe..302a5ab7ff > > >> 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > >> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct > > >> rte_mbuf **rx_pkts, > > >> staterr = rxdp->wb.upper.status_error; > > >> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > > >> break; > > >> + > > >> + rte_rmb(); > > >> rxd = *rxdp; > > > > > > > > > > > > Indeed, looks like a problem to me on systems with relaxed MO. > > > Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. > > The LoongArch architecture uses the Weak Consistency model which can > > cause the problem, especially in scenario with many cores, such as > > Loongson 3C5000 with four NUMA node, which has 64 cores. I cannot > > reproduce it on Loongson 3C5000 with one NUMA node, which just has 16 > cores. > > > About a fix - looks right, but a bit excessive to me - as I > > > understand all we need here is to prevent re-ordering by CPU itself. > > Yes, thanks for cc-ing. > > > So rte_smp_rmb() seems enough here. > > > Or might be just: > > > staterr = __atomic_load_n(&rxdp->wb.upper.status_error, > > > __ATOMIC_ACQUIRE); > > > > > Does __atomic_load_n() work on Windows if we use it to solve this > problem ? > > Yes, __atomic_load_n() works on Windows too. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-04 13:33 ` Zhang, Qi Z @ 2023-05-05 2:42 ` zhoumin 2023-05-06 1:30 ` Zhang, Qi Z 0 siblings, 1 reply; 30+ messages in thread From: zhoumin @ 2023-05-05 2:42 UTC (permalink / raw) To: Zhang, Qi Z, Morten Brørup, Konstantin Ananyev Cc: dev, maobibo, Yang, Qiming, Wu, Wenjun1, ruifeng.wang, drc, Tyler Retzlaff Hi Qi, On Thur, May 4, 2023 at 9:33PM, Zhang, Qi Z wrote: > >> -----Original Message----- >> From: Morten Brørup <mb@smartsharesystems.com> >> Sent: Thursday, May 4, 2023 9:22 PM >> To: zhoumin <zhoumin@loongson.cn>; Konstantin Ananyev >> <konstantin.v.ananyev@yandex.ru> >> Cc: dev@dpdk.org; maobibo@loongson.cn; Yang, Qiming >> <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; >> ruifeng.wang@arm.com; drc@linux.vnet.ibm.com; Tyler Retzlaff >> <roretzla@linux.microsoft.com> >> Subject: RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx >> functions >> >>> From: zhoumin [mailto:zhoumin@loongson.cn] >>> Sent: Thursday, 4 May 2023 15.17 >>> >>> Hi Konstantin, >>> >>> Thanks for your comments. >>> >>> On 2023/5/1 下午9:29, Konstantin Ananyev wrote: >>>>> Segmentation fault has been observed while running the >>>>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson >>>>> 3C5000 processor which has 64 cores and 4 NUMA nodes. >>>>> >>>>> From the ixgbe_recv_pkts_lro() function, we found that as long as >>>>> the first packet has the EOP bit set, and the length of this packet >>>>> is less than or equal to rxq->crc_len, the segmentation fault will >>>>> definitely happen even though on the other platforms, such as X86. > Sorry to interrupt, but I am curious why this issue still exists on x86 architecture. Can volatile be used to instruct the compiler to generate read instructions in a specific order, and does x86 guarantee not to reorder load operations? Actually, I did not see the segmentation fault on X86. I just made the first packet which had the EOP bit set had a zero length, then the segmentation fault would happen on X86. So, I thought that the out-of-order access to the descriptor might be possible to make the ready packet zero length, and this case was more likely to cause the segmentation fault. >>>>> Because when processd the first packet the first_seg->next will be >>>>> NULL, if at the same time this packet has the EOP bit set and its >>>>> length is less than or equal to rxq->crc_len, the following loop >>>>> will be excecuted: >>>>> >>>>> for (lp = first_seg; lp->next != rxm; lp = lp->next) >>>>> ; >>>>> >>>>> We know that the first_seg->next will be NULL under this condition. >>>>> So the >>>>> expression of lp->next->next will cause the segmentation fault. >>>>> >>>>> Normally, the length of the first packet with EOP bit set will be >>>>> greater than rxq->crc_len. However, the out-of-order execution of >>>>> CPU may make the read ordering of the status and the rest of the >>>>> descriptor fields in this function not be correct. The related >>>>> codes are as following: >>>>> >>>>> rxdp = &rx_ring[rx_id]; >>>>> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >>>>> >>>>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>>>> break; >>>>> >>>>> #2 rxd = *rxdp; >>>>> >>>>> The sentence #2 may be executed before sentence #1. This action is >>>>> likely to make the ready packet zero length. If the packet is the >>>>> first packet and has the EOP bit set, the above segmentation fault >>>>> will happen. >>>>> >>>>> So, we should add rte_rmb() to ensure the read ordering be correct. >>>>> We also >>>>> did the same thing in the ixgbe_recv_pkts() function to make the >>>>> rxd data be valid even thougth we did not find segmentation fault >>>>> in this function. >>>>> >>>>> Signed-off-by: Min Zhou <zhoumin@loongson.cn> >>>>> --- >>>>> v2: >>>>> - Make the calling of rte_rmb() for all platforms >>>>> --- >>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> b/drivers/net/ixgbe/ixgbe_rxtx.c index c9d6ca9efe..302a5ab7ff >>>>> 100644 >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct >>>>> rte_mbuf **rx_pkts, >>>>> staterr = rxdp->wb.upper.status_error; >>>>> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) >>>>> break; >>>>> + >>>>> + rte_rmb(); >>>>> rxd = *rxdp; >>>> >>>> >>>> Indeed, looks like a problem to me on systems with relaxed MO. >>>> Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. >>> The LoongArch architecture uses the Weak Consistency model which can >>> cause the problem, especially in scenario with many cores, such as >>> Loongson 3C5000 with four NUMA node, which has 64 cores. I cannot >>> reproduce it on Loongson 3C5000 with one NUMA node, which just has 16 >> cores. >>>> About a fix - looks right, but a bit excessive to me - as I >>>> understand all we need here is to prevent re-ordering by CPU itself. >>> Yes, thanks for cc-ing. >>>> So rte_smp_rmb() seems enough here. >>>> Or might be just: >>>> staterr = __atomic_load_n(&rxdp->wb.upper.status_error, >>>> __ATOMIC_ACQUIRE); >>>> >>> Does __atomic_load_n() work on Windows if we use it to solve this >> problem ? >> >> Yes, __atomic_load_n() works on Windows too. >> Best regards, Min ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-05 2:42 ` zhoumin @ 2023-05-06 1:30 ` Zhang, Qi Z 0 siblings, 0 replies; 30+ messages in thread From: Zhang, Qi Z @ 2023-05-06 1:30 UTC (permalink / raw) To: zhoumin, Morten Brørup, Konstantin Ananyev Cc: dev, maobibo, Yang, Qiming, Wu, Wenjun1, ruifeng.wang, drc, Tyler Retzlaff > -----Original Message----- > From: zhoumin <zhoumin@loongson.cn> > Sent: Friday, May 5, 2023 10:43 AM > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Morten Brørup > <mb@smartsharesystems.com>; Konstantin Ananyev > <konstantin.v.ananyev@yandex.ru> > Cc: dev@dpdk.org; maobibo@loongson.cn; Yang, Qiming > <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; > ruifeng.wang@arm.com; drc@linux.vnet.ibm.com; Tyler Retzlaff > <roretzla@linux.microsoft.com> > Subject: Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx > functions > > Hi Qi, > > On Thur, May 4, 2023 at 9:33PM, Zhang, Qi Z wrote: > > > >> -----Original Message----- > >> From: Morten Brørup <mb@smartsharesystems.com> > >> Sent: Thursday, May 4, 2023 9:22 PM > >> To: zhoumin <zhoumin@loongson.cn>; Konstantin Ananyev > >> <konstantin.v.ananyev@yandex.ru> > >> Cc: dev@dpdk.org; maobibo@loongson.cn; Yang, Qiming > >> <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; > >> ruifeng.wang@arm.com; drc@linux.vnet.ibm.com; Tyler Retzlaff > >> <roretzla@linux.microsoft.com> > >> Subject: RE: [PATCH v2] net/ixgbe: add proper memory barriers for > >> some Rx functions > >> > >>> From: zhoumin [mailto:zhoumin@loongson.cn] > >>> Sent: Thursday, 4 May 2023 15.17 > >>> > >>> Hi Konstantin, > >>> > >>> Thanks for your comments. > >>> > >>> On 2023/5/1 下午9:29, Konstantin Ananyev wrote: > >>>>> Segmentation fault has been observed while running the > >>>>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson > >>>>> 3C5000 processor which has 64 cores and 4 NUMA nodes. > >>>>> > >>>>> From the ixgbe_recv_pkts_lro() function, we found that as long as > >>>>> the first packet has the EOP bit set, and the length of this > >>>>> packet is less than or equal to rxq->crc_len, the segmentation > >>>>> fault will definitely happen even though on the other platforms, such > as X86. > > Sorry to interrupt, but I am curious why this issue still exists on x86 > architecture. Can volatile be used to instruct the compiler to generate read > instructions in a specific order, and does x86 guarantee not to reorder load > operations? > Actually, I did not see the segmentation fault on X86. I just made the first > packet which had the EOP bit set had a zero length, then the segmentation > fault would happen on X86. So, I thought that the out-of-order access to the > descriptor might be possible to make the ready packet zero length, and this > case was more likely to cause the segmentation fault. I see, thanks for the explanation. > >>>>> Because when processd the first packet the first_seg->next will be > >>>>> NULL, if at the same time this packet has the EOP bit set and its > >>>>> length is less than or equal to rxq->crc_len, the following loop > >>>>> will be excecuted: > >>>>> > >>>>> for (lp = first_seg; lp->next != rxm; lp = lp->next) > >>>>> ; > >>>>> > >>>>> We know that the first_seg->next will be NULL under this condition. > >>>>> So the > >>>>> expression of lp->next->next will cause the segmentation fault. > >>>>> > >>>>> Normally, the length of the first packet with EOP bit set will be > >>>>> greater than rxq->crc_len. However, the out-of-order execution of > >>>>> CPU may make the read ordering of the status and the rest of the > >>>>> descriptor fields in this function not be correct. The related > >>>>> codes are as following: > >>>>> > >>>>> rxdp = &rx_ring[rx_id]; > >>>>> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > >>>>> > >>>>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) > >>>>> break; > >>>>> > >>>>> #2 rxd = *rxdp; > >>>>> > >>>>> The sentence #2 may be executed before sentence #1. This action is > >>>>> likely to make the ready packet zero length. If the packet is the > >>>>> first packet and has the EOP bit set, the above segmentation fault > >>>>> will happen. > >>>>> > >>>>> So, we should add rte_rmb() to ensure the read ordering be correct. > >>>>> We also > >>>>> did the same thing in the ixgbe_recv_pkts() function to make the > >>>>> rxd data be valid even thougth we did not find segmentation fault > >>>>> in this function. > >>>>> > >>>>> Signed-off-by: Min Zhou <zhoumin@loongson.cn> > >>>>> --- > >>>>> v2: > >>>>> - Make the calling of rte_rmb() for all platforms > >>>>> --- > >>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ > >>>>> 1 file changed, 3 insertions(+) > >>>>> > >>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>> b/drivers/net/ixgbe/ixgbe_rxtx.c index c9d6ca9efe..302a5ab7ff > >>>>> 100644 > >>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct > >>>>> rte_mbuf **rx_pkts, > >>>>> staterr = rxdp->wb.upper.status_error; > >>>>> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > >>>>> break; > >>>>> + > >>>>> + rte_rmb(); > >>>>> rxd = *rxdp; > >>>> > >>>> > >>>> Indeed, looks like a problem to me on systems with relaxed MO. > >>>> Strange that it was never hit on arm or ppc - cc-ing ARM/PPC > maintainers. > >>> The LoongArch architecture uses the Weak Consistency model which can > >>> cause the problem, especially in scenario with many cores, such as > >>> Loongson 3C5000 with four NUMA node, which has 64 cores. I cannot > >>> reproduce it on Loongson 3C5000 with one NUMA node, which just has > >>> 16 > >> cores. > >>>> About a fix - looks right, but a bit excessive to me - as I > >>>> understand all we need here is to prevent re-ordering by CPU itself. > >>> Yes, thanks for cc-ing. > >>>> So rte_smp_rmb() seems enough here. > >>>> Or might be just: > >>>> staterr = __atomic_load_n(&rxdp->wb.upper.status_error, > >>>> __ATOMIC_ACQUIRE); > >>>> > >>> Does __atomic_load_n() work on Windows if we use it to solve this > >> problem ? > >> > >> Yes, __atomic_load_n() works on Windows too. > >> > Best regards, > > Min > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-04 13:21 ` Morten Brørup 2023-05-04 13:33 ` Zhang, Qi Z @ 2023-05-05 1:54 ` zhoumin 1 sibling, 0 replies; 30+ messages in thread From: zhoumin @ 2023-05-05 1:54 UTC (permalink / raw) To: Morten Brørup, Konstantin Ananyev Cc: dev, maobibo, qiming.yang, wenjun1.wu, ruifeng.wang, drc, Tyler Retzlaff Hi Morten, On Thur, May 4, 2023 at 9:21PM, Morten Brørup wrote: >> From: zhoumin [mailto:zhoumin@loongson.cn] >> Sent: Thursday, 4 May 2023 15.17 >> >> Hi Konstantin, >> >> Thanks for your comments. >> >> On 2023/5/1 下午9:29, Konstantin Ananyev wrote: >>>> Segmentation fault has been observed while running the >>>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 >>>> processor which has 64 cores and 4 NUMA nodes. >>>> >>>> From the ixgbe_recv_pkts_lro() function, we found that as long as the >>>> first >>>> packet has the EOP bit set, and the length of this packet is less >>>> than or >>>> equal to rxq->crc_len, the segmentation fault will definitely happen >>>> even >>>> though on the other platforms, such as X86. >>>> >>>> Because when processd the first packet the first_seg->next will be >>>> NULL, if >>>> at the same time this packet has the EOP bit set and its length is less >>>> than or equal to rxq->crc_len, the following loop will be excecuted: >>>> >>>> for (lp = first_seg; lp->next != rxm; lp = lp->next) >>>> ; >>>> >>>> We know that the first_seg->next will be NULL under this condition. >>>> So the >>>> expression of lp->next->next will cause the segmentation fault. >>>> >>>> Normally, the length of the first packet with EOP bit set will be >>>> greater >>>> than rxq->crc_len. However, the out-of-order execution of CPU may >>>> make the >>>> read ordering of the status and the rest of the descriptor fields in >>>> this >>>> function not be correct. The related codes are as following: >>>> >>>> rxdp = &rx_ring[rx_id]; >>>> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >>>> >>>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>>> break; >>>> >>>> #2 rxd = *rxdp; >>>> >>>> The sentence #2 may be executed before sentence #1. This action is >>>> likely >>>> to make the ready packet zero length. If the packet is the first >>>> packet and >>>> has the EOP bit set, the above segmentation fault will happen. >>>> >>>> So, we should add rte_rmb() to ensure the read ordering be correct. >>>> We also >>>> did the same thing in the ixgbe_recv_pkts() function to make the rxd >>>> data >>>> be valid even thougth we did not find segmentation fault in this >>>> function. >>>> >>>> Signed-off-by: Min Zhou <zhoumin@loongson.cn> >>>> --- >>>> v2: >>>> - Make the calling of rte_rmb() for all platforms >>>> --- >>>> drivers/net/ixgbe/ixgbe_rxtx.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >>>> b/drivers/net/ixgbe/ixgbe_rxtx.c >>>> index c9d6ca9efe..302a5ab7ff 100644 >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>> @@ -1823,6 +1823,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf >>>> **rx_pkts, >>>> staterr = rxdp->wb.upper.status_error; >>>> if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) >>>> break; >>>> + >>>> + rte_rmb(); >>>> rxd = *rxdp; >>> >>> >>> Indeed, looks like a problem to me on systems with relaxed MO. >>> Strange that it was never hit on arm or ppc - cc-ing ARM/PPC maintainers. >> The LoongArch architecture uses the Weak Consistency model which can >> cause the problem, especially in scenario with many cores, such as >> Loongson 3C5000 with four NUMA node, which has 64 cores. I cannot >> reproduce it on Loongson 3C5000 with one NUMA node, which just has 16 cores. >>> About a fix - looks right, but a bit excessive to me - >>> as I understand all we need here is to prevent re-ordering by CPU itself. >> Yes, thanks for cc-ing. >>> So rte_smp_rmb() seems enough here. >>> Or might be just: >>> staterr = __atomic_load_n(&rxdp->wb.upper.status_error, >>> __ATOMIC_ACQUIRE); >>> >> Does __atomic_load_n() work on Windows if we use it to solve this problem ? > Yes, __atomic_load_n() works on Windows too. > Thank you, Morten. I got it. I will compare those barriers and choose a proper one for this problem. Best regards, Min ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-04-24 9:05 [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions Min Zhou 2023-04-28 3:43 ` Zhang, Qi Z 2023-05-01 13:29 ` Konstantin Ananyev @ 2023-05-06 10:23 ` Min Zhou 2023-05-08 6:03 ` Ruifeng Wang 2023-06-13 9:44 ` [PATCH v4] " Min Zhou 2 siblings, 2 replies; 30+ messages in thread From: Min Zhou @ 2023-05-06 10:23 UTC (permalink / raw) To: qi.z.zhang, mb, konstantin.v.ananyev, qiming.yang, wenjun1.wu, zhoumin Cc: ruifeng.wang, drc, roretzla, dev, stable, maobibo Segmentation fault has been observed while running the ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which has 64 cores and 4 NUMA nodes. From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the segmentation fault will definitely happen even though on the other platforms. For example, if we made the first packet which had the EOP bit set had a zero length by force, the segmentation fault would happen on X86. Because when processd the first packet the first_seg->next will be NULL, if at the same time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, the following loop will be executed: for (lp = first_seg; lp->next != rxm; lp = lp->next) ; We know that the first_seg->next will be NULL under this condition. So the expression of lp->next->next will cause the segmentation fault. Normally, the length of the first packet with EOP bit set will be greater than rxq->crc_len. However, the out-of-order execution of CPU may make the read ordering of the status and the rest of the descriptor fields in this function not be correct. The related codes are as following: rxdp = &rx_ring[rx_id]; #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; #2 rxd = *rxdp; The sentence #2 may be executed before sentence #1. This action is likely to make the ready packet zero length. If the packet is the first packet and has the EOP bit set, the above segmentation fault will happen. So, we should add a proper memory barrier to ensure the read ordering be correct. We also did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even though we did not find segmentation fault in this function. Fixes: 8eecb3295ae ("ixgbe: add LRO support") Cc: stable@dpdk.org Signed-off-by: Min Zhou <zhoumin@loongson.cn> --- v3: - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() --- v2: - Make the calling of rte_rmb() for all platforms --- drivers/net/ixgbe/ixgbe_rxtx.c | 39 ++++++++++++---------------------- 1 file changed, 13 insertions(+), 26 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 6b3d3a4d1a..80bcaef093 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1823,6 +1823,12 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, staterr = rxdp->wb.upper.status_error; if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) break; + + /* + * This barrier is to ensure that status_error which includes DD + * bit is loaded before loading of other descriptor words. + */ + rte_smp_rmb(); rxd = *rxdp; /* @@ -2089,32 +2095,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts, next_desc: /* - * The code in this whole file uses the volatile pointer to - * ensure the read ordering of the status and the rest of the - * descriptor fields (on the compiler level only!!!). This is so - * UGLY - why not to just use the compiler barrier instead? DPDK - * even has the rte_compiler_barrier() for that. - * - * But most importantly this is just wrong because this doesn't - * ensure memory ordering in a general case at all. For - * instance, DPDK is supposed to work on Power CPUs where - * compiler barrier may just not be enough! - * - * I tried to write only this function properly to have a - * starting point (as a part of an LRO/RSC series) but the - * compiler cursed at me when I tried to cast away the - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm - * keeping it the way it is for now. - * - * The code in this file is broken in so many other places and - * will just not work on a big endian CPU anyway therefore the - * lines below will have to be revisited together with the rest - * of the ixgbe PMD. - * - * TODO: - * - Get rid of "volatile" and let the compiler do its job. - * - Use the proper memory barrier (rte_rmb()) to ensure the - * memory ordering below. + * It is necessary to use a proper memory barrier to ensure the + * memory ordering below. */ rxdp = &rx_ring[rx_id]; staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); @@ -2122,6 +2104,11 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts, if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; + /* + * This barrier is to ensure that status_error which includes DD + * bit is loaded before loading of other descriptor words. + */ + rte_smp_rmb(); rxd = *rxdp; PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " -- 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-06 10:23 ` [PATCH v3] " Min Zhou @ 2023-05-08 6:03 ` Ruifeng Wang 2023-05-15 2:10 ` Zhang, Qi Z 2023-06-13 9:44 ` [PATCH v4] " Min Zhou 1 sibling, 1 reply; 30+ messages in thread From: Ruifeng Wang @ 2023-05-08 6:03 UTC (permalink / raw) To: Min Zhou, qi.z.zhang, mb, konstantin.v.ananyev, qiming.yang, wenjun1.wu Cc: drc, roretzla, dev, stable, maobibo, nd > -----Original Message----- > From: Min Zhou <zhoumin@loongson.cn> > Sent: Saturday, May 6, 2023 6:24 PM > To: qi.z.zhang@intel.com; mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; > qiming.yang@intel.com; wenjun1.wu@intel.com; zhoumin@loongson.cn > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com; > roretzla@linux.microsoft.com; dev@dpdk.org; stable@dpdk.org; maobibo@loongson.cn > Subject: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions > > Segmentation fault has been observed while running the > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which > has 64 cores and 4 NUMA nodes. > > From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the > EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the > segmentation fault will definitely happen even though on the other platforms. For example, > if we made the first packet which had the EOP bit set had a zero length by force, the > segmentation fault would happen on X86. > > Because when processd the first packet the first_seg->next will be NULL, if at the same > time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, > the following loop will be executed: > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > ; > > We know that the first_seg->next will be NULL under this condition. So the expression of > lp->next->next will cause the segmentation fault. > > Normally, the length of the first packet with EOP bit set will be greater than rxq- > >crc_len. However, the out-of-order execution of CPU may make the read ordering of the > status and the rest of the descriptor fields in this function not be correct. The related > codes are as following: > > rxdp = &rx_ring[rx_id]; > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > #2 rxd = *rxdp; > > The sentence #2 may be executed before sentence #1. This action is likely to make the > ready packet zero length. If the packet is the first packet and has the EOP bit set, the > above segmentation fault will happen. > > So, we should add a proper memory barrier to ensure the read ordering be correct. We also > did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even > though we did not find segmentation fault in this function. > > Fixes: 8eecb3295ae ("ixgbe: add LRO support") > Cc: stable@dpdk.org > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > --- > v3: > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > --- > v2: > - Make the calling of rte_rmb() for all platforms > --- > drivers/net/ixgbe/ixgbe_rxtx.c | 39 ++++++++++++---------------------- > 1 file changed, 13 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > 6b3d3a4d1a..80bcaef093 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -1823,6 +1823,12 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, > staterr = rxdp->wb.upper.status_error; > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > break; > + > + /* > + * This barrier is to ensure that status_error which includes DD > + * bit is loaded before loading of other descriptor words. > + */ > + rte_smp_rmb(); > rxd = *rxdp; > > /* > @@ -2089,32 +2095,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts, > > next_desc: > /* > - * The code in this whole file uses the volatile pointer to > - * ensure the read ordering of the status and the rest of the > - * descriptor fields (on the compiler level only!!!). This is so > - * UGLY - why not to just use the compiler barrier instead? DPDK > - * even has the rte_compiler_barrier() for that. > - * > - * But most importantly this is just wrong because this doesn't > - * ensure memory ordering in a general case at all. For > - * instance, DPDK is supposed to work on Power CPUs where > - * compiler barrier may just not be enough! > - * > - * I tried to write only this function properly to have a > - * starting point (as a part of an LRO/RSC series) but the > - * compiler cursed at me when I tried to cast away the > - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm > - * keeping it the way it is for now. > - * > - * The code in this file is broken in so many other places and > - * will just not work on a big endian CPU anyway therefore the > - * lines below will have to be revisited together with the rest > - * of the ixgbe PMD. > - * > - * TODO: > - * - Get rid of "volatile" and let the compiler do its job. > - * - Use the proper memory barrier (rte_rmb()) to ensure the > - * memory ordering below. > + * It is necessary to use a proper memory barrier to ensure the > + * memory ordering below. > */ > rxdp = &rx_ring[rx_id]; > staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > @@ -2122,6 +2104,11 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts, > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > + /* > + * This barrier is to ensure that status_error which includes DD > + * bit is loaded before loading of other descriptor words. > + */ > + rte_smp_rmb(); > rxd = *rxdp; > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > -- > 2.31.1 Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-08 6:03 ` Ruifeng Wang @ 2023-05-15 2:10 ` Zhang, Qi Z 2023-06-12 10:26 ` Thomas Monjalon 0 siblings, 1 reply; 30+ messages in thread From: Zhang, Qi Z @ 2023-05-15 2:10 UTC (permalink / raw) To: Ruifeng Wang, Min Zhou, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1 Cc: drc, roretzla, dev, stable, maobibo, nd > -----Original Message----- > From: Ruifeng Wang <Ruifeng.Wang@arm.com> > Sent: Monday, May 8, 2023 2:03 PM > To: Min Zhou <zhoumin@loongson.cn>; Zhang, Qi Z <qi.z.zhang@intel.com>; > mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Yang, > Qiming <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com> > Cc: drc@linux.vnet.ibm.com; roretzla@linux.microsoft.com; dev@dpdk.org; > stable@dpdk.org; maobibo@loongson.cn; nd <nd@arm.com> > Subject: RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx > functions > > > -----Original Message----- > > From: Min Zhou <zhoumin@loongson.cn> > > Sent: Saturday, May 6, 2023 6:24 PM > > To: qi.z.zhang@intel.com; mb@smartsharesystems.com; > > konstantin.v.ananyev@yandex.ru; qiming.yang@intel.com; > > wenjun1.wu@intel.com; zhoumin@loongson.cn > > Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com; > > roretzla@linux.microsoft.com; dev@dpdk.org; stable@dpdk.org; > > maobibo@loongson.cn > > Subject: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx > > functions > > > > Segmentation fault has been observed while running the > > ixgbe_recv_pkts_lro() function to receive packets on the Loongson > > 3C5000 processor which has 64 cores and 4 NUMA nodes. > > > > From the ixgbe_recv_pkts_lro() function, we found that as long as the > > first packet has the EOP bit set, and the length of this packet is > > less than or equal to rxq->crc_len, the segmentation fault will > > definitely happen even though on the other platforms. For example, if > > we made the first packet which had the EOP bit set had a zero length by > force, the segmentation fault would happen on X86. > > > > Because when processd the first packet the first_seg->next will be > > NULL, if at the same time this packet has the EOP bit set and its > > length is less than or equal to rxq->crc_len, the following loop will be > executed: > > > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > > ; > > > > We know that the first_seg->next will be NULL under this condition. So > > the expression of > > lp->next->next will cause the segmentation fault. > > > > Normally, the length of the first packet with EOP bit set will be > > greater than rxq- > > >crc_len. However, the out-of-order execution of CPU may make the read > > >ordering of the > > status and the rest of the descriptor fields in this function not be > > correct. The related codes are as following: > > > > rxdp = &rx_ring[rx_id]; > > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > #2 rxd = *rxdp; > > > > The sentence #2 may be executed before sentence #1. This action is > > likely to make the ready packet zero length. If the packet is the > > first packet and has the EOP bit set, the above segmentation fault will > happen. > > > > So, we should add a proper memory barrier to ensure the read ordering > > be correct. We also did the same thing in the ixgbe_recv_pkts() > > function to make the rxd data be valid even though we did not find > segmentation fault in this function. > > > > Fixes: 8eecb3295ae ("ixgbe: add LRO support") > > Cc: stable@dpdk.org > > > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > > --- > > v3: > > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > > --- > > v2: > > - Make the calling of rte_rmb() for all platforms > > --- > > drivers/net/ixgbe/ixgbe_rxtx.c | 39 > > ++++++++++++---------------------- > > 1 file changed, 13 insertions(+), 26 deletions(-) > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > 6b3d3a4d1a..80bcaef093 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -1823,6 +1823,12 @@ ixgbe_recv_pkts(void *rx_queue, struct > rte_mbuf **rx_pkts, > > staterr = rxdp->wb.upper.status_error; > > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > > break; > > + > > + /* > > + * This barrier is to ensure that status_error which includes > DD > > + * bit is loaded before loading of other descriptor words. > > + */ > > + rte_smp_rmb(); > > rxd = *rxdp; > > > > /* > > @@ -2089,32 +2095,8 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct > > rte_mbuf **rx_pkts, uint16_t nb_pkts, > > > > next_desc: > > /* > > - * The code in this whole file uses the volatile pointer to > > - * ensure the read ordering of the status and the rest of the > > - * descriptor fields (on the compiler level only!!!). This is so > > - * UGLY - why not to just use the compiler barrier instead? > DPDK > > - * even has the rte_compiler_barrier() for that. > > - * > > - * But most importantly this is just wrong because this > doesn't > > - * ensure memory ordering in a general case at all. For > > - * instance, DPDK is supposed to work on Power CPUs where > > - * compiler barrier may just not be enough! > > - * > > - * I tried to write only this function properly to have a > > - * starting point (as a part of an LRO/RSC series) but the > > - * compiler cursed at me when I tried to cast away the > > - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm > > - * keeping it the way it is for now. > > - * > > - * The code in this file is broken in so many other places and > > - * will just not work on a big endian CPU anyway therefore > the > > - * lines below will have to be revisited together with the rest > > - * of the ixgbe PMD. > > - * > > - * TODO: > > - * - Get rid of "volatile" and let the compiler do its job. > > - * - Use the proper memory barrier (rte_rmb()) to ensure > the > > - * memory ordering below. > > + * It is necessary to use a proper memory barrier to ensure > the > > + * memory ordering below. > > */ > > rxdp = &rx_ring[rx_id]; > > staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > @@ -2122,6 +2104,11 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct > > rte_mbuf **rx_pkts, uint16_t nb_pkts, > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > + /* > > + * This barrier is to ensure that status_error which includes > DD > > + * bit is loaded before loading of other descriptor words. > > + */ > > + rte_smp_rmb(); > > rxd = *rxdp; > > > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > > -- > > 2.31.1 > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Applied to dpdk-next-net-intel. Thanks Qi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-15 2:10 ` Zhang, Qi Z @ 2023-06-12 10:26 ` Thomas Monjalon 2023-06-12 11:58 ` zhoumin 0 siblings, 1 reply; 30+ messages in thread From: Thomas Monjalon @ 2023-06-12 10:26 UTC (permalink / raw) To: Ruifeng Wang, Min Zhou, dev, Zhang, Qi Z Cc: mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, dev, stable, maobibo, nd, david.marchand 15/05/2023 04:10, Zhang, Qi Z: > From: Ruifeng Wang <Ruifeng.Wang@arm.com> > > From: Min Zhou <zhoumin@loongson.cn> > > > > > > Segmentation fault has been observed while running the > > > ixgbe_recv_pkts_lro() function to receive packets on the Loongson > > > 3C5000 processor which has 64 cores and 4 NUMA nodes. > > > > > > From the ixgbe_recv_pkts_lro() function, we found that as long as the > > > first packet has the EOP bit set, and the length of this packet is > > > less than or equal to rxq->crc_len, the segmentation fault will > > > definitely happen even though on the other platforms. For example, if > > > we made the first packet which had the EOP bit set had a zero length by > > force, the segmentation fault would happen on X86. > > > > > > Because when processd the first packet the first_seg->next will be > > > NULL, if at the same time this packet has the EOP bit set and its > > > length is less than or equal to rxq->crc_len, the following loop will be > > executed: > > > > > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > > > ; > > > > > > We know that the first_seg->next will be NULL under this condition. So > > > the expression of > > > lp->next->next will cause the segmentation fault. > > > > > > Normally, the length of the first packet with EOP bit set will be > > > greater than rxq- > > > >crc_len. However, the out-of-order execution of CPU may make the read > > > >ordering of the > > > status and the rest of the descriptor fields in this function not be > > > correct. The related codes are as following: > > > > > > rxdp = &rx_ring[rx_id]; > > > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > > > > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > > break; > > > > > > #2 rxd = *rxdp; > > > > > > The sentence #2 may be executed before sentence #1. This action is > > > likely to make the ready packet zero length. If the packet is the > > > first packet and has the EOP bit set, the above segmentation fault will > > happen. > > > > > > So, we should add a proper memory barrier to ensure the read ordering > > > be correct. We also did the same thing in the ixgbe_recv_pkts() > > > function to make the rxd data be valid even though we did not find > > segmentation fault in this function. > > > > > > Fixes: 8eecb3295ae ("ixgbe: add LRO support") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > > > --- > > > v3: > > > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > > > --- > > > v2: > > > - Make the calling of rte_rmb() for all platforms > > > --- [...] > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > Applied to dpdk-next-net-intel. > > Thanks > Qi > Why ignoring checkpatch? It is saying: " Warning in drivers/net/ixgbe/ixgbe_rxtx.c: Using rte_smp_[r/w]mb " Ruifeng proposed "rte_atomic_thread_fence(__ATOMIC_ACQUIRE)" in a comment on the v2. I will drop this patch from the pull of next-net-intel branch. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-12 10:26 ` Thomas Monjalon @ 2023-06-12 11:58 ` zhoumin 2023-06-12 12:44 ` Thomas Monjalon 0 siblings, 1 reply; 30+ messages in thread From: zhoumin @ 2023-06-12 11:58 UTC (permalink / raw) To: Thomas Monjalon, Ruifeng Wang, dev, Zhang, Qi Z Cc: mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, david.marchand Hi Thomas, On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: > 15/05/2023 04:10, Zhang, Qi Z: >> From: Ruifeng Wang <Ruifeng.Wang@arm.com> >>> From: Min Zhou <zhoumin@loongson.cn> >>>> Segmentation fault has been observed while running the >>>> ixgbe_recv_pkts_lro() function to receive packets on the Loongson >>>> 3C5000 processor which has 64 cores and 4 NUMA nodes. >>>> >>>> From the ixgbe_recv_pkts_lro() function, we found that as long as the >>>> first packet has the EOP bit set, and the length of this packet is >>>> less than or equal to rxq->crc_len, the segmentation fault will >>>> definitely happen even though on the other platforms. For example, if >>>> we made the first packet which had the EOP bit set had a zero length by >>> force, the segmentation fault would happen on X86. >>>> Because when processd the first packet the first_seg->next will be >>>> NULL, if at the same time this packet has the EOP bit set and its >>>> length is less than or equal to rxq->crc_len, the following loop will be >>> executed: >>>> for (lp = first_seg; lp->next != rxm; lp = lp->next) >>>> ; >>>> >>>> We know that the first_seg->next will be NULL under this condition. So >>>> the expression of >>>> lp->next->next will cause the segmentation fault. >>>> >>>> Normally, the length of the first packet with EOP bit set will be >>>> greater than rxq- >>>>> crc_len. However, the out-of-order execution of CPU may make the read >>>>> ordering of the >>>> status and the rest of the descriptor fields in this function not be >>>> correct. The related codes are as following: >>>> >>>> rxdp = &rx_ring[rx_id]; >>>> #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); >>>> >>>> if (!(staterr & IXGBE_RXDADV_STAT_DD)) >>>> break; >>>> >>>> #2 rxd = *rxdp; >>>> >>>> The sentence #2 may be executed before sentence #1. This action is >>>> likely to make the ready packet zero length. If the packet is the >>>> first packet and has the EOP bit set, the above segmentation fault will >>> happen. >>>> So, we should add a proper memory barrier to ensure the read ordering >>>> be correct. We also did the same thing in the ixgbe_recv_pkts() >>>> function to make the rxd data be valid even though we did not find >>> segmentation fault in this function. >>>> Fixes: 8eecb3295ae ("ixgbe: add LRO support") >>>> Cc: stable@dpdk.org >>>> >>>> Signed-off-by: Min Zhou <zhoumin@loongson.cn> >>>> --- >>>> v3: >>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() >>>> --- >>>> v2: >>>> - Make the calling of rte_rmb() for all platforms >>>> --- > [...] >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> >> Applied to dpdk-next-net-intel. >> >> Thanks >> Qi >> > Why ignoring checkpatch? > It is saying: > " > Warning in drivers/net/ixgbe/ixgbe_rxtx.c: > Using rte_smp_[r/w]mb > " I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? > Ruifeng proposed "rte_atomic_thread_fence(__ATOMIC_ACQUIRE)" > in a comment on the v2. Thanks, I see. I think I also can use rte_atomic_thread_fence() to solve this problem. I will send the V4 patch. > > I will drop this patch from the pull of next-net-intel branch. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-12 11:58 ` zhoumin @ 2023-06-12 12:44 ` Thomas Monjalon 2023-06-13 1:42 ` zhoumin 2023-06-13 9:25 ` Ruifeng Wang 0 siblings, 2 replies; 30+ messages in thread From: Thomas Monjalon @ 2023-06-12 12:44 UTC (permalink / raw) To: Ruifeng Wang, Zhang, Qi Z, zhoumin Cc: dev, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, david.marchand, honnappa.nagarahalli, Tyler Retzlaff, konstantin.ananyev 12/06/2023 13:58, zhoumin: > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: > > 15/05/2023 04:10, Zhang, Qi Z: > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com> > >>> From: Min Zhou <zhoumin@loongson.cn> > >>>> --- > >>>> v3: > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > >>>> --- > >>>> v2: > >>>> - Make the calling of rte_rmb() for all platforms > >>>> --- > > [...] > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > >> Applied to dpdk-next-net-intel. > >> > >> Thanks > >> Qi > >> > > Why ignoring checkpatch? > > It is saying: > > " > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c: > > Using rte_smp_[r/w]mb > > " > > > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? No we should avoid. It has been decided to slowly replace such barriers. By the way, I think it is not enough documented. You can find an explanation in doc/guides/rel_notes/deprecation.rst I think we should also add some notes to lib/eal/include/generic/rte_atomic.h Tyler, Honnappa, Ruifeng, Konstantin, what do you think? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-12 12:44 ` Thomas Monjalon @ 2023-06-13 1:42 ` zhoumin 2023-06-13 3:30 ` Jiawen Wu 2023-06-14 10:58 ` Konstantin Ananyev 2023-06-13 9:25 ` Ruifeng Wang 1 sibling, 2 replies; 30+ messages in thread From: zhoumin @ 2023-06-13 1:42 UTC (permalink / raw) To: Thomas Monjalon, Ruifeng Wang, Zhang, Qi Z Cc: dev, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, david.marchand, honnappa.nagarahalli, Tyler Retzlaff, konstantin.ananyev Hi Thomas, On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote: > 12/06/2023 13:58, zhoumin: >> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: >>> 15/05/2023 04:10, Zhang, Qi Z: >>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com> >>>>> From: Min Zhou <zhoumin@loongson.cn> >>>>>> --- >>>>>> v3: >>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() >>>>>> --- >>>>>> v2: >>>>>> - Make the calling of rte_rmb() for all platforms >>>>>> --- >>> [...] >>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> >>>> Applied to dpdk-next-net-intel. >>>> >>>> Thanks >>>> Qi >>>> >>> Why ignoring checkpatch? >>> It is saying: >>> " >>> Warning in drivers/net/ixgbe/ixgbe_rxtx.c: >>> Using rte_smp_[r/w]mb >>> " >> >> I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? > No we should avoid. > It has been decided to slowly replace such barriers. > By the way, I think it is not enough documented. > You can find an explanation in doc/guides/rel_notes/deprecation.rst Thank your for providing the reference documents. I have read this file. The explanation is clear and I get it. > I think we should also add some notes to > lib/eal/include/generic/rte_atomic.h Yes, I do think so. The notes added at the definitions of rte_smp_[r/w]mb are better. > Tyler, Honnappa, Ruifeng, Konstantin, what do you think? > ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-13 1:42 ` zhoumin @ 2023-06-13 3:30 ` Jiawen Wu 2023-06-13 10:12 ` zhoumin 2023-06-14 10:58 ` Konstantin Ananyev 1 sibling, 1 reply; 30+ messages in thread From: Jiawen Wu @ 2023-06-13 3:30 UTC (permalink / raw) To: 'zhoumin', 'Thomas Monjalon', 'Ruifeng Wang', 'Zhang, Qi Z' Cc: dev, mb, konstantin.v.ananyev, 'Yang, Qiming', 'Wu, Wenjun1', drc, roretzla, stable, maobibo, 'nd', david.marchand, honnappa.nagarahalli, 'Tyler Retzlaff', konstantin.ananyev On Tuesday, June 13, 2023 9:43 AM, zhoumin wrote: > On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote: > > 12/06/2023 13:58, zhoumin: > >> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: > >>> 15/05/2023 04:10, Zhang, Qi Z: > >>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com> > >>>>> From: Min Zhou <zhoumin@loongson.cn> > >>>>>> --- > >>>>>> v3: > >>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > >>>>>> --- > >>>>>> v2: > >>>>>> - Make the calling of rte_rmb() for all platforms > >>>>>> --- Hi zhoumin, I recently learned that Loongson is doing tests with Wangxun NICs on 3C5000, and also found this problem on Wangxun NICs. I'm wondering if it would also be fixed on txgbe/ngbe. Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-13 3:30 ` Jiawen Wu @ 2023-06-13 10:12 ` zhoumin 0 siblings, 0 replies; 30+ messages in thread From: zhoumin @ 2023-06-13 10:12 UTC (permalink / raw) To: Jiawen Wu, 'Thomas Monjalon', 'Ruifeng Wang', 'Zhang, Qi Z' Cc: dev, mb, konstantin.v.ananyev, 'Yang, Qiming', 'Wu, Wenjun1', drc, roretzla, stable, maobibo, 'nd', david.marchand, honnappa.nagarahalli, 'Tyler Retzlaff', konstantin.ananyev Hi Jiawen, On Tues, June 13, 2023 at 11:30PM, Jiawen Wu wrote: > On Tuesday, June 13, 2023 9:43 AM, zhoumin wrote: >> On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote: >>> 12/06/2023 13:58, zhoumin: >>>> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: >>>>> 15/05/2023 04:10, Zhang, Qi Z: >>>>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com> >>>>>>> From: Min Zhou <zhoumin@loongson.cn> >>>>>>>> --- >>>>>>>> v3: >>>>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() >>>>>>>> --- >>>>>>>> v2: >>>>>>>> - Make the calling of rte_rmb() for all platforms >>>>>>>> --- > Hi zhoumin, > > I recently learned that Loongson is doing tests with Wangxun NICs on 3C5000, and also > found this problem on Wangxun NICs. I'm wondering if it would also be fixed on txgbe/ngbe. I'm sorry. I have not tested the Wangxun NICs on LRO receiving mode. The previous test results for Wangxun NICs were normal. I will do additional tests for Wangxun NICs to verify this problem. > Thanks. ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-13 1:42 ` zhoumin 2023-06-13 3:30 ` Jiawen Wu @ 2023-06-14 10:58 ` Konstantin Ananyev 1 sibling, 0 replies; 30+ messages in thread From: Konstantin Ananyev @ 2023-06-14 10:58 UTC (permalink / raw) To: zhoumin, Thomas Monjalon, Ruifeng Wang, Zhang, Qi Z Cc: dev, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, David Marchand, honnappa.nagarahalli, Tyler Retzlaff [-- Attachment #1: Type: text/plain, Size: 2434 bytes --] ________________________________ Konstantin Ananyev Mobile: +353-873459988 Email: konstantin.ananyev@huawei.com From:zhoumin <zhoumin@loongson.cn> To:Thomas Monjalon <thomas@monjalon.net>;Ruifeng Wang <Ruifeng.Wang@arm.com>;Zhang, Qi Z <qi.z.zhang@intel.com> Cc:dev <dev@dpdk.org>;mb <mb@smartsharesystems.com>;konstantin.v.ananyev <konstantin.v.ananyev@yandex.ru>;Yang, Qiming <qiming.yang@intel.com>;Wu, Wenjun1 <wenjun1.wu@intel.com>;drc <drc@linux.vnet.ibm.com>;roretzla <roretzla@linux.microsoft.com>;stable <stable@dpdk.org>;maobibo <maobibo@loongson.cn>;nd <nd@arm.com>;David Marchand <david.marchand@redhat.com>;honnappa.nagarahalli <honnappa.nagarahalli@arm.com>;Tyler Retzlaff <roretzla@microsoft.com>;Konstantin Ananyev <konstantin.ananyev@huawei.com> Date:2023-06-13 04:43:12 Subject:Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions Hi Thomas, On Mon, June 12, 2023 at 8:44PM, Thomas Monjalon wrote: > 12/06/2023 13:58, zhoumin: >> On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: >>> 15/05/2023 04:10, Zhang, Qi Z: >>>> From: Ruifeng Wang <Ruifeng.Wang@arm.com<mailto:Ruifeng.Wang@arm.com>> >>>>> From: Min Zhou <zhoumin@loongson.cn<mailto:zhoumin@loongson.cn>> >>>>>> --- >>>>>> v3: >>>>>> - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() >>>>>> --- >>>>>> v2: >>>>>> - Make the calling of rte_rmb() for all platforms >>>>>> --- >>> [...] >>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>> >>>> Applied to dpdk-next-net-intel. >>>> >>>> Thanks >>>> Qi >>>> >>> Why ignoring checkpatch? >>> It is saying: >>> " >>> Warning in drivers/net/ixgbe/ixgbe_rxtx.c: >>> Using rte_smp_[r/w]mb >>> " >> >> I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? > No we should avoid. > It has been decided to slowly replace such barriers. > By the way, I think it is not enough documented. > You can find an explanation in doc/guides/rel_notes/deprecation.rst Thank your for providing the reference documents. I have read this file. The explanation is clear and I get it. > I think we should also add some notes to > lib/eal/include/generic/rte_atomic.h Yes, I do think so. The notes added at the definitions of rte_smp_[r/w]mb are better. > Tyler, Honnappa, Ruifeng, Konstantin, what do you think? > Yes, extra notes sounds like a reasonable thing to me. [-- Attachment #2: Type: text/html, Size: 4394 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-12 12:44 ` Thomas Monjalon 2023-06-13 1:42 ` zhoumin @ 2023-06-13 9:25 ` Ruifeng Wang 2023-06-20 15:52 ` Thomas Monjalon 1 sibling, 1 reply; 30+ messages in thread From: Ruifeng Wang @ 2023-06-13 9:25 UTC (permalink / raw) To: thomas, Zhang, Qi Z, zhoumin Cc: dev, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, david.marchand, Honnappa Nagarahalli, Tyler Retzlaff, konstantin.ananyev, nd > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Monday, June 12, 2023 8:45 PM > To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; zhoumin > <zhoumin@loongson.cn> > Cc: dev@dpdk.org; mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Yang, Qiming > <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; drc@linux.vnet.ibm.com; > roretzla@linux.microsoft.com; stable@dpdk.org; maobibo@loongson.cn; nd <nd@arm.com>; > david.marchand@redhat.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Tyler > Retzlaff <roretzla@microsoft.com>; konstantin.ananyev@huawei.com > Subject: Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions > > 12/06/2023 13:58, zhoumin: > > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: > > > 15/05/2023 04:10, Zhang, Qi Z: > > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com> > > >>> From: Min Zhou <zhoumin@loongson.cn> > > >>>> --- > > >>>> v3: > > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of > > >>>> rte_rmb() > > >>>> --- > > >>>> v2: > > >>>> - Make the calling of rte_rmb() for all platforms > > >>>> --- > > > [...] > > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > >> Applied to dpdk-next-net-intel. > > >> > > >> Thanks > > >> Qi > > >> > > > Why ignoring checkpatch? > > > It is saying: > > > " > > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c: > > > Using rte_smp_[r/w]mb > > > " > > > > > > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? > > No we should avoid. > It has been decided to slowly replace such barriers. > By the way, I think it is not enough documented. > You can find an explanation in doc/guides/rel_notes/deprecation.rst > > I think we should also add some notes to lib/eal/include/generic/rte_atomic.h > Tyler, Honnappa, Ruifeng, Konstantin, what do you think? > Agree that we should add notes to rte_atomic.h. The notes were not there for the sake of avoiding warnings on existing occurrences. With Tyler's rte_atomic series merged, rte_atomicNN_xx can be marked as __rte_deprecated. rte_smp_*mb can be marked as __rte_deprecated after existing occurrences are converted. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-13 9:25 ` Ruifeng Wang @ 2023-06-20 15:52 ` Thomas Monjalon 2023-06-21 6:50 ` Ruifeng Wang 0 siblings, 1 reply; 30+ messages in thread From: Thomas Monjalon @ 2023-06-20 15:52 UTC (permalink / raw) To: Zhang, Qi Z, zhoumin, Ruifeng Wang Cc: dev, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, david.marchand, Honnappa Nagarahalli, Tyler Retzlaff, konstantin.ananyev, nd 13/06/2023 11:25, Ruifeng Wang: > From: Thomas Monjalon <thomas@monjalon.net> > > 12/06/2023 13:58, zhoumin: > > > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: > > > > 15/05/2023 04:10, Zhang, Qi Z: > > > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com> > > > >>> From: Min Zhou <zhoumin@loongson.cn> > > > >>>> --- > > > >>>> v3: > > > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of > > > >>>> rte_rmb() > > > >>>> --- > > > >>>> v2: > > > >>>> - Make the calling of rte_rmb() for all platforms > > > >>>> --- > > > > [...] > > > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > >> Applied to dpdk-next-net-intel. > > > >> > > > >> Thanks > > > >> Qi > > > >> > > > > Why ignoring checkpatch? > > > > It is saying: > > > > " > > > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c: > > > > Using rte_smp_[r/w]mb > > > > " > > > > > > > > > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? > > > > No we should avoid. > > It has been decided to slowly replace such barriers. > > By the way, I think it is not enough documented. > > You can find an explanation in doc/guides/rel_notes/deprecation.rst > > > > I think we should also add some notes to lib/eal/include/generic/rte_atomic.h > > Tyler, Honnappa, Ruifeng, Konstantin, what do you think? > > > > Agree that we should add notes to rte_atomic.h. > The notes were not there for the sake of avoiding warnings on existing occurrences. > With Tyler's rte_atomic series merged, rte_atomicNN_xx can be marked as __rte_deprecated. > rte_smp_*mb can be marked as __rte_deprecated after existing occurrences are converted. Would you like to add some function comments to explain why it is deprecated? ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-20 15:52 ` Thomas Monjalon @ 2023-06-21 6:50 ` Ruifeng Wang 0 siblings, 0 replies; 30+ messages in thread From: Ruifeng Wang @ 2023-06-21 6:50 UTC (permalink / raw) To: thomas, Zhang, Qi Z, zhoumin Cc: dev, mb, konstantin.v.ananyev, Yang, Qiming, Wu, Wenjun1, drc, roretzla, stable, maobibo, nd, david.marchand, Honnappa Nagarahalli, Tyler Retzlaff, konstantin.ananyev, nd, nd > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Tuesday, June 20, 2023 11:53 PM > To: Zhang, Qi Z <qi.z.zhang@intel.com>; zhoumin <zhoumin@loongson.cn>; Ruifeng Wang > <Ruifeng.Wang@arm.com> > Cc: dev@dpdk.org; mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Yang, Qiming > <qiming.yang@intel.com>; Wu, Wenjun1 <wenjun1.wu@intel.com>; drc@linux.vnet.ibm.com; > roretzla@linux.microsoft.com; stable@dpdk.org; maobibo@loongson.cn; nd <nd@arm.com>; > david.marchand@redhat.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Tyler > Retzlaff <roretzla@microsoft.com>; konstantin.ananyev@huawei.com; nd <nd@arm.com> > Subject: Re: [PATCH v3] net/ixgbe: add proper memory barriers for some Rx functions > > 13/06/2023 11:25, Ruifeng Wang: > > From: Thomas Monjalon <thomas@monjalon.net> > > > 12/06/2023 13:58, zhoumin: > > > > On Mon, June 12, 2023 at 6:26PM, Thomas Monjalon wrote: > > > > > 15/05/2023 04:10, Zhang, Qi Z: > > > > >> From: Ruifeng Wang <Ruifeng.Wang@arm.com> > > > > >>> From: Min Zhou <zhoumin@loongson.cn> > > > > >>>> --- > > > > >>>> v3: > > > > >>>> - Use rte_smp_rmb() as the proper memory barrier instead of > > > > >>>> rte_rmb() > > > > >>>> --- > > > > >>>> v2: > > > > >>>> - Make the calling of rte_rmb() for all platforms > > > > >>>> --- > > > > > [...] > > > > >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > > >> Applied to dpdk-next-net-intel. > > > > >> > > > > >> Thanks > > > > >> Qi > > > > >> > > > > > Why ignoring checkpatch? > > > > > It is saying: > > > > > " > > > > > Warning in drivers/net/ixgbe/ixgbe_rxtx.c: > > > > > Using rte_smp_[r/w]mb > > > > > " > > > > > > > > > > > > I'm sorry. Should we never use rte_smp_[r/w]mb in the driver's code? > > > > > > No we should avoid. > > > It has been decided to slowly replace such barriers. > > > By the way, I think it is not enough documented. > > > You can find an explanation in doc/guides/rel_notes/deprecation.rst > > > > > > I think we should also add some notes to lib/eal/include/generic/rte_atomic.h > > > Tyler, Honnappa, Ruifeng, Konstantin, what do you think? > > > > > > > Agree that we should add notes to rte_atomic.h. > > The notes were not there for the sake of avoiding warnings on existing occurrences. > > With Tyler's rte_atomic series merged, rte_atomicNN_xx can be marked as __rte_deprecated. > > rte_smp_*mb can be marked as __rte_deprecated after existing occurrences are converted. > > Would you like to add some function comments to explain why it is deprecated? > Sure. Added notes in patch: http://patches.dpdk.org/project/dpdk/patch/20230621064420.163931-1-ruifeng.wang@arm.com/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4] net/ixgbe: add proper memory barriers for some Rx functions 2023-05-06 10:23 ` [PATCH v3] " Min Zhou 2023-05-08 6:03 ` Ruifeng Wang @ 2023-06-13 9:44 ` Min Zhou 2023-06-13 10:20 ` Ruifeng Wang 1 sibling, 1 reply; 30+ messages in thread From: Min Zhou @ 2023-06-13 9:44 UTC (permalink / raw) To: thomas, qi.z.zhang, mb, konstantin.v.ananyev, ruifeng.wang, drc, roretzla, qiming.yang, wenjun1.wu, zhoumin Cc: dev, stable, maobibo, jiawenwu Segmentation fault has been observed while running the ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which has 64 cores and 4 NUMA nodes. From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the segmentation fault will definitely happen even though on the other platforms. For example, if we made the first packet which had the EOP bit set had a zero length by force, the segmentation fault would happen on X86. Because when processd the first packet the first_seg->next will be NULL, if at the same time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, the following loop will be executed: for (lp = first_seg; lp->next != rxm; lp = lp->next) ; We know that the first_seg->next will be NULL under this condition. So the expression of lp->next->next will cause the segmentation fault. Normally, the length of the first packet with EOP bit set will be greater than rxq->crc_len. However, the out-of-order execution of CPU may make the read ordering of the status and the rest of the descriptor fields in this function not be correct. The related codes are as following: rxdp = &rx_ring[rx_id]; #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; #2 rxd = *rxdp; The sentence #2 may be executed before sentence #1. This action is likely to make the ready packet zero length. If the packet is the first packet and has the EOP bit set, the above segmentation fault will happen. So, we should add a proper memory barrier to ensure the read ordering be correct. We also did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even though we did not find segmentation fault in this function. Fixes: 8eecb3295ae ("ixgbe: add LRO support") Cc: stable@dpdk.org Signed-off-by: Min Zhou <zhoumin@loongson.cn> --- v4: - Replace rte_smp_rmb() with rte_atomic_thread_fence() as the proper memory barrier --- v3: - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() --- v2: - Make the calling of rte_rmb() for all platforms --- drivers/net/ixgbe/ixgbe_rxtx.c | 47 +++++++++++++++------------------- 1 file changed, 21 insertions(+), 26 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 6cbb992823..61f17cd90b 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1817,11 +1817,22 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, * of accesses cannot be reordered by the compiler. If they were * not volatile, they could be reordered which could lead to * using invalid descriptor fields when read from rxd. + * + * Meanwhile, to prevent the CPU from executing out of order, we + * need to use a proper memory barrier to ensure the memory + * ordering below. */ rxdp = &rx_ring[rx_id]; staterr = rxdp->wb.upper.status_error; if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) break; + + /* + * Use acquire fence to ensure that status_error which includes + * DD bit is loaded before loading of other descriptor words. + */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + rxd = *rxdp; /* @@ -2088,32 +2099,10 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts, next_desc: /* - * The code in this whole file uses the volatile pointer to - * ensure the read ordering of the status and the rest of the - * descriptor fields (on the compiler level only!!!). This is so - * UGLY - why not to just use the compiler barrier instead? DPDK - * even has the rte_compiler_barrier() for that. - * - * But most importantly this is just wrong because this doesn't - * ensure memory ordering in a general case at all. For - * instance, DPDK is supposed to work on Power CPUs where - * compiler barrier may just not be enough! - * - * I tried to write only this function properly to have a - * starting point (as a part of an LRO/RSC series) but the - * compiler cursed at me when I tried to cast away the - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm - * keeping it the way it is for now. - * - * The code in this file is broken in so many other places and - * will just not work on a big endian CPU anyway therefore the - * lines below will have to be revisited together with the rest - * of the ixgbe PMD. - * - * TODO: - * - Get rid of "volatile" and let the compiler do its job. - * - Use the proper memory barrier (rte_rmb()) to ensure the - * memory ordering below. + * "Volatile" only prevents caching of the variable marked + * volatile. Most important, "volatile" cannot prevent the CPU + * from executing out of order. So, it is necessary to use a + * proper memory barrier to ensure the memory ordering below. */ rxdp = &rx_ring[rx_id]; staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); @@ -2121,6 +2110,12 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts, if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; + /* + * Use acquire fence to ensure that status_error which includes + * DD bit is loaded before loading of other descriptor words. + */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + rxd = *rxdp; PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " -- 2.31.1 ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-13 9:44 ` [PATCH v4] " Min Zhou @ 2023-06-13 10:20 ` Ruifeng Wang 2023-06-13 12:11 ` Zhang, Qi Z 0 siblings, 1 reply; 30+ messages in thread From: Ruifeng Wang @ 2023-06-13 10:20 UTC (permalink / raw) To: Min Zhou, thomas, qi.z.zhang, mb, konstantin.v.ananyev, drc, roretzla, qiming.yang, wenjun1.wu Cc: dev, stable, maobibo, jiawenwu, nd > -----Original Message----- > From: Min Zhou <zhoumin@loongson.cn> > Sent: Tuesday, June 13, 2023 5:44 PM > To: thomas@monjalon.net; qi.z.zhang@intel.com; mb@smartsharesystems.com; > konstantin.v.ananyev@yandex.ru; Ruifeng Wang <Ruifeng.Wang@arm.com>; > drc@linux.vnet.ibm.com; roretzla@linux.microsoft.com; qiming.yang@intel.com; > wenjun1.wu@intel.com; zhoumin@loongson.cn > Cc: dev@dpdk.org; stable@dpdk.org; maobibo@loongson.cn; jiawenwu@trustnetic.com > Subject: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx functions > > Segmentation fault has been observed while running the > ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which > has 64 cores and 4 NUMA nodes. > > From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the > EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the > segmentation fault will definitely happen even though on the other platforms. For example, > if we made the first packet which had the EOP bit set had a zero length by force, the > segmentation fault would happen on X86. > > Because when processd the first packet the first_seg->next will be NULL, if at the same > time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, > the following loop will be executed: > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > ; > > We know that the first_seg->next will be NULL under this condition. So the expression of > lp->next->next will cause the segmentation fault. > > Normally, the length of the first packet with EOP bit set will be greater than rxq- > >crc_len. However, the out-of-order execution of CPU may make the read ordering of the > status and the rest of the descriptor fields in this function not be correct. The related > codes are as following: > > rxdp = &rx_ring[rx_id]; > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > #2 rxd = *rxdp; > > The sentence #2 may be executed before sentence #1. This action is likely to make the > ready packet zero length. If the packet is the first packet and has the EOP bit set, the > above segmentation fault will happen. > > So, we should add a proper memory barrier to ensure the read ordering be correct. We also > did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even > though we did not find segmentation fault in this function. > > Fixes: 8eecb3295ae ("ixgbe: add LRO support") > Cc: stable@dpdk.org > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > --- > v4: > - Replace rte_smp_rmb() with rte_atomic_thread_fence() as the proper memory > barrier > --- > v3: > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > --- > v2: > - Make the calling of rte_rmb() for all platforms > --- > > drivers/net/ixgbe/ixgbe_rxtx.c | 47 +++++++++++++++------------------- > 1 file changed, 21 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > 6cbb992823..61f17cd90b 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -1817,11 +1817,22 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, > * of accesses cannot be reordered by the compiler. If they were > * not volatile, they could be reordered which could lead to > * using invalid descriptor fields when read from rxd. > + * > + * Meanwhile, to prevent the CPU from executing out of order, we > + * need to use a proper memory barrier to ensure the memory > + * ordering below. > */ > rxdp = &rx_ring[rx_id]; > staterr = rxdp->wb.upper.status_error; > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > break; > + > + /* > + * Use acquire fence to ensure that status_error which includes > + * DD bit is loaded before loading of other descriptor words. > + */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > rxd = *rxdp; > > /* > @@ -2088,32 +2099,10 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts, > > next_desc: > /* > - * The code in this whole file uses the volatile pointer to > - * ensure the read ordering of the status and the rest of the > - * descriptor fields (on the compiler level only!!!). This is so > - * UGLY - why not to just use the compiler barrier instead? DPDK > - * even has the rte_compiler_barrier() for that. > - * > - * But most importantly this is just wrong because this doesn't > - * ensure memory ordering in a general case at all. For > - * instance, DPDK is supposed to work on Power CPUs where > - * compiler barrier may just not be enough! > - * > - * I tried to write only this function properly to have a > - * starting point (as a part of an LRO/RSC series) but the > - * compiler cursed at me when I tried to cast away the > - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm > - * keeping it the way it is for now. > - * > - * The code in this file is broken in so many other places and > - * will just not work on a big endian CPU anyway therefore the > - * lines below will have to be revisited together with the rest > - * of the ixgbe PMD. > - * > - * TODO: > - * - Get rid of "volatile" and let the compiler do its job. > - * - Use the proper memory barrier (rte_rmb()) to ensure the > - * memory ordering below. > + * "Volatile" only prevents caching of the variable marked > + * volatile. Most important, "volatile" cannot prevent the CPU > + * from executing out of order. So, it is necessary to use a > + * proper memory barrier to ensure the memory ordering below. > */ > rxdp = &rx_ring[rx_id]; > staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > @@ -2121,6 +2110,12 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts, > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > break; > > + /* > + * Use acquire fence to ensure that status_error which includes > + * DD bit is loaded before loading of other descriptor words. > + */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > rxd = *rxdp; > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > -- > 2.31.1 Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx functions 2023-06-13 10:20 ` Ruifeng Wang @ 2023-06-13 12:11 ` Zhang, Qi Z 0 siblings, 0 replies; 30+ messages in thread From: Zhang, Qi Z @ 2023-06-13 12:11 UTC (permalink / raw) To: Ruifeng Wang, Min Zhou, thomas, mb, konstantin.v.ananyev, drc, roretzla, Yang, Qiming, Wu, Wenjun1 Cc: dev, stable, maobibo, jiawenwu, nd > -----Original Message----- > From: Ruifeng Wang <Ruifeng.Wang@arm.com> > Sent: Tuesday, June 13, 2023 6:20 PM > To: Min Zhou <zhoumin@loongson.cn>; thomas@monjalon.net; Zhang, Qi Z > <qi.z.zhang@intel.com>; mb@smartsharesystems.com; > konstantin.v.ananyev@yandex.ru; drc@linux.vnet.ibm.com; > roretzla@linux.microsoft.com; Yang, Qiming <qiming.yang@intel.com>; Wu, > Wenjun1 <wenjun1.wu@intel.com> > Cc: dev@dpdk.org; stable@dpdk.org; maobibo@loongson.cn; > jiawenwu@trustnetic.com; nd <nd@arm.com> > Subject: RE: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx > functions > > > -----Original Message----- > > From: Min Zhou <zhoumin@loongson.cn> > > Sent: Tuesday, June 13, 2023 5:44 PM > > To: thomas@monjalon.net; qi.z.zhang@intel.com; > > mb@smartsharesystems.com; konstantin.v.ananyev@yandex.ru; Ruifeng > Wang > > <Ruifeng.Wang@arm.com>; drc@linux.vnet.ibm.com; > > roretzla@linux.microsoft.com; qiming.yang@intel.com; > > wenjun1.wu@intel.com; zhoumin@loongson.cn > > Cc: dev@dpdk.org; stable@dpdk.org; maobibo@loongson.cn; > > jiawenwu@trustnetic.com > > Subject: [PATCH v4] net/ixgbe: add proper memory barriers for some Rx > > functions > > > > Segmentation fault has been observed while running the > > ixgbe_recv_pkts_lro() function to receive packets on the Loongson > > 3C5000 processor which has 64 cores and 4 NUMA nodes. > > > > From the ixgbe_recv_pkts_lro() function, we found that as long as the > > first packet has the EOP bit set, and the length of this packet is > > less than or equal to rxq->crc_len, the segmentation fault will > > definitely happen even though on the other platforms. For example, if > > we made the first packet which had the EOP bit set had a zero length by > force, the segmentation fault would happen on X86. > > > > Because when processd the first packet the first_seg->next will be > > NULL, if at the same time this packet has the EOP bit set and its > > length is less than or equal to rxq->crc_len, the following loop will be > executed: > > > > for (lp = first_seg; lp->next != rxm; lp = lp->next) > > ; > > > > We know that the first_seg->next will be NULL under this condition. So > > the expression of > > lp->next->next will cause the segmentation fault. > > > > Normally, the length of the first packet with EOP bit set will be > > greater than rxq- > > >crc_len. However, the out-of-order execution of CPU may make the read > > >ordering of the > > status and the rest of the descriptor fields in this function not be > > correct. The related codes are as following: > > > > rxdp = &rx_ring[rx_id]; > > #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > #2 rxd = *rxdp; > > > > The sentence #2 may be executed before sentence #1. This action is > > likely to make the ready packet zero length. If the packet is the > > first packet and has the EOP bit set, the above segmentation fault will > happen. > > > > So, we should add a proper memory barrier to ensure the read ordering > > be correct. We also did the same thing in the ixgbe_recv_pkts() > > function to make the rxd data be valid even though we did not find > segmentation fault in this function. > > > > Fixes: 8eecb3295ae ("ixgbe: add LRO support") > > Cc: stable@dpdk.org > > > > Signed-off-by: Min Zhou <zhoumin@loongson.cn> > > --- > > v4: > > - Replace rte_smp_rmb() with rte_atomic_thread_fence() as the proper > memory > > barrier > > --- > > v3: > > - Use rte_smp_rmb() as the proper memory barrier instead of rte_rmb() > > --- > > v2: > > - Make the calling of rte_rmb() for all platforms > > --- > > > > drivers/net/ixgbe/ixgbe_rxtx.c | 47 > > +++++++++++++++------------------- > > 1 file changed, 21 insertions(+), 26 deletions(-) > > > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > > b/drivers/net/ixgbe/ixgbe_rxtx.c index 6cbb992823..61f17cd90b 100644 > > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > @@ -1817,11 +1817,22 @@ ixgbe_recv_pkts(void *rx_queue, struct > rte_mbuf **rx_pkts, > > * of accesses cannot be reordered by the compiler. If they > were > > * not volatile, they could be reordered which could lead to > > * using invalid descriptor fields when read from rxd. > > + * > > + * Meanwhile, to prevent the CPU from executing out of > order, we > > + * need to use a proper memory barrier to ensure the > memory > > + * ordering below. > > */ > > rxdp = &rx_ring[rx_id]; > > staterr = rxdp->wb.upper.status_error; > > if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD))) > > break; > > + > > + /* > > + * Use acquire fence to ensure that status_error which > includes > > + * DD bit is loaded before loading of other descriptor words. > > + */ > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > + > > rxd = *rxdp; > > > > /* > > @@ -2088,32 +2099,10 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct > > rte_mbuf **rx_pkts, uint16_t nb_pkts, > > > > next_desc: > > /* > > - * The code in this whole file uses the volatile pointer to > > - * ensure the read ordering of the status and the rest of the > > - * descriptor fields (on the compiler level only!!!). This is so > > - * UGLY - why not to just use the compiler barrier instead? > DPDK > > - * even has the rte_compiler_barrier() for that. > > - * > > - * But most importantly this is just wrong because this > doesn't > > - * ensure memory ordering in a general case at all. For > > - * instance, DPDK is supposed to work on Power CPUs where > > - * compiler barrier may just not be enough! > > - * > > - * I tried to write only this function properly to have a > > - * starting point (as a part of an LRO/RSC series) but the > > - * compiler cursed at me when I tried to cast away the > > - * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm > > - * keeping it the way it is for now. > > - * > > - * The code in this file is broken in so many other places and > > - * will just not work on a big endian CPU anyway therefore > the > > - * lines below will have to be revisited together with the rest > > - * of the ixgbe PMD. > > - * > > - * TODO: > > - * - Get rid of "volatile" and let the compiler do its job. > > - * - Use the proper memory barrier (rte_rmb()) to ensure > the > > - * memory ordering below. > > + * "Volatile" only prevents caching of the variable marked > > + * volatile. Most important, "volatile" cannot prevent the > CPU > > + * from executing out of order. So, it is necessary to use a > > + * proper memory barrier to ensure the memory ordering > below. > > */ > > rxdp = &rx_ring[rx_id]; > > staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); > > @@ -2121,6 +2110,12 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct > > rte_mbuf **rx_pkts, uint16_t nb_pkts, > > if (!(staterr & IXGBE_RXDADV_STAT_DD)) > > break; > > > > + /* > > + * Use acquire fence to ensure that status_error which > includes > > + * DD bit is loaded before loading of other descriptor words. > > + */ > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > + > > rxd = *rxdp; > > > > PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u " > > -- > > 2.31.1 > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Applied to dpdk-next-net-intel. Thanks Qi ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2023-06-21 6:51 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-24 9:05 [PATCH v2] net/ixgbe: add proper memory barriers for some Rx functions Min Zhou 2023-04-28 3:43 ` Zhang, Qi Z 2023-04-28 6:27 ` Morten Brørup 2023-05-04 12:58 ` zhoumin 2023-05-04 12:42 ` zhoumin 2023-05-01 13:29 ` Konstantin Ananyev 2023-05-04 6:13 ` Ruifeng Wang 2023-05-05 1:45 ` zhoumin 2023-05-04 13:16 ` zhoumin 2023-05-04 13:21 ` Morten Brørup 2023-05-04 13:33 ` Zhang, Qi Z 2023-05-05 2:42 ` zhoumin 2023-05-06 1:30 ` Zhang, Qi Z 2023-05-05 1:54 ` zhoumin 2023-05-06 10:23 ` [PATCH v3] " Min Zhou 2023-05-08 6:03 ` Ruifeng Wang 2023-05-15 2:10 ` Zhang, Qi Z 2023-06-12 10:26 ` Thomas Monjalon 2023-06-12 11:58 ` zhoumin 2023-06-12 12:44 ` Thomas Monjalon 2023-06-13 1:42 ` zhoumin 2023-06-13 3:30 ` Jiawen Wu 2023-06-13 10:12 ` zhoumin 2023-06-14 10:58 ` Konstantin Ananyev 2023-06-13 9:25 ` Ruifeng Wang 2023-06-20 15:52 ` Thomas Monjalon 2023-06-21 6:50 ` Ruifeng Wang 2023-06-13 9:44 ` [PATCH v4] " Min Zhou 2023-06-13 10:20 ` Ruifeng Wang 2023-06-13 12:11 ` Zhang, Qi Z
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).