* [PATCH] net: increase the maximum of RX/TX descriptors
@ 2024-10-29 12:48 Lukas Sismis
2024-10-29 14:37 ` Morten Brørup
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Lukas Sismis @ 2024-10-29 12:48 UTC (permalink / raw)
To: anatoly.burakov, ian.stokes; +Cc: dev, Lukas Sismis
Intel PMDs are capped by default to only 4096 RX/TX descriptors.
This can be limiting for applications requiring a bigger buffer
capabilities. The cap prevented the applications to configure
more descriptors. By bufferring more packets with RX/TX
descriptors, the applications can better handle the processing
peaks.
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
doc/guides/nics/ixgbe.rst | 2 +-
drivers/net/cpfl/cpfl_rxtx.h | 2 +-
drivers/net/e1000/e1000_ethdev.h | 2 +-
drivers/net/iavf/iavf_rxtx.h | 2 +-
drivers/net/ice/ice_rxtx.h | 2 +-
drivers/net/idpf/idpf_rxtx.h | 2 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.h | 2 +-
8 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 14573b542e..291b33d699 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -76,7 +76,7 @@ Scattered packets are not supported in this mode.
If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB),
vPMD for RX would be disabled.
-By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
+By default, IXGBE_MAX_RING_DESC is set to 32768 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
Windows Prerequisites and Pre-conditions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/drivers/net/cpfl/cpfl_rxtx.h b/drivers/net/cpfl/cpfl_rxtx.h
index aacd087b56..4db4025771 100644
--- a/drivers/net/cpfl/cpfl_rxtx.h
+++ b/drivers/net/cpfl/cpfl_rxtx.h
@@ -11,7 +11,7 @@
/* In QLEN must be whole number of 32 descriptors. */
#define CPFL_ALIGN_RING_DESC 32
#define CPFL_MIN_RING_DESC 32
-#define CPFL_MAX_RING_DESC 4096
+#define CPFL_MAX_RING_DESC 32768
#define CPFL_DMA_MEM_ALIGN 4096
#define CPFL_MAX_HAIRPINQ_RX_2_TX 1
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 339ae1f4b6..e9046047f6 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -107,7 +107,7 @@
* (num_ring_desc * sizeof(struct e1000_rx/tx_desc)) % 128 == 0
*/
#define E1000_MIN_RING_DESC 32
-#define E1000_MAX_RING_DESC 4096
+#define E1000_MAX_RING_DESC 32768
/*
* TDBA/RDBA should be aligned on 16 byte boundary. But TDLEN/RDLEN should be
diff --git a/drivers/net/iavf/iavf_rxtx.h b/drivers/net/iavf/iavf_rxtx.h
index 7b56076d32..f9c129f0ef 100644
--- a/drivers/net/iavf/iavf_rxtx.h
+++ b/drivers/net/iavf/iavf_rxtx.h
@@ -8,7 +8,7 @@
/* In QLEN must be whole number of 32 descriptors. */
#define IAVF_ALIGN_RING_DESC 32
#define IAVF_MIN_RING_DESC 64
-#define IAVF_MAX_RING_DESC 4096
+#define IAVF_MAX_RING_DESC 32768
#define IAVF_DMA_MEM_ALIGN 4096
/* Base address of the HW descriptor ring should be 128B aligned. */
#define IAVF_RING_BASE_ALIGN 128
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f7276cfc9f..6d18fe908d 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -9,7 +9,7 @@
#define ICE_ALIGN_RING_DESC 32
#define ICE_MIN_RING_DESC 64
-#define ICE_MAX_RING_DESC 4096
+#define ICE_MAX_RING_DESC 32768
#define ICE_DMA_MEM_ALIGN 4096
#define ICE_RING_BASE_ALIGN 128
diff --git a/drivers/net/idpf/idpf_rxtx.h b/drivers/net/idpf/idpf_rxtx.h
index 41a7495083..0f78f7cba5 100644
--- a/drivers/net/idpf/idpf_rxtx.h
+++ b/drivers/net/idpf/idpf_rxtx.h
@@ -11,7 +11,7 @@
/* In QLEN must be whole number of 32 descriptors. */
#define IDPF_ALIGN_RING_DESC 32
#define IDPF_MIN_RING_DESC 32
-#define IDPF_MAX_RING_DESC 4096
+#define IDPF_MAX_RING_DESC 32768
#define IDPF_DMA_MEM_ALIGN 4096
/* Base address of the HW descriptor ring should be 128B aligned. */
#define IDPF_RING_BASE_ALIGN 128
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 7da2ccf6a8..a2637f0a91 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -73,7 +73,7 @@
#define IXGBE_MMW_SIZE_DEFAULT 0x4
#define IXGBE_MMW_SIZE_JUMBO_FRAME 0x14
-#define IXGBE_MAX_RING_DESC 4096 /* replicate define from rxtx */
+#define IXGBE_MAX_RING_DESC 32768 /* replicate define from rxtx */
/*
* Default values for RX/TX configuration
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index ee89c89929..a28037b08a 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -25,7 +25,7 @@
* (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
*/
#define IXGBE_MIN_RING_DESC 32
-#define IXGBE_MAX_RING_DESC 4096
+#define IXGBE_MAX_RING_DESC 32768
#define RTE_PMD_IXGBE_TX_MAX_BURST 32
#define RTE_PMD_IXGBE_RX_MAX_BURST 32
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-29 12:48 [PATCH] net: increase the maximum of RX/TX descriptors Lukas Sismis
@ 2024-10-29 14:37 ` Morten Brørup
2024-10-30 13:58 ` Lukáš Šišmiš
2024-10-30 15:06 ` [PATCH v2 1/2] net/ixgbe: " Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
2 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2024-10-29 14:37 UTC (permalink / raw)
To: Lukas Sismis, anatoly.burakov, ian.stokes; +Cc: dev
> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> Sent: Tuesday, 29 October 2024 13.49
>
> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> This can be limiting for applications requiring a bigger buffer
> capabilities. The cap prevented the applications to configure
> more descriptors. By bufferring more packets with RX/TX
> descriptors, the applications can better handle the processing
> peaks.
>
> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> ---
Seems like a good idea.
Have the max number of descriptors been checked with the datasheets for all the affected NIC chips?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-29 14:37 ` Morten Brørup
@ 2024-10-30 13:58 ` Lukáš Šišmiš
2024-10-30 15:20 ` Stephen Hemminger
0 siblings, 1 reply; 23+ messages in thread
From: Lukáš Šišmiš @ 2024-10-30 13:58 UTC (permalink / raw)
To: Morten Brørup, anatoly.burakov, ian.stokes; +Cc: dev
On 29. 10. 24 15:37, Morten Brørup wrote:
>> From: Lukas Sismis [mailto:sismis@cesnet.cz]
>> Sent: Tuesday, 29 October 2024 13.49
>>
>> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
>> This can be limiting for applications requiring a bigger buffer
>> capabilities. The cap prevented the applications to configure
>> more descriptors. By bufferring more packets with RX/TX
>> descriptors, the applications can better handle the processing
>> peaks.
>>
>> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
>> ---
> Seems like a good idea.
>
> Have the max number of descriptors been checked with the datasheets for all the affected NIC chips?
>
I was hoping to get some feedback on this from the Intel folks.
But it seems like I can change it only for ixgbe (82599) to 32k
(possibly to 64k - 8), others - ice (E810) and i40e (X710) are capped at
8k - 32.
I neither have any experience with other drivers nor I have them
available to test so I will let it be in the follow-up version of this
patch.
Lukas
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/2] net/ixgbe: increase the maximum of RX/TX descriptors
2024-10-29 12:48 [PATCH] net: increase the maximum of RX/TX descriptors Lukas Sismis
2024-10-29 14:37 ` Morten Brørup
@ 2024-10-30 15:06 ` Lukas Sismis
2024-10-30 15:06 ` [PATCH v2 2/2] net/ice: " Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
2 siblings, 1 reply; 23+ messages in thread
From: Lukas Sismis @ 2024-10-30 15:06 UTC (permalink / raw)
To: mb; +Cc: anatoly.burakov, ian.stokes, dev, Lukas Sismis
Intel PMDs are capped by default to only 4096 RX/TX descriptors.
This can be limiting for applications requiring a bigger buffer
capabilities. By bufferring more packets with RX/TX
descriptors, the applications can better handle the processing
peaks.
Setting ixgbe max descriptors to 8192 as per datasheet:
Register name: RDLEN
Description: Descriptor Ring Length.
This register sets the number of bytes
allocated for descriptors in the circular descriptor buffer.
It must be 128B aligned (7 LS bit must be set to zero).
** Note: validated Lengths up to 128K (8K descriptors). **
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
doc/guides/nics/ixgbe.rst | 2 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 14573b542e..c5c6a6c34b 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -76,7 +76,7 @@ Scattered packets are not supported in this mode.
If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB),
vPMD for RX would be disabled.
-By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
+By default, IXGBE_MAX_RING_DESC is set to 8192 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
Windows Prerequisites and Pre-conditions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 7da2ccf6a8..da9b3d7ca7 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -73,7 +73,7 @@
#define IXGBE_MMW_SIZE_DEFAULT 0x4
#define IXGBE_MMW_SIZE_JUMBO_FRAME 0x14
-#define IXGBE_MAX_RING_DESC 4096 /* replicate define from rxtx */
+#define IXGBE_MAX_RING_DESC 8192 /* replicate define from rxtx */
/*
* Default values for RX/TX configuration
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index ee89c89929..0550c1da60 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -25,7 +25,7 @@
* (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
*/
#define IXGBE_MIN_RING_DESC 32
-#define IXGBE_MAX_RING_DESC 4096
+#define IXGBE_MAX_RING_DESC 8192
#define RTE_PMD_IXGBE_TX_MAX_BURST 32
#define RTE_PMD_IXGBE_RX_MAX_BURST 32
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 2/2] net/ice: increase the maximum of RX/TX descriptors
2024-10-30 15:06 ` [PATCH v2 1/2] net/ixgbe: " Lukas Sismis
@ 2024-10-30 15:06 ` Lukas Sismis
0 siblings, 0 replies; 23+ messages in thread
From: Lukas Sismis @ 2024-10-30 15:06 UTC (permalink / raw)
To: mb; +Cc: anatoly.burakov, ian.stokes, dev, Lukas Sismis
Intel PMDs are capped by default to only 4096 RX/TX descriptors.
This can be limiting for applications requiring a bigger buffer
capabilities. By bufferring more packets with RX/TX
descriptors, the applications can better handle the processing
peaks.
Setting ice max descriptors to 8192 - 32 as per datasheet:
Register name: QLEN (Rx-Queue)
Description: Receive Queue Length
Defines the size of the descriptor queue in descriptors units
from eight descriptors (QLEN=0x8) up to 8K descriptors minus
32 (QLEN=0x1FE0).
QLEN Restrictions: When the PXE_MODE flag in the
GLLAN_RCTL_0 register is cleared, the QLEN must be whole
number of 32 descriptors. When the PXE_MODE flag is set, the
QLEN can be one of the following options:
Up to 4 PFs, QLEN can be set to: 8, 16, 24 or 32 descriptors.
Up to 8 PFs, QLEN can be set to: 8 or 16 descriptors
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
drivers/net/ice/ice_rxtx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f7276cfc9f..b43f9fcd1b 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -9,7 +9,7 @@
#define ICE_ALIGN_RING_DESC 32
#define ICE_MIN_RING_DESC 64
-#define ICE_MAX_RING_DESC 4096
+#define ICE_MAX_RING_DESC 8192 - 32
#define ICE_DMA_MEM_ALIGN 4096
#define ICE_RING_BASE_ALIGN 128
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-30 13:58 ` Lukáš Šišmiš
@ 2024-10-30 15:20 ` Stephen Hemminger
2024-10-30 15:40 ` Lukáš Šišmiš
0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2024-10-30 15:20 UTC (permalink / raw)
To: Lukáš Šišmiš
Cc: Morten Brørup, anatoly.burakov, ian.stokes, dev
On Wed, 30 Oct 2024 14:58:40 +0100
Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> On 29. 10. 24 15:37, Morten Brørup wrote:
> >> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> >> Sent: Tuesday, 29 October 2024 13.49
> >>
> >> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> >> This can be limiting for applications requiring a bigger buffer
> >> capabilities. The cap prevented the applications to configure
> >> more descriptors. By bufferring more packets with RX/TX
> >> descriptors, the applications can better handle the processing
> >> peaks.
> >>
> >> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> >> ---
> > Seems like a good idea.
> >
> > Have the max number of descriptors been checked with the datasheets for all the affected NIC chips?
> >
> I was hoping to get some feedback on this from the Intel folks.
>
> But it seems like I can change it only for ixgbe (82599) to 32k
> (possibly to 64k - 8), others - ice (E810) and i40e (X710) are capped at
> 8k - 32.
>
> I neither have any experience with other drivers nor I have them
> available to test so I will let it be in the follow-up version of this
> patch.
>
> Lukas
>
Having large number of descriptors especially at lower speeds will
increase buffer bloat. For real life applications, do not want increase
latency more than 1ms.
10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us per packet)
A ring of 4096 descriptors can take 6 ms for full size packets.
Be careful, optimizing for 64 byte benchmarks can be disaster in real world.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-30 15:20 ` Stephen Hemminger
@ 2024-10-30 15:40 ` Lukáš Šišmiš
2024-10-30 15:58 ` Bruce Richardson
2024-10-30 16:06 ` Stephen Hemminger
0 siblings, 2 replies; 23+ messages in thread
From: Lukáš Šišmiš @ 2024-10-30 15:40 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Morten Brørup, anatoly.burakov, ian.stokes, dev
On 30. 10. 24 16:20, Stephen Hemminger wrote:
> On Wed, 30 Oct 2024 14:58:40 +0100
> Lukáš Šišmiš <sismis@cesnet.cz> wrote:
>
>> On 29. 10. 24 15:37, Morten Brørup wrote:
>>>> From: Lukas Sismis [mailto:sismis@cesnet.cz]
>>>> Sent: Tuesday, 29 October 2024 13.49
>>>>
>>>> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
>>>> This can be limiting for applications requiring a bigger buffer
>>>> capabilities. The cap prevented the applications to configure
>>>> more descriptors. By bufferring more packets with RX/TX
>>>> descriptors, the applications can better handle the processing
>>>> peaks.
>>>>
>>>> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
>>>> ---
>>> Seems like a good idea.
>>>
>>> Have the max number of descriptors been checked with the datasheets for all the affected NIC chips?
>>>
>> I was hoping to get some feedback on this from the Intel folks.
>>
>> But it seems like I can change it only for ixgbe (82599) to 32k
>> (possibly to 64k - 8), others - ice (E810) and i40e (X710) are capped at
>> 8k - 32.
>>
>> I neither have any experience with other drivers nor I have them
>> available to test so I will let it be in the follow-up version of this
>> patch.
>>
>> Lukas
>>
> Having large number of descriptors especially at lower speeds will
> increase buffer bloat. For real life applications, do not want increase
> latency more than 1ms.
>
> 10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
> Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us per packet)
> A ring of 4096 descriptors can take 6 ms for full size packets.
>
> Be careful, optimizing for 64 byte benchmarks can be disaster in real world.
>
Thanks for the info Stephen, however I am not trying to optimize for 64
byte benchmarks. The work has been initiated by an IO problem and Intel
NICs. Suricata IDS worker (1 core per queue) received a burst of packets
and then sequentially processes them one by one. Well it seems like
having a 4k buffers it seems to not be enough. NVIDIA NICs allow e.g.
32k descriptors and it works fine. In the end it worked fine when ixgbe
descriptors were increased as well. I am not sure why AF-Packet can
handle this much better than DPDK, AFP doesn't have crazy high number of
descriptors configured <= 4096, yet it works better. At the moment I
assume there is an internal buffering in the kernel which allows to
handle processing spikes.
To give more context here is the forum discussion -
https://forum.suricata.io/t/high-packet-drop-rate-with-dpdk-compared-to-af-packet-in-suricata-7-0-7/4896
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 1/1] net/bonding: make bonding functions stable
2024-10-29 12:48 [PATCH] net: increase the maximum of RX/TX descriptors Lukas Sismis
2024-10-29 14:37 ` Morten Brørup
2024-10-30 15:06 ` [PATCH v2 1/2] net/ixgbe: " Lukas Sismis
@ 2024-10-30 15:42 ` Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors Lukas Sismis
` (2 more replies)
2 siblings, 3 replies; 23+ messages in thread
From: Lukas Sismis @ 2024-10-30 15:42 UTC (permalink / raw)
To: dev; +Cc: stephen, mb, anatoly.burakov, ian.stokes, Lukas Sismis
Remove rte_experimental macros from the stable functions
as they have been part of the stable API since 23.11.
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
drivers/net/bonding/rte_eth_bond.h | 4 ----
drivers/net/bonding/rte_eth_bond_8023ad.h | 1 -
drivers/net/bonding/version.map | 15 +++++----------
3 files changed, 5 insertions(+), 15 deletions(-)
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index e59ff8793e..4f79ff9b85 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -125,7 +125,6 @@ rte_eth_bond_free(const char *name);
* @return
* 0 on success, negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_member_add(uint16_t bonding_port_id, uint16_t member_port_id);
@@ -138,7 +137,6 @@ rte_eth_bond_member_add(uint16_t bonding_port_id, uint16_t member_port_id);
* @return
* 0 on success, negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_member_remove(uint16_t bonding_port_id, uint16_t member_port_id);
@@ -199,7 +197,6 @@ rte_eth_bond_primary_get(uint16_t bonding_port_id);
* Number of members associated with bonding device on success,
* negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_members_get(uint16_t bonding_port_id, uint16_t members[],
uint16_t len);
@@ -216,7 +213,6 @@ rte_eth_bond_members_get(uint16_t bonding_port_id, uint16_t members[],
* Number of active members associated with bonding device on success,
* negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_active_members_get(uint16_t bonding_port_id, uint16_t members[],
uint16_t len);
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h
index b2deb26e2e..5432eafcfe 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.h
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.h
@@ -193,7 +193,6 @@ rte_eth_bond_8023ad_setup(uint16_t port_id,
* -EINVAL if conf is NULL or member id is invalid (not a member of given
* bonding device or is not inactive).
*/
-__rte_experimental
int
rte_eth_bond_8023ad_member_info(uint16_t port_id, uint16_t member_id,
struct rte_eth_bond_8023ad_member_info *conf);
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index a309469b1f..eb37dadf76 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -11,12 +11,17 @@ DPDK_25 {
rte_eth_bond_8023ad_ext_distrib;
rte_eth_bond_8023ad_ext_distrib_get;
rte_eth_bond_8023ad_ext_slowtx;
+ rte_eth_bond_8023ad_member_info;
rte_eth_bond_8023ad_setup;
+ rte_eth_bond_active_members_get;
rte_eth_bond_create;
rte_eth_bond_free;
rte_eth_bond_link_monitoring_set;
rte_eth_bond_mac_address_reset;
rte_eth_bond_mac_address_set;
+ rte_eth_bond_member_add;
+ rte_eth_bond_member_remove;
+ rte_eth_bond_members_get;
rte_eth_bond_mode_get;
rte_eth_bond_mode_set;
rte_eth_bond_primary_get;
@@ -26,13 +31,3 @@ DPDK_25 {
local: *;
};
-
-EXPERIMENTAL {
- # added in 23.11
- global:
- rte_eth_bond_8023ad_member_info;
- rte_eth_bond_active_members_get;
- rte_eth_bond_member_add;
- rte_eth_bond_member_remove;
- rte_eth_bond_members_get;
-};
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
@ 2024-10-30 15:42 ` Lukas Sismis
2024-10-30 16:26 ` Morten Brørup
2024-10-30 15:42 ` [PATCH v3 2/2] net/ice: " Lukas Sismis
2024-10-31 2:24 ` [PATCH v3 1/1] net/bonding: make bonding functions stable lihuisong (C)
2 siblings, 1 reply; 23+ messages in thread
From: Lukas Sismis @ 2024-10-30 15:42 UTC (permalink / raw)
To: dev; +Cc: stephen, mb, anatoly.burakov, ian.stokes, Lukas Sismis
Intel PMDs are capped by default to only 4096 RX/TX descriptors.
This can be limiting for applications requiring a bigger buffer
capabilities. By bufferring more packets with RX/TX
descriptors, the applications can better handle the processing
peaks.
Setting ixgbe max descriptors to 8192 as per datasheet:
Register name: RDLEN
Description: Descriptor Ring Length.
This register sets the number of bytes
allocated for descriptors in the circular descriptor buffer.
It must be 128B aligned (7 LS bit must be set to zero).
** Note: validated Lengths up to 128K (8K descriptors). **
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
doc/guides/nics/ixgbe.rst | 2 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 2 +-
drivers/net/ixgbe/ixgbe_rxtx.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 14573b542e..c5c6a6c34b 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -76,7 +76,7 @@ Scattered packets are not supported in this mode.
If an incoming packet is greater than the maximum acceptable length of one "mbuf" data size (by default, the size is 2 KB),
vPMD for RX would be disabled.
-By default, IXGBE_MAX_RING_DESC is set to 4096 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
+By default, IXGBE_MAX_RING_DESC is set to 8192 and RTE_PMD_IXGBE_RX_MAX_BURST is set to 32.
Windows Prerequisites and Pre-conditions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 7da2ccf6a8..da9b3d7ca7 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -73,7 +73,7 @@
#define IXGBE_MMW_SIZE_DEFAULT 0x4
#define IXGBE_MMW_SIZE_JUMBO_FRAME 0x14
-#define IXGBE_MAX_RING_DESC 4096 /* replicate define from rxtx */
+#define IXGBE_MAX_RING_DESC 8192 /* replicate define from rxtx */
/*
* Default values for RX/TX configuration
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index ee89c89929..0550c1da60 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -25,7 +25,7 @@
* (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
*/
#define IXGBE_MIN_RING_DESC 32
-#define IXGBE_MAX_RING_DESC 4096
+#define IXGBE_MAX_RING_DESC 8192
#define RTE_PMD_IXGBE_TX_MAX_BURST 32
#define RTE_PMD_IXGBE_RX_MAX_BURST 32
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 2/2] net/ice: increase the maximum of RX/TX descriptors
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors Lukas Sismis
@ 2024-10-30 15:42 ` Lukas Sismis
2024-10-30 16:26 ` Morten Brørup
2024-10-31 2:24 ` [PATCH v3 1/1] net/bonding: make bonding functions stable lihuisong (C)
2 siblings, 1 reply; 23+ messages in thread
From: Lukas Sismis @ 2024-10-30 15:42 UTC (permalink / raw)
To: dev; +Cc: stephen, mb, anatoly.burakov, ian.stokes, Lukas Sismis
Intel PMDs are capped by default to only 4096 RX/TX descriptors.
This can be limiting for applications requiring a bigger buffer
capabilities. By bufferring more packets with RX/TX
descriptors, the applications can better handle the processing
peaks.
Setting ice max descriptors to 8192 - 32 as per datasheet:
Register name: QLEN (Rx-Queue)
Description: Receive Queue Length
Defines the size of the descriptor queue in descriptors units
from eight descriptors (QLEN=0x8) up to 8K descriptors minus
32 (QLEN=0x1FE0).
QLEN Restrictions: When the PXE_MODE flag in the
GLLAN_RCTL_0 register is cleared, the QLEN must be whole
number of 32 descriptors. When the PXE_MODE flag is set, the
QLEN can be one of the following options:
Up to 4 PFs, QLEN can be set to: 8, 16, 24 or 32 descriptors.
Up to 8 PFs, QLEN can be set to: 8 or 16 descriptors
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
drivers/net/ice/ice_rxtx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index f7276cfc9f..45f25b3609 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -9,7 +9,7 @@
#define ICE_ALIGN_RING_DESC 32
#define ICE_MIN_RING_DESC 64
-#define ICE_MAX_RING_DESC 4096
+#define ICE_MAX_RING_DESC (8192 - 32)
#define ICE_DMA_MEM_ALIGN 4096
#define ICE_RING_BASE_ALIGN 128
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-30 15:40 ` Lukáš Šišmiš
@ 2024-10-30 15:58 ` Bruce Richardson
2024-10-30 16:06 ` Stephen Hemminger
1 sibling, 0 replies; 23+ messages in thread
From: Bruce Richardson @ 2024-10-30 15:58 UTC (permalink / raw)
To: Lukáš Šišmiš
Cc: Stephen Hemminger, Morten Brørup, anatoly.burakov, ian.stokes, dev
On Wed, Oct 30, 2024 at 04:40:10PM +0100, Lukáš Šišmiš wrote:
>
> On 30. 10. 24 16:20, Stephen Hemminger wrote:
> > On Wed, 30 Oct 2024 14:58:40 +0100
> > Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> >
> > > On 29. 10. 24 15:37, Morten Brørup wrote:
> > > > > From: Lukas Sismis [mailto:sismis@cesnet.cz]
> > > > > Sent: Tuesday, 29 October 2024 13.49
> > > > >
> > > > > Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> > > > > This can be limiting for applications requiring a bigger buffer
> > > > > capabilities. The cap prevented the applications to configure
> > > > > more descriptors. By bufferring more packets with RX/TX
> > > > > descriptors, the applications can better handle the processing
> > > > > peaks.
> > > > >
> > > > > Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> > > > > ---
> > > > Seems like a good idea.
> > > >
> > > > Have the max number of descriptors been checked with the datasheets for all the affected NIC chips?
> > > I was hoping to get some feedback on this from the Intel folks.
> > >
> > > But it seems like I can change it only for ixgbe (82599) to 32k
> > > (possibly to 64k - 8), others - ice (E810) and i40e (X710) are capped at
> > > 8k - 32.
> > >
> > > I neither have any experience with other drivers nor I have them
> > > available to test so I will let it be in the follow-up version of this
> > > patch.
> > >
> > > Lukas
> > >
> > Having large number of descriptors especially at lower speeds will
> > increase buffer bloat. For real life applications, do not want increase
> > latency more than 1ms.
> >
> > 10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
> > Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us per packet)
> > A ring of 4096 descriptors can take 6 ms for full size packets.
> >
> > Be careful, optimizing for 64 byte benchmarks can be disaster in real world.
> >
> Thanks for the info Stephen, however I am not trying to optimize for 64 byte
> benchmarks. The work has been initiated by an IO problem and Intel NICs.
> Suricata IDS worker (1 core per queue) received a burst of packets and then
> sequentially processes them one by one. Well it seems like having a 4k
> buffers it seems to not be enough. NVIDIA NICs allow e.g. 32k descriptors
> and it works fine. In the end it worked fine when ixgbe descriptors were
> increased as well. I am not sure why AF-Packet can handle this much better
> than DPDK, AFP doesn't have crazy high number of descriptors configured <=
> 4096, yet it works better. At the moment I assume there is an internal
> buffering in the kernel which allows to handle processing spikes.
>
> To give more context here is the forum discussion - https://forum.suricata.io/t/high-packet-drop-rate-with-dpdk-compared-to-af-packet-in-suricata-7-0-7/4896
>
Thanks for the context, and it is an interesting discussion.
One small suggestion, which I sadly don't think it will help with your
problem specifically, but I suspect that you don't need both Rx and Tx
queues to be that big. Given that the traffic going out is not going to be
greater than the traffic rate coming in, you shouldn't need much buffering,
on the Tx side. Therefore, even if you increase the Rx buffers to 32k, I'd
suggest using only 1k or 512 Tx ring slots and see how it goes. That will
give you better performance due to a reduced memory buffer footprint. Any
packets buffers transmitted will remain in the NIC ring until SW wraps all
the way around the ring, meaning a 4k Tx ring will likely always hold 4k-64
buffers in it, and similarly a 32k Tx ring will increase your active buffer
count (and hence app cache footprint) by 32k-64.
/Bruce
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-30 15:40 ` Lukáš Šišmiš
2024-10-30 15:58 ` Bruce Richardson
@ 2024-10-30 16:06 ` Stephen Hemminger
2024-11-05 8:49 ` Morten Brørup
1 sibling, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2024-10-30 16:06 UTC (permalink / raw)
To: Lukáš Šišmiš
Cc: Morten Brørup, anatoly.burakov, ian.stokes, dev
On Wed, 30 Oct 2024 16:40:10 +0100
Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> On 30. 10. 24 16:20, Stephen Hemminger wrote:
> > On Wed, 30 Oct 2024 14:58:40 +0100
> > Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> >
> >> On 29. 10. 24 15:37, Morten Brørup wrote:
> >>>> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> >>>> Sent: Tuesday, 29 October 2024 13.49
> >>>>
> >>>> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> >>>> This can be limiting for applications requiring a bigger buffer
> >>>> capabilities. The cap prevented the applications to configure
> >>>> more descriptors. By bufferring more packets with RX/TX
> >>>> descriptors, the applications can better handle the processing
> >>>> peaks.
> >>>>
> >>>> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> >>>> ---
> >>> Seems like a good idea.
> >>>
> >>> Have the max number of descriptors been checked with the datasheets for all the affected NIC chips?
> >>>
> >> I was hoping to get some feedback on this from the Intel folks.
> >>
> >> But it seems like I can change it only for ixgbe (82599) to 32k
> >> (possibly to 64k - 8), others - ice (E810) and i40e (X710) are capped at
> >> 8k - 32.
> >>
> >> I neither have any experience with other drivers nor I have them
> >> available to test so I will let it be in the follow-up version of this
> >> patch.
> >>
> >> Lukas
> >>
> > Having large number of descriptors especially at lower speeds will
> > increase buffer bloat. For real life applications, do not want increase
> > latency more than 1ms.
> >
> > 10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
> > Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us per packet)
> > A ring of 4096 descriptors can take 6 ms for full size packets.
> >
> > Be careful, optimizing for 64 byte benchmarks can be disaster in real world.
> >
> Thanks for the info Stephen, however I am not trying to optimize for 64
> byte benchmarks. The work has been initiated by an IO problem and Intel
> NICs. Suricata IDS worker (1 core per queue) received a burst of packets
> and then sequentially processes them one by one. Well it seems like
> having a 4k buffers it seems to not be enough. NVIDIA NICs allow e.g.
> 32k descriptors and it works fine. In the end it worked fine when ixgbe
> descriptors were increased as well. I am not sure why AF-Packet can
> handle this much better than DPDK, AFP doesn't have crazy high number of
> descriptors configured <= 4096, yet it works better. At the moment I
> assume there is an internal buffering in the kernel which allows to
> handle processing spikes.
>
> To give more context here is the forum discussion -
> https://forum.suricata.io/t/high-packet-drop-rate-with-dpdk-compared-to-af-packet-in-suricata-7-0-7/4896
>
>
>
I suspect AF_PACKET provides an intermediate step which can buffer more
or spread out the work.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors
2024-10-30 15:42 ` [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors Lukas Sismis
@ 2024-10-30 16:26 ` Morten Brørup
2024-11-01 11:16 ` Bruce Richardson
0 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2024-10-30 16:26 UTC (permalink / raw)
To: Lukas Sismis, dev; +Cc: stephen, anatoly.burakov, ian.stokes, Bruce Richardson
> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> Sent: Wednesday, 30 October 2024 16.43
>
> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> This can be limiting for applications requiring a bigger buffer
> capabilities. By bufferring more packets with RX/TX
> descriptors, the applications can better handle the processing
> peaks.
>
> Setting ixgbe max descriptors to 8192 as per datasheet:
> Register name: RDLEN
> Description: Descriptor Ring Length.
> This register sets the number of bytes
> allocated for descriptors in the circular descriptor buffer.
> It must be 128B aligned (7 LS bit must be set to zero).
> ** Note: validated Lengths up to 128K (8K descriptors). **
>
> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> ---
Drivers should reflect hardware capabilities; it's not up to the driver to impose artificial limits on applications. Thank you for fixing this, Lukas.
Acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH v3 2/2] net/ice: increase the maximum of RX/TX descriptors
2024-10-30 15:42 ` [PATCH v3 2/2] net/ice: " Lukas Sismis
@ 2024-10-30 16:26 ` Morten Brørup
0 siblings, 0 replies; 23+ messages in thread
From: Morten Brørup @ 2024-10-30 16:26 UTC (permalink / raw)
To: Lukas Sismis, dev; +Cc: stephen, anatoly.burakov, ian.stokes, Bruce Richardson
> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> Sent: Wednesday, 30 October 2024 16.43
>
> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> This can be limiting for applications requiring a bigger buffer
> capabilities. By bufferring more packets with RX/TX
> descriptors, the applications can better handle the processing
> peaks.
>
> Setting ice max descriptors to 8192 - 32 as per datasheet:
> Register name: QLEN (Rx-Queue)
> Description: Receive Queue Length
> Defines the size of the descriptor queue in descriptors units
> from eight descriptors (QLEN=0x8) up to 8K descriptors minus
> 32 (QLEN=0x1FE0).
> QLEN Restrictions: When the PXE_MODE flag in the
> GLLAN_RCTL_0 register is cleared, the QLEN must be whole
> number of 32 descriptors. When the PXE_MODE flag is set, the
> QLEN can be one of the following options:
> Up to 4 PFs, QLEN can be set to: 8, 16, 24 or 32 descriptors.
> Up to 8 PFs, QLEN can be set to: 8 or 16 descriptors
>
> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> ---
Drivers should reflect hardware capabilities; it's not up to the driver to impose artificial limits on applications. Thank you for fixing this, Lukas.
Acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 1/1] net/bonding: make bonding functions stable
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 2/2] net/ice: " Lukas Sismis
@ 2024-10-31 2:24 ` lihuisong (C)
2024-11-06 2:14 ` Ferruh Yigit
2 siblings, 1 reply; 23+ messages in thread
From: lihuisong (C) @ 2024-10-31 2:24 UTC (permalink / raw)
To: Lukas Sismis, dev; +Cc: stephen, mb, anatoly.burakov, ian.stokes
Acked-by: Huisong Li <lihuisong@huawei.com>
在 2024/10/30 23:42, Lukas Sismis 写道:
> Remove rte_experimental macros from the stable functions
> as they have been part of the stable API since 23.11.
>
> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> ---
> drivers/net/bonding/rte_eth_bond.h | 4 ----
> drivers/net/bonding/rte_eth_bond_8023ad.h | 1 -
> drivers/net/bonding/version.map | 15 +++++----------
> 3 files changed, 5 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
> index e59ff8793e..4f79ff9b85 100644
> --- a/drivers/net/bonding/rte_eth_bond.h
> +++ b/drivers/net/bonding/rte_eth_bond.h
> @@ -125,7 +125,6 @@ rte_eth_bond_free(const char *name);
> * @return
> * 0 on success, negative value otherwise
> */
> -__rte_experimental
> int
> rte_eth_bond_member_add(uint16_t bonding_port_id, uint16_t member_port_id);
>
> @@ -138,7 +137,6 @@ rte_eth_bond_member_add(uint16_t bonding_port_id, uint16_t member_port_id);
> * @return
> * 0 on success, negative value otherwise
> */
> -__rte_experimental
> int
> rte_eth_bond_member_remove(uint16_t bonding_port_id, uint16_t member_port_id);
>
> @@ -199,7 +197,6 @@ rte_eth_bond_primary_get(uint16_t bonding_port_id);
> * Number of members associated with bonding device on success,
> * negative value otherwise
> */
> -__rte_experimental
> int
> rte_eth_bond_members_get(uint16_t bonding_port_id, uint16_t members[],
> uint16_t len);
> @@ -216,7 +213,6 @@ rte_eth_bond_members_get(uint16_t bonding_port_id, uint16_t members[],
> * Number of active members associated with bonding device on success,
> * negative value otherwise
> */
> -__rte_experimental
> int
> rte_eth_bond_active_members_get(uint16_t bonding_port_id, uint16_t members[],
> uint16_t len);
> diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h
> index b2deb26e2e..5432eafcfe 100644
> --- a/drivers/net/bonding/rte_eth_bond_8023ad.h
> +++ b/drivers/net/bonding/rte_eth_bond_8023ad.h
> @@ -193,7 +193,6 @@ rte_eth_bond_8023ad_setup(uint16_t port_id,
> * -EINVAL if conf is NULL or member id is invalid (not a member of given
> * bonding device or is not inactive).
> */
> -__rte_experimental
> int
> rte_eth_bond_8023ad_member_info(uint16_t port_id, uint16_t member_id,
> struct rte_eth_bond_8023ad_member_info *conf);
> diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
> index a309469b1f..eb37dadf76 100644
> --- a/drivers/net/bonding/version.map
> +++ b/drivers/net/bonding/version.map
> @@ -11,12 +11,17 @@ DPDK_25 {
> rte_eth_bond_8023ad_ext_distrib;
> rte_eth_bond_8023ad_ext_distrib_get;
> rte_eth_bond_8023ad_ext_slowtx;
> + rte_eth_bond_8023ad_member_info;
> rte_eth_bond_8023ad_setup;
> + rte_eth_bond_active_members_get;
> rte_eth_bond_create;
> rte_eth_bond_free;
> rte_eth_bond_link_monitoring_set;
> rte_eth_bond_mac_address_reset;
> rte_eth_bond_mac_address_set;
> + rte_eth_bond_member_add;
> + rte_eth_bond_member_remove;
> + rte_eth_bond_members_get;
> rte_eth_bond_mode_get;
> rte_eth_bond_mode_set;
> rte_eth_bond_primary_get;
> @@ -26,13 +31,3 @@ DPDK_25 {
>
> local: *;
> };
> -
> -EXPERIMENTAL {
> - # added in 23.11
> - global:
> - rte_eth_bond_8023ad_member_info;
> - rte_eth_bond_active_members_get;
> - rte_eth_bond_member_add;
> - rte_eth_bond_member_remove;
> - rte_eth_bond_members_get;
> -};
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors
2024-10-30 16:26 ` Morten Brørup
@ 2024-11-01 11:16 ` Bruce Richardson
0 siblings, 0 replies; 23+ messages in thread
From: Bruce Richardson @ 2024-11-01 11:16 UTC (permalink / raw)
To: Morten Brørup
Cc: Lukas Sismis, dev, stephen, anatoly.burakov, ian.stokes
On Wed, Oct 30, 2024 at 05:26:12PM +0100, Morten Brørup wrote:
> > From: Lukas Sismis [mailto:sismis@cesnet.cz]
> > Sent: Wednesday, 30 October 2024 16.43
> >
> > Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> > This can be limiting for applications requiring a bigger buffer
> > capabilities. By bufferring more packets with RX/TX
> > descriptors, the applications can better handle the processing
> > peaks.
> >
> > Setting ixgbe max descriptors to 8192 as per datasheet:
> > Register name: RDLEN
> > Description: Descriptor Ring Length.
> > This register sets the number of bytes
> > allocated for descriptors in the circular descriptor buffer.
> > It must be 128B aligned (7 LS bit must be set to zero).
> > ** Note: validated Lengths up to 128K (8K descriptors). **
FYI: Don't think we need the full quote from the datasheet, reducing this to
a one-line summary on apply.
> >
> > Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> > ---
>
> Drivers should reflect hardware capabilities; it's not up to the driver to impose artificial limits on applications. Thank you for fixing this, Lukas.
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
Series-acked-by: Bruce Richardson <bruce.richardson@intel.com>
Both patches applied to dpdk-next-net-intel tree.
Thanks,
/Bruce
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH] net: increase the maximum of RX/TX descriptors
2024-10-30 16:06 ` Stephen Hemminger
@ 2024-11-05 8:49 ` Morten Brørup
2024-11-05 15:55 ` Stephen Hemminger
0 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2024-11-05 8:49 UTC (permalink / raw)
To: Stephen Hemminger, Lukáš Šišmiš
Cc: anatoly.burakov, ian.stokes, dev, bruce.richardson
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 30 October 2024 17.07
>
> On Wed, 30 Oct 2024 16:40:10 +0100
> Lukáš Šišmiš <sismis@cesnet.cz> wrote:
>
> > On 30. 10. 24 16:20, Stephen Hemminger wrote:
> > > On Wed, 30 Oct 2024 14:58:40 +0100
> > > Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> > >
> > >> On 29. 10. 24 15:37, Morten Brørup wrote:
> > >>>> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> > >>>> Sent: Tuesday, 29 October 2024 13.49
> > >>>>
> > >>>> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> > >>>> This can be limiting for applications requiring a bigger buffer
> > >>>> capabilities. The cap prevented the applications to configure
> > >>>> more descriptors. By bufferring more packets with RX/TX
> > >>>> descriptors, the applications can better handle the processing
> > >>>> peaks.
> > >>>>
> > >>>> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> > >>>> ---
> > >>> Seems like a good idea.
> > >>>
> > >>> Have the max number of descriptors been checked with the
> datasheets for all the affected NIC chips?
> > >>>
> > >> I was hoping to get some feedback on this from the Intel folks.
> > >>
> > >> But it seems like I can change it only for ixgbe (82599) to 32k
> > >> (possibly to 64k - 8), others - ice (E810) and i40e (X710) are
> capped at
> > >> 8k - 32.
> > >>
> > >> I neither have any experience with other drivers nor I have them
> > >> available to test so I will let it be in the follow-up version of
> this
> > >> patch.
> > >>
> > >> Lukas
> > >>
> > > Having large number of descriptors especially at lower speeds will
> > > increase buffer bloat. For real life applications, do not want
> increase
> > > latency more than 1ms.
> > >
> > > 10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
> > > Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us
> per packet)
> > > A ring of 4096 descriptors can take 6 ms for full size packets.
> > >
> > > Be careful, optimizing for 64 byte benchmarks can be disaster in
> real world.
> > >
> > Thanks for the info Stephen, however I am not trying to optimize for
> 64
> > byte benchmarks. The work has been initiated by an IO problem and
> Intel
> > NICs. Suricata IDS worker (1 core per queue) received a burst of
> packets
> > and then sequentially processes them one by one. Well it seems like
> > having a 4k buffers it seems to not be enough. NVIDIA NICs allow e.g.
> > 32k descriptors and it works fine. In the end it worked fine when
> ixgbe
> > descriptors were increased as well. I am not sure why AF-Packet can
> > handle this much better than DPDK, AFP doesn't have crazy high number
> of
> > descriptors configured <= 4096, yet it works better. At the moment I
> > assume there is an internal buffering in the kernel which allows to
> > handle processing spikes.
> >
> > To give more context here is the forum discussion -
> > https://forum.suricata.io/t/high-packet-drop-rate-with-dpdk-compared-
> to-af-packet-in-suricata-7-0-7/4896
> >
> >
> >
>
> I suspect AF_PACKET provides an intermediate step which can buffer more
> or spread out the work.
Agree. It's a Linux scheduling issue.
With DPDK polling, there is no interrupt in the kernel scheduler.
If the CPU core running the DPDK polling thread is running some other thread when the packets arrive on the hardware, the DPDK polling thread is NOT scheduled immediately, but has to wait for the kernel scheduler to switch to this thread instead of the other thread.
Quite a lot of time can pass before this happens - the kernel scheduler does not know that the DPDK polling thread has urgent work pending.
And the number of RX descriptors needs to be big enough to absorb all packets arriving during the scheduling delay.
It is not well described how to *guarantee* that nothing but the DPDK polling thread runs on a dedicated CPU core.
With AF_PACKET, the hardware generates an interrupt, and the kernel immediately calls the driver's interrupt handler - regardless what the CPU core is currently doing.
The driver's interrupt handler acknowledges the interrupt to the hardware and informs the kernel that the softirq handler is pending.
AFAIU, the kernel executes pending softirq handlers immediately after returning from an interrupt handler - regardless what the CPU core was doing the interrupt occurred.
The softirq handler then dequeues the packets from the hardware RX descriptors into SKBs, and when all of them have been dequeued from the hardware, enables interrupts. Then the CPU core resumes the work it was doing when interrupted.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-11-05 8:49 ` Morten Brørup
@ 2024-11-05 15:55 ` Stephen Hemminger
2024-11-05 16:50 ` Morten Brørup
0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2024-11-05 15:55 UTC (permalink / raw)
To: Morten Brørup
Cc: Lukáš Šišmiš,
anatoly.burakov, ian.stokes, dev, bruce.richardson
On Tue, 5 Nov 2024 09:49:39 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > I suspect AF_PACKET provides an intermediate step which can buffer more
> > or spread out the work.
>
> Agree. It's a Linux scheduling issue.
>
> With DPDK polling, there is no interrupt in the kernel scheduler.
> If the CPU core running the DPDK polling thread is running some other thread when the packets arrive on the hardware, the DPDK polling thread is NOT scheduled immediately, but has to wait for the kernel scheduler to switch to this thread instead of the other thread.
> Quite a lot of time can pass before this happens - the kernel scheduler does not know that the DPDK polling thread has urgent work pending.
> And the number of RX descriptors needs to be big enough to absorb all packets arriving during the scheduling delay.
> It is not well described how to *guarantee* that nothing but the DPDK polling thread runs on a dedicated CPU core.
That why any non-trivial DPDK application needs to run on isolated cpu's.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH] net: increase the maximum of RX/TX descriptors
2024-11-05 15:55 ` Stephen Hemminger
@ 2024-11-05 16:50 ` Morten Brørup
2024-11-05 21:20 ` Lukáš Šišmiš
0 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2024-11-05 16:50 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Lukáš Šišmiš,
anatoly.burakov, ian.stokes, dev, bruce.richardson
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 5 November 2024 16.55
>
> On Tue, 5 Nov 2024 09:49:39 +0100
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > >
> > > I suspect AF_PACKET provides an intermediate step which can buffer
> more
> > > or spread out the work.
> >
> > Agree. It's a Linux scheduling issue.
> >
> > With DPDK polling, there is no interrupt in the kernel scheduler.
> > If the CPU core running the DPDK polling thread is running some other
> thread when the packets arrive on the hardware, the DPDK polling thread
> is NOT scheduled immediately, but has to wait for the kernel scheduler
> to switch to this thread instead of the other thread.
> > Quite a lot of time can pass before this happens - the kernel
> scheduler does not know that the DPDK polling thread has urgent work
> pending.
> > And the number of RX descriptors needs to be big enough to absorb all
> packets arriving during the scheduling delay.
> > It is not well described how to *guarantee* that nothing but the DPDK
> polling thread runs on a dedicated CPU core.
>
> That why any non-trivial DPDK application needs to run on isolated
> cpu's.
Exactly.
And it is non-trivial and not well described how to do this.
Especially in virtual environments.
E.g. I ran some scheduling latency tests earlier today, and frequently observed 500-1000 us scheduling latency under vmware vSphere ESXi. This requires a large number of RX descriptors to absorb without packet loss. (Disclaimer: The virtual machine configuration had not been optimized. Tweaking the knobs offered by the hypervisor might improve this.)
The exact same firmware (same kernel, rootfs, libraries, applications etc.) running directly on our purpose-built hardware has scheduling latency very close to the kernel's default "timerslack" (50 us).
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] net: increase the maximum of RX/TX descriptors
2024-11-05 16:50 ` Morten Brørup
@ 2024-11-05 21:20 ` Lukáš Šišmiš
0 siblings, 0 replies; 23+ messages in thread
From: Lukáš Šišmiš @ 2024-11-05 21:20 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger
Cc: anatoly.burakov, ian.stokes, dev, bruce.richardson
On 05. 11. 24 17:50, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 5 November 2024 16.55
>>
>> On Tue, 5 Nov 2024 09:49:39 +0100
>> Morten Brørup <mb@smartsharesystems.com> wrote:
>>
>>>> I suspect AF_PACKET provides an intermediate step which can buffer
>> more
>>>> or spread out the work.
>>> Agree. It's a Linux scheduling issue.
>>>
>>> With DPDK polling, there is no interrupt in the kernel scheduler.
>>> If the CPU core running the DPDK polling thread is running some other
>> thread when the packets arrive on the hardware, the DPDK polling thread
>> is NOT scheduled immediately, but has to wait for the kernel scheduler
>> to switch to this thread instead of the other thread.
>>> Quite a lot of time can pass before this happens - the kernel
>> scheduler does not know that the DPDK polling thread has urgent work
>> pending.
>>> And the number of RX descriptors needs to be big enough to absorb all
>> packets arriving during the scheduling delay.
>>> It is not well described how to *guarantee* that nothing but the DPDK
>> polling thread runs on a dedicated CPU core.
>>
>> That why any non-trivial DPDK application needs to run on isolated
>> cpu's.
> Exactly.
> And it is non-trivial and not well described how to do this.
>
> Especially in virtual environments.
> E.g. I ran some scheduling latency tests earlier today, and frequently observed 500-1000 us scheduling latency under vmware vSphere ESXi. This requires a large number of RX descriptors to absorb without packet loss. (Disclaimer: The virtual machine configuration had not been optimized. Tweaking the knobs offered by the hypervisor might improve this.)
>
> The exact same firmware (same kernel, rootfs, libraries, applications etc.) running directly on our purpose-built hardware has scheduling latency very close to the kernel's default "timerslack" (50 us).
>
Thanks for the feedback, I am currently not 100% I ran my earlier
experiments on isolcpus and whether it had a massive impact or not.
But here is a decent guide on latency tuning I found the other day
though virtual environments are not exactly described.
https://rigtorp.se/low-latency-guide/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 1/1] net/bonding: make bonding functions stable
2024-10-31 2:24 ` [PATCH v3 1/1] net/bonding: make bonding functions stable lihuisong (C)
@ 2024-11-06 2:14 ` Ferruh Yigit
0 siblings, 0 replies; 23+ messages in thread
From: Ferruh Yigit @ 2024-11-06 2:14 UTC (permalink / raw)
To: lihuisong (C), Lukas Sismis, dev; +Cc: stephen, mb, anatoly.burakov, ian.stokes
On 10/31/2024 2:24 AM, lihuisong (C) wrote:
> 在 2024/10/30 23:42, Lukas Sismis 写道:
>> Remove rte_experimental macros from the stable functions
>> as they have been part of the stable API since 23.11.
>>
>> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
>
> Acked-by: Huisong Li <lihuisong@huawei.com>
>
Carrying from other thread:
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Applied to dpdk-next-net/main, thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 1/1] net/bonding: make bonding functions stable
2024-10-29 20:44 Lukas Sismis
@ 2024-10-29 22:22 ` Stephen Hemminger
0 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2024-10-29 22:22 UTC (permalink / raw)
To: Lukas Sismis; +Cc: chas3, dev
On Tue, 29 Oct 2024 21:44:16 +0100
Lukas Sismis <sismis@cesnet.cz> wrote:
> Remove rte_experimental macros from the stable functions
> as they have been part of the stable API since 23.11.
>
> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 1/1] net/bonding: make bonding functions stable
@ 2024-10-29 20:44 Lukas Sismis
2024-10-29 22:22 ` Stephen Hemminger
0 siblings, 1 reply; 23+ messages in thread
From: Lukas Sismis @ 2024-10-29 20:44 UTC (permalink / raw)
To: chas3; +Cc: dev, Lukas Sismis
Remove rte_experimental macros from the stable functions
as they have been part of the stable API since 23.11.
Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
---
drivers/net/bonding/rte_eth_bond.h | 4 ----
drivers/net/bonding/rte_eth_bond_8023ad.h | 1 -
drivers/net/bonding/version.map | 15 +++++----------
3 files changed, 5 insertions(+), 15 deletions(-)
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index e59ff8793e..4f79ff9b85 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -125,7 +125,6 @@ rte_eth_bond_free(const char *name);
* @return
* 0 on success, negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_member_add(uint16_t bonding_port_id, uint16_t member_port_id);
@@ -138,7 +137,6 @@ rte_eth_bond_member_add(uint16_t bonding_port_id, uint16_t member_port_id);
* @return
* 0 on success, negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_member_remove(uint16_t bonding_port_id, uint16_t member_port_id);
@@ -199,7 +197,6 @@ rte_eth_bond_primary_get(uint16_t bonding_port_id);
* Number of members associated with bonding device on success,
* negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_members_get(uint16_t bonding_port_id, uint16_t members[],
uint16_t len);
@@ -216,7 +213,6 @@ rte_eth_bond_members_get(uint16_t bonding_port_id, uint16_t members[],
* Number of active members associated with bonding device on success,
* negative value otherwise
*/
-__rte_experimental
int
rte_eth_bond_active_members_get(uint16_t bonding_port_id, uint16_t members[],
uint16_t len);
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h
index b2deb26e2e..5432eafcfe 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.h
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.h
@@ -193,7 +193,6 @@ rte_eth_bond_8023ad_setup(uint16_t port_id,
* -EINVAL if conf is NULL or member id is invalid (not a member of given
* bonding device or is not inactive).
*/
-__rte_experimental
int
rte_eth_bond_8023ad_member_info(uint16_t port_id, uint16_t member_id,
struct rte_eth_bond_8023ad_member_info *conf);
diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map
index a309469b1f..eb37dadf76 100644
--- a/drivers/net/bonding/version.map
+++ b/drivers/net/bonding/version.map
@@ -11,12 +11,17 @@ DPDK_25 {
rte_eth_bond_8023ad_ext_distrib;
rte_eth_bond_8023ad_ext_distrib_get;
rte_eth_bond_8023ad_ext_slowtx;
+ rte_eth_bond_8023ad_member_info;
rte_eth_bond_8023ad_setup;
+ rte_eth_bond_active_members_get;
rte_eth_bond_create;
rte_eth_bond_free;
rte_eth_bond_link_monitoring_set;
rte_eth_bond_mac_address_reset;
rte_eth_bond_mac_address_set;
+ rte_eth_bond_member_add;
+ rte_eth_bond_member_remove;
+ rte_eth_bond_members_get;
rte_eth_bond_mode_get;
rte_eth_bond_mode_set;
rte_eth_bond_primary_get;
@@ -26,13 +31,3 @@ DPDK_25 {
local: *;
};
-
-EXPERIMENTAL {
- # added in 23.11
- global:
- rte_eth_bond_8023ad_member_info;
- rte_eth_bond_active_members_get;
- rte_eth_bond_member_add;
- rte_eth_bond_member_remove;
- rte_eth_bond_members_get;
-};
--
2.34.1
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-11-06 2:14 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-29 12:48 [PATCH] net: increase the maximum of RX/TX descriptors Lukas Sismis
2024-10-29 14:37 ` Morten Brørup
2024-10-30 13:58 ` Lukáš Šišmiš
2024-10-30 15:20 ` Stephen Hemminger
2024-10-30 15:40 ` Lukáš Šišmiš
2024-10-30 15:58 ` Bruce Richardson
2024-10-30 16:06 ` Stephen Hemminger
2024-11-05 8:49 ` Morten Brørup
2024-11-05 15:55 ` Stephen Hemminger
2024-11-05 16:50 ` Morten Brørup
2024-11-05 21:20 ` Lukáš Šišmiš
2024-10-30 15:06 ` [PATCH v2 1/2] net/ixgbe: " Lukas Sismis
2024-10-30 15:06 ` [PATCH v2 2/2] net/ice: " Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors Lukas Sismis
2024-10-30 16:26 ` Morten Brørup
2024-11-01 11:16 ` Bruce Richardson
2024-10-30 15:42 ` [PATCH v3 2/2] net/ice: " Lukas Sismis
2024-10-30 16:26 ` Morten Brørup
2024-10-31 2:24 ` [PATCH v3 1/1] net/bonding: make bonding functions stable lihuisong (C)
2024-11-06 2:14 ` Ferruh Yigit
2024-10-29 20:44 Lukas Sismis
2024-10-29 22:22 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).