From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
by inbox.dpdk.org (Postfix) with ESMTP id A04C845BB1;
Thu, 24 Oct 2024 13:22:35 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
by mails.dpdk.org (Postfix) with ESMTP id 90DAA40281;
Thu, 24 Oct 2024 13:22:35 +0200 (CEST)
Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178])
by mails.dpdk.org (Postfix) with ESMTP id 7010F4003C
for ; Thu, 24 Oct 2024 13:22:34 +0200 (CEST)
Received: by inbox.dpdk.org (Postfix, from userid 33)
id 5F15445BC2; Thu, 24 Oct 2024 13:22:34 +0200 (CEST)
From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [DPDK/ethdev Bug 1570] Bonding mode 4 DMA errors
Date: Thu, 24 Oct 2024 11:22:34 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: DPDK
X-Bugzilla-Component: ethdev
X-Bugzilla-Version: 22.11
X-Bugzilla-Keywords:
X-Bugzilla-Severity: major
X-Bugzilla-Who: elmedin.zildzic@ericsson.com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution:
X-Bugzilla-Priority: Normal
X-Bugzilla-Assigned-To: dev@dpdk.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
op_sys bug_status bug_severity priority component assigned_to reporter
target_milestone attachments.created
Message-ID:
Content-Type: multipart/alternative; boundary=17297689540.7bdF7.713302
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
MIME-Version: 1.0
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dev-bounces@dpdk.org
--17297689540.7bdF7.713302
Date: Thu, 24 Oct 2024 13:22:34 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
https://bugs.dpdk.org/show_bug.cgi?id=3D1570
Bug ID: 1570
Summary: Bonding mode 4 DMA errors
Product: DPDK
Version: 22.11
Hardware: x86
OS: Linux
Status: UNCONFIRMED
Severity: major
Priority: Normal
Component: ethdev
Assignee: dev@dpdk.org
Reporter: elmedin.zildzic@ericsson.com
Target Milestone: ---
Created attachment 293
--> https://bugs.dpdk.org/attachment.cgi?id=3D293&action=3Dedit
dma-error-backtrace
When trying to configure bonding mode 4 using members with iavf driver (for
intel 700 series NICs) we see these DMA errors:
"EAL: Cannot set up DMA remapping, error 12 (Cannot allocate memory)"
When this happens we also see TX errors on the devices, so I tried dumping =
DMA
vaddrs and enabling TX descriptor dumps for iavf and saw the following:
DMA errors occuring at:
iova=3D0x2351200000, len=3D2097152
iova=3D0x2351400000, len=3D2097152
iova=3D0x2351600000, len=3D2097152
iova=3D0x2351800000, len=3D2097152
iova=3D0x2351a00000, len=3D2097152
iova=3D0x2351c00000, len=3D2097152=20
TX descriptor dumps:
Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000050
So DMA errors are probably the root cause for the TX errors. I tried figuri=
ng
out why DMA errors occur so I added an abort on DMA error to generate a
coredump. I've attached the backtrace of the interesting threads.=20
Looking at the backtrace, it looks like LSC callback is called at the same =
time
as we're starting the iavf member devices, and this seems to cause the DMA
errors. The reason I say that is because I tried synchronizing the threads =
and
the DMA errors disappeared. So far we have two workarounds for this problem:
1. Synchronize threads with locks
2. Pre-allocate more memory, hence no need to expand heap and do DMA
remapping.
Maybe someone can explain why these DMA errors occur when the threads are n=
ot
synched? What would be the proper fix for this?
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--17297689540.7bdF7.713302
Date: Thu, 24 Oct 2024 13:22:34 +0200
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
Created attachment 293[details]
dma-error-backtrace
When trying to configure bonding mode 4 using members with iavf driver (for
intel 700 series NICs) we see these DMA errors:
"EAL: Cannot set up DMA remapping, error 12 (Cannot allocate memory)&q=
uot;
When this happens we also see TX errors on the devices, so I tried dumping =
DMA
vaddrs and enabling TX descriptor dumps for iavf and saw the following:
DMA errors occuring at:
iova=3D0x2351200000, len=3D2097152
iova=3D0x2351400000, len=3D2097152
iova=3D0x2351600000, len=3D2097152
iova=3D0x2351800000, len=3D2097152
iova=3D0x2351a00000, len=3D2097152
iova=3D0x2351c00000, len=3D2097152=20
TX descriptor dumps:
Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000050
So DMA errors are probably the root cause for the TX errors. I tried figuri=
ng
out why DMA errors occur so I added an abort on DMA error to generate a
coredump. I've attached the backtrace of the interesting threads.=20
Looking at the backtrace, it looks like LSC callback is called at the same =
time
as we're starting the iavf member devices, and this seems to cause the DMA
errors. The reason I say that is because I tried synchronizing the threads =
and
the DMA errors disappeared. So far we have two workarounds for this problem:
1. Synchronize threads with locks
2. Pre-allocate more memory, hence no need to expand heap and do DMA
remapping.
Maybe someone can explain why these DMA errors occur when the threads are n=
ot
synched? What would be the proper fix for this?
Created attachment 293 [details] dma-error-backtrace When trying to configure bonding mode 4 using members with iavf driver (for intel 700 series NICs) we see these DMA errors: "EAL: Cannot set up DMA remapping, error 12 (Cannot allocate memory)&q= uot; When this happens we also see TX errors on the devices, so I tried dumping = DMA vaddrs and enabling TX descriptor dumps for iavf and saw the following: DMA errors occuring at: iova=3D0x2351200000, len=3D2097152 iova=3D0x2351400000, len=3D2097152 iova=3D0x2351600000, len=3D2097152 iova=3D0x2351800000, len=3D2097152 iova=3D0x2351a00000, len=3D2097152 iova=3D0x2351c00000, len=3D2097152=20 TX descriptor dumps: Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000050 So DMA errors are probably the root cause for the TX errors. I tried figuri= ng out why DMA errors occur so I added an abort on DMA error to generate a coredump. I've attached the backtrace of the interesting threads.=20 Looking at the backtrace, it looks like LSC callback is called at the same = time as we're starting the iavf member devices, and this seems to cause the DMA errors. The reason I say that is because I tried synchronizing the threads = and the DMA errors disappeared. So far we have two workarounds for this problem: 1. Synchronize threads with locks 2. Pre-allocate more memory, hence no need to expand heap and do DMA remapping. Maybe someone can explain why these DMA errors occur when the threads are n= ot synched? What would be the proper fix for this?