From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A04C845BB1; Thu, 24 Oct 2024 13:22:35 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 90DAA40281; Thu, 24 Oct 2024 13:22:35 +0200 (CEST) Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178]) by mails.dpdk.org (Postfix) with ESMTP id 7010F4003C for ; Thu, 24 Oct 2024 13:22:34 +0200 (CEST) Received: by inbox.dpdk.org (Postfix, from userid 33) id 5F15445BC2; Thu, 24 Oct 2024 13:22:34 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Subject: [DPDK/ethdev Bug 1570] Bonding mode 4 DMA errors Date: Thu, 24 Oct 2024 11:22:34 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: ethdev X-Bugzilla-Version: 22.11 X-Bugzilla-Keywords: X-Bugzilla-Severity: major X-Bugzilla-Who: elmedin.zildzic@ericsson.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: multipart/alternative; boundary=17297689540.7bdF7.713302 Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --17297689540.7bdF7.713302 Date: Thu, 24 Oct 2024 13:22:34 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All https://bugs.dpdk.org/show_bug.cgi?id=3D1570 Bug ID: 1570 Summary: Bonding mode 4 DMA errors Product: DPDK Version: 22.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: major Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: elmedin.zildzic@ericsson.com Target Milestone: --- Created attachment 293 --> https://bugs.dpdk.org/attachment.cgi?id=3D293&action=3Dedit dma-error-backtrace When trying to configure bonding mode 4 using members with iavf driver (for intel 700 series NICs) we see these DMA errors: "EAL: Cannot set up DMA remapping, error 12 (Cannot allocate memory)" When this happens we also see TX errors on the devices, so I tried dumping = DMA vaddrs and enabling TX descriptor dumps for iavf and saw the following: DMA errors occuring at: iova=3D0x2351200000, len=3D2097152 iova=3D0x2351400000, len=3D2097152 iova=3D0x2351600000, len=3D2097152 iova=3D0x2351800000, len=3D2097152 iova=3D0x2351a00000, len=3D2097152 iova=3D0x2351c00000, len=3D2097152=20 TX descriptor dumps: Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000050 Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000040 Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000050 So DMA errors are probably the root cause for the TX errors. I tried figuri= ng out why DMA errors occur so I added an abort on DMA error to generate a coredump. I've attached the backtrace of the interesting threads.=20 Looking at the backtrace, it looks like LSC callback is called at the same = time as we're starting the iavf member devices, and this seems to cause the DMA errors. The reason I say that is because I tried synchronizing the threads = and the DMA errors disappeared. So far we have two workarounds for this problem: 1. Synchronize threads with locks 2. Pre-allocate more memory, hence no need to expand heap and do DMA remapping. Maybe someone can explain why these DMA errors occur when the threads are n= ot synched? What would be the proper fix for this? --=20 You are receiving this mail because: You are the assignee for the bug.= --17297689540.7bdF7.713302 Date: Thu, 24 Oct 2024 13:22:34 +0200 MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All
Bug ID 1570
Summary Bonding mode 4 DMA errors
Product DPDK
Version 22.11
Hardware x86
OS Linux
Status UNCONFIRMED
Severity major
Priority Normal
Component ethdev
Assignee dev@dpdk.org
Reporter elmedin.zildzic@ericsson.com
Target Milestone ---

Created attachment 293 [details]
dma-error-backtrace

When trying to configure bonding mode 4 using members with iavf driver (for
intel 700 series NICs) we see these DMA errors:

"EAL: Cannot set up DMA remapping, error 12 (Cannot allocate memory)&q=
uot;

When this happens we also see TX errors on the devices, so I tried dumping =
DMA
vaddrs and enabling TX descriptor dumps for iavf and saw the following:

DMA errors occuring at:
iova=3D0x2351200000, len=3D2097152
iova=3D0x2351400000, len=3D2097152
iova=3D0x2351600000, len=3D2097152
iova=3D0x2351800000, len=3D2097152
iova=3D0x2351a00000, len=3D2097152
iova=3D0x2351c00000, len=3D2097152=20

TX descriptor dumps:
Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 0: QW0: 0x000000235137f8c0 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 1: QW0: 0x000000235137fb00 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 0: QW0: 0x000000235197f8c0 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 2: QW0: 0x000000235137fd40 QW1: 0x000001f000000050
Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000040
Queue 0 Tx_data_desc 1: QW0: 0x000000235197fb00 QW1: 0x000001f000000050

So DMA errors are probably the root cause for the TX errors. I tried figuri=
ng
out why DMA errors occur so I added an abort on DMA error to generate a
coredump. I've attached the backtrace of the interesting threads.=20

Looking at the backtrace, it looks like LSC callback is called at the same =
time
as we're starting the iavf member devices, and this seems to cause the DMA
errors. The reason I say that is because I tried synchronizing the threads =
and
the DMA errors disappeared. So far we have two workarounds for this problem:
  1. Synchronize threads with locks
  2. Pre-allocate more memory, hence no need to expand heap and do DMA
remapping.

Maybe someone can explain why these DMA errors occur when the threads are n=
ot
synched? What would be the proper fix for this?
          


You are receiving this mail because:
  • You are the assignee for the bug.
=20=20=20=20=20=20=20=20=20=20
= --17297689540.7bdF7.713302--