From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>,
"Zhang, Helin" <helin.zhang@intel.com>,
Martin Weiser <martin.weiser@allegro-packets.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
Date: Thu, 30 Jul 2015 09:43:18 +0000 [thread overview]
Message-ID: <2601191342CEEE43887BDE71AB97725836A6A0CE@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <55B9E9F5.8080102@6wind.com>
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, July 30, 2015 10:10 AM
> To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
>
> Hi Konstantin,
>
> On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> >> Sent: Thursday, July 30, 2015 9:12 AM
> >> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
> >>
> >> Hi,
> >>
> >> On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> >>> Hi Martin
> >>>
> >>> Thank you very much for the good catch!
> >>>
> >>> The similar situation in i40e, as explained by Konstantin.
> >>> As header split hasn't been supported by DPDK till now. It would be better to put the header address in RX descriptor to 0.
> >>> But in the future, during header split enabling. We may need to pay extra attention to that. As at least x710 datasheet said
> >> specifically as below.
> >>> "The header address should be set by the software to an even number (word aligned address)". We may need to find a way to
> >> ensure that during mempool/mbuf allocation.
> >>
> >> Indeed it would be good to force the priv_size to be aligned.
> >>
> >> The priv_size could be aligned automatically in
> >> rte_pktmbuf_pool_create(). The only possible problem I could see
> >> is that it would break applications that access to the data buffer
> >> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
> >> best thing to do (I didn't find any applications like this in dpdk).
> >
> >
> > Might be just make rte_pktmbuf_pool_create() fail if input priv_size % MIN_ALIGN != 0?
>
> Hmm maybe it would break more applications: an odd priv_size is
> probably rare, but a priv_size that is not aligned to 8 bytes is
> maybe more common.
My thought was that rte_mempool_create() was just introduced in 2.1,
so if we add extra requirement for the input parameter now -
there would be no ABI breakage, and not many people started to use it already.
For me just seems a bit easier and more straightforward then silent alignment -
user would not have wrong assumptions here.
Though if you think that a silent alignment would be more convenient
for most users - I wouldn't insist.
Konstantin
> It's maybe safer to align the size transparently?
>
>
> Regards,
> Olivier
>
>
>
> >
> >>
> >> For applications that directly use rte_mempool_create() instead of
> >> rte_pktmbuf_pool_create(), we could add a check using an assert in
> >> rte_pktmbuf_init() and some words in the documentation.
> >>
> >> The question is: what should be the proper alignment? I would say
> >> at least 8 bytes, but maybe cache_aligned is an option too.
> >
> > 8 bytes seems enough to me.
> >
> > Konstantin
> >
> >>
> >> Regards,
> >> Olivier
> >>
> >>
> >>>
> >>> Regards,
> >>> Helin
> >>>
> >>>> -----Original Message-----
> >>>> From: Ananyev, Konstantin
> >>>> Sent: Wednesday, July 29, 2015 11:12 AM
> >>>> To: Martin Weiser; Zhang, Helin; olivier.matz@6wind.com
> >>>> Cc: dev@dpdk.org
> >>>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf
> >>>> private area size is odd
> >>>>
> >>>> Hi Martin,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Martin Weiser
> >>>>> Sent: Wednesday, July 29, 2015 4:07 PM
> >>>>> To: Zhang, Helin; olivier.matz@6wind.com
> >>>>> Cc: dev@dpdk.org
> >>>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
> >>>>> mbuf private area size is odd
> >>>>>
> >>>>> Hi Helin, Hi Olivier,
> >>>>>
> >>>>> we are seeing an issue with the ixgbe and i40e drivers which we could
> >>>>> track down to our setting of the private area size of the mbufs.
> >>>>> The issue can be easily reproduced with the l2fwd example application
> >>>>> when a small modification is done: just set the priv_size parameter in
> >>>>> the call to the rte_pktmbuf_pool_create function to an odd number like
> >>>>> 1. In our tests this causes every call to rte_eth_rx_burst to return
> >>>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
> >>>>> packets are received on the interface and the hardware counters do not
> >>>>> report any received packets.
> >>>>
> >>>> From Niantic datasheet:
> >>>>
> >>>> "7.1.6.1 Advanced Receive Descriptors — Read Format Table 7-15 lists the
> >>>> advanced receive descriptor programming by the software. The ...
> >>>> Packet Buffer Address (64)
> >>>> This is the physical address of the packet buffer. The lowest bit is A0 (LSB of the
> >>>> address).
> >>>> Header Buffer Address (64)
> >>>> The physical address of the header buffer with the lowest bit being Descriptor
> >>>> Done (DD).
> >>>> When a packet spans in multiple descriptors, the header buffer address is used
> >>>> only on the first descriptor. During the programming phase, software must set
> >>>> the DD bit to zero (see the description of the DD bit in this section). This means
> >>>> that header buffer addresses are always word aligned."
> >>>>
> >>>> Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)and
> >>>> Header Buffer Address (HBA) to the same value:
> >>>> buf_physaddr + RTE_PKTMBUF_HEADROOM
> >>>> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and then
> >>>> SW considers that HW already done with it.
> >>>> In other words, right now for ixgbe you can't use RX buffer that is not aligned on
> >>>> word boundary.
> >>>>
> >>>> So the advice would be, right now - don't set priv_size to the odd value.
> >>>> As we don't support split header feature anyway, I think we can fix it just by
> >>>> always setting HBA in the RXD to zero.
> >>>> Could you try the fix for ixgbe below?
> >>>>
> >>>> Same story with FVL, I believe.
> >>>> Konstantin
> >>>>
> >>>>
> >>>>> Interestingly this does not happen if we force the scattered rx path.
> >>>>>
> >>>>> I assume the drivers have some expectations regarding the alignment of
> >>>>> the buf_addr in the mbuf and setting an odd private are size breaks
> >>>>> this alignment in the rte_pktmbuf_init function. If this is the case
> >>>>> then one possible fix might be to enforce an alignment on the private area size.
> >>>>>
> >>>>> Best regards,
> >>>>> Martin
> >>>>
> >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>> index a0c8847..94967c5 100644
> >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
> >>>> reset_mbuf)
> >>>>
> >>>> /* populate the descriptors */
> >>>> dma_addr =
> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
> >>>> - rxdp[i].read.hdr_addr = dma_addr;
> >>>> + rxdp[i].read.hdr_addr = 0;
> >>>> rxdp[i].read.pkt_addr = dma_addr;
> >>>> }
> >>>>
> >>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
> >>>> **rx_pkts,
> >>>> rxe->mbuf = nmb;
> >>>> dma_addr =
> >>>>
> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
> >>>> - rxdp->read.hdr_addr = dma_addr;
> >>>> + rxdp->read.hdr_addr = 0;
> >>>> rxdp->read.pkt_addr = dma_addr;
> >>>>
> >>>> /*
> >>>> @@ -1741,7 +1741,7 @@ next_desc:
> >>>> rxe->mbuf = nmb;
> >>>>
> >>>> rxm->data_off = RTE_PKTMBUF_HEADROOM;
> >>>> - rxdp->read.hdr_addr = dma;
> >>>> + rxdp->read.hdr_addr = 0;
> >>>> rxdp->read.pkt_addr = dma;
> >>>> } else
> >>>> rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
> >>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
> >>>> dma_addr =
> >>>>
> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
> >>>> rxd = &rxq->rx_ring[i];
> >>>> - rxd->read.hdr_addr = dma_addr;
> >>>> + rxd->read.hdr_addr = 0;
> >>>> rxd->read.pkt_addr = dma_addr;
> >>>> rxe[i].mbuf = mbuf;
> >>>> }
> >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> index 6c1647e..16a9c64 100644
> >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>>> RTE_PKTMBUF_HEADROOM);
> >>>> __m128i dma_addr0, dma_addr1;
> >>>>
> >>>> + const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
> >>>> +
> >>>> rxdp = rxq->rx_ring + rxq->rxrearm_start;
> >>>>
> >>>> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
> >>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>>> dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
> >>>> dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
> >>>>
> >>>> + dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
> >>>> + dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
> >>>> +
> >>>> /* flush desc with pa dma_addr */
> >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
> >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
> >>>> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644
> >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> >>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
> >>>> reset_mbuf)
> >>>>
> >>>> /* populate the descriptors */
> >>>> dma_addr =
> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
> >>>> - rxdp[i].read.hdr_addr = dma_addr;
> >>>> + rxdp[i].read.hdr_addr = 0;
> >>>> rxdp[i].read.pkt_addr = dma_addr;
> >>>> }
> >>>>
> >>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
> >>>> **rx_pkts,
> >>>> rxe->mbuf = nmb;
> >>>> dma_addr =
> >>>>
> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
> >>>> - rxdp->read.hdr_addr = dma_addr;
> >>>> + rxdp->read.hdr_addr = 0;
> >>>> rxdp->read.pkt_addr = dma_addr;
> >>>>
> >>>> /*
> >>>> @@ -1741,7 +1741,7 @@ next_desc:
> >>>> rxe->mbuf = nmb;
> >>>>
> >>>> rxm->data_off = RTE_PKTMBUF_HEADROOM;
> >>>> - rxdp->read.hdr_addr = dma;
> >>>> + rxdp->read.hdr_addr = 0;
> >>>> rxdp->read.pkt_addr = dma;
> >>>> } else
> >>>> rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
> >>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
> >>>> dma_addr =
> >>>>
> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
> >>>> rxd = &rxq->rx_ring[i];
> >>>> - rxd->read.hdr_addr = dma_addr;
> >>>> + rxd->read.hdr_addr = 0;
> >>>> rxd->read.pkt_addr = dma_addr;
> >>>> rxe[i].mbuf = mbuf;
> >>>> }
> >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> index 6c1647e..16a9c64 100644
> >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> >>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>>> RTE_PKTMBUF_HEADROOM);
> >>>> __m128i dma_addr0, dma_addr1;
> >>>>
> >>>> + const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
> >>>> +
> >>>> rxdp = rxq->rx_ring + rxq->rxrearm_start;
> >>>>
> >>>> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
> >>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
> >>>> dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
> >>>> dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
> >>>>
> >>>> + dma_addr0 = _mm_and_si128(dma_addr0, hba_msk);
> >>>> + dma_addr1 = _mm_and_si128(dma_addr1, hba_msk);
> >>>> +
> >>>> /* flush desc with pa dma_addr */
> >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
> >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
> >
next prev parent reply other threads:[~2015-07-30 9:43 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-29 15:07 Martin Weiser
2015-07-29 18:12 ` Ananyev, Konstantin
2015-07-29 20:24 ` Zhang, Helin
2015-07-30 8:12 ` Olivier MATZ
2015-07-30 9:00 ` Ananyev, Konstantin
2015-07-30 9:10 ` Olivier MATZ
2015-07-30 9:43 ` Ananyev, Konstantin [this message]
2015-07-30 11:22 ` Olivier MATZ
2015-07-30 13:47 ` Thomas Monjalon
2015-07-30 13:56 ` [dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area Olivier Matz
2015-07-30 14:13 ` Ananyev, Konstantin
2015-07-30 16:06 ` Olivier MATZ
2015-07-30 15:33 ` Zhang, Helin
2015-07-30 16:07 ` Olivier MATZ
2015-07-30 16:22 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2015-07-30 16:25 ` Zhang, Helin
2015-07-30 21:28 ` Ananyev, Konstantin
2015-08-02 22:35 ` Thomas Monjalon
2015-08-02 22:32 ` [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2601191342CEEE43887BDE71AB97725836A6A0CE@irsmsx105.ger.corp.intel.com \
--to=konstantin.ananyev@intel.com \
--cc=dev@dpdk.org \
--cc=helin.zhang@intel.com \
--cc=martin.weiser@allegro-packets.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).