DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Zhang, Helin" <helin.zhang@intel.com>,
	 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Martin Weiser <martin.weiser@allegro-packets.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd
Date: Thu, 30 Jul 2015 10:12:25 +0200	[thread overview]
Message-ID: <55B9DC69.3080508@6wind.com> (raw)
In-Reply-To: <F35DEAC7BCE34641BA9FAC6BCA4A12E70A8B7255@SHSMSX104.ccr.corp.intel.com>

Hi,

On 07/29/2015 10:24 PM, Zhang, Helin wrote:
> Hi Martin
>
> Thank you very much for the good catch!
>
> The similar situation in i40e, as explained by Konstantin.
> As header split hasn't been supported by DPDK till now. It would be better to put the header address in RX descriptor to 0.
> But in the future, during header split enabling. We may need to pay extra attention to that. As at least x710 datasheet said specifically as below.
> "The header address should be set by the software to an even number (word aligned address)". We may need to find a way to ensure that during mempool/mbuf allocation.

Indeed it would be good to force the priv_size to be aligned.

The priv_size could be aligned automatically in
rte_pktmbuf_pool_create(). The only possible problem I could see
is that it would break applications that access to the data buffer
by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the
best thing to do (I didn't find any applications like this in dpdk).

For applications that directly use rte_mempool_create() instead of
rte_pktmbuf_pool_create(), we could add a check using an assert in 
rte_pktmbuf_init() and some words in the documentation.

The question is: what should be the proper alignment? I would say
at least 8 bytes, but maybe cache_aligned is an option too.

Regards,
Olivier


>
> Regards,
> Helin
>
>> -----Original Message-----
>> From: Ananyev, Konstantin
>> Sent: Wednesday, July 29, 2015 11:12 AM
>> To: Martin Weiser; Zhang, Helin; olivier.matz@6wind.com
>> Cc: dev@dpdk.org
>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf
>> private area size is odd
>>
>> Hi Martin,
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Martin Weiser
>>> Sent: Wednesday, July 29, 2015 4:07 PM
>>> To: Zhang, Helin; olivier.matz@6wind.com
>>> Cc: dev@dpdk.org
>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when
>>> mbuf private area size is odd
>>>
>>> Hi Helin, Hi Olivier,
>>>
>>> we are seeing an issue with the ixgbe and i40e drivers which we could
>>> track down to our setting of the private area size of the mbufs.
>>> The issue can be easily reproduced with the l2fwd example application
>>> when a small modification is done: just set the priv_size parameter in
>>> the call to the rte_pktmbuf_pool_create function to an odd number like
>>> 1. In our tests this causes every call to rte_eth_rx_burst to return
>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no
>>> packets are received on the interface and the hardware counters do not
>>> report any received packets.
>>
>>  From Niantic datasheet:
>>
>> "7.1.6.1 Advanced Receive Descriptors — Read Format Table 7-15 lists the
>> advanced receive descriptor programming by the software. The ...
>> Packet Buffer Address (64)
>> This is the physical address of the packet buffer. The lowest bit is A0 (LSB of the
>> address).
>> Header Buffer Address (64)
>> The physical address of the header buffer with the lowest bit being Descriptor
>> Done (DD).
>> When a packet spans in multiple descriptors, the header buffer address is used
>> only on the first descriptor. During the programming phase, software must set
>> the DD bit to zero (see the description of the DD bit in this section). This means
>> that header buffer addresses are always word aligned."
>>
>> Right now, in ixgbe PMD we always setup  Packet Buffer Address (PBA)and
>> Header Buffer Address (HBA) to the same value:
>> buf_physaddr + RTE_PKTMBUF_HEADROOM
>> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, and then
>> SW considers that HW already done with it.
>> In other words, right now for ixgbe you can't use RX buffer that is not aligned on
>> word boundary.
>>
>> So the advice would be, right now - don't set priv_size to the odd value.
>> As we don't support split header feature anyway, I think we can fix it just by
>> always setting HBA in the RXD to zero.
>> Could you try the fix for ixgbe below?
>>
>> Same story with FVL, I believe.
>> Konstantin
>>
>>
>>> Interestingly this does not happen if we force the scattered rx path.
>>>
>>> I assume the drivers have some expectations regarding the alignment of
>>> the buf_addr in the mbuf and setting an odd private are size breaks
>>> this alignment in the rte_pktmbuf_init function. If this is the case
>>> then one possible fix might be to enforce an alignment on the private area size.
>>>
>>> Best regards,
>>> Martin
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
>> index a0c8847..94967c5 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
>> reset_mbuf)
>>
>>                  /* populate the descriptors */
>>                  dma_addr =
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
>> -               rxdp[i].read.hdr_addr = dma_addr;
>> +               rxdp[i].read.hdr_addr = 0;
>>                  rxdp[i].read.pkt_addr = dma_addr;
>>          }
>>
>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
>> **rx_pkts,
>>                  rxe->mbuf = nmb;
>>                  dma_addr =
>>
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>> -               rxdp->read.hdr_addr = dma_addr;
>> +               rxdp->read.hdr_addr = 0;
>>                  rxdp->read.pkt_addr = dma_addr;
>>
>>                  /*
>> @@ -1741,7 +1741,7 @@ next_desc:
>>                          rxe->mbuf = nmb;
>>
>>                          rxm->data_off = RTE_PKTMBUF_HEADROOM;
>> -                       rxdp->read.hdr_addr = dma;
>> +                       rxdp->read.hdr_addr = 0;
>>                          rxdp->read.pkt_addr = dma;
>>                  } else
>>                          rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
>>                  dma_addr =
>>
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
>>                  rxd = &rxq->rx_ring[i];
>> -               rxd->read.hdr_addr = dma_addr;
>> +               rxd->read.hdr_addr = 0;
>>                  rxd->read.pkt_addr = dma_addr;
>>                  rxe[i].mbuf = mbuf;
>>          }
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> index 6c1647e..16a9c64 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>                          RTE_PKTMBUF_HEADROOM);
>>          __m128i dma_addr0, dma_addr1;
>>
>> +       const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
>> +
>>          rxdp = rxq->rx_ring + rxq->rxrearm_start;
>>
>>          /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>                  dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
>>                  dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
>>
>> +               dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
>> +               dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
>> +
>>                  /* flush desc with pa dma_addr */
>>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
>>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);
>> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c
>> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool
>> reset_mbuf)
>>
>>                  /* populate the descriptors */
>>                  dma_addr =
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
>> -               rxdp[i].read.hdr_addr = dma_addr;
>> +               rxdp[i].read.hdr_addr = 0;
>>                  rxdp[i].read.pkt_addr = dma_addr;
>>          }
>>
>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf
>> **rx_pkts,
>>                  rxe->mbuf = nmb;
>>                  dma_addr =
>>
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>> -               rxdp->read.hdr_addr = dma_addr;
>> +               rxdp->read.hdr_addr = 0;
>>                  rxdp->read.pkt_addr = dma_addr;
>>
>>                  /*
>> @@ -1741,7 +1741,7 @@ next_desc:
>>                          rxe->mbuf = nmb;
>>
>>                          rxm->data_off = RTE_PKTMBUF_HEADROOM;
>> -                       rxdp->read.hdr_addr = dma;
>> +                       rxdp->read.hdr_addr = 0;
>>                          rxdp->read.pkt_addr = dma;
>>                  } else
>>                          rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@
>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
>>                  dma_addr =
>>
>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf));
>>                  rxd = &rxq->rx_ring[i];
>> -               rxd->read.hdr_addr = dma_addr;
>> +               rxd->read.hdr_addr = 0;
>>                  rxd->read.pkt_addr = dma_addr;
>>                  rxe[i].mbuf = mbuf;
>>          }
>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> index 6c1647e..16a9c64 100644
>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>                          RTE_PKTMBUF_HEADROOM);
>>          __m128i dma_addr0, dma_addr1;
>>
>> +       const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX);
>> +
>>          rxdp = rxq->rx_ring + rxq->rxrearm_start;
>>
>>          /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +110,9 @@
>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
>>                  dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room);
>>                  dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room);
>>
>> +               dma_addr0 =  _mm_and_si128(dma_addr0, hba_msk);
>> +               dma_addr1 =  _mm_and_si128(dma_addr1, hba_msk);
>> +
>>                  /* flush desc with pa dma_addr */
>>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0);
>>                  _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);

  reply	other threads:[~2015-07-30  8:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-29 15:07 Martin Weiser
2015-07-29 18:12 ` Ananyev, Konstantin
2015-07-29 20:24   ` Zhang, Helin
2015-07-30  8:12     ` Olivier MATZ [this message]
2015-07-30  9:00       ` Ananyev, Konstantin
2015-07-30  9:10         ` Olivier MATZ
2015-07-30  9:43           ` Ananyev, Konstantin
2015-07-30 11:22             ` Olivier MATZ
2015-07-30 13:47               ` Thomas Monjalon
2015-07-30 13:56                 ` [dpdk-dev] [PATCH] mbuf: enforce alignment of mbuf private area Olivier Matz
2015-07-30 14:13                   ` Ananyev, Konstantin
2015-07-30 16:06                     ` Olivier MATZ
2015-07-30 15:33                   ` Zhang, Helin
2015-07-30 16:07                     ` Olivier MATZ
2015-07-30 16:22                   ` [dpdk-dev] [PATCH v2] " Olivier Matz
2015-07-30 16:25                     ` Zhang, Helin
2015-07-30 21:28                     ` Ananyev, Konstantin
2015-08-02 22:35                       ` Thomas Monjalon
2015-08-02 22:32   ` [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B9DC69.3080508@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=martin.weiser@allegro-packets.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).