* Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
@ 2015-03-31 10:25 Ananyev, Konstantin
2015-03-31 11:40 ` Vlad Zolotarov
0 siblings, 1 reply; 6+ messages in thread
From: Ananyev, Konstantin @ 2015-03-31 10:25 UTC (permalink / raw)
To: Vlad Zolotarov, dev
> -----Original Message-----
> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
> Sent: Monday, March 30, 2015 4:57 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>
>
>
> On 03/30/15 18:37, Vlad Zolotarov wrote:
> >
> >
> > On 03/30/15 17:18, Ananyev, Konstantin wrote:
> >> Hi Vlad,
> >>
> >>> -----Original Message-----
> >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov
> >>> Sent: Wednesday, March 18, 2015 5:52 PM
> >>> To: dev@dpdk.org
> >>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
> >>>
> >>> - Only x540 and 82599 devices support LRO.
> >>> - Add the appropriate HW configuration.
> >>> - Add RSC aware rx_pkt_burst() handlers:
> >>> - Implemented bulk allocation and non-bulk allocation versions.
> >>> - Add LRO-specific fields to rte_eth_rxmode, to
> >>> rte_eth_dev_data
> >>> and to ixgbe_rx_queue.
> >>> - Use the appropriate handler when LRO is requested.
> >>>
> >>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
> >>> ---
> >>> New in v8:
> >>> - Took the RSC configuration code from ixgbe_dev_rx_init() into
> >>> a separate
> >>> function - ixgbe_set_rsc().
> >>> - Added some missing macros for HW configuration.
> >>> - Styling adjustments:
> >>> - Functions names.
> >>> - Functions descriptions.
> >>> - Reworked the ixgbe_free_rsc_cluster() code to make it more
> >>> readable.
> >>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not
> >>> supported by
> >>> ixgbe PMD.
> >>>
> >>> New in v7:
> >>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop()
> >>> flow.
> >>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
> >>> This is needed for ixgbe_rx_queue_release_mbufs().
> >>> - _recv_pkts_lro(): added the missing memory barrier before RDT
> >>> update in a
> >>> non-bulk allocation case.
> >>> - Don't allow RSC when device is configured in an SR-IOV mode.
> >>>
> >>> New in v5:
> >>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning
> >>> of rte_ethdev.h.
> >>> - Removed the "TODO: Remove me" comment near
> >>> RTE_ETHDEV_HAS_LRO_SUPPORT.
> >>>
> >>> New in v4:
> >>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
> >>> RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
> >>>
> >>> New in v2:
> >>> - Removed rte_eth_dev_data.lro_bulk_alloc.
> >>> - Fixed a few styling and spelling issues.
> >>> ---
> >>> lib/librte_ether/rte_ethdev.h | 9 +-
> >>> lib/librte_net/rte_ip.h | 3 +
> >>> lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
> >>> lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
> >>> lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
> >>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 610
> >>> +++++++++++++++++++++++++++++++-
> >>> lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
> >>> 7 files changed, 642 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/lib/librte_ether/rte_ethdev.h
> >>> b/lib/librte_ether/rte_ethdev.h
> >>> index 21aa359..61dc49a 100644
> >>> --- a/lib/librte_ether/rte_ethdev.h
> >>> +++ b/lib/librte_ether/rte_ethdev.h
> >>> @@ -172,6 +172,9 @@ extern "C" {
> >>>
> >>> #include <stdint.h>
> >>>
> >>> +/* Use this macro to check if LRO API is supported */
> >>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
> >>> +
> >>> #include <rte_log.h>
> >>> #include <rte_interrupts.h>
> >>> #include <rte_pci.h>
> >>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
> >>> enum rte_eth_rx_mq_mode mq_mode;
> >>> uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame
> >>> enabled. */
> >>> uint16_t split_hdr_size; /**< hdr buf size (header_split
> >>> enabled).*/
> >>> - uint8_t header_split : 1, /**< Header Split enable. */
> >>> + uint16_t header_split : 1, /**< Header Split enable. */
> >>> hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload
> >>> enable. */
> >>> hw_vlan_filter : 1, /**< VLAN filter enable. */
> >>> hw_vlan_strip : 1, /**< VLAN strip enable. */
> >>> hw_vlan_extend : 1, /**< Extended VLAN enable. */
> >>> jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
> >>> hw_strip_crc : 1, /**< Enable CRC stripping by
> >>> hardware. */
> >>> - enable_scatter : 1; /**< Enable scatter packets rx
> >>> handler */
> >>> + enable_scatter : 1, /**< Enable scatter packets rx
> >>> handler */
> >>> + enable_lro : 1; /**< Enable LRO */
> >>> };
> >>>
> >>> /**
> >>> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
> >>> uint8_t port_id; /**< Device [external] port
> >>> identifier. */
> >>> uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) /
> >>> OFF(0). */
> >>> scattered_rx : 1, /**< RX of scattered packets is ON(1) /
> >>> OFF(0) */
> >>> + lro : 1, /**< RX LRO is ON(1) / OFF(0) */
> >>> all_multicast : 1, /**< RX all multicast mode ON(1) /
> >>> OFF(0). */
> >>> dev_started : 1; /**< Device state: STARTED(1) /
> >>> STOPPED(0). */
> >>> };
> >>> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
> >>> index 64935d9..74c9ced 100644
> >>> --- a/lib/librte_net/rte_ip.h
> >>> +++ b/lib/librte_net/rte_ip.h
> >>> @@ -110,6 +110,9 @@ struct ipv4_hdr {
> >>> (((c) & 0xff) << 8) | \
> >>> ((d) & 0xff))
> >>>
> >>> +/** Maximal IPv4 packet length (including a header) */
> >>> +#define IPV4_MAX_PKT_LEN 65535
> >>> +
> >>> /** Internet header length mask for version_ihl field */
> >>> #define IPV4_HDR_IHL_MASK (0x0f)
> >>> /**
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> >>> b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> >>> index 9a66370..4998627 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> >>> @@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
> >>> #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) *
> >>> 4)) : \
> >>> (0x012300 + (((_i) - 24) * 4)))
> >>> #define IXGBE_EITR_ITR_INT_MASK 0x00000FF8
> >>> +#define IXGBE_EITR_ITR_INT_SHIFT 3
> >>> #define IXGBE_EITR_LLI_MOD 0x00008000
> >>> #define IXGBE_EITR_CNT_WDIS 0x80000000
> >>> +/* EITR Inteval is in 2048ns uinits for 1G and 10G link */
> >>> +#define IXGBE_EITR_INTERVAL_UNIT_NS 2048
> >> Why is that? Why not just 2us as 82599 spec says?
> >> And then:
> >> +#define IXGBE_EITR_INTERVAL_US(us) (((us) /
> >> IXGBE_EITR_INTERVAL_UNIT_US << IXGBE_EITR_ITR_INT_SHIFT) &
> >> IXGBE_EITR_ITR_INT_MASK)
> >> ?
> >
> > Because spec doesn't say 2us but rather 2.048us - see 8.2.4.5.16
> > chapter of x540 spec.
>
> The most confusing part is that 82599 Rev. 3.0 spec does say so.
I see what you mean now.
Indeed that's a bit strange that in 82599 and X540 spec ITR_Interval units are different.
We sent a question about this to HW guys, but don't know when there will be an answer.
I suppose we can stick with 2.048 us case for now.
>From my understanding, even if 82599 uses 2us the difference in max timeout would be pretty small: 500 us vs 488 us
and shouldn't cause any performance drop.
I wonder did try your LRO latest changes on 82599?
If yes, did you see any difference in performance vs previous version, when hardcoded value for 2us was used?
Konstantin
>The x540 spec I have is dated by March 2014 and 82599 spec is dated by
> November 2014 but according to changelog EITR description has never been
> updated. Therefore I conclude that the more recent device spec (x540)
> holds the most up to date content.
>
> Pls., confirm.
>
> thanks,
> vlad
>
> >
> >>
> >>> +#define IXGBE_EITR_INTERVAL_US(us) \
> >>> + (((us) * 1000 / IXGBE_EITR_INTERVAL_UNIT_NS <<
> >>> IXGBE_EITR_ITR_INT_SHIFT) & \
> >>> + IXGBE_EITR_ITR_INT_MASK)
> >>
> >> Unfortunately we are not supposed to change files under ixgbe
> >> subfirectory (except ixgbe_osdep.*).
> >> These files comes from ixgbe shared code and we treat them as read-only.
> >> You need to move these new macros you added into ixgbe_ethedev.h.
> >
> > Aha... Ok. No problem.
> >
> >>
> >> All other things look good to me.
> >> Konstantin
> >>
> >>> #define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at
> >>> 0x900-0x960 */
> >>> #define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt
> >>> causes */
> >>> #define IXGBE_EITRSEL 0x00894
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> >>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> >>> index 92d75db..923692e 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> >>> @@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
> >>>
> >>> /* Clear stored conf */
> >>> dev->data->scattered_rx = 0;
> >>> + dev->data->lro = 0;
> >>>
> >>> /* Clear recorded link status */
> >>> memset(&link, 0, sizeof(link));
> >>> @@ -2013,6 +2014,16 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
> >>> struct rte_eth_dev_info *dev_info)
> >>> DEV_RX_OFFLOAD_IPV4_CKSUM |
> >>> DEV_RX_OFFLOAD_UDP_CKSUM |
> >>> DEV_RX_OFFLOAD_TCP_CKSUM;
> >>> +
> >>> + /*
> >>> + * RSC is only supported by 82599 and x540 PF devices in a
> >>> non-SR-IOV
> >>> + * mode.
> >>> + */
> >>> + if ((hw->mac.type == ixgbe_mac_82599EB ||
> >>> + hw->mac.type == ixgbe_mac_X540) &&
> >>> + !RTE_ETH_DEV_SRIOV(dev).active)
> >>> + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
> >>> +
> >>> dev_info->tx_offload_capa =
> >>> DEV_TX_OFFLOAD_VLAN_INSERT |
> >>> DEV_TX_OFFLOAD_IPV4_CKSUM |
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> >>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> >>> index ffe3471..7050340 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> >>> @@ -344,6 +344,11 @@ uint16_t ixgbe_recv_pkts(void *rx_queue, struct
> >>> rte_mbuf **rx_pkts,
> >>> uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
> >>> struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> >>>
> >>> +uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
> >>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> >>> +uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
> >>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> >>> +
> >>> uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> >>> uint16_t nb_pkts);
> >>>
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> index a08ae6a..daa4c45 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> >>> @@ -70,6 +70,7 @@
> >>> #include <rte_sctp.h>
> >>> #include <rte_string_fns.h>
> >>> #include <rte_errno.h>
> >>> +#include <rte_ip.h>
> >>>
> >>> #include "ixgbe_logs.h"
> >>> #include "ixgbe/ixgbe_api.h"
> >>> @@ -1422,6 +1423,295 @@ ixgbe_fill_cluster_head_buf(
> >>> }
> >>> }
> >>>
> >>> +/**
> >>> + * ixgbe_recv_pkts_lro - receive handler for and LRO case.
> >>> + *
> >>> + * @rx_queue Rx queue handle
> >>> + * @rx_pkts table of received packets
> >>> + * @nb_pkts size of rx_pkts table
> >>> + * @bulk_alloc if TRUE bulk allocation is used for a HW ring refilling
> >>> + *
> >>> + * Handles the Rx HW ring completions when RSC feature is
> >>> configured. Uses an
> >>> + * additional ring of ixgbe_rsc_entry's that will hold the relevant
> >>> RSC info.
> >>> + *
> >>> + * We use the same logic as in Lunux and in FreeBSD ixgbe drivers:
> >>> + * 1) When non-EOP RSC completion arrives:
> >>> + * a) Update the HEAD of the current RSC aggregation cluster
> >>> with the new
> >>> + * segment's data length.
> >>> + * b) Set the "next" pointer of the current segment to point to
> >>> the segment
> >>> + * at the NEXTP index.
> >>> + * c) Pass the HEAD of RSC aggregation cluster on to the next
> >>> NEXTP entry
> >>> + * in the sw_rsc_ring.
> >>> + * 2) When EOP arrives we just update the cluster's total length
> >>> and offload
> >>> + * flags and deliver the cluster up to the upper layers. In our
> >>> case - put it
> >>> + * in the rx_pkts table.
> >>> + *
> >>> + * Returns the number of received packets/clusters (according to
> >>> the "bulk
> >>> + * receive" interface).
> >>> + */
> >>> +static inline uint16_t
> >>> +ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts,
> >>> uint16_t nb_pkts,
> >>> + bool bulk_alloc)
> >>> +{
> >>> + struct ixgbe_rx_queue *rxq = rx_queue;
> >>> + volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
> >>> + struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
> >>> + struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
> >>> + uint16_t rx_id = rxq->rx_tail;
> >>> + uint16_t nb_rx = 0;
> >>> + uint16_t nb_hold = rxq->nb_rx_hold;
> >>> + uint16_t prev_id = rxq->rx_tail;
> >>> +
> >>> + while (nb_rx < nb_pkts) {
> >>> + bool eop;
> >>> + struct ixgbe_rx_entry *rxe;
> >>> + struct ixgbe_rsc_entry *rsc_entry;
> >>> + struct ixgbe_rsc_entry *next_rsc_entry;
> >>> + struct ixgbe_rx_entry *next_rxe;
> >>> + struct rte_mbuf *first_seg;
> >>> + struct rte_mbuf *rxm;
> >>> + struct rte_mbuf *nmb;
> >>> + union ixgbe_adv_rx_desc rxd;
> >>> + uint16_t data_len;
> >>> + uint16_t next_id;
> >>> + volatile union ixgbe_adv_rx_desc *rxdp;
> >>> + uint32_t staterr;
> >>> +
> >>> +next_desc:
> >>> + /*
> >>> + * The code in this whole file uses the volatile pointer to
> >>> + * ensure the read ordering of the status and the rest of the
> >>> + * descriptor fields (on the compiler level only!!!). This
> >>> is so
> >>> + * UGLY - why not to just use the compiler barrier instead?
> >>> DPDK
> >>> + * even has the rte_compiler_barrier() for that.
> >>> + *
> >>> + * But most importantly this is just wrong because this
> >>> doesn't
> >>> + * ensure memory ordering in a general case at all. For
> >>> + * instance, DPDK is supposed to work on Power CPUs where
> >>> + * compiler barrier may just not be enough!
> >>> + *
> >>> + * I tried to write only this function properly to have a
> >>> + * starting point (as a part of an LRO/RSC series) but the
> >>> + * compiler cursed at me when I tried to cast away the
> >>> + * "volatile" from rx_ring (yes, it's volatile too!!!). So,
> >>> I'm
> >>> + * keeping it the way it is for now.
> >>> + *
> >>> + * The code in this file is broken in so many other places and
> >>> + * will just not work on a big endian CPU anyway therefore the
> >>> + * lines below will have to be revisited together with the
> >>> rest
> >>> + * of the ixgbe PMD.
> >>> + *
> >>> + * TODO:
> >>> + * - Get rid of "volatile" crap and let the compiler do its
> >>> + * job.
> >>> + * - Use the proper memory barrier (rte_rmb()) to ensure
> >>> the
> >>> + * memory ordering below.
> >>> + */
> >>> + rxdp = &rx_ring[rx_id];
> >>> + staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> >>> +
> >>> + if (!(staterr & IXGBE_RXDADV_STAT_DD))
> >>> + break;
> >>> +
> >>> + rxd = *rxdp;
> >>> +
> >>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> >>> + "staterr=0x%x data_len=%u",
> >>> + rxq->port_id, rxq->queue_id, rx_id, staterr,
> >>> + rte_le_to_cpu_16(rxd.wb.upper.length));
> >>> +
> >>> + if (!bulk_alloc) {
> >>> + nmb = rte_rxmbuf_alloc(rxq->mb_pool);
> >>> + if (nmb == NULL) {
> >>> + PMD_RX_LOG(DEBUG, "RX mbuf alloc failed "
> >>> + "port_id=%u queue_id=%u",
> >>> + rxq->port_id, rxq->queue_id);
> >>> +
> >>> + rte_eth_devices[rxq->port_id].data->
> >>> + rx_mbuf_alloc_failed++;
> >>> + break;
> >>> + }
> >>> + } else if (nb_hold > rxq->rx_free_thresh) {
> >>> + uint16_t next_rdt = rxq->rx_free_trigger;
> >>> +
> >>> + if (!ixgbe_rx_alloc_bufs(rxq, false)) {
> >>> + rte_wmb();
> >>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr,
> >>> + next_rdt);
> >>> + nb_hold -= rxq->rx_free_thresh;
> >>> + } else {
> >>> + PMD_RX_LOG(DEBUG, "RX bulk alloc failed "
> >>> + "port_id=%u queue_id=%u",
> >>> + rxq->port_id, rxq->queue_id);
> >>> +
> >>> + rte_eth_devices[rxq->port_id].data->
> >>> + rx_mbuf_alloc_failed++;
> >>> + break;
> >>> + }
> >>> + }
> >>> +
> >>> + nb_hold++;
> >>> + rxe = &sw_ring[rx_id];
> >>> + eop = staterr & IXGBE_RXDADV_STAT_EOP;
> >>> +
> >>> + next_id = rx_id + 1;
> >>> + if (next_id == rxq->nb_rx_desc)
> >>> + next_id = 0;
> >>> +
> >>> + /* Prefetch next mbuf while processing current one. */
> >>> + rte_ixgbe_prefetch(sw_ring[next_id].mbuf);
> >>> +
> >>> + /*
> >>> + * When next RX descriptor is on a cache-line boundary,
> >>> + * prefetch the next 4 RX descriptors and the next 4 pointers
> >>> + * to mbufs.
> >>> + */
> >>> + if ((next_id & 0x3) == 0) {
> >>> + rte_ixgbe_prefetch(&rx_ring[next_id]);
> >>> + rte_ixgbe_prefetch(&sw_ring[next_id]);
> >>> + }
> >>> +
> >>> + rxm = rxe->mbuf;
> >>> +
> >>> + if (!bulk_alloc) {
> >>> + __le64 dma =
> >>> + rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
> >>> + /*
> >>> + * Update RX descriptor with the physical address of the
> >>> + * new data buffer of the new allocated mbuf.
> >>> + */
> >>> + rxe->mbuf = nmb;
> >>> +
> >>> + rxm->data_off = RTE_PKTMBUF_HEADROOM;
> >>> + rxdp->read.hdr_addr = dma;
> >>> + rxdp->read.pkt_addr = dma;
> >>> + } else
> >>> + rxe->mbuf = NULL;
> >>> +
> >>> + /*
> >>> + * Set data length & data buffer address of mbuf.
> >>> + */
> >>> + data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
> >>> + rxm->data_len = data_len;
> >>> +
> >>> + if (!eop) {
> >>> + uint16_t nextp_id;
> >>> + /*
> >>> + * Get next descriptor index:
> >>> + * - For RSC it's in the NEXTP field.
> >>> + * - For a scattered packet - it's just a following
> >>> + * descriptor.
> >>> + */
> >>> + if (ixgbe_rsc_count(&rxd))
> >>> + nextp_id =
> >>> + (staterr & IXGBE_RXDADV_NEXTP_MASK) >>
> >>> + IXGBE_RXDADV_NEXTP_SHIFT;
> >>> + else
> >>> + nextp_id = next_id;
> >>> +
> >>> + next_rsc_entry = &sw_rsc_ring[nextp_id];
> >>> + next_rxe = &sw_ring[nextp_id];
> >>> + rte_ixgbe_prefetch(next_rxe);
> >>> + }
> >>> +
> >>> + rsc_entry = &sw_rsc_ring[rx_id];
> >>> + first_seg = rsc_entry->fbuf;
> >>> + rsc_entry->fbuf = NULL;
> >>> +
> >>> + /*
> >>> + * If this is the first buffer of the received packet,
> >>> + * set the pointer to the first mbuf of the packet and
> >>> + * initialize its context.
> >>> + * Otherwise, update the total length and the number of
> >>> segments
> >>> + * of the current scattered packet, and update the pointer to
> >>> + * the last mbuf of the current packet.
> >>> + */
> >>> + if (first_seg == NULL) {
> >>> + first_seg = rxm;
> >>> + first_seg->pkt_len = data_len;
> >>> + first_seg->nb_segs = 1;
> >>> + } else {
> >>> + first_seg->pkt_len += data_len;
> >>> + first_seg->nb_segs++;
> >>> + }
> >>> +
> >>> + prev_id = rx_id;
> >>> + rx_id = next_id;
> >>> +
> >>> + /*
> >>> + * If this is not the last buffer of the received packet,
> >>> update
> >>> + * the pointer to the first mbuf at the NEXTP entry in the
> >>> + * sw_rsc_ring and continue to parse the RX ring.
> >>> + */
> >>> + if (!eop) {
> >>> + rxm->next = next_rxe->mbuf;
> >>> + next_rsc_entry->fbuf = first_seg;
> >>> + goto next_desc;
> >>> + }
> >>> +
> >>> + /*
> >>> + * This is the last buffer of the received packet - return
> >>> + * the current cluster to the user.
> >>> + */
> >>> + rxm->next = NULL;
> >>> +
> >>> + /* Initialize the first mbuf of the returned packet */
> >>> + ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq->port_id,
> >>> + staterr);
> >>> +
> >>> + /* Prefetch data of first segment, if configured to do so. */
> >>> + rte_packet_prefetch((char *)first_seg->buf_addr +
> >>> + first_seg->data_off);
> >>> +
> >>> + /*
> >>> + * Store the mbuf address into the next entry of the array
> >>> + * of returned packets.
> >>> + */
> >>> + rx_pkts[nb_rx++] = first_seg;
> >>> + }
> >>> +
> >>> + /*
> >>> + * Record index of the next RX descriptor to probe.
> >>> + */
> >>> + rxq->rx_tail = rx_id;
> >>> +
> >>> + /*
> >>> + * If the number of free RX descriptors is greater than the RX
> >>> free
> >>> + * threshold of the queue, advance the Receive Descriptor Tail
> >>> (RDT)
> >>> + * register.
> >>> + * Update the RDT with the value of the last processed RX
> >>> descriptor
> >>> + * minus 1, to guarantee that the RDT register is never equal
> >>> to the
> >>> + * RDH register, which creates a "full" ring situtation from the
> >>> + * hardware point of view...
> >>> + */
> >>> + if (!bulk_alloc && nb_hold > rxq->rx_free_thresh) {
> >>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
> >>> + "nb_hold=%u nb_rx=%u",
> >>> + rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx);
> >>> +
> >>> + rte_wmb();
> >>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id);
> >>> + nb_hold = 0;
> >>> + }
> >>> +
> >>> + rxq->nb_rx_hold = nb_hold;
> >>> + return nb_rx;
> >>> +}
> >>> +
> >>> +uint16_t
> >>> +ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf
> >>> **rx_pkts,
> >>> + uint16_t nb_pkts)
> >>> +{
> >>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
> >>> +}
> >>> +
> >>> +uint16_t
> >>> +ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf
> >>> **rx_pkts,
> >>> + uint16_t nb_pkts)
> >>> +{
> >>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
> >>> +}
> >>> +
> >>> uint16_t
> >>> ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> >>> uint16_t nb_pkts)
> >>> @@ -2005,6 +2295,31 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev
> >>> *dev,
> >>> return (0);
> >>> }
> >>>
> >>> +/**
> >>> + * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
> >>> + *
> >>> + * The "next" pointer of the last segment of (not-yet-completed)
> >>> RSC clusters
> >>> + * in the sw_rsc_ring is not set to NULL but rather points to the next
> >>> + * mbuf of this RSC aggregation (that has not been completed yet
> >>> and still
> >>> + * resides on the HW ring). So, instead of calling for
> >>> rte_pktmbuf_free() we
> >>> + * will just free first "nb_segs" segments of the cluster
> >>> explicitly by calling
> >>> + * an rte_pktmbuf_free_seg().
> >>> + *
> >>> + * @m RSC cluster head
> >>> + */
> >>> +static void
> >>> +ixgbe_free_rsc_cluster(struct rte_mbuf *m)
> >>> +{
> >>> + uint8_t i, nb_segs = m->nb_segs;
> >>> + struct rte_mbuf *next_seg;
> >>> +
> >>> + for (i = 0; i < nb_segs; i++) {
> >>> + next_seg = m->next;
> >>> + rte_pktmbuf_free_seg(m);
> >>> + m = next_seg;
> >>> + }
> >>> +}
> >>> +
> >>> static void
> >>> ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
> >>> {
> >>> @@ -2028,6 +2343,13 @@ ixgbe_rx_queue_release_mbufs(struct
> >>> ixgbe_rx_queue *rxq)
> >>> }
> >>> #endif
> >>> }
> >>> +
> >>> + if (rxq->sw_rsc_ring)
> >>> + for (i = 0; i < rxq->nb_rx_desc; i++)
> >>> + if (rxq->sw_rsc_ring[i].fbuf) {
> >>> + ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
> >>> + rxq->sw_rsc_ring[i].fbuf = NULL;
> >>> + }
> >>> }
> >>>
> >>> static void
> >>> @@ -2036,6 +2358,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue
> >>> *rxq)
> >>> if (rxq != NULL) {
> >>> ixgbe_rx_queue_release_mbufs(rxq);
> >>> rte_free(rxq->sw_ring);
> >>> + rte_free(rxq->sw_rsc_ring);
> >>> rte_free(rxq);
> >>> }
> >>> }
> >>> @@ -2158,6 +2481,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw,
> >>> struct ixgbe_rx_queue *rxq)
> >>> rxq->nb_rx_hold = 0;
> >>> rxq->pkt_first_seg = NULL;
> >>> rxq->pkt_last_seg = NULL;
> >>> + rxq->rsc_en = 0;
> >>> }
> >>>
> >>> int
> >>> @@ -2172,6 +2496,14 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev
> >>> *dev,
> >>> struct ixgbe_rx_queue *rxq;
> >>> struct ixgbe_hw *hw;
> >>> uint16_t len;
> >>> + struct rte_eth_dev_info dev_info = { 0 };
> >>> + struct rte_eth_rxmode *dev_rx_mode = &dev->data->dev_conf.rxmode;
> >>> + bool rsc_requested = false;
> >>> +
> >>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
> >>> + if ((dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO) &&
> >>> + dev_rx_mode->enable_lro)
> >>> + rsc_requested = true;
> >>>
> >>> PMD_INIT_FUNC_TRACE();
> >>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> @@ -2277,12 +2609,27 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev
> >>> *dev,
> >>> rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
> >>> sizeof(struct ixgbe_rx_entry) * len,
> >>> RTE_CACHE_LINE_SIZE, socket_id);
> >>> - if (rxq->sw_ring == NULL) {
> >>> + if (!rxq->sw_ring) {
> >>> ixgbe_rx_queue_release(rxq);
> >>> return (-ENOMEM);
> >>> }
> >>> - PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
> >>> - rxq->sw_ring, rxq->rx_ring, rxq->rx_ring_phys_addr);
> >>> +
> >>> + if (rsc_requested) {
> >>> + rxq->sw_rsc_ring =
> >>> + rte_zmalloc_socket("rxq->sw_rsc_ring",
> >>> + sizeof(struct ixgbe_rsc_entry) * len,
> >>> + RTE_CACHE_LINE_SIZE, socket_id);
> >>> + if (!rxq->sw_rsc_ring) {
> >>> + ixgbe_rx_queue_release(rxq);
> >>> + return (-ENOMEM);
> >>> + }
> >>> + } else
> >>> + rxq->sw_rsc_ring = NULL;
> >>> +
> >>> + PMD_INIT_LOG(DEBUG, "sw_ring=%p sw_rsc_ring=%p hw_ring=%p "
> >>> + "dma_addr=0x%"PRIx64,
> >>> + rxq->sw_ring, rxq->sw_rsc_ring, rxq->rx_ring,
> >>> + rxq->rx_ring_phys_addr);
> >>>
> >>> if (!rte_is_power_of_2(nb_desc)) {
> >>> PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
> >>> @@ -3527,6 +3874,91 @@ ixgbe_dev_mq_tx_configure(struct rte_eth_dev
> >>> *dev)
> >>> return 0;
> >>> }
> >>>
> >>> +/**
> >>> + * ixgbe_get_rscctl_maxdesc - Calculate the RSCCTL[n].MAXDESC for PF
> >>> + *
> >>> + * Return the RSCCTL[n].MAXDESC for 82599 and x540 PF devices
> >>> according to the
> >>> + * spec rev. 3.0 chapter 8.2.3.8.13.
> >>> + *
> >>> + * @pool Memory pool of the Rx queue
> >>> + */
> >>> +static inline uint32_t
> >>> +ixgbe_get_rscctl_maxdesc(struct rte_mempool *pool)
> >>> +{
> >>> + struct rte_pktmbuf_pool_private *mp_priv =
> >>> rte_mempool_get_priv(pool);
> >>> +
> >>> + /* MAXDESC * SRRCTL.BSIZEPKT must not exceed 64 KB minus one */
> >>> + uint16_t maxdesc =
> >>> + IPV4_MAX_PKT_LEN /
> >>> + (mp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM);
> >>> +
> >>> + if (maxdesc >= 16)
> >>> + return IXGBE_RSCCTL_MAXDESC_16;
> >>> + else if (maxdesc >= 8)
> >>> + return IXGBE_RSCCTL_MAXDESC_8;
> >>> + else if (maxdesc >= 4)
> >>> + return IXGBE_RSCCTL_MAXDESC_4;
> >>> + else
> >>> + return IXGBE_RSCCTL_MAXDESC_1;
> >>> +}
> >>> +
> >>> +/**
> >>> + * ixgbe_set_ivar - Setup the correct IVAR register for a
> >>> particular MSIX
> >>> + * interrupt
> >>> + *
> >>> + * (Taken from FreeBSD tree)
> >>> + * (yes this is all very magic and confusing :)
> >>> + *
> >>> + * @dev port handle
> >>> + * @entry the register array entry
> >>> + * @vector the MSIX vector for this queue
> >>> + * @type RX/TX/MISC
> >>> + */
> >>> +static void
> >>> +ixgbe_set_ivar(struct rte_eth_dev *dev, u8 entry, u8 vector, s8 type)
> >>> +{
> >>> + struct ixgbe_hw *hw =
> >>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> + u32 ivar, index;
> >>> +
> >>> + vector |= IXGBE_IVAR_ALLOC_VAL;
> >>> +
> >>> + switch (hw->mac.type) {
> >>> +
> >>> + case ixgbe_mac_82598EB:
> >>> + if (type == -1)
> >>> + entry = IXGBE_IVAR_OTHER_CAUSES_INDEX;
> >>> + else
> >>> + entry += (type * 64);
> >>> + index = (entry >> 2) & 0x1F;
> >>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index));
> >>> + ivar &= ~(0xFF << (8 * (entry & 0x3)));
> >>> + ivar |= (vector << (8 * (entry & 0x3)));
> >>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar);
> >>> + break;
> >>> +
> >>> + case ixgbe_mac_82599EB:
> >>> + case ixgbe_mac_X540:
> >>> + if (type == -1) { /* MISC IVAR */
> >>> + index = (entry & 1) * 8;
> >>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR_MISC);
> >>> + ivar &= ~(0xFF << index);
> >>> + ivar |= (vector << index);
> >>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR_MISC, ivar);
> >>> + } else { /* RX/TX IVARS */
> >>> + index = (16 * (entry & 1)) + (8 * type);
> >>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(entry >> 1));
> >>> + ivar &= ~(0xFF << index);
> >>> + ivar |= (vector << index);
> >>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(entry >> 1), ivar);
> >>> + }
> >>> +
> >>> + break;
> >>> +
> >>> + default:
> >>> + break;
> >>> + }
> >>> +}
> >>> +
> >>> void ixgbe_set_rx_function(struct rte_eth_dev *dev)
> >>> {
> >>> struct ixgbe_hw *hw =
> >>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> @@ -3545,7 +3977,24 @@ void ixgbe_set_rx_function(struct rte_eth_dev
> >>> *dev)
> >>> hw->rx_vec_allowed = false;
> >>> }
> >>>
> >>> - if (dev->data->scattered_rx) {
> >>> + /*
> >>> + * Initialize the appropriate LRO callback.
> >>> + *
> >>> + * If all queues satisfy the bulk allocation preconditions
> >>> + * (hw->rx_bulk_alloc_allowed is TRUE) then we may use bulk
> >>> allocation.
> >>> + * Otherwise use a single allocation version.
> >>> + */
> >>> + if (dev->data->lro) {
> >>> + if (hw->rx_bulk_alloc_allowed) {
> >>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a bulk "
> >>> + "allocation version");
> >>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
> >>> + } else {
> >>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a single "
> >>> + "allocation version");
> >>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
> >>> + }
> >>> + } else if (dev->data->scattered_rx) {
> >>> /*
> >>> * Set the non-LRO scattered callback: there are Vector and
> >>> * single allocation versions.
> >>> @@ -3595,6 +4044,149 @@ void ixgbe_set_rx_function(struct
> >>> rte_eth_dev *dev)
> >>> }
> >>> }
> >>>
> >>> +/**
> >>> + * ixgbe_set_rsc - configure RSC related port HW registers
> >>> + *
> >>> + * Configures the port's RSC related registers according to the
> >>> 4.6.7.2 chapter
> >>> + * of 82599 Spec (x540 configuration is virtually the same).
> >>> + *
> >>> + * @dev port handle
> >>> + *
> >>> + * Returns 0 in case of success or a non-zero error code
> >>> + */
> >>> +static int
> >>> +ixgbe_set_rsc(struct rte_eth_dev *dev)
> >>> +{
> >>> + struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
> >>> + struct ixgbe_hw *hw =
> >>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> + struct rte_eth_dev_info dev_info = { 0 };
> >>> + bool rsc_capable = false;
> >>> + uint16_t i;
> >>> + uint32_t rdrxctl;
> >>> +
> >>> + /* Sanity check */
> >>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
> >>> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO)
> >>> + rsc_capable = true;
> >>> +
> >>> + if (!rsc_capable && rx_conf->enable_lro) {
> >>> + PMD_INIT_LOG(CRIT, "LRO is requested on HW that doesn't "
> >>> + "support it");
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + /* RSC global configuration (chapter 4.6.7.2.1 of 82599 Spec) */
> >>> +
> >>> + if (!rx_conf->hw_strip_crc && rx_conf->enable_lro) {
> >>> + /*
> >>> + * According to chapter of 4.6.7.2.1 of the Spec Rev.
> >>> + * 3.0 RSC configuration requires HW CRC stripping being
> >>> + * enabled. If user requested both HW CRC stripping off
> >>> + * and RSC on - return an error.
> >>> + */
> >>> + PMD_INIT_LOG(CRIT, "LRO can't be enabled when HW CRC "
> >>> + "is disabled");
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + /* RFCTL configuration */
> >>> + if (rsc_capable) {
> >>> + uint32_t rfctl = IXGBE_READ_REG(hw, IXGBE_RFCTL);
> >>> + if (rx_conf->enable_lro)
> >>> + /*
> >>> + * Since NFS packets coalescing is not supported - clear
> >>> + * RFCTL.NFSW_DIS and RFCTL.NFSR_DIS when RSC is
> >>> + * enabled.
> >>> + */
> >>> + rfctl &= ~(IXGBE_RFCTL_RSC_DIS | IXGBE_RFCTL_NFSW_DIS |
> >>> + IXGBE_RFCTL_NFSR_DIS);
> >>> + else
> >>> + rfctl |= IXGBE_RFCTL_RSC_DIS;
> >>> +
> >>> + IXGBE_WRITE_REG(hw, IXGBE_RFCTL, rfctl);
> >>> + }
> >>> +
> >>> + /* If LRO hasn't been requested - we are done here. */
> >>> + if (!rx_conf->enable_lro)
> >>> + return 0;
> >>> +
> >>> + /* Set RDRXCTL.RSCACKC bit */
> >>> + rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
> >>> + rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
> >>> + IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
> >>> +
> >>> + /* Per-queue RSC configuration (chapter 4.6.7.2.2 of 82599
> >>> Spec) */
> >>> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >>> + struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
> >>> + uint32_t srrctl =
> >>> + IXGBE_READ_REG(hw, IXGBE_SRRCTL(rxq->reg_idx));
> >>> + uint32_t rscctl =
> >>> + IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxq->reg_idx));
> >>> + uint32_t psrtype =
> >>> + IXGBE_READ_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx));
> >>> + uint32_t eitr =
> >>> + IXGBE_READ_REG(hw, IXGBE_EITR(rxq->reg_idx));
> >>> +
> >>> + /*
> >>> + * ixgbe PMD doesn't support header-split at the moment.
> >>> + *
> >>> + * Following the 4.6.7.2.1 chapter of the 82599/x540
> >>> + * Spec if RSC is enabled the SRRCTL[n].BSIZEHEADER
> >>> + * should be configured even if header split is not
> >>> + * enabled. We will configure it 128 bytes following the
> >>> + * recommendation in the spec.
> >>> + */
> >>> + srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
> >>> + srrctl |= (128 << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
> >>> + IXGBE_SRRCTL_BSIZEHDR_MASK;
> >>> +
> >>> + /*
> >>> + * TODO: Consider setting the Receive Descriptor Minimum
> >>> + * Threshold Size for an RSC case. This is not an obviously
> >>> + * beneficiary option but the one worth considering...
> >>> + */
> >>> +
> >>> + rscctl |= IXGBE_RSCCTL_RSCEN;
> >>> + rscctl |= ixgbe_get_rscctl_maxdesc(rxq->mb_pool);
> >>> + psrtype |= IXGBE_PSRTYPE_TCPHDR;
> >>> +
> >>> + /*
> >>> + * RSC: Set ITR interval corresponding to 2K ints/s.
> >>> + *
> >>> + * Full-sized RSC aggregations for a 10Gb/s link will
> >>> + * arrive at about 20K aggregation/s rate.
> >>> + *
> >>> + * 2K inst/s rate will make only 10% of the
> >>> + * aggregations to be closed due to the interrupt timer
> >>> + * expiration for a streaming at wire-speed case.
> >>> + *
> >>> + * For a sparse streaming case this setting will yield
> >>> + * at most 500us latency for a single RSC aggregation.
> >>> + */
> >>> + eitr &= ~IXGBE_EITR_ITR_INT_MASK;
> >>> + eitr |= IXGBE_EITR_INTERVAL_US(500) | IXGBE_EITR_CNT_WDIS;
> >>> +
> >>> + IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
> >>> + IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxq->reg_idx), rscctl);
> >>> + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
> >>> + IXGBE_WRITE_REG(hw, IXGBE_EITR(rxq->reg_idx), eitr);
> >>> +
> >>> + /*
> >>> + * RSC requires the mapping of the queue to the
> >>> + * interrupt vector.
> >>> + */
> >>> + ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
> >>> +
> >>> + rxq->rsc_en = 1;
> >>> + }
> >>> +
> >>> + dev->data->lro = 1;
> >>> +
> >>> + PMD_INIT_LOG(INFO, "enabling LRO mode");
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> /*
> >>> * Initializes Receive Unit.
> >>> */
> >>> @@ -3615,6 +4207,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> >>> uint16_t buf_size;
> >>> uint16_t i;
> >>> struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
> >>> + int rc;
> >>>
> >>> PMD_INIT_FUNC_TRACE();
> >>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> @@ -3724,6 +4317,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> >>> RTE_PKTMBUF_HEADROOM);
> >>> srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) &
> >>> IXGBE_SRRCTL_BSIZEPKT_MASK);
> >>> +
> >>> IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
> >>>
> >>> buf_size = (uint16_t) ((srrctl &
> >>> IXGBE_SRRCTL_BSIZEPKT_MASK) <<
> >>> @@ -3738,8 +4332,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> >>> if (rx_conf->enable_scatter)
> >>> dev->data->scattered_rx = 1;
> >>>
> >>> - ixgbe_set_rx_function(dev);
> >>> -
> >>> /*
> >>> * Device configured with multiple RX queues.
> >>> */
> >>> @@ -3770,6 +4362,12 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> >>> IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
> >>> }
> >>>
> >>> + rc = ixgbe_set_rsc(dev);
> >>> + if (rc)
> >>> + return rc;
> >>> +
> >>> + ixgbe_set_rx_function(dev);
> >>> +
> >>> return 0;
> >>> }
> >>>
> >>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> >>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> >>> index 3937cf6..120bbbd 100644
> >>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> >>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> >>> @@ -79,6 +79,10 @@ struct ixgbe_rx_entry {
> >>> struct rte_mbuf *mbuf; /**< mbuf associated with RX
> >>> descriptor. */
> >>> };
> >>>
> >>> +struct ixgbe_rsc_entry {
> >>> + struct rte_mbuf *fbuf; /**< First segment of the fragmented
> >>> packet. */
> >>> +};
> >>> +
> >>> /**
> >>> * Structure associated with each descriptor of the TX ring of a
> >>> TX queue.
> >>> */
> >>> @@ -105,6 +109,7 @@ struct ixgbe_rx_queue {
> >>> volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
> >>> volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
> >>> struct ixgbe_rx_entry *sw_ring; /**< address of RX software
> >>> ring. */
> >>> + struct ixgbe_rsc_entry *sw_rsc_ring; /**< address of RSC
> >>> software ring. */
> >>> struct rte_mbuf *pkt_first_seg; /**< First segment of current
> >>> packet. */
> >>> struct rte_mbuf *pkt_last_seg; /**< Last segment of current
> >>> packet. */
> >>> uint64_t mbuf_initializer; /**< value to init mbufs */
> >>> @@ -126,6 +131,7 @@ struct ixgbe_rx_queue {
> >>> uint8_t port_id; /**< Device port identifier. */
> >>> uint8_t crc_len; /**< 0 if CRC stripped, 4
> >>> otherwise. */
> >>> uint8_t drop_en; /**< If not 0, set
> >>> SRRCTL.Drop_En. */
> >>> + uint8_t rsc_en; /**< If not 0, RSC is enabled. */
> >>> uint8_t rx_deferred_start; /**< not in global dev
> >>> start. */
> >>> #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
> >>> /** need to alloc dummy mbuf, for wraparound when scanning hw
> >>> ring */
> >>> --
> >>> 2.1.0
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
2015-03-31 10:25 [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support Ananyev, Konstantin
@ 2015-03-31 11:40 ` Vlad Zolotarov
0 siblings, 0 replies; 6+ messages in thread
From: Vlad Zolotarov @ 2015-03-31 11:40 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
On 03/31/15 13:25, Ananyev, Konstantin wrote:
>
>> -----Original Message-----
>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
>> Sent: Monday, March 30, 2015 4:57 PM
>> To: Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>
>>
>>
>> On 03/30/15 18:37, Vlad Zolotarov wrote:
>>>
>>> On 03/30/15 17:18, Ananyev, Konstantin wrote:
>>>> Hi Vlad,
>>>>
>>>>> -----Original Message-----
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov
>>>>> Sent: Wednesday, March 18, 2015 5:52 PM
>>>>> To: dev@dpdk.org
>>>>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>>>>
>>>>> - Only x540 and 82599 devices support LRO.
>>>>> - Add the appropriate HW configuration.
>>>>> - Add RSC aware rx_pkt_burst() handlers:
>>>>> - Implemented bulk allocation and non-bulk allocation versions.
>>>>> - Add LRO-specific fields to rte_eth_rxmode, to
>>>>> rte_eth_dev_data
>>>>> and to ixgbe_rx_queue.
>>>>> - Use the appropriate handler when LRO is requested.
>>>>>
>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
>>>>> ---
>>>>> New in v8:
>>>>> - Took the RSC configuration code from ixgbe_dev_rx_init() into
>>>>> a separate
>>>>> function - ixgbe_set_rsc().
>>>>> - Added some missing macros for HW configuration.
>>>>> - Styling adjustments:
>>>>> - Functions names.
>>>>> - Functions descriptions.
>>>>> - Reworked the ixgbe_free_rsc_cluster() code to make it more
>>>>> readable.
>>>>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not
>>>>> supported by
>>>>> ixgbe PMD.
>>>>>
>>>>> New in v7:
>>>>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop()
>>>>> flow.
>>>>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>>>>> This is needed for ixgbe_rx_queue_release_mbufs().
>>>>> - _recv_pkts_lro(): added the missing memory barrier before RDT
>>>>> update in a
>>>>> non-bulk allocation case.
>>>>> - Don't allow RSC when device is configured in an SR-IOV mode.
>>>>>
>>>>> New in v5:
>>>>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning
>>>>> of rte_ethdev.h.
>>>>> - Removed the "TODO: Remove me" comment near
>>>>> RTE_ETHDEV_HAS_LRO_SUPPORT.
>>>>>
>>>>> New in v4:
>>>>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>>>>> RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>>>>
>>>>> New in v2:
>>>>> - Removed rte_eth_dev_data.lro_bulk_alloc.
>>>>> - Fixed a few styling and spelling issues.
>>>>> ---
>>>>> lib/librte_ether/rte_ethdev.h | 9 +-
>>>>> lib/librte_net/rte_ip.h | 3 +
>>>>> lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
>>>>> lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
>>>>> lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
>>>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 610
>>>>> +++++++++++++++++++++++++++++++-
>>>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
>>>>> 7 files changed, 642 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_ether/rte_ethdev.h
>>>>> b/lib/librte_ether/rte_ethdev.h
>>>>> index 21aa359..61dc49a 100644
>>>>> --- a/lib/librte_ether/rte_ethdev.h
>>>>> +++ b/lib/librte_ether/rte_ethdev.h
>>>>> @@ -172,6 +172,9 @@ extern "C" {
>>>>>
>>>>> #include <stdint.h>
>>>>>
>>>>> +/* Use this macro to check if LRO API is supported */
>>>>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>>>>> +
>>>>> #include <rte_log.h>
>>>>> #include <rte_interrupts.h>
>>>>> #include <rte_pci.h>
>>>>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>>>>> enum rte_eth_rx_mq_mode mq_mode;
>>>>> uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame
>>>>> enabled. */
>>>>> uint16_t split_hdr_size; /**< hdr buf size (header_split
>>>>> enabled).*/
>>>>> - uint8_t header_split : 1, /**< Header Split enable. */
>>>>> + uint16_t header_split : 1, /**< Header Split enable. */
>>>>> hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload
>>>>> enable. */
>>>>> hw_vlan_filter : 1, /**< VLAN filter enable. */
>>>>> hw_vlan_strip : 1, /**< VLAN strip enable. */
>>>>> hw_vlan_extend : 1, /**< Extended VLAN enable. */
>>>>> jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
>>>>> hw_strip_crc : 1, /**< Enable CRC stripping by
>>>>> hardware. */
>>>>> - enable_scatter : 1; /**< Enable scatter packets rx
>>>>> handler */
>>>>> + enable_scatter : 1, /**< Enable scatter packets rx
>>>>> handler */
>>>>> + enable_lro : 1; /**< Enable LRO */
>>>>> };
>>>>>
>>>>> /**
>>>>> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
>>>>> uint8_t port_id; /**< Device [external] port
>>>>> identifier. */
>>>>> uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) /
>>>>> OFF(0). */
>>>>> scattered_rx : 1, /**< RX of scattered packets is ON(1) /
>>>>> OFF(0) */
>>>>> + lro : 1, /**< RX LRO is ON(1) / OFF(0) */
>>>>> all_multicast : 1, /**< RX all multicast mode ON(1) /
>>>>> OFF(0). */
>>>>> dev_started : 1; /**< Device state: STARTED(1) /
>>>>> STOPPED(0). */
>>>>> };
>>>>> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
>>>>> index 64935d9..74c9ced 100644
>>>>> --- a/lib/librte_net/rte_ip.h
>>>>> +++ b/lib/librte_net/rte_ip.h
>>>>> @@ -110,6 +110,9 @@ struct ipv4_hdr {
>>>>> (((c) & 0xff) << 8) | \
>>>>> ((d) & 0xff))
>>>>>
>>>>> +/** Maximal IPv4 packet length (including a header) */
>>>>> +#define IPV4_MAX_PKT_LEN 65535
>>>>> +
>>>>> /** Internet header length mask for version_ihl field */
>>>>> #define IPV4_HDR_IHL_MASK (0x0f)
>>>>> /**
>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>>>> b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>>>> index 9a66370..4998627 100644
>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>>>> @@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
>>>>> #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) *
>>>>> 4)) : \
>>>>> (0x012300 + (((_i) - 24) * 4)))
>>>>> #define IXGBE_EITR_ITR_INT_MASK 0x00000FF8
>>>>> +#define IXGBE_EITR_ITR_INT_SHIFT 3
>>>>> #define IXGBE_EITR_LLI_MOD 0x00008000
>>>>> #define IXGBE_EITR_CNT_WDIS 0x80000000
>>>>> +/* EITR Inteval is in 2048ns uinits for 1G and 10G link */
>>>>> +#define IXGBE_EITR_INTERVAL_UNIT_NS 2048
>>>> Why is that? Why not just 2us as 82599 spec says?
>>>> And then:
>>>> +#define IXGBE_EITR_INTERVAL_US(us) (((us) /
>>>> IXGBE_EITR_INTERVAL_UNIT_US << IXGBE_EITR_ITR_INT_SHIFT) &
>>>> IXGBE_EITR_ITR_INT_MASK)
>>>> ?
>>> Because spec doesn't say 2us but rather 2.048us - see 8.2.4.5.16
>>> chapter of x540 spec.
>> The most confusing part is that 82599 Rev. 3.0 spec does say so.
> I see what you mean now.
> Indeed that's a bit strange that in 82599 and X540 spec ITR_Interval units are different.
> We sent a question about this to HW guys, but don't know when there will be an answer.
> I suppose we can stick with 2.048 us case for now.
> From my understanding, even if 82599 uses 2us the difference in max timeout would be pretty small: 500 us vs 488 us
> and shouldn't cause any performance drop.
> I wonder did try your LRO latest changes on 82599?
> If yes, did you see any difference in performance vs previous version, when hardcoded value for 2us was used?
U are right about the negligible difference even if it exists (if it's
this is not a small mistake in a 82599 spec).
Unfortunately all 10G Intel NICs we have in our lab are x540 NICs.
However I doubt that I could have noticed the difference even if I had
82599 NICs. This is because I don't have a too sophisticated LRO test
suite here and I only checked LRO's basic functionality with a
multi-socket TCP streaming test. With this scenario the timer above is
going to influence only about 10% of the aggregations (like I explained
in the code) and if the latency drops to 488us it's going to be 10.25%
of the aggregations. So, I doubt that influence of this difference on
the performance is going to be easily separated from the statistics
noise... ;)
> Konstantin
>
>> The x540 spec I have is dated by March 2014 and 82599 spec is dated by
>> November 2014 but according to changelog EITR description has never been
>> updated. Therefore I conclude that the more recent device spec (x540)
>> holds the most up to date content.
>>
>> Pls., confirm.
>>
>> thanks,
>> vlad
>>
>>>>> +#define IXGBE_EITR_INTERVAL_US(us) \
>>>>> + (((us) * 1000 / IXGBE_EITR_INTERVAL_UNIT_NS <<
>>>>> IXGBE_EITR_ITR_INT_SHIFT) & \
>>>>> + IXGBE_EITR_ITR_INT_MASK)
>>>> Unfortunately we are not supposed to change files under ixgbe
>>>> subfirectory (except ixgbe_osdep.*).
>>>> These files comes from ixgbe shared code and we treat them as read-only.
>>>> You need to move these new macros you added into ixgbe_ethedev.h.
>>> Aha... Ok. No problem.
>>>
>>>> All other things look good to me.
>>>> Konstantin
>>>>
>>>>> #define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at
>>>>> 0x900-0x960 */
>>>>> #define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt
>>>>> causes */
>>>>> #define IXGBE_EITRSEL 0x00894
>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>>>> index 92d75db..923692e 100644
>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>>>> @@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>>>>
>>>>> /* Clear stored conf */
>>>>> dev->data->scattered_rx = 0;
>>>>> + dev->data->lro = 0;
>>>>>
>>>>> /* Clear recorded link status */
>>>>> memset(&link, 0, sizeof(link));
>>>>> @@ -2013,6 +2014,16 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
>>>>> struct rte_eth_dev_info *dev_info)
>>>>> DEV_RX_OFFLOAD_IPV4_CKSUM |
>>>>> DEV_RX_OFFLOAD_UDP_CKSUM |
>>>>> DEV_RX_OFFLOAD_TCP_CKSUM;
>>>>> +
>>>>> + /*
>>>>> + * RSC is only supported by 82599 and x540 PF devices in a
>>>>> non-SR-IOV
>>>>> + * mode.
>>>>> + */
>>>>> + if ((hw->mac.type == ixgbe_mac_82599EB ||
>>>>> + hw->mac.type == ixgbe_mac_X540) &&
>>>>> + !RTE_ETH_DEV_SRIOV(dev).active)
>>>>> + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
>>>>> +
>>>>> dev_info->tx_offload_capa =
>>>>> DEV_TX_OFFLOAD_VLAN_INSERT |
>>>>> DEV_TX_OFFLOAD_IPV4_CKSUM |
>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>>>> index ffe3471..7050340 100644
>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>>>> @@ -344,6 +344,11 @@ uint16_t ixgbe_recv_pkts(void *rx_queue, struct
>>>>> rte_mbuf **rx_pkts,
>>>>> uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
>>>>> struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>>>>
>>>>> +uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
>>>>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>>>> +uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
>>>>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>>>> +
>>>>> uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>>> uint16_t nb_pkts);
>>>>>
>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>>>> index a08ae6a..daa4c45 100644
>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>>>> @@ -70,6 +70,7 @@
>>>>> #include <rte_sctp.h>
>>>>> #include <rte_string_fns.h>
>>>>> #include <rte_errno.h>
>>>>> +#include <rte_ip.h>
>>>>>
>>>>> #include "ixgbe_logs.h"
>>>>> #include "ixgbe/ixgbe_api.h"
>>>>> @@ -1422,6 +1423,295 @@ ixgbe_fill_cluster_head_buf(
>>>>> }
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * ixgbe_recv_pkts_lro - receive handler for and LRO case.
>>>>> + *
>>>>> + * @rx_queue Rx queue handle
>>>>> + * @rx_pkts table of received packets
>>>>> + * @nb_pkts size of rx_pkts table
>>>>> + * @bulk_alloc if TRUE bulk allocation is used for a HW ring refilling
>>>>> + *
>>>>> + * Handles the Rx HW ring completions when RSC feature is
>>>>> configured. Uses an
>>>>> + * additional ring of ixgbe_rsc_entry's that will hold the relevant
>>>>> RSC info.
>>>>> + *
>>>>> + * We use the same logic as in Lunux and in FreeBSD ixgbe drivers:
>>>>> + * 1) When non-EOP RSC completion arrives:
>>>>> + * a) Update the HEAD of the current RSC aggregation cluster
>>>>> with the new
>>>>> + * segment's data length.
>>>>> + * b) Set the "next" pointer of the current segment to point to
>>>>> the segment
>>>>> + * at the NEXTP index.
>>>>> + * c) Pass the HEAD of RSC aggregation cluster on to the next
>>>>> NEXTP entry
>>>>> + * in the sw_rsc_ring.
>>>>> + * 2) When EOP arrives we just update the cluster's total length
>>>>> and offload
>>>>> + * flags and deliver the cluster up to the upper layers. In our
>>>>> case - put it
>>>>> + * in the rx_pkts table.
>>>>> + *
>>>>> + * Returns the number of received packets/clusters (according to
>>>>> the "bulk
>>>>> + * receive" interface).
>>>>> + */
>>>>> +static inline uint16_t
>>>>> +ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts,
>>>>> uint16_t nb_pkts,
>>>>> + bool bulk_alloc)
>>>>> +{
>>>>> + struct ixgbe_rx_queue *rxq = rx_queue;
>>>>> + volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
>>>>> + struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
>>>>> + struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
>>>>> + uint16_t rx_id = rxq->rx_tail;
>>>>> + uint16_t nb_rx = 0;
>>>>> + uint16_t nb_hold = rxq->nb_rx_hold;
>>>>> + uint16_t prev_id = rxq->rx_tail;
>>>>> +
>>>>> + while (nb_rx < nb_pkts) {
>>>>> + bool eop;
>>>>> + struct ixgbe_rx_entry *rxe;
>>>>> + struct ixgbe_rsc_entry *rsc_entry;
>>>>> + struct ixgbe_rsc_entry *next_rsc_entry;
>>>>> + struct ixgbe_rx_entry *next_rxe;
>>>>> + struct rte_mbuf *first_seg;
>>>>> + struct rte_mbuf *rxm;
>>>>> + struct rte_mbuf *nmb;
>>>>> + union ixgbe_adv_rx_desc rxd;
>>>>> + uint16_t data_len;
>>>>> + uint16_t next_id;
>>>>> + volatile union ixgbe_adv_rx_desc *rxdp;
>>>>> + uint32_t staterr;
>>>>> +
>>>>> +next_desc:
>>>>> + /*
>>>>> + * The code in this whole file uses the volatile pointer to
>>>>> + * ensure the read ordering of the status and the rest of the
>>>>> + * descriptor fields (on the compiler level only!!!). This
>>>>> is so
>>>>> + * UGLY - why not to just use the compiler barrier instead?
>>>>> DPDK
>>>>> + * even has the rte_compiler_barrier() for that.
>>>>> + *
>>>>> + * But most importantly this is just wrong because this
>>>>> doesn't
>>>>> + * ensure memory ordering in a general case at all. For
>>>>> + * instance, DPDK is supposed to work on Power CPUs where
>>>>> + * compiler barrier may just not be enough!
>>>>> + *
>>>>> + * I tried to write only this function properly to have a
>>>>> + * starting point (as a part of an LRO/RSC series) but the
>>>>> + * compiler cursed at me when I tried to cast away the
>>>>> + * "volatile" from rx_ring (yes, it's volatile too!!!). So,
>>>>> I'm
>>>>> + * keeping it the way it is for now.
>>>>> + *
>>>>> + * The code in this file is broken in so many other places and
>>>>> + * will just not work on a big endian CPU anyway therefore the
>>>>> + * lines below will have to be revisited together with the
>>>>> rest
>>>>> + * of the ixgbe PMD.
>>>>> + *
>>>>> + * TODO:
>>>>> + * - Get rid of "volatile" crap and let the compiler do its
>>>>> + * job.
>>>>> + * - Use the proper memory barrier (rte_rmb()) to ensure
>>>>> the
>>>>> + * memory ordering below.
>>>>> + */
>>>>> + rxdp = &rx_ring[rx_id];
>>>>> + staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
>>>>> +
>>>>> + if (!(staterr & IXGBE_RXDADV_STAT_DD))
>>>>> + break;
>>>>> +
>>>>> + rxd = *rxdp;
>>>>> +
>>>>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
>>>>> + "staterr=0x%x data_len=%u",
>>>>> + rxq->port_id, rxq->queue_id, rx_id, staterr,
>>>>> + rte_le_to_cpu_16(rxd.wb.upper.length));
>>>>> +
>>>>> + if (!bulk_alloc) {
>>>>> + nmb = rte_rxmbuf_alloc(rxq->mb_pool);
>>>>> + if (nmb == NULL) {
>>>>> + PMD_RX_LOG(DEBUG, "RX mbuf alloc failed "
>>>>> + "port_id=%u queue_id=%u",
>>>>> + rxq->port_id, rxq->queue_id);
>>>>> +
>>>>> + rte_eth_devices[rxq->port_id].data->
>>>>> + rx_mbuf_alloc_failed++;
>>>>> + break;
>>>>> + }
>>>>> + } else if (nb_hold > rxq->rx_free_thresh) {
>>>>> + uint16_t next_rdt = rxq->rx_free_trigger;
>>>>> +
>>>>> + if (!ixgbe_rx_alloc_bufs(rxq, false)) {
>>>>> + rte_wmb();
>>>>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr,
>>>>> + next_rdt);
>>>>> + nb_hold -= rxq->rx_free_thresh;
>>>>> + } else {
>>>>> + PMD_RX_LOG(DEBUG, "RX bulk alloc failed "
>>>>> + "port_id=%u queue_id=%u",
>>>>> + rxq->port_id, rxq->queue_id);
>>>>> +
>>>>> + rte_eth_devices[rxq->port_id].data->
>>>>> + rx_mbuf_alloc_failed++;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + nb_hold++;
>>>>> + rxe = &sw_ring[rx_id];
>>>>> + eop = staterr & IXGBE_RXDADV_STAT_EOP;
>>>>> +
>>>>> + next_id = rx_id + 1;
>>>>> + if (next_id == rxq->nb_rx_desc)
>>>>> + next_id = 0;
>>>>> +
>>>>> + /* Prefetch next mbuf while processing current one. */
>>>>> + rte_ixgbe_prefetch(sw_ring[next_id].mbuf);
>>>>> +
>>>>> + /*
>>>>> + * When next RX descriptor is on a cache-line boundary,
>>>>> + * prefetch the next 4 RX descriptors and the next 4 pointers
>>>>> + * to mbufs.
>>>>> + */
>>>>> + if ((next_id & 0x3) == 0) {
>>>>> + rte_ixgbe_prefetch(&rx_ring[next_id]);
>>>>> + rte_ixgbe_prefetch(&sw_ring[next_id]);
>>>>> + }
>>>>> +
>>>>> + rxm = rxe->mbuf;
>>>>> +
>>>>> + if (!bulk_alloc) {
>>>>> + __le64 dma =
>>>>> + rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>>>>> + /*
>>>>> + * Update RX descriptor with the physical address of the
>>>>> + * new data buffer of the new allocated mbuf.
>>>>> + */
>>>>> + rxe->mbuf = nmb;
>>>>> +
>>>>> + rxm->data_off = RTE_PKTMBUF_HEADROOM;
>>>>> + rxdp->read.hdr_addr = dma;
>>>>> + rxdp->read.pkt_addr = dma;
>>>>> + } else
>>>>> + rxe->mbuf = NULL;
>>>>> +
>>>>> + /*
>>>>> + * Set data length & data buffer address of mbuf.
>>>>> + */
>>>>> + data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
>>>>> + rxm->data_len = data_len;
>>>>> +
>>>>> + if (!eop) {
>>>>> + uint16_t nextp_id;
>>>>> + /*
>>>>> + * Get next descriptor index:
>>>>> + * - For RSC it's in the NEXTP field.
>>>>> + * - For a scattered packet - it's just a following
>>>>> + * descriptor.
>>>>> + */
>>>>> + if (ixgbe_rsc_count(&rxd))
>>>>> + nextp_id =
>>>>> + (staterr & IXGBE_RXDADV_NEXTP_MASK) >>
>>>>> + IXGBE_RXDADV_NEXTP_SHIFT;
>>>>> + else
>>>>> + nextp_id = next_id;
>>>>> +
>>>>> + next_rsc_entry = &sw_rsc_ring[nextp_id];
>>>>> + next_rxe = &sw_ring[nextp_id];
>>>>> + rte_ixgbe_prefetch(next_rxe);
>>>>> + }
>>>>> +
>>>>> + rsc_entry = &sw_rsc_ring[rx_id];
>>>>> + first_seg = rsc_entry->fbuf;
>>>>> + rsc_entry->fbuf = NULL;
>>>>> +
>>>>> + /*
>>>>> + * If this is the first buffer of the received packet,
>>>>> + * set the pointer to the first mbuf of the packet and
>>>>> + * initialize its context.
>>>>> + * Otherwise, update the total length and the number of
>>>>> segments
>>>>> + * of the current scattered packet, and update the pointer to
>>>>> + * the last mbuf of the current packet.
>>>>> + */
>>>>> + if (first_seg == NULL) {
>>>>> + first_seg = rxm;
>>>>> + first_seg->pkt_len = data_len;
>>>>> + first_seg->nb_segs = 1;
>>>>> + } else {
>>>>> + first_seg->pkt_len += data_len;
>>>>> + first_seg->nb_segs++;
>>>>> + }
>>>>> +
>>>>> + prev_id = rx_id;
>>>>> + rx_id = next_id;
>>>>> +
>>>>> + /*
>>>>> + * If this is not the last buffer of the received packet,
>>>>> update
>>>>> + * the pointer to the first mbuf at the NEXTP entry in the
>>>>> + * sw_rsc_ring and continue to parse the RX ring.
>>>>> + */
>>>>> + if (!eop) {
>>>>> + rxm->next = next_rxe->mbuf;
>>>>> + next_rsc_entry->fbuf = first_seg;
>>>>> + goto next_desc;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * This is the last buffer of the received packet - return
>>>>> + * the current cluster to the user.
>>>>> + */
>>>>> + rxm->next = NULL;
>>>>> +
>>>>> + /* Initialize the first mbuf of the returned packet */
>>>>> + ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq->port_id,
>>>>> + staterr);
>>>>> +
>>>>> + /* Prefetch data of first segment, if configured to do so. */
>>>>> + rte_packet_prefetch((char *)first_seg->buf_addr +
>>>>> + first_seg->data_off);
>>>>> +
>>>>> + /*
>>>>> + * Store the mbuf address into the next entry of the array
>>>>> + * of returned packets.
>>>>> + */
>>>>> + rx_pkts[nb_rx++] = first_seg;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * Record index of the next RX descriptor to probe.
>>>>> + */
>>>>> + rxq->rx_tail = rx_id;
>>>>> +
>>>>> + /*
>>>>> + * If the number of free RX descriptors is greater than the RX
>>>>> free
>>>>> + * threshold of the queue, advance the Receive Descriptor Tail
>>>>> (RDT)
>>>>> + * register.
>>>>> + * Update the RDT with the value of the last processed RX
>>>>> descriptor
>>>>> + * minus 1, to guarantee that the RDT register is never equal
>>>>> to the
>>>>> + * RDH register, which creates a "full" ring situtation from the
>>>>> + * hardware point of view...
>>>>> + */
>>>>> + if (!bulk_alloc && nb_hold > rxq->rx_free_thresh) {
>>>>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
>>>>> + "nb_hold=%u nb_rx=%u",
>>>>> + rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx);
>>>>> +
>>>>> + rte_wmb();
>>>>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id);
>>>>> + nb_hold = 0;
>>>>> + }
>>>>> +
>>>>> + rxq->nb_rx_hold = nb_hold;
>>>>> + return nb_rx;
>>>>> +}
>>>>> +
>>>>> +uint16_t
>>>>> +ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf
>>>>> **rx_pkts,
>>>>> + uint16_t nb_pkts)
>>>>> +{
>>>>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
>>>>> +}
>>>>> +
>>>>> +uint16_t
>>>>> +ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf
>>>>> **rx_pkts,
>>>>> + uint16_t nb_pkts)
>>>>> +{
>>>>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
>>>>> +}
>>>>> +
>>>>> uint16_t
>>>>> ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>>>>> uint16_t nb_pkts)
>>>>> @@ -2005,6 +2295,31 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev
>>>>> *dev,
>>>>> return (0);
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
>>>>> + *
>>>>> + * The "next" pointer of the last segment of (not-yet-completed)
>>>>> RSC clusters
>>>>> + * in the sw_rsc_ring is not set to NULL but rather points to the next
>>>>> + * mbuf of this RSC aggregation (that has not been completed yet
>>>>> and still
>>>>> + * resides on the HW ring). So, instead of calling for
>>>>> rte_pktmbuf_free() we
>>>>> + * will just free first "nb_segs" segments of the cluster
>>>>> explicitly by calling
>>>>> + * an rte_pktmbuf_free_seg().
>>>>> + *
>>>>> + * @m RSC cluster head
>>>>> + */
>>>>> +static void
>>>>> +ixgbe_free_rsc_cluster(struct rte_mbuf *m)
>>>>> +{
>>>>> + uint8_t i, nb_segs = m->nb_segs;
>>>>> + struct rte_mbuf *next_seg;
>>>>> +
>>>>> + for (i = 0; i < nb_segs; i++) {
>>>>> + next_seg = m->next;
>>>>> + rte_pktmbuf_free_seg(m);
>>>>> + m = next_seg;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> static void
>>>>> ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
>>>>> {
>>>>> @@ -2028,6 +2343,13 @@ ixgbe_rx_queue_release_mbufs(struct
>>>>> ixgbe_rx_queue *rxq)
>>>>> }
>>>>> #endif
>>>>> }
>>>>> +
>>>>> + if (rxq->sw_rsc_ring)
>>>>> + for (i = 0; i < rxq->nb_rx_desc; i++)
>>>>> + if (rxq->sw_rsc_ring[i].fbuf) {
>>>>> + ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
>>>>> + rxq->sw_rsc_ring[i].fbuf = NULL;
>>>>> + }
>>>>> }
>>>>>
>>>>> static void
>>>>> @@ -2036,6 +2358,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue
>>>>> *rxq)
>>>>> if (rxq != NULL) {
>>>>> ixgbe_rx_queue_release_mbufs(rxq);
>>>>> rte_free(rxq->sw_ring);
>>>>> + rte_free(rxq->sw_rsc_ring);
>>>>> rte_free(rxq);
>>>>> }
>>>>> }
>>>>> @@ -2158,6 +2481,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw,
>>>>> struct ixgbe_rx_queue *rxq)
>>>>> rxq->nb_rx_hold = 0;
>>>>> rxq->pkt_first_seg = NULL;
>>>>> rxq->pkt_last_seg = NULL;
>>>>> + rxq->rsc_en = 0;
>>>>> }
>>>>>
>>>>> int
>>>>> @@ -2172,6 +2496,14 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev
>>>>> *dev,
>>>>> struct ixgbe_rx_queue *rxq;
>>>>> struct ixgbe_hw *hw;
>>>>> uint16_t len;
>>>>> + struct rte_eth_dev_info dev_info = { 0 };
>>>>> + struct rte_eth_rxmode *dev_rx_mode = &dev->data->dev_conf.rxmode;
>>>>> + bool rsc_requested = false;
>>>>> +
>>>>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
>>>>> + if ((dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO) &&
>>>>> + dev_rx_mode->enable_lro)
>>>>> + rsc_requested = true;
>>>>>
>>>>> PMD_INIT_FUNC_TRACE();
>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>> @@ -2277,12 +2609,27 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev
>>>>> *dev,
>>>>> rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
>>>>> sizeof(struct ixgbe_rx_entry) * len,
>>>>> RTE_CACHE_LINE_SIZE, socket_id);
>>>>> - if (rxq->sw_ring == NULL) {
>>>>> + if (!rxq->sw_ring) {
>>>>> ixgbe_rx_queue_release(rxq);
>>>>> return (-ENOMEM);
>>>>> }
>>>>> - PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
>>>>> - rxq->sw_ring, rxq->rx_ring, rxq->rx_ring_phys_addr);
>>>>> +
>>>>> + if (rsc_requested) {
>>>>> + rxq->sw_rsc_ring =
>>>>> + rte_zmalloc_socket("rxq->sw_rsc_ring",
>>>>> + sizeof(struct ixgbe_rsc_entry) * len,
>>>>> + RTE_CACHE_LINE_SIZE, socket_id);
>>>>> + if (!rxq->sw_rsc_ring) {
>>>>> + ixgbe_rx_queue_release(rxq);
>>>>> + return (-ENOMEM);
>>>>> + }
>>>>> + } else
>>>>> + rxq->sw_rsc_ring = NULL;
>>>>> +
>>>>> + PMD_INIT_LOG(DEBUG, "sw_ring=%p sw_rsc_ring=%p hw_ring=%p "
>>>>> + "dma_addr=0x%"PRIx64,
>>>>> + rxq->sw_ring, rxq->sw_rsc_ring, rxq->rx_ring,
>>>>> + rxq->rx_ring_phys_addr);
>>>>>
>>>>> if (!rte_is_power_of_2(nb_desc)) {
>>>>> PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
>>>>> @@ -3527,6 +3874,91 @@ ixgbe_dev_mq_tx_configure(struct rte_eth_dev
>>>>> *dev)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * ixgbe_get_rscctl_maxdesc - Calculate the RSCCTL[n].MAXDESC for PF
>>>>> + *
>>>>> + * Return the RSCCTL[n].MAXDESC for 82599 and x540 PF devices
>>>>> according to the
>>>>> + * spec rev. 3.0 chapter 8.2.3.8.13.
>>>>> + *
>>>>> + * @pool Memory pool of the Rx queue
>>>>> + */
>>>>> +static inline uint32_t
>>>>> +ixgbe_get_rscctl_maxdesc(struct rte_mempool *pool)
>>>>> +{
>>>>> + struct rte_pktmbuf_pool_private *mp_priv =
>>>>> rte_mempool_get_priv(pool);
>>>>> +
>>>>> + /* MAXDESC * SRRCTL.BSIZEPKT must not exceed 64 KB minus one */
>>>>> + uint16_t maxdesc =
>>>>> + IPV4_MAX_PKT_LEN /
>>>>> + (mp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM);
>>>>> +
>>>>> + if (maxdesc >= 16)
>>>>> + return IXGBE_RSCCTL_MAXDESC_16;
>>>>> + else if (maxdesc >= 8)
>>>>> + return IXGBE_RSCCTL_MAXDESC_8;
>>>>> + else if (maxdesc >= 4)
>>>>> + return IXGBE_RSCCTL_MAXDESC_4;
>>>>> + else
>>>>> + return IXGBE_RSCCTL_MAXDESC_1;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * ixgbe_set_ivar - Setup the correct IVAR register for a
>>>>> particular MSIX
>>>>> + * interrupt
>>>>> + *
>>>>> + * (Taken from FreeBSD tree)
>>>>> + * (yes this is all very magic and confusing :)
>>>>> + *
>>>>> + * @dev port handle
>>>>> + * @entry the register array entry
>>>>> + * @vector the MSIX vector for this queue
>>>>> + * @type RX/TX/MISC
>>>>> + */
>>>>> +static void
>>>>> +ixgbe_set_ivar(struct rte_eth_dev *dev, u8 entry, u8 vector, s8 type)
>>>>> +{
>>>>> + struct ixgbe_hw *hw =
>>>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>> + u32 ivar, index;
>>>>> +
>>>>> + vector |= IXGBE_IVAR_ALLOC_VAL;
>>>>> +
>>>>> + switch (hw->mac.type) {
>>>>> +
>>>>> + case ixgbe_mac_82598EB:
>>>>> + if (type == -1)
>>>>> + entry = IXGBE_IVAR_OTHER_CAUSES_INDEX;
>>>>> + else
>>>>> + entry += (type * 64);
>>>>> + index = (entry >> 2) & 0x1F;
>>>>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index));
>>>>> + ivar &= ~(0xFF << (8 * (entry & 0x3)));
>>>>> + ivar |= (vector << (8 * (entry & 0x3)));
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar);
>>>>> + break;
>>>>> +
>>>>> + case ixgbe_mac_82599EB:
>>>>> + case ixgbe_mac_X540:
>>>>> + if (type == -1) { /* MISC IVAR */
>>>>> + index = (entry & 1) * 8;
>>>>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR_MISC);
>>>>> + ivar &= ~(0xFF << index);
>>>>> + ivar |= (vector << index);
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR_MISC, ivar);
>>>>> + } else { /* RX/TX IVARS */
>>>>> + index = (16 * (entry & 1)) + (8 * type);
>>>>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(entry >> 1));
>>>>> + ivar &= ~(0xFF << index);
>>>>> + ivar |= (vector << index);
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(entry >> 1), ivar);
>>>>> + }
>>>>> +
>>>>> + break;
>>>>> +
>>>>> + default:
>>>>> + break;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> void ixgbe_set_rx_function(struct rte_eth_dev *dev)
>>>>> {
>>>>> struct ixgbe_hw *hw =
>>>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>> @@ -3545,7 +3977,24 @@ void ixgbe_set_rx_function(struct rte_eth_dev
>>>>> *dev)
>>>>> hw->rx_vec_allowed = false;
>>>>> }
>>>>>
>>>>> - if (dev->data->scattered_rx) {
>>>>> + /*
>>>>> + * Initialize the appropriate LRO callback.
>>>>> + *
>>>>> + * If all queues satisfy the bulk allocation preconditions
>>>>> + * (hw->rx_bulk_alloc_allowed is TRUE) then we may use bulk
>>>>> allocation.
>>>>> + * Otherwise use a single allocation version.
>>>>> + */
>>>>> + if (dev->data->lro) {
>>>>> + if (hw->rx_bulk_alloc_allowed) {
>>>>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a bulk "
>>>>> + "allocation version");
>>>>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
>>>>> + } else {
>>>>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a single "
>>>>> + "allocation version");
>>>>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
>>>>> + }
>>>>> + } else if (dev->data->scattered_rx) {
>>>>> /*
>>>>> * Set the non-LRO scattered callback: there are Vector and
>>>>> * single allocation versions.
>>>>> @@ -3595,6 +4044,149 @@ void ixgbe_set_rx_function(struct
>>>>> rte_eth_dev *dev)
>>>>> }
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * ixgbe_set_rsc - configure RSC related port HW registers
>>>>> + *
>>>>> + * Configures the port's RSC related registers according to the
>>>>> 4.6.7.2 chapter
>>>>> + * of 82599 Spec (x540 configuration is virtually the same).
>>>>> + *
>>>>> + * @dev port handle
>>>>> + *
>>>>> + * Returns 0 in case of success or a non-zero error code
>>>>> + */
>>>>> +static int
>>>>> +ixgbe_set_rsc(struct rte_eth_dev *dev)
>>>>> +{
>>>>> + struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
>>>>> + struct ixgbe_hw *hw =
>>>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>> + struct rte_eth_dev_info dev_info = { 0 };
>>>>> + bool rsc_capable = false;
>>>>> + uint16_t i;
>>>>> + uint32_t rdrxctl;
>>>>> +
>>>>> + /* Sanity check */
>>>>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
>>>>> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO)
>>>>> + rsc_capable = true;
>>>>> +
>>>>> + if (!rsc_capable && rx_conf->enable_lro) {
>>>>> + PMD_INIT_LOG(CRIT, "LRO is requested on HW that doesn't "
>>>>> + "support it");
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + /* RSC global configuration (chapter 4.6.7.2.1 of 82599 Spec) */
>>>>> +
>>>>> + if (!rx_conf->hw_strip_crc && rx_conf->enable_lro) {
>>>>> + /*
>>>>> + * According to chapter of 4.6.7.2.1 of the Spec Rev.
>>>>> + * 3.0 RSC configuration requires HW CRC stripping being
>>>>> + * enabled. If user requested both HW CRC stripping off
>>>>> + * and RSC on - return an error.
>>>>> + */
>>>>> + PMD_INIT_LOG(CRIT, "LRO can't be enabled when HW CRC "
>>>>> + "is disabled");
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + /* RFCTL configuration */
>>>>> + if (rsc_capable) {
>>>>> + uint32_t rfctl = IXGBE_READ_REG(hw, IXGBE_RFCTL);
>>>>> + if (rx_conf->enable_lro)
>>>>> + /*
>>>>> + * Since NFS packets coalescing is not supported - clear
>>>>> + * RFCTL.NFSW_DIS and RFCTL.NFSR_DIS when RSC is
>>>>> + * enabled.
>>>>> + */
>>>>> + rfctl &= ~(IXGBE_RFCTL_RSC_DIS | IXGBE_RFCTL_NFSW_DIS |
>>>>> + IXGBE_RFCTL_NFSR_DIS);
>>>>> + else
>>>>> + rfctl |= IXGBE_RFCTL_RSC_DIS;
>>>>> +
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_RFCTL, rfctl);
>>>>> + }
>>>>> +
>>>>> + /* If LRO hasn't been requested - we are done here. */
>>>>> + if (!rx_conf->enable_lro)
>>>>> + return 0;
>>>>> +
>>>>> + /* Set RDRXCTL.RSCACKC bit */
>>>>> + rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
>>>>> + rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
>>>>> +
>>>>> + /* Per-queue RSC configuration (chapter 4.6.7.2.2 of 82599
>>>>> Spec) */
>>>>> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
>>>>> + struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
>>>>> + uint32_t srrctl =
>>>>> + IXGBE_READ_REG(hw, IXGBE_SRRCTL(rxq->reg_idx));
>>>>> + uint32_t rscctl =
>>>>> + IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxq->reg_idx));
>>>>> + uint32_t psrtype =
>>>>> + IXGBE_READ_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx));
>>>>> + uint32_t eitr =
>>>>> + IXGBE_READ_REG(hw, IXGBE_EITR(rxq->reg_idx));
>>>>> +
>>>>> + /*
>>>>> + * ixgbe PMD doesn't support header-split at the moment.
>>>>> + *
>>>>> + * Following the 4.6.7.2.1 chapter of the 82599/x540
>>>>> + * Spec if RSC is enabled the SRRCTL[n].BSIZEHEADER
>>>>> + * should be configured even if header split is not
>>>>> + * enabled. We will configure it 128 bytes following the
>>>>> + * recommendation in the spec.
>>>>> + */
>>>>> + srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
>>>>> + srrctl |= (128 << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
>>>>> + IXGBE_SRRCTL_BSIZEHDR_MASK;
>>>>> +
>>>>> + /*
>>>>> + * TODO: Consider setting the Receive Descriptor Minimum
>>>>> + * Threshold Size for an RSC case. This is not an obviously
>>>>> + * beneficiary option but the one worth considering...
>>>>> + */
>>>>> +
>>>>> + rscctl |= IXGBE_RSCCTL_RSCEN;
>>>>> + rscctl |= ixgbe_get_rscctl_maxdesc(rxq->mb_pool);
>>>>> + psrtype |= IXGBE_PSRTYPE_TCPHDR;
>>>>> +
>>>>> + /*
>>>>> + * RSC: Set ITR interval corresponding to 2K ints/s.
>>>>> + *
>>>>> + * Full-sized RSC aggregations for a 10Gb/s link will
>>>>> + * arrive at about 20K aggregation/s rate.
>>>>> + *
>>>>> + * 2K inst/s rate will make only 10% of the
>>>>> + * aggregations to be closed due to the interrupt timer
>>>>> + * expiration for a streaming at wire-speed case.
>>>>> + *
>>>>> + * For a sparse streaming case this setting will yield
>>>>> + * at most 500us latency for a single RSC aggregation.
>>>>> + */
>>>>> + eitr &= ~IXGBE_EITR_ITR_INT_MASK;
>>>>> + eitr |= IXGBE_EITR_INTERVAL_US(500) | IXGBE_EITR_CNT_WDIS;
>>>>> +
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxq->reg_idx), rscctl);
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
>>>>> + IXGBE_WRITE_REG(hw, IXGBE_EITR(rxq->reg_idx), eitr);
>>>>> +
>>>>> + /*
>>>>> + * RSC requires the mapping of the queue to the
>>>>> + * interrupt vector.
>>>>> + */
>>>>> + ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
>>>>> +
>>>>> + rxq->rsc_en = 1;
>>>>> + }
>>>>> +
>>>>> + dev->data->lro = 1;
>>>>> +
>>>>> + PMD_INIT_LOG(INFO, "enabling LRO mode");
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> /*
>>>>> * Initializes Receive Unit.
>>>>> */
>>>>> @@ -3615,6 +4207,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>>>> uint16_t buf_size;
>>>>> uint16_t i;
>>>>> struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
>>>>> + int rc;
>>>>>
>>>>> PMD_INIT_FUNC_TRACE();
>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>> @@ -3724,6 +4317,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>>>> RTE_PKTMBUF_HEADROOM);
>>>>> srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) &
>>>>> IXGBE_SRRCTL_BSIZEPKT_MASK);
>>>>> +
>>>>> IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>>>>>
>>>>> buf_size = (uint16_t) ((srrctl &
>>>>> IXGBE_SRRCTL_BSIZEPKT_MASK) <<
>>>>> @@ -3738,8 +4332,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>>>> if (rx_conf->enable_scatter)
>>>>> dev->data->scattered_rx = 1;
>>>>>
>>>>> - ixgbe_set_rx_function(dev);
>>>>> -
>>>>> /*
>>>>> * Device configured with multiple RX queues.
>>>>> */
>>>>> @@ -3770,6 +4362,12 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>>>> IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
>>>>> }
>>>>>
>>>>> + rc = ixgbe_set_rsc(dev);
>>>>> + if (rc)
>>>>> + return rc;
>>>>> +
>>>>> + ixgbe_set_rx_function(dev);
>>>>> +
>>>>> return 0;
>>>>> }
>>>>>
>>>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>>>> index 3937cf6..120bbbd 100644
>>>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>>>> @@ -79,6 +79,10 @@ struct ixgbe_rx_entry {
>>>>> struct rte_mbuf *mbuf; /**< mbuf associated with RX
>>>>> descriptor. */
>>>>> };
>>>>>
>>>>> +struct ixgbe_rsc_entry {
>>>>> + struct rte_mbuf *fbuf; /**< First segment of the fragmented
>>>>> packet. */
>>>>> +};
>>>>> +
>>>>> /**
>>>>> * Structure associated with each descriptor of the TX ring of a
>>>>> TX queue.
>>>>> */
>>>>> @@ -105,6 +109,7 @@ struct ixgbe_rx_queue {
>>>>> volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
>>>>> volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
>>>>> struct ixgbe_rx_entry *sw_ring; /**< address of RX software
>>>>> ring. */
>>>>> + struct ixgbe_rsc_entry *sw_rsc_ring; /**< address of RSC
>>>>> software ring. */
>>>>> struct rte_mbuf *pkt_first_seg; /**< First segment of current
>>>>> packet. */
>>>>> struct rte_mbuf *pkt_last_seg; /**< Last segment of current
>>>>> packet. */
>>>>> uint64_t mbuf_initializer; /**< value to init mbufs */
>>>>> @@ -126,6 +131,7 @@ struct ixgbe_rx_queue {
>>>>> uint8_t port_id; /**< Device port identifier. */
>>>>> uint8_t crc_len; /**< 0 if CRC stripped, 4
>>>>> otherwise. */
>>>>> uint8_t drop_en; /**< If not 0, set
>>>>> SRRCTL.Drop_En. */
>>>>> + uint8_t rsc_en; /**< If not 0, RSC is enabled. */
>>>>> uint8_t rx_deferred_start; /**< not in global dev
>>>>> start. */
>>>>> #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
>>>>> /** need to alloc dummy mbuf, for wraparound when scanning hw
>>>>> ring */
>>>>> --
>>>>> 2.1.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [dpdk-dev] [PATCH v8 0/3]: Add LRO support to ixgbe PMD
@ 2015-03-18 17:52 Vlad Zolotarov
2015-03-18 17:52 ` [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support Vlad Zolotarov
0 siblings, 1 reply; 6+ messages in thread
From: Vlad Zolotarov @ 2015-03-18 17:52 UTC (permalink / raw)
To: dev
This series adds the missing flow for enabling the LRO in the ethdev and
adds a support for this feature in the ixgbe PMD. There is a big hope that this
initiative is going to be picked up by some Intel developer that would add the LRO support
to other Intel PMDs.
The series starts with some cleanup work in the code the final patch (the actual adding of
the LRO support) is going to touch/use/change. There are still quite a few issues in the ixgbe
PMD code left but they have to be a matter of a different series and I've left a few "TODO"
remarks in the code.
The LRO ("RSC" in Intel's context) PMD completion handling code follows the same design as the
corresponding Linux and FreeBSD implementation: pass the aggregation's cluster HEAD buffer to
the NEXTP entry of the software ring till EOP is met.
HW configuration follows the corresponding specs: this feature is supported only by x540 and
82599 PF devices.
The feature has been tested with seastar TCP stack with the following configuration on Tx side:
- MTU: 400B
- 100 concurrent TCP connections.
The results were:
- Without LRO: total throughput: 0.12Gbps, coefficient of variance: 1.41%
- With LRO: total throughput: 8.21Gbps, coefficient of variance: 0.59%
This is an almost factor 80 improvement.
New in v8:
- Fixed the structs naming: igb_xxx -> ixgbe_xxx (some leftovers in PATCH2).
- Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
function - ixgbe_set_rsc().
- Added some missing macros for HW configuration.
- Styling adjustments:
- Functions names.
- Functions descriptions.
- Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
- Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
ixgbe PMD.
New in v7:
- Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
- Fixed rx_bulk_alloc_allowed and rx_vec_allowed initialization:
- Don't set them to FALSE in rte_eth_dev_stop() flow - the following
rte_eth_dev_start() will need them.
- Reset them to TRUE in rte_eth_dev_configure() and not in a probe() flow.
This will ensure the proper behaviour if port is re-configured.
- Reset the sw_ring[].mbuf entry in a bulk allocation case.
This is needed for ixgbe_rx_queue_release_mbufs().
- _recv_pkts_lro(): added the missing memory barrier before RDT update in a
non-bulk allocation case.
- Don't allow RSC when device is configured in an SR-IOV mode.
New in v6:
- Fix of the typo in the "bug fixes" series that broke the compilation caused a
minor change in this follow-up series.
New in v5:
- Split the series into "bug fixes" and "all the rest" so that the former could be
integrated into a 2.0 release.
- Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of rte_ethdev.h.
- Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
New in v4:
- Remove CONFIG_RTE_ETHDEV_LRO_SUPPORT from config/common_linuxapp.
- Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h.
- As a result of "ixgbe: check rxd number to avoid mbuf leak" (352078e8e) Vector Rx
had to get the same treatment as Rx Bulk Alloc (see PATCH4 for more details).
New in v3:
- ixgbe_rx_alloc_bufs(): Always reset refcnt of the buffers to 1. Otherwise rte_pktmbuf_free()
won't free them.
New in v2:
- Removed rte_eth_dev_data.lro_bulk_alloc and added ixgbe_hw.rx_bulk_alloc_allowed
instead.
- Unified the rx_pkt_bulk callback setting (a separate new patch).
- Fixed a few styling and spelling issues.
Vlad Zolotarov (3):
ixgbe: Cleanups
ixgbe: Code refactoring
ixgbe: Add LRO support
lib/librte_ether/rte_ethdev.h | 9 +-
lib/librte_net/rte_ip.h | 3 +
lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 766 +++++++++++++++++++++++++++++---
lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
7 files changed, 737 insertions(+), 69 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
2015-03-18 17:52 [dpdk-dev] [PATCH v8 0/3]: Add LRO support to ixgbe PMD Vlad Zolotarov
@ 2015-03-18 17:52 ` Vlad Zolotarov
2015-03-30 14:18 ` Ananyev, Konstantin
0 siblings, 1 reply; 6+ messages in thread
From: Vlad Zolotarov @ 2015-03-18 17:52 UTC (permalink / raw)
To: dev
- Only x540 and 82599 devices support LRO.
- Add the appropriate HW configuration.
- Add RSC aware rx_pkt_burst() handlers:
- Implemented bulk allocation and non-bulk allocation versions.
- Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
and to ixgbe_rx_queue.
- Use the appropriate handler when LRO is requested.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
---
New in v8:
- Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
function - ixgbe_set_rsc().
- Added some missing macros for HW configuration.
- Styling adjustments:
- Functions names.
- Functions descriptions.
- Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
- Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
ixgbe PMD.
New in v7:
- Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
- Reset the sw_ring[].mbuf entry in a bulk allocation case.
This is needed for ixgbe_rx_queue_release_mbufs().
- _recv_pkts_lro(): added the missing memory barrier before RDT update in a
non-bulk allocation case.
- Don't allow RSC when device is configured in an SR-IOV mode.
New in v5:
- Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of rte_ethdev.h.
- Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
New in v4:
- Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
New in v2:
- Removed rte_eth_dev_data.lro_bulk_alloc.
- Fixed a few styling and spelling issues.
---
lib/librte_ether/rte_ethdev.h | 9 +-
lib/librte_net/rte_ip.h | 3 +
lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 610 +++++++++++++++++++++++++++++++-
lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
7 files changed, 642 insertions(+), 8 deletions(-)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 21aa359..61dc49a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -172,6 +172,9 @@ extern "C" {
#include <stdint.h>
+/* Use this macro to check if LRO API is supported */
+#define RTE_ETHDEV_HAS_LRO_SUPPORT
+
#include <rte_log.h>
#include <rte_interrupts.h>
#include <rte_pci.h>
@@ -320,14 +323,15 @@ struct rte_eth_rxmode {
enum rte_eth_rx_mq_mode mq_mode;
uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */
uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/
- uint8_t header_split : 1, /**< Header Split enable. */
+ uint16_t header_split : 1, /**< Header Split enable. */
hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload enable. */
hw_vlan_filter : 1, /**< VLAN filter enable. */
hw_vlan_strip : 1, /**< VLAN strip enable. */
hw_vlan_extend : 1, /**< Extended VLAN enable. */
jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
- enable_scatter : 1; /**< Enable scatter packets rx handler */
+ enable_scatter : 1, /**< Enable scatter packets rx handler */
+ enable_lro : 1; /**< Enable LRO */
};
/**
@@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
uint8_t port_id; /**< Device [external] port identifier. */
uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
scattered_rx : 1, /**< RX of scattered packets is ON(1) / OFF(0) */
+ lro : 1, /**< RX LRO is ON(1) / OFF(0) */
all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
dev_started : 1; /**< Device state: STARTED(1) / STOPPED(0). */
};
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 64935d9..74c9ced 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -110,6 +110,9 @@ struct ipv4_hdr {
(((c) & 0xff) << 8) | \
((d) & 0xff))
+/** Maximal IPv4 packet length (including a header) */
+#define IPV4_MAX_PKT_LEN 65535
+
/** Internet header length mask for version_ihl field */
#define IPV4_HDR_IHL_MASK (0x0f)
/**
diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
index 9a66370..4998627 100644
--- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
+++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
@@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
#define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) * 4)) : \
(0x012300 + (((_i) - 24) * 4)))
#define IXGBE_EITR_ITR_INT_MASK 0x00000FF8
+#define IXGBE_EITR_ITR_INT_SHIFT 3
#define IXGBE_EITR_LLI_MOD 0x00008000
#define IXGBE_EITR_CNT_WDIS 0x80000000
+/* EITR Inteval is in 2048ns uinits for 1G and 10G link */
+#define IXGBE_EITR_INTERVAL_UNIT_NS 2048
+#define IXGBE_EITR_INTERVAL_US(us) \
+ (((us) * 1000 / IXGBE_EITR_INTERVAL_UNIT_NS << IXGBE_EITR_ITR_INT_SHIFT) & \
+ IXGBE_EITR_ITR_INT_MASK)
#define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at 0x900-0x960 */
#define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt causes */
#define IXGBE_EITRSEL 0x00894
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 92d75db..923692e 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
/* Clear stored conf */
dev->data->scattered_rx = 0;
+ dev->data->lro = 0;
/* Clear recorded link status */
memset(&link, 0, sizeof(link));
@@ -2013,6 +2014,16 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
DEV_RX_OFFLOAD_IPV4_CKSUM |
DEV_RX_OFFLOAD_UDP_CKSUM |
DEV_RX_OFFLOAD_TCP_CKSUM;
+
+ /*
+ * RSC is only supported by 82599 and x540 PF devices in a non-SR-IOV
+ * mode.
+ */
+ if ((hw->mac.type == ixgbe_mac_82599EB ||
+ hw->mac.type == ixgbe_mac_X540) &&
+ !RTE_ETH_DEV_SRIOV(dev).active)
+ dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
+
dev_info->tx_offload_capa =
DEV_TX_OFFLOAD_VLAN_INSERT |
DEV_TX_OFFLOAD_IPV4_CKSUM |
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index ffe3471..7050340 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -344,6 +344,11 @@ uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
+ struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
+ struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index a08ae6a..daa4c45 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -70,6 +70,7 @@
#include <rte_sctp.h>
#include <rte_string_fns.h>
#include <rte_errno.h>
+#include <rte_ip.h>
#include "ixgbe_logs.h"
#include "ixgbe/ixgbe_api.h"
@@ -1422,6 +1423,295 @@ ixgbe_fill_cluster_head_buf(
}
}
+/**
+ * ixgbe_recv_pkts_lro - receive handler for and LRO case.
+ *
+ * @rx_queue Rx queue handle
+ * @rx_pkts table of received packets
+ * @nb_pkts size of rx_pkts table
+ * @bulk_alloc if TRUE bulk allocation is used for a HW ring refilling
+ *
+ * Handles the Rx HW ring completions when RSC feature is configured. Uses an
+ * additional ring of ixgbe_rsc_entry's that will hold the relevant RSC info.
+ *
+ * We use the same logic as in Lunux and in FreeBSD ixgbe drivers:
+ * 1) When non-EOP RSC completion arrives:
+ * a) Update the HEAD of the current RSC aggregation cluster with the new
+ * segment's data length.
+ * b) Set the "next" pointer of the current segment to point to the segment
+ * at the NEXTP index.
+ * c) Pass the HEAD of RSC aggregation cluster on to the next NEXTP entry
+ * in the sw_rsc_ring.
+ * 2) When EOP arrives we just update the cluster's total length and offload
+ * flags and deliver the cluster up to the upper layers. In our case - put it
+ * in the rx_pkts table.
+ *
+ * Returns the number of received packets/clusters (according to the "bulk
+ * receive" interface).
+ */
+static inline uint16_t
+ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
+ bool bulk_alloc)
+{
+ struct ixgbe_rx_queue *rxq = rx_queue;
+ volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
+ struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
+ struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
+ uint16_t rx_id = rxq->rx_tail;
+ uint16_t nb_rx = 0;
+ uint16_t nb_hold = rxq->nb_rx_hold;
+ uint16_t prev_id = rxq->rx_tail;
+
+ while (nb_rx < nb_pkts) {
+ bool eop;
+ struct ixgbe_rx_entry *rxe;
+ struct ixgbe_rsc_entry *rsc_entry;
+ struct ixgbe_rsc_entry *next_rsc_entry;
+ struct ixgbe_rx_entry *next_rxe;
+ struct rte_mbuf *first_seg;
+ struct rte_mbuf *rxm;
+ struct rte_mbuf *nmb;
+ union ixgbe_adv_rx_desc rxd;
+ uint16_t data_len;
+ uint16_t next_id;
+ volatile union ixgbe_adv_rx_desc *rxdp;
+ uint32_t staterr;
+
+next_desc:
+ /*
+ * The code in this whole file uses the volatile pointer to
+ * ensure the read ordering of the status and the rest of the
+ * descriptor fields (on the compiler level only!!!). This is so
+ * UGLY - why not to just use the compiler barrier instead? DPDK
+ * even has the rte_compiler_barrier() for that.
+ *
+ * But most importantly this is just wrong because this doesn't
+ * ensure memory ordering in a general case at all. For
+ * instance, DPDK is supposed to work on Power CPUs where
+ * compiler barrier may just not be enough!
+ *
+ * I tried to write only this function properly to have a
+ * starting point (as a part of an LRO/RSC series) but the
+ * compiler cursed at me when I tried to cast away the
+ * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
+ * keeping it the way it is for now.
+ *
+ * The code in this file is broken in so many other places and
+ * will just not work on a big endian CPU anyway therefore the
+ * lines below will have to be revisited together with the rest
+ * of the ixgbe PMD.
+ *
+ * TODO:
+ * - Get rid of "volatile" crap and let the compiler do its
+ * job.
+ * - Use the proper memory barrier (rte_rmb()) to ensure the
+ * memory ordering below.
+ */
+ rxdp = &rx_ring[rx_id];
+ staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
+
+ if (!(staterr & IXGBE_RXDADV_STAT_DD))
+ break;
+
+ rxd = *rxdp;
+
+ PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
+ "staterr=0x%x data_len=%u",
+ rxq->port_id, rxq->queue_id, rx_id, staterr,
+ rte_le_to_cpu_16(rxd.wb.upper.length));
+
+ if (!bulk_alloc) {
+ nmb = rte_rxmbuf_alloc(rxq->mb_pool);
+ if (nmb == NULL) {
+ PMD_RX_LOG(DEBUG, "RX mbuf alloc failed "
+ "port_id=%u queue_id=%u",
+ rxq->port_id, rxq->queue_id);
+
+ rte_eth_devices[rxq->port_id].data->
+ rx_mbuf_alloc_failed++;
+ break;
+ }
+ } else if (nb_hold > rxq->rx_free_thresh) {
+ uint16_t next_rdt = rxq->rx_free_trigger;
+
+ if (!ixgbe_rx_alloc_bufs(rxq, false)) {
+ rte_wmb();
+ IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr,
+ next_rdt);
+ nb_hold -= rxq->rx_free_thresh;
+ } else {
+ PMD_RX_LOG(DEBUG, "RX bulk alloc failed "
+ "port_id=%u queue_id=%u",
+ rxq->port_id, rxq->queue_id);
+
+ rte_eth_devices[rxq->port_id].data->
+ rx_mbuf_alloc_failed++;
+ break;
+ }
+ }
+
+ nb_hold++;
+ rxe = &sw_ring[rx_id];
+ eop = staterr & IXGBE_RXDADV_STAT_EOP;
+
+ next_id = rx_id + 1;
+ if (next_id == rxq->nb_rx_desc)
+ next_id = 0;
+
+ /* Prefetch next mbuf while processing current one. */
+ rte_ixgbe_prefetch(sw_ring[next_id].mbuf);
+
+ /*
+ * When next RX descriptor is on a cache-line boundary,
+ * prefetch the next 4 RX descriptors and the next 4 pointers
+ * to mbufs.
+ */
+ if ((next_id & 0x3) == 0) {
+ rte_ixgbe_prefetch(&rx_ring[next_id]);
+ rte_ixgbe_prefetch(&sw_ring[next_id]);
+ }
+
+ rxm = rxe->mbuf;
+
+ if (!bulk_alloc) {
+ __le64 dma =
+ rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
+ /*
+ * Update RX descriptor with the physical address of the
+ * new data buffer of the new allocated mbuf.
+ */
+ rxe->mbuf = nmb;
+
+ rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxdp->read.hdr_addr = dma;
+ rxdp->read.pkt_addr = dma;
+ } else
+ rxe->mbuf = NULL;
+
+ /*
+ * Set data length & data buffer address of mbuf.
+ */
+ data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
+ rxm->data_len = data_len;
+
+ if (!eop) {
+ uint16_t nextp_id;
+ /*
+ * Get next descriptor index:
+ * - For RSC it's in the NEXTP field.
+ * - For a scattered packet - it's just a following
+ * descriptor.
+ */
+ if (ixgbe_rsc_count(&rxd))
+ nextp_id =
+ (staterr & IXGBE_RXDADV_NEXTP_MASK) >>
+ IXGBE_RXDADV_NEXTP_SHIFT;
+ else
+ nextp_id = next_id;
+
+ next_rsc_entry = &sw_rsc_ring[nextp_id];
+ next_rxe = &sw_ring[nextp_id];
+ rte_ixgbe_prefetch(next_rxe);
+ }
+
+ rsc_entry = &sw_rsc_ring[rx_id];
+ first_seg = rsc_entry->fbuf;
+ rsc_entry->fbuf = NULL;
+
+ /*
+ * If this is the first buffer of the received packet,
+ * set the pointer to the first mbuf of the packet and
+ * initialize its context.
+ * Otherwise, update the total length and the number of segments
+ * of the current scattered packet, and update the pointer to
+ * the last mbuf of the current packet.
+ */
+ if (first_seg == NULL) {
+ first_seg = rxm;
+ first_seg->pkt_len = data_len;
+ first_seg->nb_segs = 1;
+ } else {
+ first_seg->pkt_len += data_len;
+ first_seg->nb_segs++;
+ }
+
+ prev_id = rx_id;
+ rx_id = next_id;
+
+ /*
+ * If this is not the last buffer of the received packet, update
+ * the pointer to the first mbuf at the NEXTP entry in the
+ * sw_rsc_ring and continue to parse the RX ring.
+ */
+ if (!eop) {
+ rxm->next = next_rxe->mbuf;
+ next_rsc_entry->fbuf = first_seg;
+ goto next_desc;
+ }
+
+ /*
+ * This is the last buffer of the received packet - return
+ * the current cluster to the user.
+ */
+ rxm->next = NULL;
+
+ /* Initialize the first mbuf of the returned packet */
+ ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq->port_id,
+ staterr);
+
+ /* Prefetch data of first segment, if configured to do so. */
+ rte_packet_prefetch((char *)first_seg->buf_addr +
+ first_seg->data_off);
+
+ /*
+ * Store the mbuf address into the next entry of the array
+ * of returned packets.
+ */
+ rx_pkts[nb_rx++] = first_seg;
+ }
+
+ /*
+ * Record index of the next RX descriptor to probe.
+ */
+ rxq->rx_tail = rx_id;
+
+ /*
+ * If the number of free RX descriptors is greater than the RX free
+ * threshold of the queue, advance the Receive Descriptor Tail (RDT)
+ * register.
+ * Update the RDT with the value of the last processed RX descriptor
+ * minus 1, to guarantee that the RDT register is never equal to the
+ * RDH register, which creates a "full" ring situtation from the
+ * hardware point of view...
+ */
+ if (!bulk_alloc && nb_hold > rxq->rx_free_thresh) {
+ PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
+ "nb_hold=%u nb_rx=%u",
+ rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx);
+
+ rte_wmb();
+ IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id);
+ nb_hold = 0;
+ }
+
+ rxq->nb_rx_hold = nb_hold;
+ return nb_rx;
+}
+
+uint16_t
+ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts)
+{
+ return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
+}
+
+uint16_t
+ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
+ uint16_t nb_pkts)
+{
+ return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
+}
+
uint16_t
ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
@@ -2005,6 +2295,31 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
return (0);
}
+/**
+ * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
+ *
+ * The "next" pointer of the last segment of (not-yet-completed) RSC clusters
+ * in the sw_rsc_ring is not set to NULL but rather points to the next
+ * mbuf of this RSC aggregation (that has not been completed yet and still
+ * resides on the HW ring). So, instead of calling for rte_pktmbuf_free() we
+ * will just free first "nb_segs" segments of the cluster explicitly by calling
+ * an rte_pktmbuf_free_seg().
+ *
+ * @m RSC cluster head
+ */
+static void
+ixgbe_free_rsc_cluster(struct rte_mbuf *m)
+{
+ uint8_t i, nb_segs = m->nb_segs;
+ struct rte_mbuf *next_seg;
+
+ for (i = 0; i < nb_segs; i++) {
+ next_seg = m->next;
+ rte_pktmbuf_free_seg(m);
+ m = next_seg;
+ }
+}
+
static void
ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
{
@@ -2028,6 +2343,13 @@ ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
}
#endif
}
+
+ if (rxq->sw_rsc_ring)
+ for (i = 0; i < rxq->nb_rx_desc; i++)
+ if (rxq->sw_rsc_ring[i].fbuf) {
+ ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
+ rxq->sw_rsc_ring[i].fbuf = NULL;
+ }
}
static void
@@ -2036,6 +2358,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue *rxq)
if (rxq != NULL) {
ixgbe_rx_queue_release_mbufs(rxq);
rte_free(rxq->sw_ring);
+ rte_free(rxq->sw_rsc_ring);
rte_free(rxq);
}
}
@@ -2158,6 +2481,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct ixgbe_rx_queue *rxq)
rxq->nb_rx_hold = 0;
rxq->pkt_first_seg = NULL;
rxq->pkt_last_seg = NULL;
+ rxq->rsc_en = 0;
}
int
@@ -2172,6 +2496,14 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
struct ixgbe_rx_queue *rxq;
struct ixgbe_hw *hw;
uint16_t len;
+ struct rte_eth_dev_info dev_info = { 0 };
+ struct rte_eth_rxmode *dev_rx_mode = &dev->data->dev_conf.rxmode;
+ bool rsc_requested = false;
+
+ dev->dev_ops->dev_infos_get(dev, &dev_info);
+ if ((dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO) &&
+ dev_rx_mode->enable_lro)
+ rsc_requested = true;
PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -2277,12 +2609,27 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
sizeof(struct ixgbe_rx_entry) * len,
RTE_CACHE_LINE_SIZE, socket_id);
- if (rxq->sw_ring == NULL) {
+ if (!rxq->sw_ring) {
ixgbe_rx_queue_release(rxq);
return (-ENOMEM);
}
- PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
- rxq->sw_ring, rxq->rx_ring, rxq->rx_ring_phys_addr);
+
+ if (rsc_requested) {
+ rxq->sw_rsc_ring =
+ rte_zmalloc_socket("rxq->sw_rsc_ring",
+ sizeof(struct ixgbe_rsc_entry) * len,
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (!rxq->sw_rsc_ring) {
+ ixgbe_rx_queue_release(rxq);
+ return (-ENOMEM);
+ }
+ } else
+ rxq->sw_rsc_ring = NULL;
+
+ PMD_INIT_LOG(DEBUG, "sw_ring=%p sw_rsc_ring=%p hw_ring=%p "
+ "dma_addr=0x%"PRIx64,
+ rxq->sw_ring, rxq->sw_rsc_ring, rxq->rx_ring,
+ rxq->rx_ring_phys_addr);
if (!rte_is_power_of_2(nb_desc)) {
PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
@@ -3527,6 +3874,91 @@ ixgbe_dev_mq_tx_configure(struct rte_eth_dev *dev)
return 0;
}
+/**
+ * ixgbe_get_rscctl_maxdesc - Calculate the RSCCTL[n].MAXDESC for PF
+ *
+ * Return the RSCCTL[n].MAXDESC for 82599 and x540 PF devices according to the
+ * spec rev. 3.0 chapter 8.2.3.8.13.
+ *
+ * @pool Memory pool of the Rx queue
+ */
+static inline uint32_t
+ixgbe_get_rscctl_maxdesc(struct rte_mempool *pool)
+{
+ struct rte_pktmbuf_pool_private *mp_priv = rte_mempool_get_priv(pool);
+
+ /* MAXDESC * SRRCTL.BSIZEPKT must not exceed 64 KB minus one */
+ uint16_t maxdesc =
+ IPV4_MAX_PKT_LEN /
+ (mp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM);
+
+ if (maxdesc >= 16)
+ return IXGBE_RSCCTL_MAXDESC_16;
+ else if (maxdesc >= 8)
+ return IXGBE_RSCCTL_MAXDESC_8;
+ else if (maxdesc >= 4)
+ return IXGBE_RSCCTL_MAXDESC_4;
+ else
+ return IXGBE_RSCCTL_MAXDESC_1;
+}
+
+/**
+ * ixgbe_set_ivar - Setup the correct IVAR register for a particular MSIX
+ * interrupt
+ *
+ * (Taken from FreeBSD tree)
+ * (yes this is all very magic and confusing :)
+ *
+ * @dev port handle
+ * @entry the register array entry
+ * @vector the MSIX vector for this queue
+ * @type RX/TX/MISC
+ */
+static void
+ixgbe_set_ivar(struct rte_eth_dev *dev, u8 entry, u8 vector, s8 type)
+{
+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ u32 ivar, index;
+
+ vector |= IXGBE_IVAR_ALLOC_VAL;
+
+ switch (hw->mac.type) {
+
+ case ixgbe_mac_82598EB:
+ if (type == -1)
+ entry = IXGBE_IVAR_OTHER_CAUSES_INDEX;
+ else
+ entry += (type * 64);
+ index = (entry >> 2) & 0x1F;
+ ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index));
+ ivar &= ~(0xFF << (8 * (entry & 0x3)));
+ ivar |= (vector << (8 * (entry & 0x3)));
+ IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar);
+ break;
+
+ case ixgbe_mac_82599EB:
+ case ixgbe_mac_X540:
+ if (type == -1) { /* MISC IVAR */
+ index = (entry & 1) * 8;
+ ivar = IXGBE_READ_REG(hw, IXGBE_IVAR_MISC);
+ ivar &= ~(0xFF << index);
+ ivar |= (vector << index);
+ IXGBE_WRITE_REG(hw, IXGBE_IVAR_MISC, ivar);
+ } else { /* RX/TX IVARS */
+ index = (16 * (entry & 1)) + (8 * type);
+ ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(entry >> 1));
+ ivar &= ~(0xFF << index);
+ ivar |= (vector << index);
+ IXGBE_WRITE_REG(hw, IXGBE_IVAR(entry >> 1), ivar);
+ }
+
+ break;
+
+ default:
+ break;
+ }
+}
+
void ixgbe_set_rx_function(struct rte_eth_dev *dev)
{
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3545,7 +3977,24 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
hw->rx_vec_allowed = false;
}
- if (dev->data->scattered_rx) {
+ /*
+ * Initialize the appropriate LRO callback.
+ *
+ * If all queues satisfy the bulk allocation preconditions
+ * (hw->rx_bulk_alloc_allowed is TRUE) then we may use bulk allocation.
+ * Otherwise use a single allocation version.
+ */
+ if (dev->data->lro) {
+ if (hw->rx_bulk_alloc_allowed) {
+ PMD_INIT_LOG(INFO, "LRO is requested. Using a bulk "
+ "allocation version");
+ dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
+ } else {
+ PMD_INIT_LOG(INFO, "LRO is requested. Using a single "
+ "allocation version");
+ dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
+ }
+ } else if (dev->data->scattered_rx) {
/*
* Set the non-LRO scattered callback: there are Vector and
* single allocation versions.
@@ -3595,6 +4044,149 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
}
}
+/**
+ * ixgbe_set_rsc - configure RSC related port HW registers
+ *
+ * Configures the port's RSC related registers according to the 4.6.7.2 chapter
+ * of 82599 Spec (x540 configuration is virtually the same).
+ *
+ * @dev port handle
+ *
+ * Returns 0 in case of success or a non-zero error code
+ */
+static int
+ixgbe_set_rsc(struct rte_eth_dev *dev)
+{
+ struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ struct rte_eth_dev_info dev_info = { 0 };
+ bool rsc_capable = false;
+ uint16_t i;
+ uint32_t rdrxctl;
+
+ /* Sanity check */
+ dev->dev_ops->dev_infos_get(dev, &dev_info);
+ if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO)
+ rsc_capable = true;
+
+ if (!rsc_capable && rx_conf->enable_lro) {
+ PMD_INIT_LOG(CRIT, "LRO is requested on HW that doesn't "
+ "support it");
+ return -EINVAL;
+ }
+
+ /* RSC global configuration (chapter 4.6.7.2.1 of 82599 Spec) */
+
+ if (!rx_conf->hw_strip_crc && rx_conf->enable_lro) {
+ /*
+ * According to chapter of 4.6.7.2.1 of the Spec Rev.
+ * 3.0 RSC configuration requires HW CRC stripping being
+ * enabled. If user requested both HW CRC stripping off
+ * and RSC on - return an error.
+ */
+ PMD_INIT_LOG(CRIT, "LRO can't be enabled when HW CRC "
+ "is disabled");
+ return -EINVAL;
+ }
+
+ /* RFCTL configuration */
+ if (rsc_capable) {
+ uint32_t rfctl = IXGBE_READ_REG(hw, IXGBE_RFCTL);
+ if (rx_conf->enable_lro)
+ /*
+ * Since NFS packets coalescing is not supported - clear
+ * RFCTL.NFSW_DIS and RFCTL.NFSR_DIS when RSC is
+ * enabled.
+ */
+ rfctl &= ~(IXGBE_RFCTL_RSC_DIS | IXGBE_RFCTL_NFSW_DIS |
+ IXGBE_RFCTL_NFSR_DIS);
+ else
+ rfctl |= IXGBE_RFCTL_RSC_DIS;
+
+ IXGBE_WRITE_REG(hw, IXGBE_RFCTL, rfctl);
+ }
+
+ /* If LRO hasn't been requested - we are done here. */
+ if (!rx_conf->enable_lro)
+ return 0;
+
+ /* Set RDRXCTL.RSCACKC bit */
+ rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
+ rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
+ IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
+
+ /* Per-queue RSC configuration (chapter 4.6.7.2.2 of 82599 Spec) */
+ for (i = 0; i < dev->data->nb_rx_queues; i++) {
+ struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
+ uint32_t srrctl =
+ IXGBE_READ_REG(hw, IXGBE_SRRCTL(rxq->reg_idx));
+ uint32_t rscctl =
+ IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxq->reg_idx));
+ uint32_t psrtype =
+ IXGBE_READ_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx));
+ uint32_t eitr =
+ IXGBE_READ_REG(hw, IXGBE_EITR(rxq->reg_idx));
+
+ /*
+ * ixgbe PMD doesn't support header-split at the moment.
+ *
+ * Following the 4.6.7.2.1 chapter of the 82599/x540
+ * Spec if RSC is enabled the SRRCTL[n].BSIZEHEADER
+ * should be configured even if header split is not
+ * enabled. We will configure it 128 bytes following the
+ * recommendation in the spec.
+ */
+ srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
+ srrctl |= (128 << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
+ IXGBE_SRRCTL_BSIZEHDR_MASK;
+
+ /*
+ * TODO: Consider setting the Receive Descriptor Minimum
+ * Threshold Size for an RSC case. This is not an obviously
+ * beneficiary option but the one worth considering...
+ */
+
+ rscctl |= IXGBE_RSCCTL_RSCEN;
+ rscctl |= ixgbe_get_rscctl_maxdesc(rxq->mb_pool);
+ psrtype |= IXGBE_PSRTYPE_TCPHDR;
+
+ /*
+ * RSC: Set ITR interval corresponding to 2K ints/s.
+ *
+ * Full-sized RSC aggregations for a 10Gb/s link will
+ * arrive at about 20K aggregation/s rate.
+ *
+ * 2K inst/s rate will make only 10% of the
+ * aggregations to be closed due to the interrupt timer
+ * expiration for a streaming at wire-speed case.
+ *
+ * For a sparse streaming case this setting will yield
+ * at most 500us latency for a single RSC aggregation.
+ */
+ eitr &= ~IXGBE_EITR_ITR_INT_MASK;
+ eitr |= IXGBE_EITR_INTERVAL_US(500) | IXGBE_EITR_CNT_WDIS;
+
+ IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
+ IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxq->reg_idx), rscctl);
+ IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
+ IXGBE_WRITE_REG(hw, IXGBE_EITR(rxq->reg_idx), eitr);
+
+ /*
+ * RSC requires the mapping of the queue to the
+ * interrupt vector.
+ */
+ ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
+
+ rxq->rsc_en = 1;
+ }
+
+ dev->data->lro = 1;
+
+ PMD_INIT_LOG(INFO, "enabling LRO mode");
+
+ return 0;
+}
+
/*
* Initializes Receive Unit.
*/
@@ -3615,6 +4207,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint16_t buf_size;
uint16_t i;
struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
+ int rc;
PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3724,6 +4317,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
RTE_PKTMBUF_HEADROOM);
srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) &
IXGBE_SRRCTL_BSIZEPKT_MASK);
+
IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
buf_size = (uint16_t) ((srrctl & IXGBE_SRRCTL_BSIZEPKT_MASK) <<
@@ -3738,8 +4332,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
if (rx_conf->enable_scatter)
dev->data->scattered_rx = 1;
- ixgbe_set_rx_function(dev);
-
/*
* Device configured with multiple RX queues.
*/
@@ -3770,6 +4362,12 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
}
+ rc = ixgbe_set_rsc(dev);
+ if (rc)
+ return rc;
+
+ ixgbe_set_rx_function(dev);
+
return 0;
}
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
index 3937cf6..120bbbd 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
@@ -79,6 +79,10 @@ struct ixgbe_rx_entry {
struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
};
+struct ixgbe_rsc_entry {
+ struct rte_mbuf *fbuf; /**< First segment of the fragmented packet. */
+};
+
/**
* Structure associated with each descriptor of the TX ring of a TX queue.
*/
@@ -105,6 +109,7 @@ struct ixgbe_rx_queue {
volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
struct ixgbe_rx_entry *sw_ring; /**< address of RX software ring. */
+ struct ixgbe_rsc_entry *sw_rsc_ring; /**< address of RSC software ring. */
struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
uint64_t mbuf_initializer; /**< value to init mbufs */
@@ -126,6 +131,7 @@ struct ixgbe_rx_queue {
uint8_t port_id; /**< Device port identifier. */
uint8_t crc_len; /**< 0 if CRC stripped, 4 otherwise. */
uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */
+ uint8_t rsc_en; /**< If not 0, RSC is enabled. */
uint8_t rx_deferred_start; /**< not in global dev start. */
#ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
/** need to alloc dummy mbuf, for wraparound when scanning hw ring */
--
2.1.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
2015-03-18 17:52 ` [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support Vlad Zolotarov
@ 2015-03-30 14:18 ` Ananyev, Konstantin
2015-03-30 15:37 ` Vlad Zolotarov
0 siblings, 1 reply; 6+ messages in thread
From: Ananyev, Konstantin @ 2015-03-30 14:18 UTC (permalink / raw)
To: Vlad Zolotarov, dev
Hi Vlad,
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov
> Sent: Wednesday, March 18, 2015 5:52 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>
> - Only x540 and 82599 devices support LRO.
> - Add the appropriate HW configuration.
> - Add RSC aware rx_pkt_burst() handlers:
> - Implemented bulk allocation and non-bulk allocation versions.
> - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
> and to ixgbe_rx_queue.
> - Use the appropriate handler when LRO is requested.
>
> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
> ---
> New in v8:
> - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
> function - ixgbe_set_rsc().
> - Added some missing macros for HW configuration.
> - Styling adjustments:
> - Functions names.
> - Functions descriptions.
> - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
> ixgbe PMD.
>
> New in v7:
> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
> This is needed for ixgbe_rx_queue_release_mbufs().
> - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
> non-bulk allocation case.
> - Don't allow RSC when device is configured in an SR-IOV mode.
>
> New in v5:
> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of rte_ethdev.h.
> - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
>
> New in v4:
> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
> RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>
> New in v2:
> - Removed rte_eth_dev_data.lro_bulk_alloc.
> - Fixed a few styling and spelling issues.
> ---
> lib/librte_ether/rte_ethdev.h | 9 +-
> lib/librte_net/rte_ip.h | 3 +
> lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
> lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
> lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 610 +++++++++++++++++++++++++++++++-
> lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
> 7 files changed, 642 insertions(+), 8 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 21aa359..61dc49a 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -172,6 +172,9 @@ extern "C" {
>
> #include <stdint.h>
>
> +/* Use this macro to check if LRO API is supported */
> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
> +
> #include <rte_log.h>
> #include <rte_interrupts.h>
> #include <rte_pci.h>
> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
> enum rte_eth_rx_mq_mode mq_mode;
> uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */
> uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/
> - uint8_t header_split : 1, /**< Header Split enable. */
> + uint16_t header_split : 1, /**< Header Split enable. */
> hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload enable. */
> hw_vlan_filter : 1, /**< VLAN filter enable. */
> hw_vlan_strip : 1, /**< VLAN strip enable. */
> hw_vlan_extend : 1, /**< Extended VLAN enable. */
> jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
> hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
> - enable_scatter : 1; /**< Enable scatter packets rx handler */
> + enable_scatter : 1, /**< Enable scatter packets rx handler */
> + enable_lro : 1; /**< Enable LRO */
> };
>
> /**
> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
> uint8_t port_id; /**< Device [external] port identifier. */
> uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
> scattered_rx : 1, /**< RX of scattered packets is ON(1) / OFF(0) */
> + lro : 1, /**< RX LRO is ON(1) / OFF(0) */
> all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
> dev_started : 1; /**< Device state: STARTED(1) / STOPPED(0). */
> };
> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
> index 64935d9..74c9ced 100644
> --- a/lib/librte_net/rte_ip.h
> +++ b/lib/librte_net/rte_ip.h
> @@ -110,6 +110,9 @@ struct ipv4_hdr {
> (((c) & 0xff) << 8) | \
> ((d) & 0xff))
>
> +/** Maximal IPv4 packet length (including a header) */
> +#define IPV4_MAX_PKT_LEN 65535
> +
> /** Internet header length mask for version_ihl field */
> #define IPV4_HDR_IHL_MASK (0x0f)
> /**
> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> index 9a66370..4998627 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
> @@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
> #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) * 4)) : \
> (0x012300 + (((_i) - 24) * 4)))
> #define IXGBE_EITR_ITR_INT_MASK 0x00000FF8
> +#define IXGBE_EITR_ITR_INT_SHIFT 3
> #define IXGBE_EITR_LLI_MOD 0x00008000
> #define IXGBE_EITR_CNT_WDIS 0x80000000
> +/* EITR Inteval is in 2048ns uinits for 1G and 10G link */
> +#define IXGBE_EITR_INTERVAL_UNIT_NS 2048
Why is that? Why not just 2us as 82599 spec says?
And then:
+#define IXGBE_EITR_INTERVAL_US(us) (((us) / IXGBE_EITR_INTERVAL_UNIT_US << IXGBE_EITR_ITR_INT_SHIFT) & IXGBE_EITR_ITR_INT_MASK)
?
> +#define IXGBE_EITR_INTERVAL_US(us) \
> + (((us) * 1000 / IXGBE_EITR_INTERVAL_UNIT_NS << IXGBE_EITR_ITR_INT_SHIFT) & \
> + IXGBE_EITR_ITR_INT_MASK)
Unfortunately we are not supposed to change files under ixgbe subfirectory (except ixgbe_osdep.*).
These files comes from ixgbe shared code and we treat them as read-only.
You need to move these new macros you added into ixgbe_ethedev.h.
All other things look good to me.
Konstantin
> #define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at 0x900-0x960 */
> #define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt causes */
> #define IXGBE_EITRSEL 0x00894
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 92d75db..923692e 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>
> /* Clear stored conf */
> dev->data->scattered_rx = 0;
> + dev->data->lro = 0;
>
> /* Clear recorded link status */
> memset(&link, 0, sizeof(link));
> @@ -2013,6 +2014,16 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> DEV_RX_OFFLOAD_IPV4_CKSUM |
> DEV_RX_OFFLOAD_UDP_CKSUM |
> DEV_RX_OFFLOAD_TCP_CKSUM;
> +
> + /*
> + * RSC is only supported by 82599 and x540 PF devices in a non-SR-IOV
> + * mode.
> + */
> + if ((hw->mac.type == ixgbe_mac_82599EB ||
> + hw->mac.type == ixgbe_mac_X540) &&
> + !RTE_ETH_DEV_SRIOV(dev).active)
> + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
> +
> dev_info->tx_offload_capa =
> DEV_TX_OFFLOAD_VLAN_INSERT |
> DEV_TX_OFFLOAD_IPV4_CKSUM |
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> index ffe3471..7050340 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
> @@ -344,6 +344,11 @@ uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
> struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>
> +uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> +uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> +
> uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> uint16_t nb_pkts);
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index a08ae6a..daa4c45 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -70,6 +70,7 @@
> #include <rte_sctp.h>
> #include <rte_string_fns.h>
> #include <rte_errno.h>
> +#include <rte_ip.h>
>
> #include "ixgbe_logs.h"
> #include "ixgbe/ixgbe_api.h"
> @@ -1422,6 +1423,295 @@ ixgbe_fill_cluster_head_buf(
> }
> }
>
> +/**
> + * ixgbe_recv_pkts_lro - receive handler for and LRO case.
> + *
> + * @rx_queue Rx queue handle
> + * @rx_pkts table of received packets
> + * @nb_pkts size of rx_pkts table
> + * @bulk_alloc if TRUE bulk allocation is used for a HW ring refilling
> + *
> + * Handles the Rx HW ring completions when RSC feature is configured. Uses an
> + * additional ring of ixgbe_rsc_entry's that will hold the relevant RSC info.
> + *
> + * We use the same logic as in Lunux and in FreeBSD ixgbe drivers:
> + * 1) When non-EOP RSC completion arrives:
> + * a) Update the HEAD of the current RSC aggregation cluster with the new
> + * segment's data length.
> + * b) Set the "next" pointer of the current segment to point to the segment
> + * at the NEXTP index.
> + * c) Pass the HEAD of RSC aggregation cluster on to the next NEXTP entry
> + * in the sw_rsc_ring.
> + * 2) When EOP arrives we just update the cluster's total length and offload
> + * flags and deliver the cluster up to the upper layers. In our case - put it
> + * in the rx_pkts table.
> + *
> + * Returns the number of received packets/clusters (according to the "bulk
> + * receive" interface).
> + */
> +static inline uint16_t
> +ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
> + bool bulk_alloc)
> +{
> + struct ixgbe_rx_queue *rxq = rx_queue;
> + volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
> + struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
> + struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
> + uint16_t rx_id = rxq->rx_tail;
> + uint16_t nb_rx = 0;
> + uint16_t nb_hold = rxq->nb_rx_hold;
> + uint16_t prev_id = rxq->rx_tail;
> +
> + while (nb_rx < nb_pkts) {
> + bool eop;
> + struct ixgbe_rx_entry *rxe;
> + struct ixgbe_rsc_entry *rsc_entry;
> + struct ixgbe_rsc_entry *next_rsc_entry;
> + struct ixgbe_rx_entry *next_rxe;
> + struct rte_mbuf *first_seg;
> + struct rte_mbuf *rxm;
> + struct rte_mbuf *nmb;
> + union ixgbe_adv_rx_desc rxd;
> + uint16_t data_len;
> + uint16_t next_id;
> + volatile union ixgbe_adv_rx_desc *rxdp;
> + uint32_t staterr;
> +
> +next_desc:
> + /*
> + * The code in this whole file uses the volatile pointer to
> + * ensure the read ordering of the status and the rest of the
> + * descriptor fields (on the compiler level only!!!). This is so
> + * UGLY - why not to just use the compiler barrier instead? DPDK
> + * even has the rte_compiler_barrier() for that.
> + *
> + * But most importantly this is just wrong because this doesn't
> + * ensure memory ordering in a general case at all. For
> + * instance, DPDK is supposed to work on Power CPUs where
> + * compiler barrier may just not be enough!
> + *
> + * I tried to write only this function properly to have a
> + * starting point (as a part of an LRO/RSC series) but the
> + * compiler cursed at me when I tried to cast away the
> + * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
> + * keeping it the way it is for now.
> + *
> + * The code in this file is broken in so many other places and
> + * will just not work on a big endian CPU anyway therefore the
> + * lines below will have to be revisited together with the rest
> + * of the ixgbe PMD.
> + *
> + * TODO:
> + * - Get rid of "volatile" crap and let the compiler do its
> + * job.
> + * - Use the proper memory barrier (rte_rmb()) to ensure the
> + * memory ordering below.
> + */
> + rxdp = &rx_ring[rx_id];
> + staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
> +
> + if (!(staterr & IXGBE_RXDADV_STAT_DD))
> + break;
> +
> + rxd = *rxdp;
> +
> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> + "staterr=0x%x data_len=%u",
> + rxq->port_id, rxq->queue_id, rx_id, staterr,
> + rte_le_to_cpu_16(rxd.wb.upper.length));
> +
> + if (!bulk_alloc) {
> + nmb = rte_rxmbuf_alloc(rxq->mb_pool);
> + if (nmb == NULL) {
> + PMD_RX_LOG(DEBUG, "RX mbuf alloc failed "
> + "port_id=%u queue_id=%u",
> + rxq->port_id, rxq->queue_id);
> +
> + rte_eth_devices[rxq->port_id].data->
> + rx_mbuf_alloc_failed++;
> + break;
> + }
> + } else if (nb_hold > rxq->rx_free_thresh) {
> + uint16_t next_rdt = rxq->rx_free_trigger;
> +
> + if (!ixgbe_rx_alloc_bufs(rxq, false)) {
> + rte_wmb();
> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr,
> + next_rdt);
> + nb_hold -= rxq->rx_free_thresh;
> + } else {
> + PMD_RX_LOG(DEBUG, "RX bulk alloc failed "
> + "port_id=%u queue_id=%u",
> + rxq->port_id, rxq->queue_id);
> +
> + rte_eth_devices[rxq->port_id].data->
> + rx_mbuf_alloc_failed++;
> + break;
> + }
> + }
> +
> + nb_hold++;
> + rxe = &sw_ring[rx_id];
> + eop = staterr & IXGBE_RXDADV_STAT_EOP;
> +
> + next_id = rx_id + 1;
> + if (next_id == rxq->nb_rx_desc)
> + next_id = 0;
> +
> + /* Prefetch next mbuf while processing current one. */
> + rte_ixgbe_prefetch(sw_ring[next_id].mbuf);
> +
> + /*
> + * When next RX descriptor is on a cache-line boundary,
> + * prefetch the next 4 RX descriptors and the next 4 pointers
> + * to mbufs.
> + */
> + if ((next_id & 0x3) == 0) {
> + rte_ixgbe_prefetch(&rx_ring[next_id]);
> + rte_ixgbe_prefetch(&sw_ring[next_id]);
> + }
> +
> + rxm = rxe->mbuf;
> +
> + if (!bulk_alloc) {
> + __le64 dma =
> + rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
> + /*
> + * Update RX descriptor with the physical address of the
> + * new data buffer of the new allocated mbuf.
> + */
> + rxe->mbuf = nmb;
> +
> + rxm->data_off = RTE_PKTMBUF_HEADROOM;
> + rxdp->read.hdr_addr = dma;
> + rxdp->read.pkt_addr = dma;
> + } else
> + rxe->mbuf = NULL;
> +
> + /*
> + * Set data length & data buffer address of mbuf.
> + */
> + data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
> + rxm->data_len = data_len;
> +
> + if (!eop) {
> + uint16_t nextp_id;
> + /*
> + * Get next descriptor index:
> + * - For RSC it's in the NEXTP field.
> + * - For a scattered packet - it's just a following
> + * descriptor.
> + */
> + if (ixgbe_rsc_count(&rxd))
> + nextp_id =
> + (staterr & IXGBE_RXDADV_NEXTP_MASK) >>
> + IXGBE_RXDADV_NEXTP_SHIFT;
> + else
> + nextp_id = next_id;
> +
> + next_rsc_entry = &sw_rsc_ring[nextp_id];
> + next_rxe = &sw_ring[nextp_id];
> + rte_ixgbe_prefetch(next_rxe);
> + }
> +
> + rsc_entry = &sw_rsc_ring[rx_id];
> + first_seg = rsc_entry->fbuf;
> + rsc_entry->fbuf = NULL;
> +
> + /*
> + * If this is the first buffer of the received packet,
> + * set the pointer to the first mbuf of the packet and
> + * initialize its context.
> + * Otherwise, update the total length and the number of segments
> + * of the current scattered packet, and update the pointer to
> + * the last mbuf of the current packet.
> + */
> + if (first_seg == NULL) {
> + first_seg = rxm;
> + first_seg->pkt_len = data_len;
> + first_seg->nb_segs = 1;
> + } else {
> + first_seg->pkt_len += data_len;
> + first_seg->nb_segs++;
> + }
> +
> + prev_id = rx_id;
> + rx_id = next_id;
> +
> + /*
> + * If this is not the last buffer of the received packet, update
> + * the pointer to the first mbuf at the NEXTP entry in the
> + * sw_rsc_ring and continue to parse the RX ring.
> + */
> + if (!eop) {
> + rxm->next = next_rxe->mbuf;
> + next_rsc_entry->fbuf = first_seg;
> + goto next_desc;
> + }
> +
> + /*
> + * This is the last buffer of the received packet - return
> + * the current cluster to the user.
> + */
> + rxm->next = NULL;
> +
> + /* Initialize the first mbuf of the returned packet */
> + ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq->port_id,
> + staterr);
> +
> + /* Prefetch data of first segment, if configured to do so. */
> + rte_packet_prefetch((char *)first_seg->buf_addr +
> + first_seg->data_off);
> +
> + /*
> + * Store the mbuf address into the next entry of the array
> + * of returned packets.
> + */
> + rx_pkts[nb_rx++] = first_seg;
> + }
> +
> + /*
> + * Record index of the next RX descriptor to probe.
> + */
> + rxq->rx_tail = rx_id;
> +
> + /*
> + * If the number of free RX descriptors is greater than the RX free
> + * threshold of the queue, advance the Receive Descriptor Tail (RDT)
> + * register.
> + * Update the RDT with the value of the last processed RX descriptor
> + * minus 1, to guarantee that the RDT register is never equal to the
> + * RDH register, which creates a "full" ring situtation from the
> + * hardware point of view...
> + */
> + if (!bulk_alloc && nb_hold > rxq->rx_free_thresh) {
> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
> + "nb_hold=%u nb_rx=%u",
> + rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx);
> +
> + rte_wmb();
> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id);
> + nb_hold = 0;
> + }
> +
> + rxq->nb_rx_hold = nb_hold;
> + return nb_rx;
> +}
> +
> +uint16_t
> +ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
> + uint16_t nb_pkts)
> +{
> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
> +}
> +
> +uint16_t
> +ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
> + uint16_t nb_pkts)
> +{
> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
> +}
> +
> uint16_t
> ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t nb_pkts)
> @@ -2005,6 +2295,31 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
> return (0);
> }
>
> +/**
> + * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
> + *
> + * The "next" pointer of the last segment of (not-yet-completed) RSC clusters
> + * in the sw_rsc_ring is not set to NULL but rather points to the next
> + * mbuf of this RSC aggregation (that has not been completed yet and still
> + * resides on the HW ring). So, instead of calling for rte_pktmbuf_free() we
> + * will just free first "nb_segs" segments of the cluster explicitly by calling
> + * an rte_pktmbuf_free_seg().
> + *
> + * @m RSC cluster head
> + */
> +static void
> +ixgbe_free_rsc_cluster(struct rte_mbuf *m)
> +{
> + uint8_t i, nb_segs = m->nb_segs;
> + struct rte_mbuf *next_seg;
> +
> + for (i = 0; i < nb_segs; i++) {
> + next_seg = m->next;
> + rte_pktmbuf_free_seg(m);
> + m = next_seg;
> + }
> +}
> +
> static void
> ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
> {
> @@ -2028,6 +2343,13 @@ ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
> }
> #endif
> }
> +
> + if (rxq->sw_rsc_ring)
> + for (i = 0; i < rxq->nb_rx_desc; i++)
> + if (rxq->sw_rsc_ring[i].fbuf) {
> + ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
> + rxq->sw_rsc_ring[i].fbuf = NULL;
> + }
> }
>
> static void
> @@ -2036,6 +2358,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue *rxq)
> if (rxq != NULL) {
> ixgbe_rx_queue_release_mbufs(rxq);
> rte_free(rxq->sw_ring);
> + rte_free(rxq->sw_rsc_ring);
> rte_free(rxq);
> }
> }
> @@ -2158,6 +2481,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct ixgbe_rx_queue *rxq)
> rxq->nb_rx_hold = 0;
> rxq->pkt_first_seg = NULL;
> rxq->pkt_last_seg = NULL;
> + rxq->rsc_en = 0;
> }
>
> int
> @@ -2172,6 +2496,14 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
> struct ixgbe_rx_queue *rxq;
> struct ixgbe_hw *hw;
> uint16_t len;
> + struct rte_eth_dev_info dev_info = { 0 };
> + struct rte_eth_rxmode *dev_rx_mode = &dev->data->dev_conf.rxmode;
> + bool rsc_requested = false;
> +
> + dev->dev_ops->dev_infos_get(dev, &dev_info);
> + if ((dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO) &&
> + dev_rx_mode->enable_lro)
> + rsc_requested = true;
>
> PMD_INIT_FUNC_TRACE();
> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> @@ -2277,12 +2609,27 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
> rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
> sizeof(struct ixgbe_rx_entry) * len,
> RTE_CACHE_LINE_SIZE, socket_id);
> - if (rxq->sw_ring == NULL) {
> + if (!rxq->sw_ring) {
> ixgbe_rx_queue_release(rxq);
> return (-ENOMEM);
> }
> - PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
> - rxq->sw_ring, rxq->rx_ring, rxq->rx_ring_phys_addr);
> +
> + if (rsc_requested) {
> + rxq->sw_rsc_ring =
> + rte_zmalloc_socket("rxq->sw_rsc_ring",
> + sizeof(struct ixgbe_rsc_entry) * len,
> + RTE_CACHE_LINE_SIZE, socket_id);
> + if (!rxq->sw_rsc_ring) {
> + ixgbe_rx_queue_release(rxq);
> + return (-ENOMEM);
> + }
> + } else
> + rxq->sw_rsc_ring = NULL;
> +
> + PMD_INIT_LOG(DEBUG, "sw_ring=%p sw_rsc_ring=%p hw_ring=%p "
> + "dma_addr=0x%"PRIx64,
> + rxq->sw_ring, rxq->sw_rsc_ring, rxq->rx_ring,
> + rxq->rx_ring_phys_addr);
>
> if (!rte_is_power_of_2(nb_desc)) {
> PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
> @@ -3527,6 +3874,91 @@ ixgbe_dev_mq_tx_configure(struct rte_eth_dev *dev)
> return 0;
> }
>
> +/**
> + * ixgbe_get_rscctl_maxdesc - Calculate the RSCCTL[n].MAXDESC for PF
> + *
> + * Return the RSCCTL[n].MAXDESC for 82599 and x540 PF devices according to the
> + * spec rev. 3.0 chapter 8.2.3.8.13.
> + *
> + * @pool Memory pool of the Rx queue
> + */
> +static inline uint32_t
> +ixgbe_get_rscctl_maxdesc(struct rte_mempool *pool)
> +{
> + struct rte_pktmbuf_pool_private *mp_priv = rte_mempool_get_priv(pool);
> +
> + /* MAXDESC * SRRCTL.BSIZEPKT must not exceed 64 KB minus one */
> + uint16_t maxdesc =
> + IPV4_MAX_PKT_LEN /
> + (mp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM);
> +
> + if (maxdesc >= 16)
> + return IXGBE_RSCCTL_MAXDESC_16;
> + else if (maxdesc >= 8)
> + return IXGBE_RSCCTL_MAXDESC_8;
> + else if (maxdesc >= 4)
> + return IXGBE_RSCCTL_MAXDESC_4;
> + else
> + return IXGBE_RSCCTL_MAXDESC_1;
> +}
> +
> +/**
> + * ixgbe_set_ivar - Setup the correct IVAR register for a particular MSIX
> + * interrupt
> + *
> + * (Taken from FreeBSD tree)
> + * (yes this is all very magic and confusing :)
> + *
> + * @dev port handle
> + * @entry the register array entry
> + * @vector the MSIX vector for this queue
> + * @type RX/TX/MISC
> + */
> +static void
> +ixgbe_set_ivar(struct rte_eth_dev *dev, u8 entry, u8 vector, s8 type)
> +{
> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> + u32 ivar, index;
> +
> + vector |= IXGBE_IVAR_ALLOC_VAL;
> +
> + switch (hw->mac.type) {
> +
> + case ixgbe_mac_82598EB:
> + if (type == -1)
> + entry = IXGBE_IVAR_OTHER_CAUSES_INDEX;
> + else
> + entry += (type * 64);
> + index = (entry >> 2) & 0x1F;
> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index));
> + ivar &= ~(0xFF << (8 * (entry & 0x3)));
> + ivar |= (vector << (8 * (entry & 0x3)));
> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar);
> + break;
> +
> + case ixgbe_mac_82599EB:
> + case ixgbe_mac_X540:
> + if (type == -1) { /* MISC IVAR */
> + index = (entry & 1) * 8;
> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR_MISC);
> + ivar &= ~(0xFF << index);
> + ivar |= (vector << index);
> + IXGBE_WRITE_REG(hw, IXGBE_IVAR_MISC, ivar);
> + } else { /* RX/TX IVARS */
> + index = (16 * (entry & 1)) + (8 * type);
> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(entry >> 1));
> + ivar &= ~(0xFF << index);
> + ivar |= (vector << index);
> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(entry >> 1), ivar);
> + }
> +
> + break;
> +
> + default:
> + break;
> + }
> +}
> +
> void ixgbe_set_rx_function(struct rte_eth_dev *dev)
> {
> struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> @@ -3545,7 +3977,24 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
> hw->rx_vec_allowed = false;
> }
>
> - if (dev->data->scattered_rx) {
> + /*
> + * Initialize the appropriate LRO callback.
> + *
> + * If all queues satisfy the bulk allocation preconditions
> + * (hw->rx_bulk_alloc_allowed is TRUE) then we may use bulk allocation.
> + * Otherwise use a single allocation version.
> + */
> + if (dev->data->lro) {
> + if (hw->rx_bulk_alloc_allowed) {
> + PMD_INIT_LOG(INFO, "LRO is requested. Using a bulk "
> + "allocation version");
> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
> + } else {
> + PMD_INIT_LOG(INFO, "LRO is requested. Using a single "
> + "allocation version");
> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
> + }
> + } else if (dev->data->scattered_rx) {
> /*
> * Set the non-LRO scattered callback: there are Vector and
> * single allocation versions.
> @@ -3595,6 +4044,149 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
> }
> }
>
> +/**
> + * ixgbe_set_rsc - configure RSC related port HW registers
> + *
> + * Configures the port's RSC related registers according to the 4.6.7.2 chapter
> + * of 82599 Spec (x540 configuration is virtually the same).
> + *
> + * @dev port handle
> + *
> + * Returns 0 in case of success or a non-zero error code
> + */
> +static int
> +ixgbe_set_rsc(struct rte_eth_dev *dev)
> +{
> + struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> + struct rte_eth_dev_info dev_info = { 0 };
> + bool rsc_capable = false;
> + uint16_t i;
> + uint32_t rdrxctl;
> +
> + /* Sanity check */
> + dev->dev_ops->dev_infos_get(dev, &dev_info);
> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO)
> + rsc_capable = true;
> +
> + if (!rsc_capable && rx_conf->enable_lro) {
> + PMD_INIT_LOG(CRIT, "LRO is requested on HW that doesn't "
> + "support it");
> + return -EINVAL;
> + }
> +
> + /* RSC global configuration (chapter 4.6.7.2.1 of 82599 Spec) */
> +
> + if (!rx_conf->hw_strip_crc && rx_conf->enable_lro) {
> + /*
> + * According to chapter of 4.6.7.2.1 of the Spec Rev.
> + * 3.0 RSC configuration requires HW CRC stripping being
> + * enabled. If user requested both HW CRC stripping off
> + * and RSC on - return an error.
> + */
> + PMD_INIT_LOG(CRIT, "LRO can't be enabled when HW CRC "
> + "is disabled");
> + return -EINVAL;
> + }
> +
> + /* RFCTL configuration */
> + if (rsc_capable) {
> + uint32_t rfctl = IXGBE_READ_REG(hw, IXGBE_RFCTL);
> + if (rx_conf->enable_lro)
> + /*
> + * Since NFS packets coalescing is not supported - clear
> + * RFCTL.NFSW_DIS and RFCTL.NFSR_DIS when RSC is
> + * enabled.
> + */
> + rfctl &= ~(IXGBE_RFCTL_RSC_DIS | IXGBE_RFCTL_NFSW_DIS |
> + IXGBE_RFCTL_NFSR_DIS);
> + else
> + rfctl |= IXGBE_RFCTL_RSC_DIS;
> +
> + IXGBE_WRITE_REG(hw, IXGBE_RFCTL, rfctl);
> + }
> +
> + /* If LRO hasn't been requested - we are done here. */
> + if (!rx_conf->enable_lro)
> + return 0;
> +
> + /* Set RDRXCTL.RSCACKC bit */
> + rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
> + rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
> + IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
> +
> + /* Per-queue RSC configuration (chapter 4.6.7.2.2 of 82599 Spec) */
> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
> + struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
> + uint32_t srrctl =
> + IXGBE_READ_REG(hw, IXGBE_SRRCTL(rxq->reg_idx));
> + uint32_t rscctl =
> + IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxq->reg_idx));
> + uint32_t psrtype =
> + IXGBE_READ_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx));
> + uint32_t eitr =
> + IXGBE_READ_REG(hw, IXGBE_EITR(rxq->reg_idx));
> +
> + /*
> + * ixgbe PMD doesn't support header-split at the moment.
> + *
> + * Following the 4.6.7.2.1 chapter of the 82599/x540
> + * Spec if RSC is enabled the SRRCTL[n].BSIZEHEADER
> + * should be configured even if header split is not
> + * enabled. We will configure it 128 bytes following the
> + * recommendation in the spec.
> + */
> + srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
> + srrctl |= (128 << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
> + IXGBE_SRRCTL_BSIZEHDR_MASK;
> +
> + /*
> + * TODO: Consider setting the Receive Descriptor Minimum
> + * Threshold Size for an RSC case. This is not an obviously
> + * beneficiary option but the one worth considering...
> + */
> +
> + rscctl |= IXGBE_RSCCTL_RSCEN;
> + rscctl |= ixgbe_get_rscctl_maxdesc(rxq->mb_pool);
> + psrtype |= IXGBE_PSRTYPE_TCPHDR;
> +
> + /*
> + * RSC: Set ITR interval corresponding to 2K ints/s.
> + *
> + * Full-sized RSC aggregations for a 10Gb/s link will
> + * arrive at about 20K aggregation/s rate.
> + *
> + * 2K inst/s rate will make only 10% of the
> + * aggregations to be closed due to the interrupt timer
> + * expiration for a streaming at wire-speed case.
> + *
> + * For a sparse streaming case this setting will yield
> + * at most 500us latency for a single RSC aggregation.
> + */
> + eitr &= ~IXGBE_EITR_ITR_INT_MASK;
> + eitr |= IXGBE_EITR_INTERVAL_US(500) | IXGBE_EITR_CNT_WDIS;
> +
> + IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
> + IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxq->reg_idx), rscctl);
> + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
> + IXGBE_WRITE_REG(hw, IXGBE_EITR(rxq->reg_idx), eitr);
> +
> + /*
> + * RSC requires the mapping of the queue to the
> + * interrupt vector.
> + */
> + ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
> +
> + rxq->rsc_en = 1;
> + }
> +
> + dev->data->lro = 1;
> +
> + PMD_INIT_LOG(INFO, "enabling LRO mode");
> +
> + return 0;
> +}
> +
> /*
> * Initializes Receive Unit.
> */
> @@ -3615,6 +4207,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> uint16_t buf_size;
> uint16_t i;
> struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
> + int rc;
>
> PMD_INIT_FUNC_TRACE();
> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> @@ -3724,6 +4317,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> RTE_PKTMBUF_HEADROOM);
> srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) &
> IXGBE_SRRCTL_BSIZEPKT_MASK);
> +
> IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>
> buf_size = (uint16_t) ((srrctl & IXGBE_SRRCTL_BSIZEPKT_MASK) <<
> @@ -3738,8 +4332,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> if (rx_conf->enable_scatter)
> dev->data->scattered_rx = 1;
>
> - ixgbe_set_rx_function(dev);
> -
> /*
> * Device configured with multiple RX queues.
> */
> @@ -3770,6 +4362,12 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
> IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
> }
>
> + rc = ixgbe_set_rsc(dev);
> + if (rc)
> + return rc;
> +
> + ixgbe_set_rx_function(dev);
> +
> return 0;
> }
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> index 3937cf6..120bbbd 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
> @@ -79,6 +79,10 @@ struct ixgbe_rx_entry {
> struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
> };
>
> +struct ixgbe_rsc_entry {
> + struct rte_mbuf *fbuf; /**< First segment of the fragmented packet. */
> +};
> +
> /**
> * Structure associated with each descriptor of the TX ring of a TX queue.
> */
> @@ -105,6 +109,7 @@ struct ixgbe_rx_queue {
> volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
> volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
> struct ixgbe_rx_entry *sw_ring; /**< address of RX software ring. */
> + struct ixgbe_rsc_entry *sw_rsc_ring; /**< address of RSC software ring. */
> struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
> struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
> uint64_t mbuf_initializer; /**< value to init mbufs */
> @@ -126,6 +131,7 @@ struct ixgbe_rx_queue {
> uint8_t port_id; /**< Device port identifier. */
> uint8_t crc_len; /**< 0 if CRC stripped, 4 otherwise. */
> uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */
> + uint8_t rsc_en; /**< If not 0, RSC is enabled. */
> uint8_t rx_deferred_start; /**< not in global dev start. */
> #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
> /** need to alloc dummy mbuf, for wraparound when scanning hw ring */
> --
> 2.1.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
2015-03-30 14:18 ` Ananyev, Konstantin
@ 2015-03-30 15:37 ` Vlad Zolotarov
2015-03-30 15:57 ` Vlad Zolotarov
0 siblings, 1 reply; 6+ messages in thread
From: Vlad Zolotarov @ 2015-03-30 15:37 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
On 03/30/15 17:18, Ananyev, Konstantin wrote:
> Hi Vlad,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov
>> Sent: Wednesday, March 18, 2015 5:52 PM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>
>> - Only x540 and 82599 devices support LRO.
>> - Add the appropriate HW configuration.
>> - Add RSC aware rx_pkt_burst() handlers:
>> - Implemented bulk allocation and non-bulk allocation versions.
>> - Add LRO-specific fields to rte_eth_rxmode, to rte_eth_dev_data
>> and to ixgbe_rx_queue.
>> - Use the appropriate handler when LRO is requested.
>>
>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
>> ---
>> New in v8:
>> - Took the RSC configuration code from ixgbe_dev_rx_init() into a separate
>> function - ixgbe_set_rsc().
>> - Added some missing macros for HW configuration.
>> - Styling adjustments:
>> - Functions names.
>> - Functions descriptions.
>> - Reworked the ixgbe_free_rsc_cluster() code to make it more readable.
>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not supported by
>> ixgbe PMD.
>>
>> New in v7:
>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop() flow.
>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>> This is needed for ixgbe_rx_queue_release_mbufs().
>> - _recv_pkts_lro(): added the missing memory barrier before RDT update in a
>> non-bulk allocation case.
>> - Don't allow RSC when device is configured in an SR-IOV mode.
>>
>> New in v5:
>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning of rte_ethdev.h.
>> - Removed the "TODO: Remove me" comment near RTE_ETHDEV_HAS_LRO_SUPPORT.
>>
>> New in v4:
>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>> RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>
>> New in v2:
>> - Removed rte_eth_dev_data.lro_bulk_alloc.
>> - Fixed a few styling and spelling issues.
>> ---
>> lib/librte_ether/rte_ethdev.h | 9 +-
>> lib/librte_net/rte_ip.h | 3 +
>> lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
>> lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
>> lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 610 +++++++++++++++++++++++++++++++-
>> lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
>> 7 files changed, 642 insertions(+), 8 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index 21aa359..61dc49a 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -172,6 +172,9 @@ extern "C" {
>>
>> #include <stdint.h>
>>
>> +/* Use this macro to check if LRO API is supported */
>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>> +
>> #include <rte_log.h>
>> #include <rte_interrupts.h>
>> #include <rte_pci.h>
>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>> enum rte_eth_rx_mq_mode mq_mode;
>> uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame enabled. */
>> uint16_t split_hdr_size; /**< hdr buf size (header_split enabled).*/
>> - uint8_t header_split : 1, /**< Header Split enable. */
>> + uint16_t header_split : 1, /**< Header Split enable. */
>> hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload enable. */
>> hw_vlan_filter : 1, /**< VLAN filter enable. */
>> hw_vlan_strip : 1, /**< VLAN strip enable. */
>> hw_vlan_extend : 1, /**< Extended VLAN enable. */
>> jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
>> hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
>> - enable_scatter : 1; /**< Enable scatter packets rx handler */
>> + enable_scatter : 1, /**< Enable scatter packets rx handler */
>> + enable_lro : 1; /**< Enable LRO */
>> };
>>
>> /**
>> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
>> uint8_t port_id; /**< Device [external] port identifier. */
>> uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) / OFF(0). */
>> scattered_rx : 1, /**< RX of scattered packets is ON(1) / OFF(0) */
>> + lro : 1, /**< RX LRO is ON(1) / OFF(0) */
>> all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
>> dev_started : 1; /**< Device state: STARTED(1) / STOPPED(0). */
>> };
>> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
>> index 64935d9..74c9ced 100644
>> --- a/lib/librte_net/rte_ip.h
>> +++ b/lib/librte_net/rte_ip.h
>> @@ -110,6 +110,9 @@ struct ipv4_hdr {
>> (((c) & 0xff) << 8) | \
>> ((d) & 0xff))
>>
>> +/** Maximal IPv4 packet length (including a header) */
>> +#define IPV4_MAX_PKT_LEN 65535
>> +
>> /** Internet header length mask for version_ihl field */
>> #define IPV4_HDR_IHL_MASK (0x0f)
>> /**
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>> index 9a66370..4998627 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>> @@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
>> #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) * 4)) : \
>> (0x012300 + (((_i) - 24) * 4)))
>> #define IXGBE_EITR_ITR_INT_MASK 0x00000FF8
>> +#define IXGBE_EITR_ITR_INT_SHIFT 3
>> #define IXGBE_EITR_LLI_MOD 0x00008000
>> #define IXGBE_EITR_CNT_WDIS 0x80000000
>> +/* EITR Inteval is in 2048ns uinits for 1G and 10G link */
>> +#define IXGBE_EITR_INTERVAL_UNIT_NS 2048
> Why is that? Why not just 2us as 82599 spec says?
> And then:
> +#define IXGBE_EITR_INTERVAL_US(us) (((us) / IXGBE_EITR_INTERVAL_UNIT_US << IXGBE_EITR_ITR_INT_SHIFT) & IXGBE_EITR_ITR_INT_MASK)
> ?
Because spec doesn't say 2us but rather 2.048us - see 8.2.4.5.16 chapter
of x540 spec.
>
>> +#define IXGBE_EITR_INTERVAL_US(us) \
>> + (((us) * 1000 / IXGBE_EITR_INTERVAL_UNIT_NS << IXGBE_EITR_ITR_INT_SHIFT) & \
>> + IXGBE_EITR_ITR_INT_MASK)
>
> Unfortunately we are not supposed to change files under ixgbe subfirectory (except ixgbe_osdep.*).
> These files comes from ixgbe shared code and we treat them as read-only.
> You need to move these new macros you added into ixgbe_ethedev.h.
Aha... Ok. No problem.
>
> All other things look good to me.
> Konstantin
>
>> #define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at 0x900-0x960 */
>> #define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt causes */
>> #define IXGBE_EITRSEL 0x00894
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> index 92d75db..923692e 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>> @@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>
>> /* Clear stored conf */
>> dev->data->scattered_rx = 0;
>> + dev->data->lro = 0;
>>
>> /* Clear recorded link status */
>> memset(&link, 0, sizeof(link));
>> @@ -2013,6 +2014,16 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>> DEV_RX_OFFLOAD_IPV4_CKSUM |
>> DEV_RX_OFFLOAD_UDP_CKSUM |
>> DEV_RX_OFFLOAD_TCP_CKSUM;
>> +
>> + /*
>> + * RSC is only supported by 82599 and x540 PF devices in a non-SR-IOV
>> + * mode.
>> + */
>> + if ((hw->mac.type == ixgbe_mac_82599EB ||
>> + hw->mac.type == ixgbe_mac_X540) &&
>> + !RTE_ETH_DEV_SRIOV(dev).active)
>> + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
>> +
>> dev_info->tx_offload_capa =
>> DEV_TX_OFFLOAD_VLAN_INSERT |
>> DEV_TX_OFFLOAD_IPV4_CKSUM |
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>> index ffe3471..7050340 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>> @@ -344,6 +344,11 @@ uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>> uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
>> struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>
>> +uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>> +uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>> +
>> uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>> uint16_t nb_pkts);
>>
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> index a08ae6a..daa4c45 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> @@ -70,6 +70,7 @@
>> #include <rte_sctp.h>
>> #include <rte_string_fns.h>
>> #include <rte_errno.h>
>> +#include <rte_ip.h>
>>
>> #include "ixgbe_logs.h"
>> #include "ixgbe/ixgbe_api.h"
>> @@ -1422,6 +1423,295 @@ ixgbe_fill_cluster_head_buf(
>> }
>> }
>>
>> +/**
>> + * ixgbe_recv_pkts_lro - receive handler for and LRO case.
>> + *
>> + * @rx_queue Rx queue handle
>> + * @rx_pkts table of received packets
>> + * @nb_pkts size of rx_pkts table
>> + * @bulk_alloc if TRUE bulk allocation is used for a HW ring refilling
>> + *
>> + * Handles the Rx HW ring completions when RSC feature is configured. Uses an
>> + * additional ring of ixgbe_rsc_entry's that will hold the relevant RSC info.
>> + *
>> + * We use the same logic as in Lunux and in FreeBSD ixgbe drivers:
>> + * 1) When non-EOP RSC completion arrives:
>> + * a) Update the HEAD of the current RSC aggregation cluster with the new
>> + * segment's data length.
>> + * b) Set the "next" pointer of the current segment to point to the segment
>> + * at the NEXTP index.
>> + * c) Pass the HEAD of RSC aggregation cluster on to the next NEXTP entry
>> + * in the sw_rsc_ring.
>> + * 2) When EOP arrives we just update the cluster's total length and offload
>> + * flags and deliver the cluster up to the upper layers. In our case - put it
>> + * in the rx_pkts table.
>> + *
>> + * Returns the number of received packets/clusters (according to the "bulk
>> + * receive" interface).
>> + */
>> +static inline uint16_t
>> +ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
>> + bool bulk_alloc)
>> +{
>> + struct ixgbe_rx_queue *rxq = rx_queue;
>> + volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
>> + struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
>> + struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
>> + uint16_t rx_id = rxq->rx_tail;
>> + uint16_t nb_rx = 0;
>> + uint16_t nb_hold = rxq->nb_rx_hold;
>> + uint16_t prev_id = rxq->rx_tail;
>> +
>> + while (nb_rx < nb_pkts) {
>> + bool eop;
>> + struct ixgbe_rx_entry *rxe;
>> + struct ixgbe_rsc_entry *rsc_entry;
>> + struct ixgbe_rsc_entry *next_rsc_entry;
>> + struct ixgbe_rx_entry *next_rxe;
>> + struct rte_mbuf *first_seg;
>> + struct rte_mbuf *rxm;
>> + struct rte_mbuf *nmb;
>> + union ixgbe_adv_rx_desc rxd;
>> + uint16_t data_len;
>> + uint16_t next_id;
>> + volatile union ixgbe_adv_rx_desc *rxdp;
>> + uint32_t staterr;
>> +
>> +next_desc:
>> + /*
>> + * The code in this whole file uses the volatile pointer to
>> + * ensure the read ordering of the status and the rest of the
>> + * descriptor fields (on the compiler level only!!!). This is so
>> + * UGLY - why not to just use the compiler barrier instead? DPDK
>> + * even has the rte_compiler_barrier() for that.
>> + *
>> + * But most importantly this is just wrong because this doesn't
>> + * ensure memory ordering in a general case at all. For
>> + * instance, DPDK is supposed to work on Power CPUs where
>> + * compiler barrier may just not be enough!
>> + *
>> + * I tried to write only this function properly to have a
>> + * starting point (as a part of an LRO/RSC series) but the
>> + * compiler cursed at me when I tried to cast away the
>> + * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
>> + * keeping it the way it is for now.
>> + *
>> + * The code in this file is broken in so many other places and
>> + * will just not work on a big endian CPU anyway therefore the
>> + * lines below will have to be revisited together with the rest
>> + * of the ixgbe PMD.
>> + *
>> + * TODO:
>> + * - Get rid of "volatile" crap and let the compiler do its
>> + * job.
>> + * - Use the proper memory barrier (rte_rmb()) to ensure the
>> + * memory ordering below.
>> + */
>> + rxdp = &rx_ring[rx_id];
>> + staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
>> +
>> + if (!(staterr & IXGBE_RXDADV_STAT_DD))
>> + break;
>> +
>> + rxd = *rxdp;
>> +
>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
>> + "staterr=0x%x data_len=%u",
>> + rxq->port_id, rxq->queue_id, rx_id, staterr,
>> + rte_le_to_cpu_16(rxd.wb.upper.length));
>> +
>> + if (!bulk_alloc) {
>> + nmb = rte_rxmbuf_alloc(rxq->mb_pool);
>> + if (nmb == NULL) {
>> + PMD_RX_LOG(DEBUG, "RX mbuf alloc failed "
>> + "port_id=%u queue_id=%u",
>> + rxq->port_id, rxq->queue_id);
>> +
>> + rte_eth_devices[rxq->port_id].data->
>> + rx_mbuf_alloc_failed++;
>> + break;
>> + }
>> + } else if (nb_hold > rxq->rx_free_thresh) {
>> + uint16_t next_rdt = rxq->rx_free_trigger;
>> +
>> + if (!ixgbe_rx_alloc_bufs(rxq, false)) {
>> + rte_wmb();
>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr,
>> + next_rdt);
>> + nb_hold -= rxq->rx_free_thresh;
>> + } else {
>> + PMD_RX_LOG(DEBUG, "RX bulk alloc failed "
>> + "port_id=%u queue_id=%u",
>> + rxq->port_id, rxq->queue_id);
>> +
>> + rte_eth_devices[rxq->port_id].data->
>> + rx_mbuf_alloc_failed++;
>> + break;
>> + }
>> + }
>> +
>> + nb_hold++;
>> + rxe = &sw_ring[rx_id];
>> + eop = staterr & IXGBE_RXDADV_STAT_EOP;
>> +
>> + next_id = rx_id + 1;
>> + if (next_id == rxq->nb_rx_desc)
>> + next_id = 0;
>> +
>> + /* Prefetch next mbuf while processing current one. */
>> + rte_ixgbe_prefetch(sw_ring[next_id].mbuf);
>> +
>> + /*
>> + * When next RX descriptor is on a cache-line boundary,
>> + * prefetch the next 4 RX descriptors and the next 4 pointers
>> + * to mbufs.
>> + */
>> + if ((next_id & 0x3) == 0) {
>> + rte_ixgbe_prefetch(&rx_ring[next_id]);
>> + rte_ixgbe_prefetch(&sw_ring[next_id]);
>> + }
>> +
>> + rxm = rxe->mbuf;
>> +
>> + if (!bulk_alloc) {
>> + __le64 dma =
>> + rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>> + /*
>> + * Update RX descriptor with the physical address of the
>> + * new data buffer of the new allocated mbuf.
>> + */
>> + rxe->mbuf = nmb;
>> +
>> + rxm->data_off = RTE_PKTMBUF_HEADROOM;
>> + rxdp->read.hdr_addr = dma;
>> + rxdp->read.pkt_addr = dma;
>> + } else
>> + rxe->mbuf = NULL;
>> +
>> + /*
>> + * Set data length & data buffer address of mbuf.
>> + */
>> + data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
>> + rxm->data_len = data_len;
>> +
>> + if (!eop) {
>> + uint16_t nextp_id;
>> + /*
>> + * Get next descriptor index:
>> + * - For RSC it's in the NEXTP field.
>> + * - For a scattered packet - it's just a following
>> + * descriptor.
>> + */
>> + if (ixgbe_rsc_count(&rxd))
>> + nextp_id =
>> + (staterr & IXGBE_RXDADV_NEXTP_MASK) >>
>> + IXGBE_RXDADV_NEXTP_SHIFT;
>> + else
>> + nextp_id = next_id;
>> +
>> + next_rsc_entry = &sw_rsc_ring[nextp_id];
>> + next_rxe = &sw_ring[nextp_id];
>> + rte_ixgbe_prefetch(next_rxe);
>> + }
>> +
>> + rsc_entry = &sw_rsc_ring[rx_id];
>> + first_seg = rsc_entry->fbuf;
>> + rsc_entry->fbuf = NULL;
>> +
>> + /*
>> + * If this is the first buffer of the received packet,
>> + * set the pointer to the first mbuf of the packet and
>> + * initialize its context.
>> + * Otherwise, update the total length and the number of segments
>> + * of the current scattered packet, and update the pointer to
>> + * the last mbuf of the current packet.
>> + */
>> + if (first_seg == NULL) {
>> + first_seg = rxm;
>> + first_seg->pkt_len = data_len;
>> + first_seg->nb_segs = 1;
>> + } else {
>> + first_seg->pkt_len += data_len;
>> + first_seg->nb_segs++;
>> + }
>> +
>> + prev_id = rx_id;
>> + rx_id = next_id;
>> +
>> + /*
>> + * If this is not the last buffer of the received packet, update
>> + * the pointer to the first mbuf at the NEXTP entry in the
>> + * sw_rsc_ring and continue to parse the RX ring.
>> + */
>> + if (!eop) {
>> + rxm->next = next_rxe->mbuf;
>> + next_rsc_entry->fbuf = first_seg;
>> + goto next_desc;
>> + }
>> +
>> + /*
>> + * This is the last buffer of the received packet - return
>> + * the current cluster to the user.
>> + */
>> + rxm->next = NULL;
>> +
>> + /* Initialize the first mbuf of the returned packet */
>> + ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq->port_id,
>> + staterr);
>> +
>> + /* Prefetch data of first segment, if configured to do so. */
>> + rte_packet_prefetch((char *)first_seg->buf_addr +
>> + first_seg->data_off);
>> +
>> + /*
>> + * Store the mbuf address into the next entry of the array
>> + * of returned packets.
>> + */
>> + rx_pkts[nb_rx++] = first_seg;
>> + }
>> +
>> + /*
>> + * Record index of the next RX descriptor to probe.
>> + */
>> + rxq->rx_tail = rx_id;
>> +
>> + /*
>> + * If the number of free RX descriptors is greater than the RX free
>> + * threshold of the queue, advance the Receive Descriptor Tail (RDT)
>> + * register.
>> + * Update the RDT with the value of the last processed RX descriptor
>> + * minus 1, to guarantee that the RDT register is never equal to the
>> + * RDH register, which creates a "full" ring situtation from the
>> + * hardware point of view...
>> + */
>> + if (!bulk_alloc && nb_hold > rxq->rx_free_thresh) {
>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
>> + "nb_hold=%u nb_rx=%u",
>> + rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx);
>> +
>> + rte_wmb();
>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id);
>> + nb_hold = 0;
>> + }
>> +
>> + rxq->nb_rx_hold = nb_hold;
>> + return nb_rx;
>> +}
>> +
>> +uint16_t
>> +ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
>> + uint16_t nb_pkts)
>> +{
>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
>> +}
>> +
>> +uint16_t
>> +ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
>> + uint16_t nb_pkts)
>> +{
>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
>> +}
>> +
>> uint16_t
>> ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>> uint16_t nb_pkts)
>> @@ -2005,6 +2295,31 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
>> return (0);
>> }
>>
>> +/**
>> + * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
>> + *
>> + * The "next" pointer of the last segment of (not-yet-completed) RSC clusters
>> + * in the sw_rsc_ring is not set to NULL but rather points to the next
>> + * mbuf of this RSC aggregation (that has not been completed yet and still
>> + * resides on the HW ring). So, instead of calling for rte_pktmbuf_free() we
>> + * will just free first "nb_segs" segments of the cluster explicitly by calling
>> + * an rte_pktmbuf_free_seg().
>> + *
>> + * @m RSC cluster head
>> + */
>> +static void
>> +ixgbe_free_rsc_cluster(struct rte_mbuf *m)
>> +{
>> + uint8_t i, nb_segs = m->nb_segs;
>> + struct rte_mbuf *next_seg;
>> +
>> + for (i = 0; i < nb_segs; i++) {
>> + next_seg = m->next;
>> + rte_pktmbuf_free_seg(m);
>> + m = next_seg;
>> + }
>> +}
>> +
>> static void
>> ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
>> {
>> @@ -2028,6 +2343,13 @@ ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
>> }
>> #endif
>> }
>> +
>> + if (rxq->sw_rsc_ring)
>> + for (i = 0; i < rxq->nb_rx_desc; i++)
>> + if (rxq->sw_rsc_ring[i].fbuf) {
>> + ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
>> + rxq->sw_rsc_ring[i].fbuf = NULL;
>> + }
>> }
>>
>> static void
>> @@ -2036,6 +2358,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue *rxq)
>> if (rxq != NULL) {
>> ixgbe_rx_queue_release_mbufs(rxq);
>> rte_free(rxq->sw_ring);
>> + rte_free(rxq->sw_rsc_ring);
>> rte_free(rxq);
>> }
>> }
>> @@ -2158,6 +2481,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw, struct ixgbe_rx_queue *rxq)
>> rxq->nb_rx_hold = 0;
>> rxq->pkt_first_seg = NULL;
>> rxq->pkt_last_seg = NULL;
>> + rxq->rsc_en = 0;
>> }
>>
>> int
>> @@ -2172,6 +2496,14 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
>> struct ixgbe_rx_queue *rxq;
>> struct ixgbe_hw *hw;
>> uint16_t len;
>> + struct rte_eth_dev_info dev_info = { 0 };
>> + struct rte_eth_rxmode *dev_rx_mode = &dev->data->dev_conf.rxmode;
>> + bool rsc_requested = false;
>> +
>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
>> + if ((dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO) &&
>> + dev_rx_mode->enable_lro)
>> + rsc_requested = true;
>>
>> PMD_INIT_FUNC_TRACE();
>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>> @@ -2277,12 +2609,27 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
>> rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
>> sizeof(struct ixgbe_rx_entry) * len,
>> RTE_CACHE_LINE_SIZE, socket_id);
>> - if (rxq->sw_ring == NULL) {
>> + if (!rxq->sw_ring) {
>> ixgbe_rx_queue_release(rxq);
>> return (-ENOMEM);
>> }
>> - PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
>> - rxq->sw_ring, rxq->rx_ring, rxq->rx_ring_phys_addr);
>> +
>> + if (rsc_requested) {
>> + rxq->sw_rsc_ring =
>> + rte_zmalloc_socket("rxq->sw_rsc_ring",
>> + sizeof(struct ixgbe_rsc_entry) * len,
>> + RTE_CACHE_LINE_SIZE, socket_id);
>> + if (!rxq->sw_rsc_ring) {
>> + ixgbe_rx_queue_release(rxq);
>> + return (-ENOMEM);
>> + }
>> + } else
>> + rxq->sw_rsc_ring = NULL;
>> +
>> + PMD_INIT_LOG(DEBUG, "sw_ring=%p sw_rsc_ring=%p hw_ring=%p "
>> + "dma_addr=0x%"PRIx64,
>> + rxq->sw_ring, rxq->sw_rsc_ring, rxq->rx_ring,
>> + rxq->rx_ring_phys_addr);
>>
>> if (!rte_is_power_of_2(nb_desc)) {
>> PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
>> @@ -3527,6 +3874,91 @@ ixgbe_dev_mq_tx_configure(struct rte_eth_dev *dev)
>> return 0;
>> }
>>
>> +/**
>> + * ixgbe_get_rscctl_maxdesc - Calculate the RSCCTL[n].MAXDESC for PF
>> + *
>> + * Return the RSCCTL[n].MAXDESC for 82599 and x540 PF devices according to the
>> + * spec rev. 3.0 chapter 8.2.3.8.13.
>> + *
>> + * @pool Memory pool of the Rx queue
>> + */
>> +static inline uint32_t
>> +ixgbe_get_rscctl_maxdesc(struct rte_mempool *pool)
>> +{
>> + struct rte_pktmbuf_pool_private *mp_priv = rte_mempool_get_priv(pool);
>> +
>> + /* MAXDESC * SRRCTL.BSIZEPKT must not exceed 64 KB minus one */
>> + uint16_t maxdesc =
>> + IPV4_MAX_PKT_LEN /
>> + (mp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM);
>> +
>> + if (maxdesc >= 16)
>> + return IXGBE_RSCCTL_MAXDESC_16;
>> + else if (maxdesc >= 8)
>> + return IXGBE_RSCCTL_MAXDESC_8;
>> + else if (maxdesc >= 4)
>> + return IXGBE_RSCCTL_MAXDESC_4;
>> + else
>> + return IXGBE_RSCCTL_MAXDESC_1;
>> +}
>> +
>> +/**
>> + * ixgbe_set_ivar - Setup the correct IVAR register for a particular MSIX
>> + * interrupt
>> + *
>> + * (Taken from FreeBSD tree)
>> + * (yes this is all very magic and confusing :)
>> + *
>> + * @dev port handle
>> + * @entry the register array entry
>> + * @vector the MSIX vector for this queue
>> + * @type RX/TX/MISC
>> + */
>> +static void
>> +ixgbe_set_ivar(struct rte_eth_dev *dev, u8 entry, u8 vector, s8 type)
>> +{
>> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>> + u32 ivar, index;
>> +
>> + vector |= IXGBE_IVAR_ALLOC_VAL;
>> +
>> + switch (hw->mac.type) {
>> +
>> + case ixgbe_mac_82598EB:
>> + if (type == -1)
>> + entry = IXGBE_IVAR_OTHER_CAUSES_INDEX;
>> + else
>> + entry += (type * 64);
>> + index = (entry >> 2) & 0x1F;
>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index));
>> + ivar &= ~(0xFF << (8 * (entry & 0x3)));
>> + ivar |= (vector << (8 * (entry & 0x3)));
>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar);
>> + break;
>> +
>> + case ixgbe_mac_82599EB:
>> + case ixgbe_mac_X540:
>> + if (type == -1) { /* MISC IVAR */
>> + index = (entry & 1) * 8;
>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR_MISC);
>> + ivar &= ~(0xFF << index);
>> + ivar |= (vector << index);
>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR_MISC, ivar);
>> + } else { /* RX/TX IVARS */
>> + index = (16 * (entry & 1)) + (8 * type);
>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(entry >> 1));
>> + ivar &= ~(0xFF << index);
>> + ivar |= (vector << index);
>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(entry >> 1), ivar);
>> + }
>> +
>> + break;
>> +
>> + default:
>> + break;
>> + }
>> +}
>> +
>> void ixgbe_set_rx_function(struct rte_eth_dev *dev)
>> {
>> struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>> @@ -3545,7 +3977,24 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
>> hw->rx_vec_allowed = false;
>> }
>>
>> - if (dev->data->scattered_rx) {
>> + /*
>> + * Initialize the appropriate LRO callback.
>> + *
>> + * If all queues satisfy the bulk allocation preconditions
>> + * (hw->rx_bulk_alloc_allowed is TRUE) then we may use bulk allocation.
>> + * Otherwise use a single allocation version.
>> + */
>> + if (dev->data->lro) {
>> + if (hw->rx_bulk_alloc_allowed) {
>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a bulk "
>> + "allocation version");
>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
>> + } else {
>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a single "
>> + "allocation version");
>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
>> + }
>> + } else if (dev->data->scattered_rx) {
>> /*
>> * Set the non-LRO scattered callback: there are Vector and
>> * single allocation versions.
>> @@ -3595,6 +4044,149 @@ void ixgbe_set_rx_function(struct rte_eth_dev *dev)
>> }
>> }
>>
>> +/**
>> + * ixgbe_set_rsc - configure RSC related port HW registers
>> + *
>> + * Configures the port's RSC related registers according to the 4.6.7.2 chapter
>> + * of 82599 Spec (x540 configuration is virtually the same).
>> + *
>> + * @dev port handle
>> + *
>> + * Returns 0 in case of success or a non-zero error code
>> + */
>> +static int
>> +ixgbe_set_rsc(struct rte_eth_dev *dev)
>> +{
>> + struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
>> + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>> + struct rte_eth_dev_info dev_info = { 0 };
>> + bool rsc_capable = false;
>> + uint16_t i;
>> + uint32_t rdrxctl;
>> +
>> + /* Sanity check */
>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
>> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO)
>> + rsc_capable = true;
>> +
>> + if (!rsc_capable && rx_conf->enable_lro) {
>> + PMD_INIT_LOG(CRIT, "LRO is requested on HW that doesn't "
>> + "support it");
>> + return -EINVAL;
>> + }
>> +
>> + /* RSC global configuration (chapter 4.6.7.2.1 of 82599 Spec) */
>> +
>> + if (!rx_conf->hw_strip_crc && rx_conf->enable_lro) {
>> + /*
>> + * According to chapter of 4.6.7.2.1 of the Spec Rev.
>> + * 3.0 RSC configuration requires HW CRC stripping being
>> + * enabled. If user requested both HW CRC stripping off
>> + * and RSC on - return an error.
>> + */
>> + PMD_INIT_LOG(CRIT, "LRO can't be enabled when HW CRC "
>> + "is disabled");
>> + return -EINVAL;
>> + }
>> +
>> + /* RFCTL configuration */
>> + if (rsc_capable) {
>> + uint32_t rfctl = IXGBE_READ_REG(hw, IXGBE_RFCTL);
>> + if (rx_conf->enable_lro)
>> + /*
>> + * Since NFS packets coalescing is not supported - clear
>> + * RFCTL.NFSW_DIS and RFCTL.NFSR_DIS when RSC is
>> + * enabled.
>> + */
>> + rfctl &= ~(IXGBE_RFCTL_RSC_DIS | IXGBE_RFCTL_NFSW_DIS |
>> + IXGBE_RFCTL_NFSR_DIS);
>> + else
>> + rfctl |= IXGBE_RFCTL_RSC_DIS;
>> +
>> + IXGBE_WRITE_REG(hw, IXGBE_RFCTL, rfctl);
>> + }
>> +
>> + /* If LRO hasn't been requested - we are done here. */
>> + if (!rx_conf->enable_lro)
>> + return 0;
>> +
>> + /* Set RDRXCTL.RSCACKC bit */
>> + rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
>> + rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
>> + IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
>> +
>> + /* Per-queue RSC configuration (chapter 4.6.7.2.2 of 82599 Spec) */
>> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
>> + struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
>> + uint32_t srrctl =
>> + IXGBE_READ_REG(hw, IXGBE_SRRCTL(rxq->reg_idx));
>> + uint32_t rscctl =
>> + IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxq->reg_idx));
>> + uint32_t psrtype =
>> + IXGBE_READ_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx));
>> + uint32_t eitr =
>> + IXGBE_READ_REG(hw, IXGBE_EITR(rxq->reg_idx));
>> +
>> + /*
>> + * ixgbe PMD doesn't support header-split at the moment.
>> + *
>> + * Following the 4.6.7.2.1 chapter of the 82599/x540
>> + * Spec if RSC is enabled the SRRCTL[n].BSIZEHEADER
>> + * should be configured even if header split is not
>> + * enabled. We will configure it 128 bytes following the
>> + * recommendation in the spec.
>> + */
>> + srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
>> + srrctl |= (128 << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
>> + IXGBE_SRRCTL_BSIZEHDR_MASK;
>> +
>> + /*
>> + * TODO: Consider setting the Receive Descriptor Minimum
>> + * Threshold Size for an RSC case. This is not an obviously
>> + * beneficiary option but the one worth considering...
>> + */
>> +
>> + rscctl |= IXGBE_RSCCTL_RSCEN;
>> + rscctl |= ixgbe_get_rscctl_maxdesc(rxq->mb_pool);
>> + psrtype |= IXGBE_PSRTYPE_TCPHDR;
>> +
>> + /*
>> + * RSC: Set ITR interval corresponding to 2K ints/s.
>> + *
>> + * Full-sized RSC aggregations for a 10Gb/s link will
>> + * arrive at about 20K aggregation/s rate.
>> + *
>> + * 2K inst/s rate will make only 10% of the
>> + * aggregations to be closed due to the interrupt timer
>> + * expiration for a streaming at wire-speed case.
>> + *
>> + * For a sparse streaming case this setting will yield
>> + * at most 500us latency for a single RSC aggregation.
>> + */
>> + eitr &= ~IXGBE_EITR_ITR_INT_MASK;
>> + eitr |= IXGBE_EITR_INTERVAL_US(500) | IXGBE_EITR_CNT_WDIS;
>> +
>> + IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>> + IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxq->reg_idx), rscctl);
>> + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
>> + IXGBE_WRITE_REG(hw, IXGBE_EITR(rxq->reg_idx), eitr);
>> +
>> + /*
>> + * RSC requires the mapping of the queue to the
>> + * interrupt vector.
>> + */
>> + ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
>> +
>> + rxq->rsc_en = 1;
>> + }
>> +
>> + dev->data->lro = 1;
>> +
>> + PMD_INIT_LOG(INFO, "enabling LRO mode");
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * Initializes Receive Unit.
>> */
>> @@ -3615,6 +4207,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>> uint16_t buf_size;
>> uint16_t i;
>> struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
>> + int rc;
>>
>> PMD_INIT_FUNC_TRACE();
>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>> @@ -3724,6 +4317,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>> RTE_PKTMBUF_HEADROOM);
>> srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) &
>> IXGBE_SRRCTL_BSIZEPKT_MASK);
>> +
>> IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>>
>> buf_size = (uint16_t) ((srrctl & IXGBE_SRRCTL_BSIZEPKT_MASK) <<
>> @@ -3738,8 +4332,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>> if (rx_conf->enable_scatter)
>> dev->data->scattered_rx = 1;
>>
>> - ixgbe_set_rx_function(dev);
>> -
>> /*
>> * Device configured with multiple RX queues.
>> */
>> @@ -3770,6 +4362,12 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>> IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
>> }
>>
>> + rc = ixgbe_set_rsc(dev);
>> + if (rc)
>> + return rc;
>> +
>> + ixgbe_set_rx_function(dev);
>> +
>> return 0;
>> }
>>
>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>> index 3937cf6..120bbbd 100644
>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>> @@ -79,6 +79,10 @@ struct ixgbe_rx_entry {
>> struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
>> };
>>
>> +struct ixgbe_rsc_entry {
>> + struct rte_mbuf *fbuf; /**< First segment of the fragmented packet. */
>> +};
>> +
>> /**
>> * Structure associated with each descriptor of the TX ring of a TX queue.
>> */
>> @@ -105,6 +109,7 @@ struct ixgbe_rx_queue {
>> volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
>> volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
>> struct ixgbe_rx_entry *sw_ring; /**< address of RX software ring. */
>> + struct ixgbe_rsc_entry *sw_rsc_ring; /**< address of RSC software ring. */
>> struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
>> struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
>> uint64_t mbuf_initializer; /**< value to init mbufs */
>> @@ -126,6 +131,7 @@ struct ixgbe_rx_queue {
>> uint8_t port_id; /**< Device port identifier. */
>> uint8_t crc_len; /**< 0 if CRC stripped, 4 otherwise. */
>> uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */
>> + uint8_t rsc_en; /**< If not 0, RSC is enabled. */
>> uint8_t rx_deferred_start; /**< not in global dev start. */
>> #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
>> /** need to alloc dummy mbuf, for wraparound when scanning hw ring */
>> --
>> 2.1.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
2015-03-30 15:37 ` Vlad Zolotarov
@ 2015-03-30 15:57 ` Vlad Zolotarov
0 siblings, 0 replies; 6+ messages in thread
From: Vlad Zolotarov @ 2015-03-30 15:57 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
On 03/30/15 18:37, Vlad Zolotarov wrote:
>
>
> On 03/30/15 17:18, Ananyev, Konstantin wrote:
>> Hi Vlad,
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov
>>> Sent: Wednesday, March 18, 2015 5:52 PM
>>> To: dev@dpdk.org
>>> Subject: [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support
>>>
>>> - Only x540 and 82599 devices support LRO.
>>> - Add the appropriate HW configuration.
>>> - Add RSC aware rx_pkt_burst() handlers:
>>> - Implemented bulk allocation and non-bulk allocation versions.
>>> - Add LRO-specific fields to rte_eth_rxmode, to
>>> rte_eth_dev_data
>>> and to ixgbe_rx_queue.
>>> - Use the appropriate handler when LRO is requested.
>>>
>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
>>> ---
>>> New in v8:
>>> - Took the RSC configuration code from ixgbe_dev_rx_init() into
>>> a separate
>>> function - ixgbe_set_rsc().
>>> - Added some missing macros for HW configuration.
>>> - Styling adjustments:
>>> - Functions names.
>>> - Functions descriptions.
>>> - Reworked the ixgbe_free_rsc_cluster() code to make it more
>>> readable.
>>> - Kill the HEADER_SPLIT flow in ixgbe_set_rsc() since it's not
>>> supported by
>>> ixgbe PMD.
>>>
>>> New in v7:
>>> - Free not-yet-completed RSC aggregations in rte_eth_dev_stop()
>>> flow.
>>> - Reset the sw_ring[].mbuf entry in a bulk allocation case.
>>> This is needed for ixgbe_rx_queue_release_mbufs().
>>> - _recv_pkts_lro(): added the missing memory barrier before RDT
>>> update in a
>>> non-bulk allocation case.
>>> - Don't allow RSC when device is configured in an SR-IOV mode.
>>>
>>> New in v5:
>>> - Put the RTE_ETHDEV_HAS_LRO_SUPPORT definition at the beginning
>>> of rte_ethdev.h.
>>> - Removed the "TODO: Remove me" comment near
>>> RTE_ETHDEV_HAS_LRO_SUPPORT.
>>>
>>> New in v4:
>>> - Define RTE_ETHDEV_HAS_LRO_SUPPORT in rte_ethdev.h instead of
>>> RTE_ETHDEV_LRO_SUPPORT defined in config/common_linuxapp.
>>>
>>> New in v2:
>>> - Removed rte_eth_dev_data.lro_bulk_alloc.
>>> - Fixed a few styling and spelling issues.
>>> ---
>>> lib/librte_ether/rte_ethdev.h | 9 +-
>>> lib/librte_net/rte_ip.h | 3 +
>>> lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h | 6 +
>>> lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 11 +
>>> lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 5 +
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 610
>>> +++++++++++++++++++++++++++++++-
>>> lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 6 +
>>> 7 files changed, 642 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.h
>>> b/lib/librte_ether/rte_ethdev.h
>>> index 21aa359..61dc49a 100644
>>> --- a/lib/librte_ether/rte_ethdev.h
>>> +++ b/lib/librte_ether/rte_ethdev.h
>>> @@ -172,6 +172,9 @@ extern "C" {
>>>
>>> #include <stdint.h>
>>>
>>> +/* Use this macro to check if LRO API is supported */
>>> +#define RTE_ETHDEV_HAS_LRO_SUPPORT
>>> +
>>> #include <rte_log.h>
>>> #include <rte_interrupts.h>
>>> #include <rte_pci.h>
>>> @@ -320,14 +323,15 @@ struct rte_eth_rxmode {
>>> enum rte_eth_rx_mq_mode mq_mode;
>>> uint32_t max_rx_pkt_len; /**< Only used if jumbo_frame
>>> enabled. */
>>> uint16_t split_hdr_size; /**< hdr buf size (header_split
>>> enabled).*/
>>> - uint8_t header_split : 1, /**< Header Split enable. */
>>> + uint16_t header_split : 1, /**< Header Split enable. */
>>> hw_ip_checksum : 1, /**< IP/UDP/TCP checksum offload
>>> enable. */
>>> hw_vlan_filter : 1, /**< VLAN filter enable. */
>>> hw_vlan_strip : 1, /**< VLAN strip enable. */
>>> hw_vlan_extend : 1, /**< Extended VLAN enable. */
>>> jumbo_frame : 1, /**< Jumbo Frame Receipt enable. */
>>> hw_strip_crc : 1, /**< Enable CRC stripping by
>>> hardware. */
>>> - enable_scatter : 1; /**< Enable scatter packets rx
>>> handler */
>>> + enable_scatter : 1, /**< Enable scatter packets rx
>>> handler */
>>> + enable_lro : 1; /**< Enable LRO */
>>> };
>>>
>>> /**
>>> @@ -1515,6 +1519,7 @@ struct rte_eth_dev_data {
>>> uint8_t port_id; /**< Device [external] port
>>> identifier. */
>>> uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) /
>>> OFF(0). */
>>> scattered_rx : 1, /**< RX of scattered packets is ON(1) /
>>> OFF(0) */
>>> + lro : 1, /**< RX LRO is ON(1) / OFF(0) */
>>> all_multicast : 1, /**< RX all multicast mode ON(1) /
>>> OFF(0). */
>>> dev_started : 1; /**< Device state: STARTED(1) /
>>> STOPPED(0). */
>>> };
>>> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
>>> index 64935d9..74c9ced 100644
>>> --- a/lib/librte_net/rte_ip.h
>>> +++ b/lib/librte_net/rte_ip.h
>>> @@ -110,6 +110,9 @@ struct ipv4_hdr {
>>> (((c) & 0xff) << 8) | \
>>> ((d) & 0xff))
>>>
>>> +/** Maximal IPv4 packet length (including a header) */
>>> +#define IPV4_MAX_PKT_LEN 65535
>>> +
>>> /** Internet header length mask for version_ihl field */
>>> #define IPV4_HDR_IHL_MASK (0x0f)
>>> /**
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>> b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>> index 9a66370..4998627 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_type.h
>>> @@ -234,8 +234,14 @@ struct ixgbe_thermal_sensor_data {
>>> #define IXGBE_EITR(_i) (((_i) <= 23) ? (0x00820 + ((_i) *
>>> 4)) : \
>>> (0x012300 + (((_i) - 24) * 4)))
>>> #define IXGBE_EITR_ITR_INT_MASK 0x00000FF8
>>> +#define IXGBE_EITR_ITR_INT_SHIFT 3
>>> #define IXGBE_EITR_LLI_MOD 0x00008000
>>> #define IXGBE_EITR_CNT_WDIS 0x80000000
>>> +/* EITR Inteval is in 2048ns uinits for 1G and 10G link */
>>> +#define IXGBE_EITR_INTERVAL_UNIT_NS 2048
>> Why is that? Why not just 2us as 82599 spec says?
>> And then:
>> +#define IXGBE_EITR_INTERVAL_US(us) (((us) /
>> IXGBE_EITR_INTERVAL_UNIT_US << IXGBE_EITR_ITR_INT_SHIFT) &
>> IXGBE_EITR_ITR_INT_MASK)
>> ?
>
> Because spec doesn't say 2us but rather 2.048us - see 8.2.4.5.16
> chapter of x540 spec.
The most confusing part is that 82599 Rev. 3.0 spec does say so. The
x540 spec I have is dated by March 2014 and 82599 spec is dated by
November 2014 but according to changelog EITR description has never been
updated. Therefore I conclude that the more recent device spec (x540)
holds the most up to date content.
Pls., confirm.
thanks,
vlad
>
>>
>>> +#define IXGBE_EITR_INTERVAL_US(us) \
>>> + (((us) * 1000 / IXGBE_EITR_INTERVAL_UNIT_NS <<
>>> IXGBE_EITR_ITR_INT_SHIFT) & \
>>> + IXGBE_EITR_ITR_INT_MASK)
>>
>> Unfortunately we are not supposed to change files under ixgbe
>> subfirectory (except ixgbe_osdep.*).
>> These files comes from ixgbe shared code and we treat them as read-only.
>> You need to move these new macros you added into ixgbe_ethedev.h.
>
> Aha... Ok. No problem.
>
>>
>> All other things look good to me.
>> Konstantin
>>
>>> #define IXGBE_IVAR(_i) (0x00900 + ((_i) * 4)) /* 24 at
>>> 0x900-0x960 */
>>> #define IXGBE_IVAR_MISC 0x00A00 /* misc MSI-X interrupt
>>> causes */
>>> #define IXGBE_EITRSEL 0x00894
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> index 92d75db..923692e 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
>>> @@ -1645,6 +1645,7 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
>>>
>>> /* Clear stored conf */
>>> dev->data->scattered_rx = 0;
>>> + dev->data->lro = 0;
>>>
>>> /* Clear recorded link status */
>>> memset(&link, 0, sizeof(link));
>>> @@ -2013,6 +2014,16 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
>>> struct rte_eth_dev_info *dev_info)
>>> DEV_RX_OFFLOAD_IPV4_CKSUM |
>>> DEV_RX_OFFLOAD_UDP_CKSUM |
>>> DEV_RX_OFFLOAD_TCP_CKSUM;
>>> +
>>> + /*
>>> + * RSC is only supported by 82599 and x540 PF devices in a
>>> non-SR-IOV
>>> + * mode.
>>> + */
>>> + if ((hw->mac.type == ixgbe_mac_82599EB ||
>>> + hw->mac.type == ixgbe_mac_X540) &&
>>> + !RTE_ETH_DEV_SRIOV(dev).active)
>>> + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
>>> +
>>> dev_info->tx_offload_capa =
>>> DEV_TX_OFFLOAD_VLAN_INSERT |
>>> DEV_TX_OFFLOAD_IPV4_CKSUM |
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> index ffe3471..7050340 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
>>> @@ -344,6 +344,11 @@ uint16_t ixgbe_recv_pkts(void *rx_queue, struct
>>> rte_mbuf **rx_pkts,
>>> uint16_t ixgbe_recv_scattered_pkts(void *rx_queue,
>>> struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>>
>>> +uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
>>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>> +uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
>>> + struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
>>> +
>>> uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>> uint16_t nb_pkts);
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> index a08ae6a..daa4c45 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>>> @@ -70,6 +70,7 @@
>>> #include <rte_sctp.h>
>>> #include <rte_string_fns.h>
>>> #include <rte_errno.h>
>>> +#include <rte_ip.h>
>>>
>>> #include "ixgbe_logs.h"
>>> #include "ixgbe/ixgbe_api.h"
>>> @@ -1422,6 +1423,295 @@ ixgbe_fill_cluster_head_buf(
>>> }
>>> }
>>>
>>> +/**
>>> + * ixgbe_recv_pkts_lro - receive handler for and LRO case.
>>> + *
>>> + * @rx_queue Rx queue handle
>>> + * @rx_pkts table of received packets
>>> + * @nb_pkts size of rx_pkts table
>>> + * @bulk_alloc if TRUE bulk allocation is used for a HW ring refilling
>>> + *
>>> + * Handles the Rx HW ring completions when RSC feature is
>>> configured. Uses an
>>> + * additional ring of ixgbe_rsc_entry's that will hold the relevant
>>> RSC info.
>>> + *
>>> + * We use the same logic as in Lunux and in FreeBSD ixgbe drivers:
>>> + * 1) When non-EOP RSC completion arrives:
>>> + * a) Update the HEAD of the current RSC aggregation cluster
>>> with the new
>>> + * segment's data length.
>>> + * b) Set the "next" pointer of the current segment to point to
>>> the segment
>>> + * at the NEXTP index.
>>> + * c) Pass the HEAD of RSC aggregation cluster on to the next
>>> NEXTP entry
>>> + * in the sw_rsc_ring.
>>> + * 2) When EOP arrives we just update the cluster's total length
>>> and offload
>>> + * flags and deliver the cluster up to the upper layers. In our
>>> case - put it
>>> + * in the rx_pkts table.
>>> + *
>>> + * Returns the number of received packets/clusters (according to
>>> the "bulk
>>> + * receive" interface).
>>> + */
>>> +static inline uint16_t
>>> +ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts,
>>> uint16_t nb_pkts,
>>> + bool bulk_alloc)
>>> +{
>>> + struct ixgbe_rx_queue *rxq = rx_queue;
>>> + volatile union ixgbe_adv_rx_desc *rx_ring = rxq->rx_ring;
>>> + struct ixgbe_rx_entry *sw_ring = rxq->sw_ring;
>>> + struct ixgbe_rsc_entry *sw_rsc_ring = rxq->sw_rsc_ring;
>>> + uint16_t rx_id = rxq->rx_tail;
>>> + uint16_t nb_rx = 0;
>>> + uint16_t nb_hold = rxq->nb_rx_hold;
>>> + uint16_t prev_id = rxq->rx_tail;
>>> +
>>> + while (nb_rx < nb_pkts) {
>>> + bool eop;
>>> + struct ixgbe_rx_entry *rxe;
>>> + struct ixgbe_rsc_entry *rsc_entry;
>>> + struct ixgbe_rsc_entry *next_rsc_entry;
>>> + struct ixgbe_rx_entry *next_rxe;
>>> + struct rte_mbuf *first_seg;
>>> + struct rte_mbuf *rxm;
>>> + struct rte_mbuf *nmb;
>>> + union ixgbe_adv_rx_desc rxd;
>>> + uint16_t data_len;
>>> + uint16_t next_id;
>>> + volatile union ixgbe_adv_rx_desc *rxdp;
>>> + uint32_t staterr;
>>> +
>>> +next_desc:
>>> + /*
>>> + * The code in this whole file uses the volatile pointer to
>>> + * ensure the read ordering of the status and the rest of the
>>> + * descriptor fields (on the compiler level only!!!). This
>>> is so
>>> + * UGLY - why not to just use the compiler barrier instead?
>>> DPDK
>>> + * even has the rte_compiler_barrier() for that.
>>> + *
>>> + * But most importantly this is just wrong because this
>>> doesn't
>>> + * ensure memory ordering in a general case at all. For
>>> + * instance, DPDK is supposed to work on Power CPUs where
>>> + * compiler barrier may just not be enough!
>>> + *
>>> + * I tried to write only this function properly to have a
>>> + * starting point (as a part of an LRO/RSC series) but the
>>> + * compiler cursed at me when I tried to cast away the
>>> + * "volatile" from rx_ring (yes, it's volatile too!!!). So,
>>> I'm
>>> + * keeping it the way it is for now.
>>> + *
>>> + * The code in this file is broken in so many other places and
>>> + * will just not work on a big endian CPU anyway therefore the
>>> + * lines below will have to be revisited together with the
>>> rest
>>> + * of the ixgbe PMD.
>>> + *
>>> + * TODO:
>>> + * - Get rid of "volatile" crap and let the compiler do its
>>> + * job.
>>> + * - Use the proper memory barrier (rte_rmb()) to ensure
>>> the
>>> + * memory ordering below.
>>> + */
>>> + rxdp = &rx_ring[rx_id];
>>> + staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
>>> +
>>> + if (!(staterr & IXGBE_RXDADV_STAT_DD))
>>> + break;
>>> +
>>> + rxd = *rxdp;
>>> +
>>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
>>> + "staterr=0x%x data_len=%u",
>>> + rxq->port_id, rxq->queue_id, rx_id, staterr,
>>> + rte_le_to_cpu_16(rxd.wb.upper.length));
>>> +
>>> + if (!bulk_alloc) {
>>> + nmb = rte_rxmbuf_alloc(rxq->mb_pool);
>>> + if (nmb == NULL) {
>>> + PMD_RX_LOG(DEBUG, "RX mbuf alloc failed "
>>> + "port_id=%u queue_id=%u",
>>> + rxq->port_id, rxq->queue_id);
>>> +
>>> + rte_eth_devices[rxq->port_id].data->
>>> + rx_mbuf_alloc_failed++;
>>> + break;
>>> + }
>>> + } else if (nb_hold > rxq->rx_free_thresh) {
>>> + uint16_t next_rdt = rxq->rx_free_trigger;
>>> +
>>> + if (!ixgbe_rx_alloc_bufs(rxq, false)) {
>>> + rte_wmb();
>>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr,
>>> + next_rdt);
>>> + nb_hold -= rxq->rx_free_thresh;
>>> + } else {
>>> + PMD_RX_LOG(DEBUG, "RX bulk alloc failed "
>>> + "port_id=%u queue_id=%u",
>>> + rxq->port_id, rxq->queue_id);
>>> +
>>> + rte_eth_devices[rxq->port_id].data->
>>> + rx_mbuf_alloc_failed++;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + nb_hold++;
>>> + rxe = &sw_ring[rx_id];
>>> + eop = staterr & IXGBE_RXDADV_STAT_EOP;
>>> +
>>> + next_id = rx_id + 1;
>>> + if (next_id == rxq->nb_rx_desc)
>>> + next_id = 0;
>>> +
>>> + /* Prefetch next mbuf while processing current one. */
>>> + rte_ixgbe_prefetch(sw_ring[next_id].mbuf);
>>> +
>>> + /*
>>> + * When next RX descriptor is on a cache-line boundary,
>>> + * prefetch the next 4 RX descriptors and the next 4 pointers
>>> + * to mbufs.
>>> + */
>>> + if ((next_id & 0x3) == 0) {
>>> + rte_ixgbe_prefetch(&rx_ring[next_id]);
>>> + rte_ixgbe_prefetch(&sw_ring[next_id]);
>>> + }
>>> +
>>> + rxm = rxe->mbuf;
>>> +
>>> + if (!bulk_alloc) {
>>> + __le64 dma =
>>> + rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb));
>>> + /*
>>> + * Update RX descriptor with the physical address of the
>>> + * new data buffer of the new allocated mbuf.
>>> + */
>>> + rxe->mbuf = nmb;
>>> +
>>> + rxm->data_off = RTE_PKTMBUF_HEADROOM;
>>> + rxdp->read.hdr_addr = dma;
>>> + rxdp->read.pkt_addr = dma;
>>> + } else
>>> + rxe->mbuf = NULL;
>>> +
>>> + /*
>>> + * Set data length & data buffer address of mbuf.
>>> + */
>>> + data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
>>> + rxm->data_len = data_len;
>>> +
>>> + if (!eop) {
>>> + uint16_t nextp_id;
>>> + /*
>>> + * Get next descriptor index:
>>> + * - For RSC it's in the NEXTP field.
>>> + * - For a scattered packet - it's just a following
>>> + * descriptor.
>>> + */
>>> + if (ixgbe_rsc_count(&rxd))
>>> + nextp_id =
>>> + (staterr & IXGBE_RXDADV_NEXTP_MASK) >>
>>> + IXGBE_RXDADV_NEXTP_SHIFT;
>>> + else
>>> + nextp_id = next_id;
>>> +
>>> + next_rsc_entry = &sw_rsc_ring[nextp_id];
>>> + next_rxe = &sw_ring[nextp_id];
>>> + rte_ixgbe_prefetch(next_rxe);
>>> + }
>>> +
>>> + rsc_entry = &sw_rsc_ring[rx_id];
>>> + first_seg = rsc_entry->fbuf;
>>> + rsc_entry->fbuf = NULL;
>>> +
>>> + /*
>>> + * If this is the first buffer of the received packet,
>>> + * set the pointer to the first mbuf of the packet and
>>> + * initialize its context.
>>> + * Otherwise, update the total length and the number of
>>> segments
>>> + * of the current scattered packet, and update the pointer to
>>> + * the last mbuf of the current packet.
>>> + */
>>> + if (first_seg == NULL) {
>>> + first_seg = rxm;
>>> + first_seg->pkt_len = data_len;
>>> + first_seg->nb_segs = 1;
>>> + } else {
>>> + first_seg->pkt_len += data_len;
>>> + first_seg->nb_segs++;
>>> + }
>>> +
>>> + prev_id = rx_id;
>>> + rx_id = next_id;
>>> +
>>> + /*
>>> + * If this is not the last buffer of the received packet,
>>> update
>>> + * the pointer to the first mbuf at the NEXTP entry in the
>>> + * sw_rsc_ring and continue to parse the RX ring.
>>> + */
>>> + if (!eop) {
>>> + rxm->next = next_rxe->mbuf;
>>> + next_rsc_entry->fbuf = first_seg;
>>> + goto next_desc;
>>> + }
>>> +
>>> + /*
>>> + * This is the last buffer of the received packet - return
>>> + * the current cluster to the user.
>>> + */
>>> + rxm->next = NULL;
>>> +
>>> + /* Initialize the first mbuf of the returned packet */
>>> + ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq->port_id,
>>> + staterr);
>>> +
>>> + /* Prefetch data of first segment, if configured to do so. */
>>> + rte_packet_prefetch((char *)first_seg->buf_addr +
>>> + first_seg->data_off);
>>> +
>>> + /*
>>> + * Store the mbuf address into the next entry of the array
>>> + * of returned packets.
>>> + */
>>> + rx_pkts[nb_rx++] = first_seg;
>>> + }
>>> +
>>> + /*
>>> + * Record index of the next RX descriptor to probe.
>>> + */
>>> + rxq->rx_tail = rx_id;
>>> +
>>> + /*
>>> + * If the number of free RX descriptors is greater than the RX
>>> free
>>> + * threshold of the queue, advance the Receive Descriptor Tail
>>> (RDT)
>>> + * register.
>>> + * Update the RDT with the value of the last processed RX
>>> descriptor
>>> + * minus 1, to guarantee that the RDT register is never equal
>>> to the
>>> + * RDH register, which creates a "full" ring situtation from the
>>> + * hardware point of view...
>>> + */
>>> + if (!bulk_alloc && nb_hold > rxq->rx_free_thresh) {
>>> + PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
>>> + "nb_hold=%u nb_rx=%u",
>>> + rxq->port_id, rxq->queue_id, rx_id, nb_hold, nb_rx);
>>> +
>>> + rte_wmb();
>>> + IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, prev_id);
>>> + nb_hold = 0;
>>> + }
>>> +
>>> + rxq->nb_rx_hold = nb_hold;
>>> + return nb_rx;
>>> +}
>>> +
>>> +uint16_t
>>> +ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf
>>> **rx_pkts,
>>> + uint16_t nb_pkts)
>>> +{
>>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
>>> +}
>>> +
>>> +uint16_t
>>> +ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf
>>> **rx_pkts,
>>> + uint16_t nb_pkts)
>>> +{
>>> + return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
>>> +}
>>> +
>>> uint16_t
>>> ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>>> uint16_t nb_pkts)
>>> @@ -2005,6 +2295,31 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev
>>> *dev,
>>> return (0);
>>> }
>>>
>>> +/**
>>> + * ixgbe_free_rsc_cluster - free the not-yet-completed RSC cluster
>>> + *
>>> + * The "next" pointer of the last segment of (not-yet-completed)
>>> RSC clusters
>>> + * in the sw_rsc_ring is not set to NULL but rather points to the next
>>> + * mbuf of this RSC aggregation (that has not been completed yet
>>> and still
>>> + * resides on the HW ring). So, instead of calling for
>>> rte_pktmbuf_free() we
>>> + * will just free first "nb_segs" segments of the cluster
>>> explicitly by calling
>>> + * an rte_pktmbuf_free_seg().
>>> + *
>>> + * @m RSC cluster head
>>> + */
>>> +static void
>>> +ixgbe_free_rsc_cluster(struct rte_mbuf *m)
>>> +{
>>> + uint8_t i, nb_segs = m->nb_segs;
>>> + struct rte_mbuf *next_seg;
>>> +
>>> + for (i = 0; i < nb_segs; i++) {
>>> + next_seg = m->next;
>>> + rte_pktmbuf_free_seg(m);
>>> + m = next_seg;
>>> + }
>>> +}
>>> +
>>> static void
>>> ixgbe_rx_queue_release_mbufs(struct ixgbe_rx_queue *rxq)
>>> {
>>> @@ -2028,6 +2343,13 @@ ixgbe_rx_queue_release_mbufs(struct
>>> ixgbe_rx_queue *rxq)
>>> }
>>> #endif
>>> }
>>> +
>>> + if (rxq->sw_rsc_ring)
>>> + for (i = 0; i < rxq->nb_rx_desc; i++)
>>> + if (rxq->sw_rsc_ring[i].fbuf) {
>>> + ixgbe_free_rsc_cluster(rxq->sw_rsc_ring[i].fbuf);
>>> + rxq->sw_rsc_ring[i].fbuf = NULL;
>>> + }
>>> }
>>>
>>> static void
>>> @@ -2036,6 +2358,7 @@ ixgbe_rx_queue_release(struct ixgbe_rx_queue
>>> *rxq)
>>> if (rxq != NULL) {
>>> ixgbe_rx_queue_release_mbufs(rxq);
>>> rte_free(rxq->sw_ring);
>>> + rte_free(rxq->sw_rsc_ring);
>>> rte_free(rxq);
>>> }
>>> }
>>> @@ -2158,6 +2481,7 @@ ixgbe_reset_rx_queue(struct ixgbe_hw *hw,
>>> struct ixgbe_rx_queue *rxq)
>>> rxq->nb_rx_hold = 0;
>>> rxq->pkt_first_seg = NULL;
>>> rxq->pkt_last_seg = NULL;
>>> + rxq->rsc_en = 0;
>>> }
>>>
>>> int
>>> @@ -2172,6 +2496,14 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev
>>> *dev,
>>> struct ixgbe_rx_queue *rxq;
>>> struct ixgbe_hw *hw;
>>> uint16_t len;
>>> + struct rte_eth_dev_info dev_info = { 0 };
>>> + struct rte_eth_rxmode *dev_rx_mode = &dev->data->dev_conf.rxmode;
>>> + bool rsc_requested = false;
>>> +
>>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
>>> + if ((dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO) &&
>>> + dev_rx_mode->enable_lro)
>>> + rsc_requested = true;
>>>
>>> PMD_INIT_FUNC_TRACE();
>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> @@ -2277,12 +2609,27 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev
>>> *dev,
>>> rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
>>> sizeof(struct ixgbe_rx_entry) * len,
>>> RTE_CACHE_LINE_SIZE, socket_id);
>>> - if (rxq->sw_ring == NULL) {
>>> + if (!rxq->sw_ring) {
>>> ixgbe_rx_queue_release(rxq);
>>> return (-ENOMEM);
>>> }
>>> - PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
>>> - rxq->sw_ring, rxq->rx_ring, rxq->rx_ring_phys_addr);
>>> +
>>> + if (rsc_requested) {
>>> + rxq->sw_rsc_ring =
>>> + rte_zmalloc_socket("rxq->sw_rsc_ring",
>>> + sizeof(struct ixgbe_rsc_entry) * len,
>>> + RTE_CACHE_LINE_SIZE, socket_id);
>>> + if (!rxq->sw_rsc_ring) {
>>> + ixgbe_rx_queue_release(rxq);
>>> + return (-ENOMEM);
>>> + }
>>> + } else
>>> + rxq->sw_rsc_ring = NULL;
>>> +
>>> + PMD_INIT_LOG(DEBUG, "sw_ring=%p sw_rsc_ring=%p hw_ring=%p "
>>> + "dma_addr=0x%"PRIx64,
>>> + rxq->sw_ring, rxq->sw_rsc_ring, rxq->rx_ring,
>>> + rxq->rx_ring_phys_addr);
>>>
>>> if (!rte_is_power_of_2(nb_desc)) {
>>> PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
>>> @@ -3527,6 +3874,91 @@ ixgbe_dev_mq_tx_configure(struct rte_eth_dev
>>> *dev)
>>> return 0;
>>> }
>>>
>>> +/**
>>> + * ixgbe_get_rscctl_maxdesc - Calculate the RSCCTL[n].MAXDESC for PF
>>> + *
>>> + * Return the RSCCTL[n].MAXDESC for 82599 and x540 PF devices
>>> according to the
>>> + * spec rev. 3.0 chapter 8.2.3.8.13.
>>> + *
>>> + * @pool Memory pool of the Rx queue
>>> + */
>>> +static inline uint32_t
>>> +ixgbe_get_rscctl_maxdesc(struct rte_mempool *pool)
>>> +{
>>> + struct rte_pktmbuf_pool_private *mp_priv =
>>> rte_mempool_get_priv(pool);
>>> +
>>> + /* MAXDESC * SRRCTL.BSIZEPKT must not exceed 64 KB minus one */
>>> + uint16_t maxdesc =
>>> + IPV4_MAX_PKT_LEN /
>>> + (mp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM);
>>> +
>>> + if (maxdesc >= 16)
>>> + return IXGBE_RSCCTL_MAXDESC_16;
>>> + else if (maxdesc >= 8)
>>> + return IXGBE_RSCCTL_MAXDESC_8;
>>> + else if (maxdesc >= 4)
>>> + return IXGBE_RSCCTL_MAXDESC_4;
>>> + else
>>> + return IXGBE_RSCCTL_MAXDESC_1;
>>> +}
>>> +
>>> +/**
>>> + * ixgbe_set_ivar - Setup the correct IVAR register for a
>>> particular MSIX
>>> + * interrupt
>>> + *
>>> + * (Taken from FreeBSD tree)
>>> + * (yes this is all very magic and confusing :)
>>> + *
>>> + * @dev port handle
>>> + * @entry the register array entry
>>> + * @vector the MSIX vector for this queue
>>> + * @type RX/TX/MISC
>>> + */
>>> +static void
>>> +ixgbe_set_ivar(struct rte_eth_dev *dev, u8 entry, u8 vector, s8 type)
>>> +{
>>> + struct ixgbe_hw *hw =
>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> + u32 ivar, index;
>>> +
>>> + vector |= IXGBE_IVAR_ALLOC_VAL;
>>> +
>>> + switch (hw->mac.type) {
>>> +
>>> + case ixgbe_mac_82598EB:
>>> + if (type == -1)
>>> + entry = IXGBE_IVAR_OTHER_CAUSES_INDEX;
>>> + else
>>> + entry += (type * 64);
>>> + index = (entry >> 2) & 0x1F;
>>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(index));
>>> + ivar &= ~(0xFF << (8 * (entry & 0x3)));
>>> + ivar |= (vector << (8 * (entry & 0x3)));
>>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(index), ivar);
>>> + break;
>>> +
>>> + case ixgbe_mac_82599EB:
>>> + case ixgbe_mac_X540:
>>> + if (type == -1) { /* MISC IVAR */
>>> + index = (entry & 1) * 8;
>>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR_MISC);
>>> + ivar &= ~(0xFF << index);
>>> + ivar |= (vector << index);
>>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR_MISC, ivar);
>>> + } else { /* RX/TX IVARS */
>>> + index = (16 * (entry & 1)) + (8 * type);
>>> + ivar = IXGBE_READ_REG(hw, IXGBE_IVAR(entry >> 1));
>>> + ivar &= ~(0xFF << index);
>>> + ivar |= (vector << index);
>>> + IXGBE_WRITE_REG(hw, IXGBE_IVAR(entry >> 1), ivar);
>>> + }
>>> +
>>> + break;
>>> +
>>> + default:
>>> + break;
>>> + }
>>> +}
>>> +
>>> void ixgbe_set_rx_function(struct rte_eth_dev *dev)
>>> {
>>> struct ixgbe_hw *hw =
>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> @@ -3545,7 +3977,24 @@ void ixgbe_set_rx_function(struct rte_eth_dev
>>> *dev)
>>> hw->rx_vec_allowed = false;
>>> }
>>>
>>> - if (dev->data->scattered_rx) {
>>> + /*
>>> + * Initialize the appropriate LRO callback.
>>> + *
>>> + * If all queues satisfy the bulk allocation preconditions
>>> + * (hw->rx_bulk_alloc_allowed is TRUE) then we may use bulk
>>> allocation.
>>> + * Otherwise use a single allocation version.
>>> + */
>>> + if (dev->data->lro) {
>>> + if (hw->rx_bulk_alloc_allowed) {
>>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a bulk "
>>> + "allocation version");
>>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
>>> + } else {
>>> + PMD_INIT_LOG(INFO, "LRO is requested. Using a single "
>>> + "allocation version");
>>> + dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
>>> + }
>>> + } else if (dev->data->scattered_rx) {
>>> /*
>>> * Set the non-LRO scattered callback: there are Vector and
>>> * single allocation versions.
>>> @@ -3595,6 +4044,149 @@ void ixgbe_set_rx_function(struct
>>> rte_eth_dev *dev)
>>> }
>>> }
>>>
>>> +/**
>>> + * ixgbe_set_rsc - configure RSC related port HW registers
>>> + *
>>> + * Configures the port's RSC related registers according to the
>>> 4.6.7.2 chapter
>>> + * of 82599 Spec (x540 configuration is virtually the same).
>>> + *
>>> + * @dev port handle
>>> + *
>>> + * Returns 0 in case of success or a non-zero error code
>>> + */
>>> +static int
>>> +ixgbe_set_rsc(struct rte_eth_dev *dev)
>>> +{
>>> + struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
>>> + struct ixgbe_hw *hw =
>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> + struct rte_eth_dev_info dev_info = { 0 };
>>> + bool rsc_capable = false;
>>> + uint16_t i;
>>> + uint32_t rdrxctl;
>>> +
>>> + /* Sanity check */
>>> + dev->dev_ops->dev_infos_get(dev, &dev_info);
>>> + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TCP_LRO)
>>> + rsc_capable = true;
>>> +
>>> + if (!rsc_capable && rx_conf->enable_lro) {
>>> + PMD_INIT_LOG(CRIT, "LRO is requested on HW that doesn't "
>>> + "support it");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* RSC global configuration (chapter 4.6.7.2.1 of 82599 Spec) */
>>> +
>>> + if (!rx_conf->hw_strip_crc && rx_conf->enable_lro) {
>>> + /*
>>> + * According to chapter of 4.6.7.2.1 of the Spec Rev.
>>> + * 3.0 RSC configuration requires HW CRC stripping being
>>> + * enabled. If user requested both HW CRC stripping off
>>> + * and RSC on - return an error.
>>> + */
>>> + PMD_INIT_LOG(CRIT, "LRO can't be enabled when HW CRC "
>>> + "is disabled");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* RFCTL configuration */
>>> + if (rsc_capable) {
>>> + uint32_t rfctl = IXGBE_READ_REG(hw, IXGBE_RFCTL);
>>> + if (rx_conf->enable_lro)
>>> + /*
>>> + * Since NFS packets coalescing is not supported - clear
>>> + * RFCTL.NFSW_DIS and RFCTL.NFSR_DIS when RSC is
>>> + * enabled.
>>> + */
>>> + rfctl &= ~(IXGBE_RFCTL_RSC_DIS | IXGBE_RFCTL_NFSW_DIS |
>>> + IXGBE_RFCTL_NFSR_DIS);
>>> + else
>>> + rfctl |= IXGBE_RFCTL_RSC_DIS;
>>> +
>>> + IXGBE_WRITE_REG(hw, IXGBE_RFCTL, rfctl);
>>> + }
>>> +
>>> + /* If LRO hasn't been requested - we are done here. */
>>> + if (!rx_conf->enable_lro)
>>> + return 0;
>>> +
>>> + /* Set RDRXCTL.RSCACKC bit */
>>> + rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
>>> + rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
>>> + IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
>>> +
>>> + /* Per-queue RSC configuration (chapter 4.6.7.2.2 of 82599
>>> Spec) */
>>> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
>>> + struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
>>> + uint32_t srrctl =
>>> + IXGBE_READ_REG(hw, IXGBE_SRRCTL(rxq->reg_idx));
>>> + uint32_t rscctl =
>>> + IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxq->reg_idx));
>>> + uint32_t psrtype =
>>> + IXGBE_READ_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx));
>>> + uint32_t eitr =
>>> + IXGBE_READ_REG(hw, IXGBE_EITR(rxq->reg_idx));
>>> +
>>> + /*
>>> + * ixgbe PMD doesn't support header-split at the moment.
>>> + *
>>> + * Following the 4.6.7.2.1 chapter of the 82599/x540
>>> + * Spec if RSC is enabled the SRRCTL[n].BSIZEHEADER
>>> + * should be configured even if header split is not
>>> + * enabled. We will configure it 128 bytes following the
>>> + * recommendation in the spec.
>>> + */
>>> + srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
>>> + srrctl |= (128 << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT) &
>>> + IXGBE_SRRCTL_BSIZEHDR_MASK;
>>> +
>>> + /*
>>> + * TODO: Consider setting the Receive Descriptor Minimum
>>> + * Threshold Size for an RSC case. This is not an obviously
>>> + * beneficiary option but the one worth considering...
>>> + */
>>> +
>>> + rscctl |= IXGBE_RSCCTL_RSCEN;
>>> + rscctl |= ixgbe_get_rscctl_maxdesc(rxq->mb_pool);
>>> + psrtype |= IXGBE_PSRTYPE_TCPHDR;
>>> +
>>> + /*
>>> + * RSC: Set ITR interval corresponding to 2K ints/s.
>>> + *
>>> + * Full-sized RSC aggregations for a 10Gb/s link will
>>> + * arrive at about 20K aggregation/s rate.
>>> + *
>>> + * 2K inst/s rate will make only 10% of the
>>> + * aggregations to be closed due to the interrupt timer
>>> + * expiration for a streaming at wire-speed case.
>>> + *
>>> + * For a sparse streaming case this setting will yield
>>> + * at most 500us latency for a single RSC aggregation.
>>> + */
>>> + eitr &= ~IXGBE_EITR_ITR_INT_MASK;
>>> + eitr |= IXGBE_EITR_INTERVAL_US(500) | IXGBE_EITR_CNT_WDIS;
>>> +
>>> + IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>>> + IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxq->reg_idx), rscctl);
>>> + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype);
>>> + IXGBE_WRITE_REG(hw, IXGBE_EITR(rxq->reg_idx), eitr);
>>> +
>>> + /*
>>> + * RSC requires the mapping of the queue to the
>>> + * interrupt vector.
>>> + */
>>> + ixgbe_set_ivar(dev, rxq->reg_idx, i, 0);
>>> +
>>> + rxq->rsc_en = 1;
>>> + }
>>> +
>>> + dev->data->lro = 1;
>>> +
>>> + PMD_INIT_LOG(INFO, "enabling LRO mode");
>>> +
>>> + return 0;
>>> +}
>>> +
>>> /*
>>> * Initializes Receive Unit.
>>> */
>>> @@ -3615,6 +4207,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>> uint16_t buf_size;
>>> uint16_t i;
>>> struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode;
>>> + int rc;
>>>
>>> PMD_INIT_FUNC_TRACE();
>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> @@ -3724,6 +4317,7 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>> RTE_PKTMBUF_HEADROOM);
>>> srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) &
>>> IXGBE_SRRCTL_BSIZEPKT_MASK);
>>> +
>>> IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl);
>>>
>>> buf_size = (uint16_t) ((srrctl &
>>> IXGBE_SRRCTL_BSIZEPKT_MASK) <<
>>> @@ -3738,8 +4332,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>> if (rx_conf->enable_scatter)
>>> dev->data->scattered_rx = 1;
>>>
>>> - ixgbe_set_rx_function(dev);
>>> -
>>> /*
>>> * Device configured with multiple RX queues.
>>> */
>>> @@ -3770,6 +4362,12 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
>>> IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
>>> }
>>>
>>> + rc = ixgbe_set_rsc(dev);
>>> + if (rc)
>>> + return rc;
>>> +
>>> + ixgbe_set_rx_function(dev);
>>> +
>>> return 0;
>>> }
>>>
>>> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>> index 3937cf6..120bbbd 100644
>>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.h
>>> @@ -79,6 +79,10 @@ struct ixgbe_rx_entry {
>>> struct rte_mbuf *mbuf; /**< mbuf associated with RX
>>> descriptor. */
>>> };
>>>
>>> +struct ixgbe_rsc_entry {
>>> + struct rte_mbuf *fbuf; /**< First segment of the fragmented
>>> packet. */
>>> +};
>>> +
>>> /**
>>> * Structure associated with each descriptor of the TX ring of a
>>> TX queue.
>>> */
>>> @@ -105,6 +109,7 @@ struct ixgbe_rx_queue {
>>> volatile uint32_t *rdt_reg_addr; /**< RDT register address. */
>>> volatile uint32_t *rdh_reg_addr; /**< RDH register address. */
>>> struct ixgbe_rx_entry *sw_ring; /**< address of RX software
>>> ring. */
>>> + struct ixgbe_rsc_entry *sw_rsc_ring; /**< address of RSC
>>> software ring. */
>>> struct rte_mbuf *pkt_first_seg; /**< First segment of current
>>> packet. */
>>> struct rte_mbuf *pkt_last_seg; /**< Last segment of current
>>> packet. */
>>> uint64_t mbuf_initializer; /**< value to init mbufs */
>>> @@ -126,6 +131,7 @@ struct ixgbe_rx_queue {
>>> uint8_t port_id; /**< Device port identifier. */
>>> uint8_t crc_len; /**< 0 if CRC stripped, 4
>>> otherwise. */
>>> uint8_t drop_en; /**< If not 0, set
>>> SRRCTL.Drop_En. */
>>> + uint8_t rsc_en; /**< If not 0, RSC is enabled. */
>>> uint8_t rx_deferred_start; /**< not in global dev
>>> start. */
>>> #ifdef RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC
>>> /** need to alloc dummy mbuf, for wraparound when scanning hw
>>> ring */
>>> --
>>> 2.1.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-03-31 11:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-31 10:25 [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support Ananyev, Konstantin
2015-03-31 11:40 ` Vlad Zolotarov
-- strict thread matches above, loose matches on Subject: below --
2015-03-18 17:52 [dpdk-dev] [PATCH v8 0/3]: Add LRO support to ixgbe PMD Vlad Zolotarov
2015-03-18 17:52 ` [dpdk-dev] [PATCH v8 3/3] ixgbe: Add LRO support Vlad Zolotarov
2015-03-30 14:18 ` Ananyev, Konstantin
2015-03-30 15:37 ` Vlad Zolotarov
2015-03-30 15:57 ` Vlad Zolotarov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).