From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 7ADA6B0BA for ; Wed, 4 Jun 2014 20:08:47 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 04 Jun 2014 11:08:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.98,974,1392192000"; d="scan'208";a="549795606" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 04 Jun 2014 11:08:50 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id s54I8nwe011721; Wed, 4 Jun 2014 19:08:49 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id s54I8no7009000; Wed, 4 Jun 2014 19:08:49 +0100 Received: (from cfdumitr@localhost) by sivswdev01.ir.intel.com with id s54I8nGu008995; Wed, 4 Jun 2014 19:08:49 +0100 From: Cristian Dumitrescu To: dev@dpdk.org Date: Wed, 4 Jun 2014 19:08:23 +0100 Message-Id: <1401905319-8882-8-git-send-email-cristian.dumitrescu@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: <1401905319-8882-1-git-send-email-cristian.dumitrescu@intel.com> References: <1401905319-8882-1-git-send-email-cristian.dumitrescu@intel.com> Subject: [dpdk-dev] [v2 07/23] Packet Framework librte_port: IPv4 reassembly X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jun 2014 18:08:52 -0000 The IPv4 reassembly operation is presented as a Packet Framework port. The code duplication with examples/ip_reassembly sample application to be addressed soon by linking the relevant library once upstreamed. Signed-off-by: Cristian Dumitrescu --- lib/librte_port/ipv4_frag_tbl.h | 403 ++++++++++++++++++++++++++++++++++++ lib/librte_port/ipv4_rsmbl.h | 429 +++++++++++++++++++++++++++++++++++++++ lib/librte_port/rte_port_ras.c | 256 +++++++++++++++++++++++ lib/librte_port/rte_port_ras.h | 83 ++++++++ 4 files changed, 1171 insertions(+), 0 deletions(-) create mode 100644 lib/librte_port/ipv4_frag_tbl.h create mode 100644 lib/librte_port/ipv4_rsmbl.h create mode 100644 lib/librte_port/rte_port_ras.c create mode 100644 lib/librte_port/rte_port_ras.h diff --git a/lib/librte_port/ipv4_frag_tbl.h b/lib/librte_port/ipv4_frag_tbl.h new file mode 100644 index 0000000..c44863b --- /dev/null +++ b/lib/librte_port/ipv4_frag_tbl.h @@ -0,0 +1,403 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _IPV4_FRAG_TBL_H_ +#define _IPV4_FRAG_TBL_H_ + +/** + * @file + * IPv4 fragments table. + * + * Implementation of IPv4 fragment table create/destroy/find/update. + * + */ + +/* + * The ipv4_frag_tbl is a simple hash table: + * The basic idea is to use two hash functions and + * associativity. This provides 2 * possible locations in + * the hash table for each key. Sort of simplified Cuckoo hashing, + * when the collision occurs and all 2 * are occupied, + * instead of resinserting existing keys into alternative locations, we just + * return a faiure. + * Another thing timing: entries that resides in the table longer then + * are considered as invalid, and could be removed/replaced + * by the new ones. + * pair is stored together, all add/update/lookup opearions are not + * MT safe. + */ + +#include +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2 +#include +#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */ + +#define PRIME_VALUE 0xeaad8405 + +TAILQ_HEAD(ipv4_pkt_list, ipv4_frag_pkt); + +struct ipv4_frag_tbl_stat { + uint64_t find_num; /* total # of find/insert attempts. */ + uint64_t add_num; /* # of add ops. */ + uint64_t del_num; /* # of del ops. */ + uint64_t reuse_num; /* # of reuse (del/add) ops. */ + uint64_t fail_total; /* total # of add failures. */ + uint64_t fail_nospace; /* # of 'no space' add failures. */ +} __rte_cache_aligned; + +struct ipv4_frag_tbl { + uint64_t max_cycles; /* ttl for table entries. */ + uint32_t entry_mask; /* hash value mask. */ + uint32_t max_entries; /* max entries allowed. */ + uint32_t use_entries; /* entries in use. */ + uint32_t bucket_entries; /* hash assocaitivity. */ + uint32_t nb_entries; /* total size of the table. */ + uint32_t nb_buckets; /* num of associativity lines. */ + struct ipv4_frag_pkt *last; /* last used entry. */ + struct ipv4_pkt_list lru; /* LRU list for table entries. */ + struct ipv4_frag_tbl_stat stat; /* statistics counters. */ + struct ipv4_frag_pkt pkt[0]; /* hash table. */ +}; + +#define IPV4_FRAG_TBL_POS(tbl, sig) \ + ((tbl)->pkt + ((sig) & (tbl)->entry_mask)) + +#define IPV4_FRAG_HASH_FNUM 2 + +#ifdef IPV4_FRAG_TBL_STAT +#define IPV4_FRAG_TBL_STAT_UPDATE(s, f, v) ((s)->f += (v)) +#else +#define IPV4_FRAG_TBL_STAT_UPDATE(s, f, v) do {} while (0) +#endif /* IPV4_FRAG_TBL_STAT */ + +static inline void +ipv4_frag_hash(const struct ipv4_frag_key *key, uint32_t *v1, uint32_t *v2) +{ + uint32_t v; + const uint32_t *p; + + p = (const uint32_t *)&key->src_dst; + +#ifdef RTE_MACHINE_CPUFLAG_SSE4_2 + v = rte_hash_crc_4byte(p[0], PRIME_VALUE); + v = rte_hash_crc_4byte(p[1], v); + v = rte_hash_crc_4byte(key->id, v); +#else + + v = rte_jhash_3words(p[0], p[1], key->id, PRIME_VALUE); +#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */ + + *v1 = v; + *v2 = (v << 7) + (v >> 14); +} + +/* + * Update the table, after we finish processing it's entry. + */ +static inline void +ipv4_frag_inuse(struct ipv4_frag_tbl *tbl, const struct ipv4_frag_pkt *fp) +{ + if (IPV4_FRAG_KEY_EMPTY(&fp->key)) { + TAILQ_REMOVE(&tbl->lru, fp, lru); + tbl->use_entries--; + } +} + +/* + * For the given key, try to find an existing entry. + * If such entry doesn't exist, will return free and/or timed-out entry, + * that can be used for that key. + */ +static inline struct ipv4_frag_pkt * +ipv4_frag_lookup(struct ipv4_frag_tbl *tbl, + const struct ipv4_frag_key *key, uint64_t tms, + struct ipv4_frag_pkt **free, struct ipv4_frag_pkt **stale) +{ + struct ipv4_frag_pkt *p1, *p2; + struct ipv4_frag_pkt *empty, *old; + uint64_t max_cycles; + uint32_t i, assoc, sig1, sig2; + + empty = NULL; + old = NULL; + + max_cycles = tbl->max_cycles; + assoc = tbl->bucket_entries; + + if (tbl->last != NULL && IPV4_FRAG_KEY_CMP(&tbl->last->key, key) == 0) + return tbl->last; + + ipv4_frag_hash(key, &sig1, &sig2); + p1 = IPV4_FRAG_TBL_POS(tbl, sig1); + p2 = IPV4_FRAG_TBL_POS(tbl, sig2); + + for (i = 0; i != assoc; i++) { + IPV4_FRAG_LOG(DEBUG, "%s:%d:\n" + "tbl: %p, max_entries: %u, use_entries: %u\n" + "ipv4_frag_pkt line0: %p, index: %u from %u\n" + "key: <%" PRIx64 ", %#x>, start: %" PRIu64 "\n", + __func__, __LINE__, + tbl, tbl->max_entries, tbl->use_entries, + p1, i, assoc, + p1[i].key.src_dst, p1[i].key.id, p1[i].start); + + if (IPV4_FRAG_KEY_CMP(&p1[i].key, key) == 0) + return (p1 + i); + else if (IPV4_FRAG_KEY_EMPTY(&p1[i].key)) + empty = (empty == NULL) ? (p1 + i) : empty; + else if (max_cycles + p1[i].start < tms) + old = (old == NULL) ? (p1 + i) : old; + + IPV4_FRAG_LOG(DEBUG, "%s:%d:\n" + "tbl: %p, max_entries: %u, use_entries: %u\n" + "ipv4_frag_pkt line1: %p, index: %u from %u\n" + "key: <%" PRIx64 ", %#x>, start: %" PRIu64 "\n", + __func__, __LINE__, + tbl, tbl->max_entries, tbl->use_entries, + p2, i, assoc, + p2[i].key.src_dst, p2[i].key.id, p2[i].start); + + if (IPV4_FRAG_KEY_CMP(&p2[i].key, key) == 0) + return (p2 + i); + else if (IPV4_FRAG_KEY_EMPTY(&p2[i].key)) + empty = (empty == NULL) ? (p2 + i) : empty; + else if (max_cycles + p2[i].start < tms) + old = (old == NULL) ? (p2 + i) : old; + } + + *free = empty; + *stale = old; + return NULL; +} + +static inline void +ipv4_frag_tbl_del(struct ipv4_frag_tbl *tbl, struct ipv4_frag_death_row *dr, + struct ipv4_frag_pkt *fp) +{ + ipv4_frag_free(fp, dr); + IPV4_FRAG_KEY_INVALIDATE(&fp->key); + TAILQ_REMOVE(&tbl->lru, fp, lru); + tbl->use_entries--; + IPV4_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1); +} + +static inline void +ipv4_frag_tbl_add(struct ipv4_frag_tbl *tbl, struct ipv4_frag_pkt *fp, + const struct ipv4_frag_key *key, uint64_t tms) +{ + fp->key = key[0]; + ipv4_frag_reset(fp, tms); + TAILQ_INSERT_TAIL(&tbl->lru, fp, lru); + tbl->use_entries++; + IPV4_FRAG_TBL_STAT_UPDATE(&tbl->stat, add_num, 1); +} + +static inline void +ipv4_frag_tbl_reuse(struct ipv4_frag_tbl *tbl, struct ipv4_frag_death_row *dr, + struct ipv4_frag_pkt *fp, uint64_t tms) +{ + ipv4_frag_free(fp, dr); + ipv4_frag_reset(fp, tms); + TAILQ_REMOVE(&tbl->lru, fp, lru); + TAILQ_INSERT_TAIL(&tbl->lru, fp, lru); + IPV4_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1); +} + +/* + * Find an entry in the table for the corresponding fragment. + * If such entry is not present, then allocate a new one. + * If the entry is stale, then free and reuse it. + */ +static inline struct ipv4_frag_pkt * +ipv4_frag_find(struct ipv4_frag_tbl *tbl, struct ipv4_frag_death_row *dr, + const struct ipv4_frag_key *key, uint64_t tms) +{ + struct ipv4_frag_pkt *pkt, *free, *stale, *lru; + uint64_t max_cycles; + + /* + * Actually the two line below are totally redundant. + * they are here, just to make gcc 4.6 happy. + */ + free = NULL; + stale = NULL; + max_cycles = tbl->max_cycles; + + IPV4_FRAG_TBL_STAT_UPDATE(&tbl->stat, find_num, 1); + + pkt = ipv4_frag_lookup(tbl, key, tms, &free, &stale); + if (pkt == NULL) { + + /*timed-out entry, free and invalidate it*/ + if (stale != NULL) { + ipv4_frag_tbl_del(tbl, dr, stale); + free = stale; + + /* + * we found a free entry, check if we can use it. + * If we run out of free entries in the table, then + * check if we have a timed out entry to delete. + */ + } else if (free != NULL && + tbl->max_entries <= tbl->use_entries) { + lru = TAILQ_FIRST(&tbl->lru); + if (max_cycles + lru->start < tms) { + ipv4_frag_tbl_del(tbl, dr, lru); + } else { + free = NULL; + IPV4_FRAG_TBL_STAT_UPDATE(&tbl->stat, + fail_nospace, 1); + } + } + + /* found a free entry to reuse. */ + if (free != NULL) { + ipv4_frag_tbl_add(tbl, free, key, tms); + pkt = free; + } + + /* + * we found the flow, but it is already timed out, + * so free associated resources, reposition it in the LRU list, + * and reuse it. + */ + } else if (max_cycles + pkt->start < tms) { + ipv4_frag_tbl_reuse(tbl, dr, pkt, tms); + } + + IPV4_FRAG_TBL_STAT_UPDATE(&tbl->stat, fail_total, (pkt == NULL)); + + tbl->last = pkt; + return pkt; +} + +/* + * Create a new IPV4 Frag table. + * @param bucket_num + * Number of buckets in the hash table. + * @param bucket_entries + * Number of entries per bucket (e.g. hash associativity). + * Should be power of two. + * @param max_entries + * Maximum number of entries that could be stored in the table. + * The value should be less or equal then bucket_num * bucket_entries. + * @param max_cycles + * Maximum TTL in cycles for each fragmented packet. + * @param socket_id + * The *socket_id* argument is the socket identifier in the case of + * NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA constraints. + * @return + * The pointer to the new allocated mempool, on success. NULL on error. + */ +static struct ipv4_frag_tbl * +ipv4_frag_tbl_create(uint32_t bucket_num, uint32_t bucket_entries, + uint32_t max_entries, uint64_t max_cycles, int socket_id) +{ + struct ipv4_frag_tbl *tbl; + size_t sz; + uint64_t nb_entries; + + nb_entries = rte_align32pow2(bucket_num); + nb_entries *= bucket_entries; + nb_entries *= IPV4_FRAG_HASH_FNUM; + + /* check input parameters. */ + if (rte_is_power_of_2(bucket_entries) == 0 || + nb_entries > UINT32_MAX || nb_entries == 0 || + nb_entries < max_entries) { + RTE_LOG(ERR, USER1, "%s: invalid input parameter\n", __func__); + return NULL; + } + + sz = sizeof(*tbl) + nb_entries * sizeof(tbl->pkt[0]); + tbl = rte_zmalloc_socket(__func__, sz, CACHE_LINE_SIZE, socket_id); + if (tbl == NULL) { + RTE_LOG(ERR, USER1, + "%s: allocation of %zu bytes at socket %d failed do\n", + __func__, sz, socket_id); + return NULL; + } + + RTE_LOG(INFO, USER1, "%s: allocated of %zu bytes at socket %d\n", + __func__, sz, socket_id); + + tbl->max_cycles = max_cycles; + tbl->max_entries = max_entries; + tbl->nb_entries = (uint32_t)nb_entries; + tbl->nb_buckets = bucket_num; + tbl->bucket_entries = bucket_entries; + tbl->entry_mask = (tbl->nb_entries - 1) & ~(tbl->bucket_entries - 1); + + TAILQ_INIT(&(tbl->lru)); + return tbl; +} + +static inline void +ipv4_frag_tbl_destroy(struct ipv4_frag_tbl *tbl) +{ + rte_free(tbl); +} + +#if 0 + +static void +ipv4_frag_tbl_dump_stat(FILE *f, const struct ipv4_frag_tbl *tbl) +{ + uint64_t fail_total, fail_nospace; + + fail_total = tbl->stat.fail_total; + fail_nospace = tbl->stat.fail_nospace; + + fprintf(f, "max entries:\t%u;\n" + "entries in use:\t%u;\n" + "finds/inserts:\t%" PRIu64 ";\n" + "entries added:\t%" PRIu64 ";\n" + "entries deleted by timeout:\t%" PRIu64 ";\n" + "entries reused by timeout:\t%" PRIu64 ";\n" + "total add failures:\t%" PRIu64 ";\n" + "add no-space failures:\t%" PRIu64 ";\n" + "add hash-collisions failures:\t%" PRIu64 ";\n", + tbl->max_entries, + tbl->use_entries, + tbl->stat.find_num, + tbl->stat.add_num, + tbl->stat.del_num, + tbl->stat.reuse_num, + fail_total, + fail_nospace, + fail_total - fail_nospace); +} + +#endif + +#endif /* _IPV4_FRAG_TBL_H_ */ diff --git a/lib/librte_port/ipv4_rsmbl.h b/lib/librte_port/ipv4_rsmbl.h new file mode 100644 index 0000000..f6cf963 --- /dev/null +++ b/lib/librte_port/ipv4_rsmbl.h @@ -0,0 +1,429 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _IPV4_RSMBL_H_ +#define _IPV4_RSMBL_H_ + +#include + +/** + * @file + * IPv4 reassemble + * + * Implementation of IPv4 reassemble. + * + */ + +#define MAX_PKT_BURST 64 + +enum { + LAST_FRAG_IDX, + FIRST_FRAG_IDX, + MIN_FRAG_NUM, + MAX_FRAG_NUM = 4, +}; + +struct ipv4_frag { + uint16_t ofs; + uint16_t len; + struct rte_mbuf *mb; +}; + +/* + * Use to uniquely indetify fragmented datagram. + */ +struct ipv4_frag_key { + uint64_t src_dst; + uint32_t id; +}; + +#define IPV4_FRAG_KEY_INVALIDATE(k) ((k)->src_dst = 0) +#define IPV4_FRAG_KEY_EMPTY(k) ((k)->src_dst == 0) + +#define IPV4_FRAG_KEY_CMP(k1, k2) \ + (((k1)->src_dst ^ (k2)->src_dst) | ((k1)->id ^ (k2)->id)) + + +/* + * Fragmented packet to reassemble. + * First two entries in the frags[] array are for the last and first fragments. + */ +struct ipv4_frag_pkt { + TAILQ_ENTRY(ipv4_frag_pkt) lru; /* LRU list */ + struct ipv4_frag_key key; + uint64_t start; /* creation timestamp */ + uint32_t total_size; /* expected reassembled size */ + uint32_t frag_size; /* size of fragments received */ + uint32_t last_idx; /* index of next entry to fill */ + struct ipv4_frag frags[MAX_FRAG_NUM]; +} __rte_cache_aligned; + + +struct ipv4_frag_death_row { + uint32_t cnt; + struct rte_mbuf *row[MAX_PKT_BURST * (MAX_FRAG_NUM + 1)]; +}; + +#define IPV4_FRAG_MBUF2DR(dr, mb) ((dr)->row[(dr)->cnt++] = (mb)) + +/* logging macros. */ + +#ifdef IPV4_FRAG_DEBUG +#define IPV4_FRAG_LOG(lvl, fmt, args...) RTE_LOG(lvl, USER1, fmt, ##args) +#else +#define IPV4_FRAG_LOG(lvl, fmt, args...) do {} while (0) +#endif /* IPV4_FRAG_DEBUG */ + + +static inline void +ipv4_frag_reset(struct ipv4_frag_pkt *fp, uint64_t tms) +{ + static const struct ipv4_frag zero_frag = { + .ofs = 0, + .len = 0, + .mb = NULL, + }; + + fp->start = tms; + fp->total_size = UINT32_MAX; + fp->frag_size = 0; + fp->last_idx = MIN_FRAG_NUM; + fp->frags[LAST_FRAG_IDX] = zero_frag; + fp->frags[FIRST_FRAG_IDX] = zero_frag; +} + +static inline void +ipv4_frag_free(struct ipv4_frag_pkt *fp, struct ipv4_frag_death_row *dr) +{ + uint32_t i, k; + + k = dr->cnt; + for (i = 0; i != fp->last_idx; i++) { + if (fp->frags[i].mb != NULL) { + dr->row[k++] = fp->frags[i].mb; + fp->frags[i].mb = NULL; + } + } + + fp->last_idx = 0; + dr->cnt = k; +} + +static inline void +ipv4_frag_free_death_row(struct ipv4_frag_death_row *dr, uint32_t prefetch) +{ + uint32_t i, k, n; + + k = RTE_MIN(prefetch, dr->cnt); + n = dr->cnt; + + for (i = 0; i != k; i++) + rte_prefetch0(dr->row[i]); + + for (i = 0; i != n - k; i++) { + rte_prefetch0(dr->row[i + k]); + rte_pktmbuf_free(dr->row[i]); + } + + for (; i != n; i++) + rte_pktmbuf_free(dr->row[i]); + + dr->cnt = 0; +} + +/* + * Helper function. + * Takes 2 mbufs that represents two framents of the same packet and + * chains them into one mbuf. + */ +static inline void +ipv4_frag_chain(struct rte_mbuf *mn, struct rte_mbuf *mp) +{ + struct rte_mbuf *ms; + + /* adjust start of the last fragment data. */ + rte_pktmbuf_adj(mp, (uint16_t)(mp->pkt.vlan_macip.f.l2_len + + mp->pkt.vlan_macip.f.l3_len)); + + /* chain two fragments. */ + ms = rte_pktmbuf_lastseg(mn); + ms->pkt.next = mp; + + /* accumulate number of segments and total length. */ + mn->pkt.nb_segs = (uint8_t)(mn->pkt.nb_segs + mp->pkt.nb_segs); + mn->pkt.pkt_len += mp->pkt.pkt_len; + + /* reset pkt_len and nb_segs for chained fragment. */ + mp->pkt.pkt_len = mp->pkt.data_len; + mp->pkt.nb_segs = 1; +} + +/* + * Reassemble fragments into one packet. + */ +static inline struct rte_mbuf * +ipv4_frag_reassemble(const struct ipv4_frag_pkt *fp) +{ + struct ipv4_hdr *ip_hdr; + struct rte_mbuf *m, *prev; + uint32_t i, n, ofs, first_len; + + first_len = fp->frags[FIRST_FRAG_IDX].len; + n = fp->last_idx - 1; + + /*start from the last fragment. */ + m = fp->frags[LAST_FRAG_IDX].mb; + ofs = fp->frags[LAST_FRAG_IDX].ofs; + + while (ofs != first_len) { + + prev = m; + + for (i = n; i != FIRST_FRAG_IDX && ofs != first_len; i--) { + + /* previous fragment found. */ + if (fp->frags[i].ofs + fp->frags[i].len == ofs) { + + ipv4_frag_chain(fp->frags[i].mb, m); + + /* update our last fragment and offset. */ + m = fp->frags[i].mb; + ofs = fp->frags[i].ofs; + } + } + + /* error - hole in the packet. */ + if (m == prev) + return NULL; + } + + /* chain with the first fragment. */ + ipv4_frag_chain(fp->frags[FIRST_FRAG_IDX].mb, m); + m = fp->frags[FIRST_FRAG_IDX].mb; + + /* update mbuf fields for reassembled packet. */ + m->ol_flags |= PKT_TX_IP_CKSUM; + + /* update ipv4 header for the reassmebled packet */ + ip_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, uint8_t *) + + m->pkt.vlan_macip.f.l2_len); + + ip_hdr->total_length = rte_cpu_to_be_16((uint16_t)(fp->total_size + + m->pkt.vlan_macip.f.l3_len)); + ip_hdr->fragment_offset = (uint16_t)(ip_hdr->fragment_offset & + rte_cpu_to_be_16(IPV4_HDR_DF_FLAG)); + ip_hdr->hdr_checksum = 0; + + return m; +} + +static inline struct rte_mbuf * +ipv4_frag_process(struct ipv4_frag_pkt *fp, struct ipv4_frag_death_row *dr, + struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags) +{ + uint32_t idx; + + fp->frag_size += len; + + /* this is the first fragment. */ + if (ofs == 0) { + idx = (fp->frags[FIRST_FRAG_IDX].mb == NULL) ? + FIRST_FRAG_IDX : UINT32_MAX; + + /* this is the last fragment. */ + } else if (more_frags == 0) { + fp->total_size = ofs + len; + idx = (fp->frags[LAST_FRAG_IDX].mb == NULL) ? + LAST_FRAG_IDX : UINT32_MAX; + + /* this is the intermediate fragment. */ + } else { + idx = fp->last_idx; + if (idx < sizeof(fp->frags) / sizeof(fp->frags[0])) + fp->last_idx++; + } + + /* + * errorneous packet: either exceeed max allowed number of fragments, + * or duplicate first/last fragment encountered. + */ + if (idx >= sizeof(fp->frags) / sizeof(fp->frags[0])) { + + /* report an error. */ + IPV4_FRAG_LOG(DEBUG, "%s:%d invalid fragmented packet:\n" + "ipv4_frag_pkt: %p, key: <%" PRIx64 ", %#x>, " + "total_size: %u, frag_size: %u, last_idx: %u\n" + "first fragment: ofs: %u, len: %u\n" + "last fragment: ofs: %u, len: %u\n\n", + __func__, __LINE__, + fp, fp->key.src_dst, fp->key.id, + fp->total_size, fp->frag_size, fp->last_idx, + fp->frags[FIRST_FRAG_IDX].ofs, + fp->frags[FIRST_FRAG_IDX].len, + fp->frags[LAST_FRAG_IDX].ofs, + fp->frags[LAST_FRAG_IDX].len); + + /* free all fragments, invalidate the entry. */ + ipv4_frag_free(fp, dr); + IPV4_FRAG_KEY_INVALIDATE(&fp->key); + IPV4_FRAG_MBUF2DR(dr, mb); + + return NULL; + } + + fp->frags[idx].ofs = ofs; + fp->frags[idx].len = len; + fp->frags[idx].mb = mb; + + mb = NULL; + + /* not all fragments are collected yet. */ + if (likely(fp->frag_size < fp->total_size)) { + return mb; + + /* if we collected all fragments, then try to reassemble. */ + } else if (fp->frag_size == fp->total_size && + fp->frags[FIRST_FRAG_IDX].mb != NULL) { + mb = ipv4_frag_reassemble(fp); + } + + /* errorenous set of fragments. */ + if (mb == NULL) { + + /* report an error. */ + IPV4_FRAG_LOG(DEBUG, "%s:%d invalid fragmented packet:\n" + "ipv4_frag_pkt: %p, key: <%" PRIx64 ", %#x>, " + "total_size: %u, frag_size: %u, last_idx: %u\n" + "first fragment: ofs: %u, len: %u\n" + "last fragment: ofs: %u, len: %u\n\n", + __func__, __LINE__, + fp, fp->key.src_dst, fp->key.id, + fp->total_size, fp->frag_size, fp->last_idx, + fp->frags[FIRST_FRAG_IDX].ofs, + fp->frags[FIRST_FRAG_IDX].len, + fp->frags[LAST_FRAG_IDX].ofs, + fp->frags[LAST_FRAG_IDX].len); + + /* free associated resources. */ + ipv4_frag_free(fp, dr); + } + + /* we are done with that entry, invalidate it. */ + IPV4_FRAG_KEY_INVALIDATE(&fp->key); + return mb; +} + +#include "ipv4_frag_tbl.h" + +/* + * Process new mbuf with fragment of IPV4 packet. + * Incoming mbuf should have it's l2_len/l3_len fields setuped correclty. + * @param tbl + * Table where to lookup/add the fragmented packet. + * @param mb + * Incoming mbuf with IPV4 fragment. + * @param tms + * Fragment arrival timestamp. + * @param ip_hdr + * Pointer to the IPV4 header inside the fragment. + * @param ip_ofs + * Fragment's offset (as extracted from the header). + * @param ip_flag + * Fragment's MF flag. + * @return + * Pointer to mbuf for reassebled packet, or NULL if: + * - an error occured. + * - not all fragments of the packet are collected yet. + */ +static inline struct rte_mbuf * +ipv4_frag_mbuf(struct ipv4_frag_tbl *tbl, struct ipv4_frag_death_row *dr, + struct rte_mbuf *mb, uint64_t tms, struct ipv4_hdr *ip_hdr, + uint16_t ip_ofs, uint16_t ip_flag) +{ + struct ipv4_frag_pkt *fp; + struct ipv4_frag_key key; + const uint64_t *psd; + uint16_t ip_len; + + psd = (uint64_t *)&ip_hdr->src_addr; + key.src_dst = psd[0]; + key.id = ip_hdr->packet_id; + + ip_ofs *= IPV4_HDR_OFFSET_UNITS; + ip_len = (uint16_t)(rte_be_to_cpu_16(ip_hdr->total_length) - + mb->pkt.vlan_macip.f.l3_len); + + IPV4_FRAG_LOG(DEBUG, "%s:%d:\n" + "mbuf: %p, tms: %" PRIu64 + ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %u, flags: %#x\n" + "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, " + "max_entries: %u, use_entries: %u\n\n", + __func__, __LINE__, + mb, tms, key.src_dst, key.id, ip_ofs, ip_len, ip_flag, + tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries, + tbl->use_entries); + + /* try to find/add entry into the fragment's table. */ + fp = ipv4_frag_find(tbl, dr, &key, tms); + if (fp == NULL) { + IPV4_FRAG_MBUF2DR(dr, mb); + return NULL; + } + + IPV4_FRAG_LOG(DEBUG, "%s:%d:\n" + "tbl: %p, max_entries: %u, use_entries: %u\n" + "ipv4_frag_pkt: %p, key: <%" PRIx64 ", %#x>, start: %" PRIu64 + ", total_size: %u, frag_size: %u, last_idx: %u\n\n", + __func__, __LINE__, + tbl, tbl->max_entries, tbl->use_entries, + fp, fp->key.src_dst, fp->key.id, fp->start, + fp->total_size, fp->frag_size, fp->last_idx); + + /* process the fragmented packet. */ + mb = ipv4_frag_process(fp, dr, mb, ip_ofs, ip_len, ip_flag); + ipv4_frag_inuse(tbl, fp); + + IPV4_FRAG_LOG(DEBUG, "%s:%d:\n" + "mbuf: %p\n" + "tbl: %p, max_entries: %u, use_entries: %u\n" + "ipv4_frag_pkt: %p, key: <%" PRIx64 ", %#x>, start: %" PRIu64 + ", total_size: %u, frag_size: %u, last_idx: %u\n\n", + __func__, __LINE__, mb, + tbl, tbl->max_entries, tbl->use_entries, + fp, fp->key.src_dst, fp->key.id, fp->start, + fp->total_size, fp->frag_size, fp->last_idx); + + return mb; +} + +#endif /* _IPV4_RSMBL_H_ */ diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c new file mode 100644 index 0000000..60f33b5 --- /dev/null +++ b/lib/librte_port/rte_port_ras.c @@ -0,0 +1,256 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include + +#include +#include +#include +#include +#include +#include + +#include "rte_port_ras.h" +#include "ipv4_rsmbl.h" + +#ifndef IPV4_RAS_N_BUCKETS +#define IPV4_RAS_N_BUCKETS 4094 +#endif + +#ifndef IPV4_RAS_N_ENTRIES_PER_BUCKET +#define IPV4_RAS_N_ENTRIES_PER_BUCKET 8 +#endif + +#ifndef IPV4_RAS_N_ENTRIES +#define IPV4_RAS_N_ENTRIES (IPV4_RAS_N_BUCKETS * IPV4_RAS_N_ENTRIES_PER_BUCKET) +#endif + +struct rte_port_ring_writer_ipv4_ras { + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; + struct rte_ring *ring; + uint32_t tx_burst_sz; + uint32_t tx_buf_count; + struct ipv4_frag_tbl *frag_tbl; + struct ipv4_frag_death_row death_row; +}; + +static void * +rte_port_ring_writer_ipv4_ras_create(void *params, int socket_id) +{ + struct rte_port_ring_writer_ipv4_ras_params *conf = + (struct rte_port_ring_writer_ipv4_ras_params *) params; + struct rte_port_ring_writer_ipv4_ras *port; + uint64_t frag_cycles; + + /* Check input parameters */ + if (conf == NULL) { + RTE_LOG(ERR, PORT, "%s: Parameter conf is NULL\n", __func__); + return NULL; + } + if (conf->ring == NULL) { + RTE_LOG(ERR, PORT, "%s: Parameter ring is NULL\n", __func__); + return NULL; + } + if ((conf->tx_burst_sz == 0) || + (conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) { + RTE_LOG(ERR, PORT, "%s: Parameter tx_burst_sz is invalid\n", + __func__); + return NULL; + } + + /* Memory allocation */ + port = rte_zmalloc_socket("PORT", sizeof(*port), + CACHE_LINE_SIZE, socket_id); + if (port == NULL) { + RTE_LOG(ERR, PORT, "%s: Failed to allocate socket\n", __func__); + return NULL; + } + + /* Create fragmentation table */ + frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * MS_PER_S; + frag_cycles *= 100; + + port->frag_tbl = ipv4_frag_tbl_create( + IPV4_RAS_N_BUCKETS, + IPV4_RAS_N_ENTRIES_PER_BUCKET, + IPV4_RAS_N_ENTRIES, + frag_cycles, + socket_id); + + if (port->frag_tbl == NULL) { + RTE_LOG(ERR, PORT, "%s: ipv4_frag_tbl_create failed\n", + __func__); + rte_free(port); + return NULL; + } + + /* Initialization */ + port->ring = conf->ring; + port->tx_burst_sz = conf->tx_burst_sz; + port->tx_buf_count = 0; + + return port; +} + +static inline void +send_burst(struct rte_port_ring_writer_ipv4_ras *p) +{ + uint32_t nb_tx; + + nb_tx = rte_ring_sp_enqueue_burst(p->ring, (void **)p->tx_buf, + p->tx_buf_count); + + for ( ; nb_tx < p->tx_buf_count; nb_tx++) + rte_pktmbuf_free(p->tx_buf[nb_tx]); + + p->tx_buf_count = 0; +} + +static inline void +process_one(struct rte_port_ring_writer_ipv4_ras *p, struct rte_mbuf *pkt) +{ + /* Assume there is no ethernet header */ + struct ipv4_hdr *pkt_hdr = (struct ipv4_hdr *) + (rte_pktmbuf_mtod(pkt, unsigned char *)); + + /* Get "Do not fragment" flag and fragment offset */ + uint16_t frag_field = rte_be_to_cpu_16(pkt_hdr->fragment_offset); + uint16_t frag_offset = (uint16_t)(frag_field & IPV4_HDR_OFFSET_MASK); + uint16_t frag_flag = (uint16_t)(frag_field & IPV4_HDR_MF_FLAG); + + /* If it is a fragmented packet, then try to reassemble */ + if ((frag_flag == 0) && (frag_offset == 0)) + p->tx_buf[p->tx_buf_count++] = pkt; + else { + struct rte_mbuf *mo; + struct ipv4_frag_tbl *tbl = p->frag_tbl; + struct ipv4_frag_death_row *dr = &p->death_row; + + /* Process this fragment */ + mo = ipv4_frag_mbuf(tbl, dr, pkt, rte_rdtsc(), pkt_hdr, + frag_offset, frag_flag); + if (mo != NULL) + p->tx_buf[p->tx_buf_count++] = mo; + + ipv4_frag_free_death_row(&p->death_row, 3); + } +} + +static int +rte_port_ring_writer_ipv4_ras_tx(void *port, struct rte_mbuf *pkt) +{ + struct rte_port_ring_writer_ipv4_ras *p = + (struct rte_port_ring_writer_ipv4_ras *) port; + + process_one(p, pkt); + if (p->tx_buf_count >= p->tx_burst_sz) + send_burst(p); + + return 0; +} + +static int +rte_port_ring_writer_ipv4_ras_tx_bulk(void *port, + struct rte_mbuf **pkts, + uint64_t pkts_mask) +{ + struct rte_port_ring_writer_ipv4_ras *p = + (struct rte_port_ring_writer_ipv4_ras *) port; + + if ((pkts_mask & (pkts_mask + 1)) == 0) { + uint64_t n_pkts = __builtin_popcountll(pkts_mask); + uint32_t i; + + for (i = 0; i < n_pkts; i++) { + struct rte_mbuf *pkt = pkts[i]; + + process_one(p, pkt); + if (p->tx_buf_count >= p->tx_burst_sz) + send_burst(p); + } + } else { + for ( ; pkts_mask; ) { + uint32_t pkt_index = __builtin_ctzll(pkts_mask); + uint64_t pkt_mask = 1LLU << pkt_index; + struct rte_mbuf *pkt = pkts[pkt_index]; + + process_one(p, pkt); + if (p->tx_buf_count >= p->tx_burst_sz) + send_burst(p); + + pkts_mask &= ~pkt_mask; + } + } + + return 0; +} + +static int +rte_port_ring_writer_ipv4_ras_flush(void *port) +{ + struct rte_port_ring_writer_ipv4_ras *p = + (struct rte_port_ring_writer_ipv4_ras *) port; + + if (p->tx_buf_count > 0) + send_burst(p); + + return 0; +} + +static int +rte_port_ring_writer_ipv4_ras_free(void *port) +{ + struct rte_port_ring_writer_ipv4_ras *p = + (struct rte_port_ring_writer_ipv4_ras *) port; + + if (port == NULL) { + RTE_LOG(ERR, PORT, "%s: Parameter port is NULL\n", __func__); + return -1; + } + + rte_port_ring_writer_ipv4_ras_flush(port); + ipv4_frag_tbl_destroy(p->frag_tbl); + rte_free(port); + + return 0; +} + +/* + * Summary of port operations + */ +struct rte_port_out_ops rte_port_ring_writer_ipv4_ras_ops = { + .f_create = rte_port_ring_writer_ipv4_ras_create, + .f_free = rte_port_ring_writer_ipv4_ras_free, + .f_tx = rte_port_ring_writer_ipv4_ras_tx, + .f_tx_bulk = rte_port_ring_writer_ipv4_ras_tx_bulk, + .f_flush = rte_port_ring_writer_ipv4_ras_flush, +}; diff --git a/lib/librte_port/rte_port_ras.h b/lib/librte_port/rte_port_ras.h new file mode 100644 index 0000000..c6ed688 --- /dev/null +++ b/lib/librte_port/rte_port_ras.h @@ -0,0 +1,83 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __INCLUDE_RTE_PORT_RAS_H__ +#define __INCLUDE_RTE_PORT_RAS_H__ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * @file + * RTE Port for IPv4 Reassembly + * + * This port is built on top of pre-initialized single producer rte_ring. In + * order to minimize the amount of packets stored in the ring at any given + * time, the IP reassembly functionality is executed on ring write operation, + * hence this port is implemented as an output port. A regular ring_reader port + * can be created to read from the same ring. + * + * The packets written to the ring are either complete IP datagrams or IP + * fragments. The packets read from the ring are all complete IP datagrams, + * either jumbo frames (i.e. IP packets with length bigger than MTU) or not. + * The complete IP datagrams written to the ring are not changed. The IP + * fragments written to the ring are first reassembled and into complete IP + * datagrams or dropped on error or IP reassembly time-out. + * + ***/ + +#include + +#include + +#include "rte_port.h" + +/** ring_writer_ipv4_ras port parameters */ +struct rte_port_ring_writer_ipv4_ras_params { + /** Underlying single consumer ring that has to be pre-initialized. */ + struct rte_ring *ring; + + /** Recommended burst size to ring. The actual burst size can be bigger + or smaller than this value. */ + uint32_t tx_burst_sz; +}; + +/** ring_writer_ipv4_ras port operations */ +extern struct rte_port_out_ops rte_port_ring_writer_ipv4_ras_ops; + +#ifdef __cplusplus +} +#endif + +#endif -- 1.7.7.6