From: Alex Kiselev <alex@therouter.net>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2 2/2] librte_ip_frag: add mbuf counter
Date: Wed, 18 Jul 2018 10:33:20 +0300 [thread overview]
Message-ID: <1208098598.20180718103320@therouter.net> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258DA90E731@irsmsx105.ger.corp.intel.com>
Hi Konstantin.
> Hi Alex,
> Sorry for delay in reply.
>> >> There might be situations (kind of attack when a lot of
>> >> fragmented packets are sent to a dpdk application in order
>> >> to flood the fragmentation table) when no additional mbufs
>> >> must be added to the fragmentations table since it already
>> >> contains to many of them. Currently there is no way to
>> >> determine the number of mbufs holded int the fragmentation
>> >> table. This patch allows to keep track of the number of mbufs
>> >> holded in the fragmentation table.
>> > I understand your intention, but still not sure it is worth it.
>> > My thought was that you can estimate by upper limit (num_entries * entries_per_bucket) or so.
>> No, I can't. The estimation error might be so big that there would be no difference at all.
> Not sure why? If you'll use upper limit, then worst thing could happen -
> you would start your table cleanup a bit earlier.
Since bucket size is 4, an estimation error might be 400%.
So, for example, if I want to setup the upper limit (max number mbufs that
can be stored in frag table) to 20% of all my available mbufs
I have to be ready that 80% of all mbufs might end up in a frag table
(every bucket is full). Or if I take into account bucket size, and devide 20%
by 4 in order the number mbufs to be exactly 20% in the worse case when every bucket is full,
I could end up in the opposite border situation when exactly single mbuf is stored
in every bucket, so upper limit of mbufs would be 20 / 4 = 5%. Both ways are not
good since either you have to reserve extra mbufs just to correct estimation error
or you upper limit would to small and you will be dropping good fragments.
>> > Probably another way to account number of mbufs without changes in the lib -
>> > apply something like that(assuming that your fragmets are not multisegs):
>> > uint32_t mbuf_in_frag_table = 0;
>> > ....
>> n= dr->>cnt;
>> > mb = rte_ipv4_frag_reassemble_packet(...);
>> > if (mb != NULL)
>> > mbuf_in_frag_table += mb->nb_segs;
>> > mbuf_in_frag_table += dr->cnt - n + 1;
> Sorry, my bad, I think it should be
> mbuf_in_frag_table -= dr->cnt - n + 1;
>> > In theory that could be applied even if fragments might be multisegs, but for that,
>> > we'll need to change rte_ip_frag_free_death_row() to return total number of freed segments.
>> That should be a little bit more complicated wrapper code:
>> uint32_t mbuf_in_frag_table = 0;
>> ....
>> n= dr->cnt;
>> reassembled_mbuf = rte_ipv4_frag_reassemble_packet(..., fragmented_mbuf, ...);
>> if (reassembled_mbuf == NULL)
>> mbuf_in_frag_table += fragmented_mbuf->nb_segs;
> We don't know for sure here.
> fragmented_mbuf could be in death row by now.
Yes. That's exactly why you have to keep track of
mbufs here and later after rte_ip_frag_free_death_row().
User have to think about frag table and death row as a single entity,
kind of a black box, since it's impossible to say where
(in the frag table or in the death row) your mbuf will be
after you call rte_ipv4_frag_reassemble_packet(). So, a caller/user should
keep track of mbuf on every border/interface of that black box.
One interface is rte_ipv4_frag_reassemble_packet and the other is
rte_ip_frag_free_death_row.
So, that's why it's easier to keep track of mbufs inside the library.
>> else
>> mbuf_in_frag_table -= reassembled_mbuf->nb_segs;
>> mbuf_in_frag_table += dr->cnt - n;
>> Also, in that case every rte_ip_frag_free_death_row() needs a wrapper code too.
>> n= dr->cnt;
>> rte_ip_frag_free_death_row(..)
>> mbuf_in_frag_table += dr->cnt - n;
> I don't think it is necessary.
> After packet is put in the death-row it is no longer in the table.
It's critical, since from a user point of view death row and frag table
is a black box due rte_ipv4_frag_reassemble_packet() doesn't indicate a caller
where his packet has been stored (in the frag table or death row).
> Konstantin
>> I think my approach is simplier.
>> > Konstantin
>> >> Signed-off-by: Alex Kiselev <alex@therouter.net>
>> >> ---
>> >> lib/librte_ip_frag/ip_frag_common.h | 16 +++++++++-------
>> >> lib/librte_ip_frag/ip_frag_internal.c | 16 +++++++++-------
>> >> lib/librte_ip_frag/rte_ip_frag.h | 18 +++++++++++++++++-
>> >> lib/librte_ip_frag/rte_ip_frag_common.c | 1 +
>> >> lib/librte_ip_frag/rte_ip_frag_version.map | 1 +
>> >> lib/librte_ip_frag/rte_ipv4_reassembly.c | 2 +-
>> >> lib/librte_ip_frag/rte_ipv6_reassembly.c | 2 +-
>> >> 7 files changed, 39 insertions(+), 17 deletions(-)
>> >> diff --git a/lib/librte_ip_frag/ip_frag_common.h b/lib/librte_ip_frag/ip_frag_common.h
>> >> index 0fdcc7d0f..9fe5c0559 100644
>> >> --- a/lib/librte_ip_frag/ip_frag_common.h
>> >> +++ b/lib/librte_ip_frag/ip_frag_common.h
>> >> @@ -32,15 +32,15 @@
>> >> #endif /* IP_FRAG_TBL_STAT */
>> >> /* internal functions declarations */
>> >> -struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
>> >> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
>> >> - uint16_t ofs, uint16_t len, uint16_t more_frags);
>> >> +struct rte_mbuf *ip_frag_process(struct rte_ip_frag_tbl *tbl,
>> >> + struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>> >> + struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags);
>> >> -struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
>> >> +struct ip_frag_pkt *ip_frag_find(struct rte_ip_frag_tbl *tbl,
>> >> struct rte_ip_frag_death_row *dr,
>> >> const struct ip_frag_key *key, uint64_t tms);
>> >> -struct ip_frag_pkt * ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
>> >> +struct ip_frag_pkt *ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
>> >> const struct ip_frag_key *key, uint64_t tms,
>> >> struct ip_frag_pkt **free, struct ip_frag_pkt **stale);
>> >> @@ -91,7 +91,8 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const struct ip_frag_key * k2)
>> >> /* put fragment on death row */
>> >> static inline void
>> >> -ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
>> >> +ip_frag_free(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
>> >> + struct rte_ip_frag_death_row *dr)
>> >> {
>> >> uint32_t i, k;
>> >> @@ -100,6 +101,7 @@ ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
>> >> if (fp->frags[i].mb != NULL) {
>> >> dr->row[k++] = fp->frags[i].mb;
>> >> fp->frags[i].mb = NULL;
>> >> + tbl->nb_mbufs--;
>> >> }
>> >> }
>> >> @@ -160,7 +162,7 @@ static inline void
>> >> ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr,
>> >> struct ip_frag_pkt *fp)
>> >> {
>> >> - ip_frag_free(fp, dr);
>> >> + ip_frag_free(tbl, fp, dr);
>> >> ip_frag_key_invalidate(&fp->key);
>> >> TAILQ_REMOVE(&tbl->lru, fp, lru);
>> >> tbl->use_entries--;
>> >> diff --git a/lib/librte_ip_frag/ip_frag_internal.c b/lib/librte_ip_frag/ip_frag_internal.c
>> >> index 97470a872..4c47d3fb4 100644
>> >> --- a/lib/librte_ip_frag/ip_frag_internal.c
>> >> +++ b/lib/librte_ip_frag/ip_frag_internal.c
>> >> @@ -29,14 +29,13 @@ static inline void
>> >> ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr,
>> >> struct ip_frag_pkt *fp, uint64_t tms)
>> >> {
>> >> - ip_frag_free(fp, dr);
>> >> + ip_frag_free(tbl, fp, dr);
>> >> ip_frag_reset(fp, tms);
>> >> TAILQ_REMOVE(&tbl->lru, fp, lru);
>> >> TAILQ_INSERT_TAIL(&tbl->lru, fp, lru);
>> >> IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1);
>> >> }
>> >> -
>> >> static inline void
>> >> ipv4_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
>> >> {
>> >> @@ -88,8 +87,9 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
>> >> }
>> >> struct rte_mbuf *
>> >> -ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>> >> - struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
>> >> +ip_frag_process(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
>> >> + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint16_t ofs,
>> >> + uint16_t len, uint16_t more_frags)
>> >> {
>> >> uint32_t idx;
>> >> @@ -147,7 +147,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>> >> fp->frags[IP_LAST_FRAG_IDX].len);
>> >> /* free all fragments, invalidate the entry. */
>> >> - ip_frag_free(fp, dr);
>> >> + ip_frag_free(tbl, fp, dr);
>> >> ip_frag_key_invalidate(&fp->key);
>> >> IP_FRAG_MBUF2DR(dr, mb);
>> >> @@ -157,6 +157,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>> >> fp->frags[idx].ofs = ofs;
>> >> fp->frags[idx].len = len;
>> >> fp->frags[idx].mb = mb;
>> >> + tbl->nb_mbufs++;
>> >> mb = NULL;
>> >> @@ -205,8 +206,9 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>> >> fp->frags[IP_LAST_FRAG_IDX].len);
>> >> /* free associated resources. */
>> >> - ip_frag_free(fp, dr);
>> >> - }
>> >> + ip_frag_free(tbl, fp, dr);
>> >> + } else
>> >> + tbl->nb_mbufs -= fp->last_idx;
>> >> /* we are done with that entry, invalidate it. */
>> >> ip_frag_key_invalidate(&fp->key);
>> >> diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
>> >> index 7f425f610..623934d87 100644
>> >> --- a/lib/librte_ip_frag/rte_ip_frag.h
>> >> +++ b/lib/librte_ip_frag/rte_ip_frag.h
>> >> @@ -96,6 +96,7 @@ struct rte_ip_frag_tbl {
>> >> uint32_t bucket_entries; /**< hash associativity. */
>> >> uint32_t nb_entries; /**< total size of the table. */
>> >> uint32_t nb_buckets; /**< num of associativity lines. */
>> >> + uint32_t nb_mbufs; /**< num of mbufs holded in the tbl. */
>> >> struct ip_frag_pkt *last; /**< last used entry. */
>> >> struct ip_pkt_list lru; /**< LRU list for table entries. */
>> >> struct ip_frag_tbl_stat stat; /**< statistics counters. */
>> >> @@ -329,8 +330,23 @@ void
>> >> rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl *tbl);
>> >> /**
>> >> - * Delete expired fragments
>> >> + * Number of mbufs holded in the fragmentation table.
>> >> + *
>> >> + * @param tbl
>> >> + * Fragmentation table
>> >> *
>> >> + * @return
>> >> + * Number of mbufs holded in the fragmentation table.
>> >> + */
>> >> +static inline uint32_t __rte_experimental
>> >> +rte_frag_table_mbuf_count(const struct rte_ip_frag_tbl *tbl)
>> >> +{
>> >> + return tbl->nb_mbufs;
>> >> +}
>> >> +
>> >> +/**
>> >> + * Delete expired fragments
>> >> + *
>> >> * @param tbl
>> >> * Table to delete expired fragments from
>> >> * @param dr
>> >> diff --git a/lib/librte_ip_frag/rte_ip_frag_common.c b/lib/librte_ip_frag/rte_ip_frag_common.c
>> >> index a23f6f24f..46c2df84a 100644
>> >> --- a/lib/librte_ip_frag/rte_ip_frag_common.c
>> >> +++ b/lib/librte_ip_frag/rte_ip_frag_common.c
>> >> @@ -75,6 +75,7 @@ rte_ip_frag_table_create(uint32_t bucket_num, uint32_t bucket_entries,
>> >> tbl->nb_buckets = bucket_num;
>> >> tbl->bucket_entries = bucket_entries;
>> >> tbl->entry_mask = (tbl->nb_entries - 1) & ~(tbl->bucket_entries - 1);
>> >> + tbl->nb_mbufs = 0;
>> >> TAILQ_INIT(&(tbl->lru));
>> >> return tbl;
>> >> diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map b/lib/librte_ip_frag/rte_ip_frag_version.map
>> >> index d40d5515f..f4700f460 100644
>> >> --- a/lib/librte_ip_frag/rte_ip_frag_version.map
>> >> +++ b/lib/librte_ip_frag/rte_ip_frag_version.map
>> >> @@ -23,4 +23,5 @@ EXPERIMENTAL {
>> >> global:
>> >> rte_frag_table_del_expired_entries;
>> >> + rte_frag_table_mbuf_count;
>> >> };
>> >> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c
>> >> index 4956b99ea..fbdfd860a 100644
>> >> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
>> >> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
>> >> @@ -146,7 +146,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
>> >> /* process the fragmented packet. */
>> >> - mb = ip_frag_process(fp, dr, mb, ip_ofs, ip_len, ip_flag);
>> >> + mb = ip_frag_process(tbl, fp, dr, mb, ip_ofs, ip_len, ip_flag);
>> >> ip_frag_inuse(tbl, fp);
>> >> IP_FRAG_LOG(DEBUG, "%s:%d:\n"
>> >> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
>> >> index db249fe60..dda5a57b7 100644
>> >> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
>> >> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
>> >> @@ -186,7 +186,7 @@ rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
>> >> /* process the fragmented packet. */
>> >> - mb = ip_frag_process(fp, dr, mb, ip_ofs, ip_len,
>> >> + mb = ip_frag_process(tbl, fp, dr, mb, ip_ofs, ip_len,
>> >> MORE_FRAGS(frag_hdr->frag_data));
>> >> ip_frag_inuse(tbl, fp);
>> >> --
>> >> 2.16.1.windows.1
>> --
>> Alex
--
Alex
next prev parent reply other threads:[~2018-07-18 7:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bdf25067-f7c2-4e04-a850-af9fb2ae0f2f@orsmsx104.amr.corp.intel.com>
2018-06-29 16:38 ` Ananyev, Konstantin
2018-06-29 17:46 ` Alex Kiselev
2018-07-17 14:07 ` Ananyev, Konstantin
2018-07-18 7:33 ` Alex Kiselev [this message]
2018-08-22 9:47 ` Alex Kiselev
2018-08-24 12:07 ` Ananyev, Konstantin
2018-06-04 10:13 Alex Kiselev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1208098598.20180718103320@therouter.net \
--to=alex@therouter.net \
--cc=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).