From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
Ilya Matveychikov <matvejchikov@gmail.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Hu, Jiayu" <jiayu.hu@intel.com>
Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching
Date: Thu, 7 Dec 2017 00:19:46 +0000 [thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772585FAC57C1@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <20171206151532.3abaf2fb@xeon-e3>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, December 6, 2017 11:16 PM
> To: Ilya Matveychikov <matvejchikov@gmail.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching
>
> On Wed, 6 Dec 2017 22:38:12 +0400
> Ilya Matveychikov <matvejchikov@gmail.com> wrote:
>
> > > On Dec 6, 2017, at 10:12 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > >
> > > On Wed, 6 Dec 2017 18:02:21 +0400
> > > Ilya Matveychikov <matvejchikov@gmail.com> wrote:
> > >
> > >> Hello all,
> > >>
> > >>
> > >> My question is about neighbor packet matching algorithm for TCP. Is it
> > >> correct to expect that IP packets should have continuous ID enumeration
> > >> (i.e. iph-next.id = iph-prev.id + 1)?
> > >
> > >
> > > No.
> > >
> > >> ~~~
> > >> lib/librte_gro/gro_tcp4.c:check_seq_option()
> > >> ...
> > >> /* check if the two packets are neighbors */
> > >> tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
> > >> if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
> > >> (ip_id == (item->ip_id + 1)))
> > >> /* append the new packet */
> > >> return 1;
> > >> else if (((sent_seq + tcp_dl) == item->sent_seq) &&
> > >> ((ip_id + item->nb_merged) == item->ip_id))
> > >> /* pre-pend the new packet */
> > >> return -1;
> > >> else
> > >> return 0;
> > >> ~~~
> > >>
> > >> As per RFC791:
> > >>
> > >> Identification: 16 bits
> > >>
> > >> An identifying value assigned by the sender to aid in assembling the
> > >> fragments of a datagram.
> > >
> > > The IP header id is meaningless in most TCP sessions.
> > > Good TCP implementations use PMTU discovery which sets the Don't Fragment bit.
> > > With DF, the IP id is unused (since no fragmentation).
> > > Many implementations just send 0 since generating unique IP id requires an
> > > atomic operation which is potential bottleneck.
> >
> > So, is my question correct and the code is wrong?
> >
>
> Yes. This code is wrong on several areas.
> * The ip_id on TCP flows is irrelevant.
> * packet should only be merged if TCP flags are the same.
>
>
> The author should look at Linux net/ipv4/tcp_offload.c
As I remember, linux GRO implementation *does* require that IP IDs
of the merging packets to be continuous.
net/ipv4/af_inet.c:
static struct sk_buff **inet_gro_receive(struct sk_buff **head,
struct sk_buff *skb)
{
...
id = ntohl(*(__be32 *)&iph->id);
flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
id >>= 16;
...
NAPI_GRO_CB(p)->flush_id =
((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
NAPI_GRO_CB(p)->flush |= flush;
....
And then at net/ipv4/tcp_offload.c:
struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
{
...
/* Include the IP ID check below from the inner most IP hdr */
flush = NAPI_GRO_CB(p)->flush | NAPI_GRO_CB(p)->flush_id;
...
if (flush || skb_gro_receive(head, skb)) {
...
The reason why we do need to check that IP ID is continuous -
DPDK GRO library doesn't strip off IPv4 header, instead it has to merge them into one.
If IP ID would be non-contiguous it is unclear which one should be to used.
By same reason packets with different IP/TCP options are not allowed.
So in that case GRO lib makes a decision that it isn't safe to merge these packets.
As I understand linux does pretty much the same.
Konstantin
next prev parent reply other threads:[~2017-12-07 0:19 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-06 14:02 Ilya Matveychikov
2017-12-06 18:12 ` Stephen Hemminger
2017-12-06 18:38 ` Ilya Matveychikov
2017-12-06 23:15 ` Stephen Hemminger
2017-12-07 0:19 ` Ananyev, Konstantin [this message]
2017-12-07 1:01 ` Stephen Hemminger
2017-12-07 7:04 ` Ilya Matveychikov
2017-12-07 8:31 ` Hu, Jiayu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2601191342CEEE43887BDE71AB9772585FAC57C1@irsmsx105.ger.corp.intel.com \
--to=konstantin.ananyev@intel.com \
--cc=dev@dpdk.org \
--cc=jiayu.hu@intel.com \
--cc=matvejchikov@gmail.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).