DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
	Ilya Matveychikov <matvejchikov@gmail.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Hu, Jiayu" <jiayu.hu@intel.com>
Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching
Date: Thu, 7 Dec 2017 00:19:46 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB9772585FAC57C1@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <20171206151532.3abaf2fb@xeon-e3>



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, December 6, 2017 11:16 PM
> To: Ilya Matveychikov <matvejchikov@gmail.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching
> 
> On Wed, 6 Dec 2017 22:38:12 +0400
> Ilya Matveychikov <matvejchikov@gmail.com> wrote:
> 
> > > On Dec 6, 2017, at 10:12 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > >
> > > On Wed, 6 Dec 2017 18:02:21 +0400
> > > Ilya Matveychikov <matvejchikov@gmail.com> wrote:
> > >
> > >> Hello all,
> > >>
> > >>
> > >> My question is about neighbor packet matching algorithm for TCP. Is it
> > >> correct to expect that IP packets should have continuous ID enumeration
> > >> (i.e. iph-next.id = iph-prev.id + 1)?
> > >
> > >
> > > No.
> > >
> > >> ~~~
> > >> lib/librte_gro/gro_tcp4.c:check_seq_option()
> > >> 	...
> > >> 	/* check if the two packets are neighbors */
> > >> 	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
> > >> 	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
> > >> 			(ip_id == (item->ip_id + 1)))
> > >> 		/* append the new packet */
> > >> 		return 1;
> > >> 	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
> > >> 			((ip_id + item->nb_merged) == item->ip_id))
> > >> 		/* pre-pend the new packet */
> > >> 		return -1;
> > >> 	else
> > >> 		return 0;
> > >> ~~~
> > >>
> > >> As per RFC791:
> > >>
> > >>  Identification:  16 bits
> > >>
> > >>    An identifying value assigned by the sender to aid in assembling the
> > >>    fragments of a datagram.
> > >
> > > The IP header id is meaningless in most TCP sessions.
> > > Good TCP implementations use PMTU discovery which sets the Don't Fragment bit.
> > > With DF, the IP id is unused (since no fragmentation).
> > > Many implementations just send 0 since generating unique IP id requires an
> > > atomic operation which is potential bottleneck.
> >
> > So, is my question correct and the code is wrong?
> >
> 
> Yes. This code is wrong on several areas.
> * The ip_id on TCP flows is irrelevant.
> * packet should only be merged if TCP flags are the same.
> 
> 
> The author should look at Linux net/ipv4/tcp_offload.c

As I remember, linux GRO implementation *does* require that IP IDs
of the merging packets to be continuous.

net/ipv4/af_inet.c:
static struct sk_buff **inet_gro_receive(struct sk_buff **head,
					 struct sk_buff *skb)
{
  	...
 	id = ntohl(*(__be32 *)&iph->id);
	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
	id >>= 16;

	...

	NAPI_GRO_CB(p)->flush_id =
			    ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
	NAPI_GRO_CB(p)->flush |= flush;
               ....

And then at net/ipv4/tcp_offload.c:
struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
{
	...
	/* Include the IP ID check below from the inner most IP hdr */
	flush = NAPI_GRO_CB(p)->flush | NAPI_GRO_CB(p)->flush_id;
	...
	if (flush || skb_gro_receive(head, skb)) {
 	...

The reason why we do need to check that IP ID is continuous - 
DPDK GRO library doesn't strip off IPv4 header, instead it has to merge them into one.
If IP ID would be non-contiguous it is unclear which one should be to used.
By same reason packets with different IP/TCP options are not allowed.
So in that case GRO lib makes a decision that it isn't safe to merge these packets.
As I understand linux does pretty much the same.
Konstantin 

 

  reply	other threads:[~2017-12-07  0:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-06 14:02 Ilya Matveychikov
2017-12-06 18:12 ` Stephen Hemminger
2017-12-06 18:38   ` Ilya Matveychikov
2017-12-06 23:15     ` Stephen Hemminger
2017-12-07  0:19       ` Ananyev, Konstantin [this message]
2017-12-07  1:01         ` Stephen Hemminger
2017-12-07  7:04           ` Ilya Matveychikov
2017-12-07  8:31           ` Hu, Jiayu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB9772585FAC57C1@irsmsx105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=jiayu.hu@intel.com \
    --cc=matvejchikov@gmail.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).