From: Olivier Matz <olivier.matz@6wind.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org, maxime.coquelin@redhat.com, huawei.xie@intel.com,
stephen@networkplumber.org, "Tan,
Jianfeng" <jianfeng.tan@intel.com>,
Tomasz Kulasek <tomaszx.kulasek@intel.com>,
Thomas Monjalon <thomas.monjalon@6wind.com>,
Konstantin Ananyev <konstantin.ananyev@intel.com>
Subject: Re: [dpdk-dev] [PATCH 5/5] net/virtio: fix Tso when mbuf is shared
Date: Tue, 17 Jan 2017 12:18:25 +0100 [thread overview]
Message-ID: <20170117121825.3db48369@platinum> (raw)
In-Reply-To: <20170116064819.GL9770@yliu-dev.sh.intel.com>
Hi Yuanhan,
On Mon, 16 Jan 2017 14:48:19 +0800, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> On Mon, Jan 09, 2017 at 06:46:25PM +0100, Olivier Matz wrote:
> > The virtio specifications requires that the L4 checksum is set to
> > the pseudo header checksum. You can search for "pseudo header" in
> > the following doc:
> > http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.pdf
> >
> > Especially in 5.1.6.2.1, we can see that if we use the csum flag, we
> > must set the checksum to phdr, and if we do tso, we must set the
> > csum flag.
> >
> > We can check that this is really needed with Linux vhost by
> > replaying the test plan described at [1].
> >
> > [1] http://dpdk.org/ml/archives/dev/2016-October/048793.html
> >
> > If we add the following patch to disable the checksum fix (on top of
> > this patchset), the test1 "large packets (lro/tso)" won't work.
> >
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -224,6 +224,9 @@
> > uint32_t tmp;
> > int shared = 0;
> >
> > + if (1)
> > + return 0;
> > +
> > /* mbuf is write-only, we need to copy the headers in a
> > linear buffer */ if (unlikely(rte_pktmbuf_data_is_shared(m, 0,
> > hdrlen))) { shared = 1;
> >
> >
> > In one direction ("flow1" in the test desc), large packets are
> > transmitted from host on the ixgbe interface, and received by the
> > guest. Then, testpmd bridges the packet to the virtio interface. But
> > the packet is not received by the host.
>
> I hope I could have time to dig this further, since, honestly, I don't
> quite like this patch: it makes things un-maintainable.
Well, I'm not that proud of the patch, but that's the best solution
I've found. Nevertheless saying it makes things un-maintainable looks a
bit excessive to me :)
The option of reallocating a mbuf, copy and fix network headers in it
looks even more complex to me (that was my first approach).
> Besides that, I think we have similar issue with nic drivers. See the
> rte_net_intel_cksum_flags_prepare() function introduced at commit
> 4fb7e803eb1a ("ethdev: add Tx preparation").
Yes, that was discussed a bit. See [1] and the subsequent mails.
http://dpdk.org/ml/archives/dev/2016-December/051014.html
My opinion is that tx_burst() should not change the mbuf data, it's
always been like this. For Intel NICs, there is no issue since the DPDK
API is derived from Intel NICs API, so there is no fix to do in the
mbuf data.
For tx_prepare(), it's explicitly said that it can update the data.
If tx_prepare() becomes mandatory, it will naturally fix this issue
without modifying the driver, because the phdr csum calculation will be
done in tx_prepare().
An alternative is to mark this as a known issue for now, and wait until
tx_prepare() is mandatory.
> Cc more people here. And here is a quick background for them: NIC
> drivers doing TSO need change the mbuf (say, for cksum updating),
> however, as Stephen pointed out, we could not do that if the mbuf is
> shared: I don't see such checks in the driver code as well.
>
> > There are at least 2 options for this one:
> >
> > - try to use 2 different descriptors (the patch is probably harder,
> > and it may slow-down the case where ANY_LAYOUT is supported)
> >
> > - refuse to initialize with TSO enabled if ANY_LAYOUT is not
> > supported.
> >
> > If you think ANY_LAYOUT is most likely true today, we could choose
> > option 2. Let me know what's your preference here.
>
> Maybe we could go with a simpler one: COW. Yeah, it costs more, but
> this would be rare, that it should be OKay, right? Besides, we just
> need copy the heading mbuf.
Could you detail what you mean by COW in this context? Do you mean
reallocating a new mbuf? If yes, it's not only a problem of cost:
- There is no mempool pointer associated to tx queues, so we cannot
allocate a mbuf. Reusing a mempool pointer from the current mbuf looks
risky, because it can be a special pool, like a pool dedicated for
clones, without data.
- It makes allocation error quite hard to manage, it would require some
rework in tx functions.
Thanks for your review and the discussion. Let me know what you think.
Regards,
Olivier
next prev parent reply other threads:[~2017-01-17 11:18 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-24 8:56 [dpdk-dev] [PATCH 0/5] virtio/mbuf: fix virtio tso with shared mbufs Olivier Matz
2016-11-24 8:56 ` [dpdk-dev] [PATCH 1/5] mbuf: remove const attribute in mbuf read function Olivier Matz
2016-11-24 8:56 ` [dpdk-dev] [PATCH 2/5] mbuf: new helper to check if a mbuf is shared Olivier Matz
2016-11-24 8:56 ` [dpdk-dev] [PATCH 3/5] mbuf: new helper to write data in a mbuf chain Olivier Matz
2016-11-24 8:56 ` [dpdk-dev] [PATCH 4/5] mbuf: new helper to copy data from a mbuf Olivier Matz
2016-11-24 8:56 ` [dpdk-dev] [PATCH 5/5] net/virtio: fix Tso when mbuf is shared Olivier Matz
2016-12-14 7:27 ` Yuanhan Liu
2017-01-09 17:46 ` Olivier Matz
2017-01-16 6:48 ` Yuanhan Liu
2017-01-17 11:18 ` Olivier Matz [this message]
2017-01-18 5:03 ` Yuanhan Liu
2017-01-24 10:51 ` Olivier MATZ
2017-01-28 12:32 ` Yuanhan Liu
2017-01-09 17:59 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170117121825.3db48369@platinum \
--to=olivier.matz@6wind.com \
--cc=dev@dpdk.org \
--cc=huawei.xie@intel.com \
--cc=jianfeng.tan@intel.com \
--cc=konstantin.ananyev@intel.com \
--cc=maxime.coquelin@redhat.com \
--cc=stephen@networkplumber.org \
--cc=thomas.monjalon@6wind.com \
--cc=tomaszx.kulasek@intel.com \
--cc=yuanhan.liu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).