From: "Ouyang, Changchun" <changchun.ouyang@intel.com>
To: Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] vhost: Check descriptor number for vector Rx
Date: Mon, 27 Oct 2014 13:55:52 +0000 [thread overview]
Message-ID: <F52918179C57134FAEC9EA62FA2F96251187DBC7@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <1433643.g0MTQjq1Jf@xps13>
Hi Thomas,
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, October 27, 2014 4:46 PM
> To: Ouyang, Changchun
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: Check descriptor number for vector
> Rx
>
> 2014-10-25 00:48, Ouyang, Changchun:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2014-10-24 16:38, Ouyang Changchun:
> > > > For zero copy, it need check whether RX descriptor num meets the
> > > > least requirement when using vector PMD Rx function, and give user
> > > > more hints if it fails to meet the least requirement.
> > > [...]
> > > > --- a/examples/vhost/main.c
> > > > +++ b/examples/vhost/main.c
> > > > @@ -131,6 +131,10 @@
> > > > #define RTE_TEST_RX_DESC_DEFAULT_ZCP 32 /* legacy: 32, DPDK virt
> FE: 128. */
> > > > #define RTE_TEST_TX_DESC_DEFAULT_ZCP 64 /* legacy: 64, DPDK virt
> FE: 64. */
> > > >
> > > > +#ifdef RTE_IXGBE_INC_VECTOR
> > > > +#define VPMD_RX_BURST 32
> > > > +#endif
> > > > +
> > > > /* Get first 4 bytes in mbuf headroom. */ #define
> > > > MBUF_HEADROOM_UINT32(mbuf) (*(uint32_t *)((uint8_t *)(mbuf) \
> > > > + sizeof(struct rte_mbuf)))
> > > > @@ -792,6 +796,19 @@ us_vhost_parse_args(int argc, char **argv)
> > > > return -1;
> > > > }
> > > >
> > > > +#ifdef RTE_IXGBE_INC_VECTOR
> > > > + if ((zero_copy == 1) && (num_rx_descriptor <= VPMD_RX_BURST)) {
> > > > + RTE_LOG(INFO, VHOST_PORT,
> > > > + "The RX desc num: %d is too small for PMD to
> work\n"
> > > > + "properly, please enlarge it to bigger than %d if\n"
> > > > + "possible by the option: '--rx-desc-num
> <number>'\n"
> > > > + "One alternative is disabling
> RTE_IXGBE_INC_VECTOR\n"
> > > > + "in config file and rebuild the libraries.\n",
> > > > + num_rx_descriptor, VPMD_RX_BURST);
> > > > + return -1;
> > > > + }
> > > > +#endif
> > > > +
> > > > return 0;
> > > > }
> > >
> > > I feel there is a design problem here.
> > > An application shouldn't have to care about the underlying driver.
> >
> > For most of other applications, as their descriptor numbers are set as
> > big enough(1024 or so) , So there is no need to check the descriptor
> number at the early stage of running.
> >
> > But for vhost zero copy(note vhost one copy also has 1024 descriptor
> > number) has the default descriptor number of 32.
> > Why use 32?
> > because vhost zero copy implementation (working as backend) need
> > support dpdk based app which use pmd virtio-net driver, And also need
> support linux legacy virtio-net based application.
> > When it is the linux legacy virtio-net case, on one side the qemu has
> > hard code to confine the total virtio descriptor size to 256, On other side,
> legacy virtio use half of them as virtio header, and then only another half i.e.
> 128 descriptors are available to use as real buffer.
> >
> > In PMD mode, all HW descriptors need to be filled DMA address in the rx
> initial stage, otherwise there is probably exceptional in rx process.
> > Based on that, we need use really limited virtio buffer to fully fill
> > all hw descriptor DMA address, Or in other word, the available virtio
> > descriptor size will determine the total mbuf size and hw descriptor
> > size in the case of zero copy,
> >
> > Tune and find that 32 is the suitable value for vhost zero copy to work
> properly when it legacy linux virtio case.
> > Another factor to reduce the value to 32, is that mempool use ring to
> > accommodate the mbuf, it cost one to flag the ring head/tail, And there are
> some other overheads like temporary mbufs(size as RX_BURST) when rx.
> > Note that number descriptor should need power 2.
> >
> > Why the change occur at this moment?
> > Recently the default rx function is modified into vector RX function,
> > while it use non-vector mode (scalar mode) Rx previously, Vector RX
> function need more than 32 descriptor to work properly, but scalar mode RX
> hasn't this limitation.
> >
> > As the RX function is changeable(you can use vector mode or non-vector),
> and descriptor number can also be changed.
> > So here in the vhost app, check if they match to make sure all things could
> work normally, and give some hints if they don't match.
> >
> > Hope the above could make it a bit clearer. :-)
>
> Thank you for your explanation.
> Your fix shows that driver and application are tightly linked.
> It's a design issue. As I said:
> "An application shouldn't have to care about the underlying driver."
> I didn't dig enough in vhost to suggest a good fix but I'm sure someone could
> have an idea.
>
Agree with you, there is something linked between app and driver, but that's due to a few things:
1.Qume has hard code to confine the total vring size;
2.PMD driver need fully fill the dma address of descriptor at setup stage;
3.PMD driver use vector PMD as default path which require more than 32 in the burst RX;
4.Zero copy need use external buffer directly and set to HW descriptor dma address;
Except for item 3, everything could not be removed or ignored easily.
As for item 3, I don't know why we set vector PMD as default path, while perusing performance, but at the cost of flexibility,
Here is an example, vhost zero copy need check it and descriptor number. I am not sure if the default vector pmd path will affect other app or not.
On the other hand, without this check, and hint to user,
User will suffer from this, vhost can't receive packets any more after forwarding 31 packets or so,
and user don't know what really happen behind, also can't easily get the information of
their descriptor is not enough for vector pmd RX.
Then this is followed by painfully debugging and figuring out the issue.
While with this check and hint to user, things go smoothly.
Meanwhile we can waiting for other people's viewpoints.
Thanks
Changchun
next prev parent reply other threads:[~2014-10-27 13:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-24 8:38 Ouyang Changchun
2014-10-24 9:27 ` Thomas Monjalon
2014-10-25 0:48 ` Ouyang, Changchun
2014-10-27 8:46 ` Thomas Monjalon
2014-10-27 13:55 ` Ouyang, Changchun [this message]
2014-10-29 23:37 ` Xie, Huawei
2014-10-30 0:58 ` Ouyang, Changchun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F52918179C57134FAEC9EA62FA2F96251187DBC7@shsmsx102.ccr.corp.intel.com \
--to=changchun.ouyang@intel.com \
--cc=dev@dpdk.org \
--cc=thomas.monjalon@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).