From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <thomas.monjalon@6wind.com>
Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com
 [209.85.212.172]) by dpdk.org (Postfix) with ESMTP id 5F0BC5A73
 for <dev@dpdk.org>; Fri, 27 Feb 2015 15:13:59 +0100 (CET)
Received: by wiwh11 with SMTP id h11so500724wiw.3
 for <dev@dpdk.org>; Fri, 27 Feb 2015 06:13:59 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:organization
 :user-agent:in-reply-to:references:mime-version
 :content-transfer-encoding:content-type;
 bh=Gwe8o6EmQm3G/r2RVVmmhhfpmmXH5qZMz292iBl4VUI=;
 b=R6RdtbNGNRRIbB2Y4aPg32dOfXxGvyi2SxzzwwgxAfsO+r/v7/6DFwr4/ttxQ3RwsV
 Za1KyJQlZVRYvRMNvzo1EV8rJPOg1JUlFf09nDYUqlcu38DRVjuUmyYPzDgk6j+6bpeg
 B0QB3fTTmiMm55M4QImaDMs1sKMBBJYd/4DxowvsQNtfqej3BdY39ec/1cfIFR5ikbFY
 RberhG69kcZi5Z07hQl72eeREcSKICrwH2ZSt1XFKrh7WTmlm9C7ayyxuRCenL+QcXrE
 DgSMh/l1nYwkpq3QN14O5AfBr1t2Eh3DmzR4IRjuAKR3u+1xVUdxrU6Gfj5JiT2WJEn5
 R0Gg==
X-Gm-Message-State: ALoCoQn3oIjyImdLiRlzSOmD25cCTfLW44nK+NwpCuNEEuugWqUTc0f/LtX9oo02rHdrhOVlzmTg
X-Received: by 10.180.206.14 with SMTP id lk14mr6814862wic.71.1425046439177;
 Fri, 27 Feb 2015 06:13:59 -0800 (PST)
Received: from xps13.localnet (136-92-190-109.dsl.ovh.fr. [109.190.92.136])
 by mx.google.com with ESMTPSA id j9sm6151589wjy.18.2015.02.27.06.13.57
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 27 Feb 2015 06:13:58 -0800 (PST)
From: Thomas Monjalon <thomas.monjalon@6wind.com>
To: "Liang, Cunming" <cunming.liang@intel.com>
Date: Fri, 27 Feb 2015 15:13:21 +0100
Message-ID: <7961783.gxFFSSThyb@xps13>
Organization: 6WIND
User-Agent: KMail/4.14.4 (Linux/3.18.4-1-ARCH; KDE/4.14.4; x86_64; ; )
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA3118DFC6E@shsmsx102.ccr.corp.intel.com>
References: <1424710542-14637-1-git-send-email-danny.zhou@intel.com>
 <CALwxeUu2GCzqR+RSgpp1GX=bCDpZTyPcqD4RQzsCx5d48RvBdQ@mail.gmail.com>
 <D0158A423229094DA7ABF71CF2FA0DA3118DFC6E@shsmsx102.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v6 4/8] eal/linux: add per rx queue interrupt
	handling based on VFIO
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Feb 2015 14:13:59 -0000

Hi Cunming,

First, sorry to have to say that, but it is not easy to read discussion=
s
where quote marks are not used. I re-insert them for clarity.

Comments below.

2015-02-27 12:22, Liang, Cunming:
> From: David Marchand [mailto:david.marchand@6wind.com]
> Sent: Friday, February 27, 2015 6:34 PM
>=20
> > I am not really comfortable with this api.
> >=20
> > This is just creating something on top of the standard epoll api wi=
th
> > limitations. In the end, we could just use an external lib that doe=
s this
> > already.
>=20
> [Liang, Cunming] Not really, I think. We try to protect the data insi=
de
> =E2=80=98rte_intr_handle=E2=80=99, it doesn=E2=80=99t expect user to =
understand the things defined
> inside =E2=80=98rte_intr_handle=E2=80=99.
> It=E2=80=99s better typedef =E2=80=98rte_intr_handle=E2=80=99 as a ra=
w integer ID, having a function
> to get it from a ethdev. Then all the interrupt api is around it.
> It provides the common pci NIC devices rxtx interrupt processing appr=
oach.
> For the limitations, we can fix it step by step.
>=20
> > So ok, this will work for your limited use case, but this will not =
be
> > really useful for anything else.
> > Not sure it has its place in eal, this is more an example to me.
>=20
> [Liang, Cunming] =E2=80=98limited use case=E2=80=99 do you means only=
 for rxtx ?
> It don=E2=80=99t expect to provide a generic event mechanism (like li=
bev/libevent
> does), but a simple way to allow PMD work with DMA interrupt. It main=
ly
> abstract for rx interrupt purpose. I appreciate if you could help to =
list
> more useful cases.

You don't expect to provide a generic event mechanism but application
developpers could need to wait for many events at once, not only Rx one=
s.
That's why it's better to provide only the needed parts to use somethin=
g
generic like libevent.
And we should avoid reinventing the wheel.

> > > +static void
> > > +eal_intr_process_rxtx_interrupts(struct rte_intr_handle *intr_ha=
ndle,
> > > +                                struct epoll_event *events,
> > > +                                uint32_t *vec, int nfds)
> > > +{
> > > +       int i, bytes_read;
> > > +       union rte_intr_read_buffer buf;
> > > +       int fd;
> > > +
> > > +       for (i =3D 0; i < nfds; i++) {
> > > +               /* set the length to be read for different handle=
 type */
> > > +               switch (intr_handle->type) {
> > > +               case RTE_INTR_HANDLE_UIO:
> > > +                       bytes_read =3D sizeof(buf.uio_intr_count)=
;
> > > +                       break;
> > > +               case RTE_INTR_HANDLE_ALARM:
> > > +                       bytes_read =3D sizeof(buf.timerfd_num);
> > > +                       break;
> > > +#ifdef VFIO_PRESENT
> > > +               case RTE_INTR_HANDLE_VFIO_MSIX:
> > > +               case RTE_INTR_HANDLE_VFIO_MSI:
> > > +               case RTE_INTR_HANDLE_VFIO_LEGACY:
> > > +                       bytes_read =3D sizeof(buf.vfio_intr_count=
);
> > > +                       break;
> > > +#endif
> > > +               default:
> > > +                       bytes_read =3D 1;
> > > +                       break;
> > > +               }
> > > +
> > > +               /**
> > > +               * read out to clear the ready-to-be-read flag
> > > +               * for epoll_wait.
> > > +               */
> > > +               vec[i] =3D events[i].data.u32;
> > > +               assert(vec[i] < VFIO_MAX_RXTX_INTR_ID);
> > > +
> > > +               fd =3D intr_handle->efds[vec[i]];
> > > +               bytes_read =3D read(fd, &buf, bytes_read);
> > > +               if (bytes_read < 0)
> > > +                       RTE_LOG(ERR, EAL, "Error reading from fil=
e "
> > > +                               "descriptor %d: %s\n", fd, strerr=
or(errno));
> > > +               else if (bytes_read =3D=3D 0)
> > > +                       RTE_LOG(ERR, EAL, "Read nothing from file=
 "
> > > +                               "descriptor %d\n", fd);
> > > +       }
> > > +}
> >=20
> > Why unconditionnally read ?
> > You are absorbing events from the application if the application ga=
ve you
> > an external epfd and populated it with its own fds.
>=20
> [Liang, Cunming] The vector number was checked. If an external epfd
> populated some event carry fd rather than a data.u32 but the value
> inside the valid range, it considers as a valid vector number. No mat=
ter
> the read success or not, it always notify the event. Do you have any
> suggestion used here to check the condition ?
>=20
> > > +static int init_tls_epfd(void)
> > > +{
> > > +       int pfd =3D epoll_create(1);
> > > +       if (pfd < 0) {
> > > +               RTE_LOG(ERR, EAL,
> > > +                       "Cannot create epoll instance\n");
> > > +               return -1;
> > > +       }
> > > +       return pfd;
> > > +}
> > > +
> > > +int
> > > +rte_intr_rx_wait(struct rte_intr_handle *intr_handle, int epfd,
> > > +                uint32_t *vec, uint16_t num)
> > > +{
> >=20
> > In the end, this "rx" does not mean anything to eal.
>=20
> [Liang, Cunming] That=E2=80=99s a good point. I tried to remove =E2=80=
=98rx=E2=80=99 and use a
> generic word here. =E2=80=98rte_intr_wait=E2=80=99 looks like too gen=
eric,
> =E2=80=98rte_intr_epfd_wait=E2=80=99 looks not abstract with bsd.
> As the function only serves for rxtx vector, so using the rx prefix.
> Which name do you prefer ?

You should understand that you are trying to wrongly replace a generic =
lib.
The best name is probably /dev/null.

> > > +#define MAX_EVENTS      8
> > > +       struct epoll_event events[MAX_EVENTS];
> > > +       int ret, nfds =3D 0;
> > > +
> > > +       if (!intr_handle || !vec) {
> > > +               RTE_LOG(ERR, EAL, "invalid input parameter\n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       if (intr_handle->type !=3D RTE_INTR_HANDLE_VFIO_MSIX) {
> > > +               RTE_LOG(ERR, EAL, "intr type should be VFIO_MSIX\=
n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       if (epfd =3D=3D RTE_EPOLL_FD_ANY) {
> > > +               /* using per thread epoll fd */
> > > +               if (unlikely(RTE_PER_LCORE(_epfd) =3D=3D -1))
> > > +                       RTE_PER_LCORE(_epfd) =3D init_tls_epfd();=

> > > +               epfd =3D RTE_PER_LCORE(_epfd);
> > > +       }
> >=20
> > Rather than testing every time, this should be set by the caller,
> > i.e. epfd is always valid.
> > If application does not want to create a epfd, then it calls
> > rte_intr_rx_wait with RTE_EPOLL_FD_ANY (this name is not well chose=
n)
> > that is a macro wrapped to RTE_PER_LCORE(_epfd).
>=20
> [Liang, Cunming] It sounds good to me. As we don=E2=80=99t expect to =
expose
> *rte_per_lcore__epfd* as an public symbol, so will define rte_epfd()
> instread.
> Within rte_epfd(), if RTE_PER_LCORE(_epfd) not assigned, then
> init_tls_epfd() once.
>=20
> > init_tls_epfd() should be called only once at init time.
> > No need to check every time.
>=20
> [Liang, Cunming] As it probably not need per thread epfd at all.
> So I prefer to create it when it real needed as above I mentioned.

> > > +       do {
> > > +               ret =3D epoll_wait(epfd, events,
> > > +                                RTE_MIN(num, MAX_EVENTS),
> > > +                                EAL_INTR_EPOLL_WAIT_FOREVER);
> > > +               if (unlikely(ret < 0)) {
> > > +                       /* epoll_wait fail */
> > > +                       RTE_LOG(ERR, EAL, "epoll_wait returns wit=
h fail\n");
> > > +                       return -1;
> > > +               } else if (ret > 0) {
> > > +                       /* epoll_wait has at least one fd ready t=
o read */
> > > +                       eal_intr_process_rxtx_interrupts(intr_han=
dle, events,
> > > +                                                        vec, ret=
);
> > > +                       num -=3D ret;
> > > +                       vec +=3D ret;
> > > +                       nfds +=3D ret;
> > > +               } else if (nfds > 0)
> > > +                       break;
> > > +       } while (num > 0);
> > > +
> > > +       return nfds;
> > > +}
> >
> > You are blocking unless all fds have been set, so you are serialisi=
ng
> > all events.
>=20
> [Liang, Cunming] I=E2=80=99m not sure fully got your point. If any ev=
ent arrives,
> it gets back. Do you means if no fds added in, it=E2=80=99s always bl=
ocking.
> You expect to have a timeout return ?

> > >                         RTE_LOG(ERR, EAL, "  cannot get IRQ info,=
 "
> > >=20
> > > -                                       "error %i (%s)\n", errno,=
 strerror(errno));
> > > +                               "error %i (%s)\n", errno, strerro=
r(errno));
> >=20
> > Garbage, this has nothing to do with the patch.
>=20
> [Liang, Cunming] It=E2=80=99s for line number exceed 80 margin compla=
in.

The title of the patch is "add per rx queue interrupt handling based on=
 VFIO".
So this kind of modification is a garbage.
Sorry, I won't play the game "idem / the same" below ;)

> > > -                       if (internal_config.vfio_intr_mode !=3D R=
TE_INTR_MODE_NONE) {
> > > +                       if (internal_config.vfio_intr_mode !=3D
> > > +                           RTE_INTR_MODE_NONE) {
> > >                                 RTE_LOG(ERR, EAL,
> > > -                                               "  interrupt vect=
or does not support eventfd!\n");
> > > +                                       "  interrupt vector "
> > > +                                       "does not support eventfd=
!\n");> >=20
> >=20
> > Idem.
>=20
> [Liang, Cunming] The same.

> > > -                                       "error %i (%s)\n", errno,=
 strerror(errno));
> > > +                               "error %i (%s)\n", errno, strerro=
r(errno));
> >=20
> > Idem.
>=20
> [Liang, Cunming] The same.

> > >                 dev->intr_handle.vfio_dev_fd =3D vfio_dev_fd;
> > > -
> >=20
> > Idem.
>=20
> [Liang, Cunming] Accept.