From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id AA24C5961 for ; Mon, 19 Oct 2015 15:19:53 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP; 19 Oct 2015 06:19:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,702,1437462000"; d="scan'208";a="830314869" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by fmsmga002.fm.intel.com with ESMTP; 19 Oct 2015 06:19:52 -0700 Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 19 Oct 2015 06:19:52 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 19 Oct 2015 06:19:51 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.96]) by shsmsx102.ccr.corp.intel.com ([169.254.2.253]) with mapi id 14.03.0248.002; Mon, 19 Oct 2015 21:19:50 +0800 From: "Xie, Huawei" To: Stephen Hemminger Thread-Topic: [PATCH 3/5] virtio: use indirect ring elements Thread-Index: AdEKcNUdTVFv92u0RmaE7X1gJWxEEw== Date: Mon, 19 Oct 2015 13:19:50 +0000 Message-ID: References: <1445231772-17467-1-git-send-email-stephen@networkplumber.org> <1445231772-17467-4-git-send-email-stephen@networkplumber.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Oct 2015 13:19:54 -0000 On 10/19/2015 1:16 PM, Stephen Hemminger wrote:=0A= > The virtio ring in QEMU/KVM is usually limited to 256 entries=0A= > and the normal way that virtio driver was queuing mbufs required=0A= > nsegs + 1 ring elements. By using the indirect ring element feature=0A= > if available, each packet will take only one ring slot even for=0A= > multi-segment packets.=0A= =0A= [...]=0A= =0A= =0A= > static int=0A= > -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)= =0A= > +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,= =0A= > + int use_indirect)=0A= > {=0A= > struct vq_desc_extra *dxp;=0A= > struct vring_desc *start_dp;=0A= > uint16_t seg_num =3D cookie->nb_segs;=0A= > - uint16_t needed =3D 1 + seg_num;=0A= > + uint16_t needed =3D use_indirect ? 1 : 1 + seg_num;=0A= > uint16_t head_idx, idx;=0A= > - uint16_t head_size =3D txvq->hw->vtnet_hdr_size;=0A= > + unsigned long offs;=0A= > =0A= > if (unlikely(txvq->vq_free_cnt =3D=3D 0))=0A= > return -ENOSPC;=0A= > @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, stru= ct rte_mbuf *cookie)=0A= > dxp =3D &txvq->vq_descx[idx];=0A= > dxp->cookie =3D (void *)cookie;=0A= > dxp->ndescs =3D needed;=0A= > -=0A= > start_dp =3D txvq->vq_ring.desc;=0A= > - start_dp[idx].addr =3D=0A= > - txvq->virtio_net_hdr_mem + idx * head_size;=0A= > - start_dp[idx].len =3D (uint32_t)head_size;=0A= > - start_dp[idx].flags =3D VRING_DESC_F_NEXT;=0A= > +=0A= > + if (use_indirect) {=0A= > + struct virtio_tx_region *txr=0A= > + =3D txvq->virtio_net_hdr_mz->addr;=0A= > +=0A= > + offs =3D idx * sizeof(struct virtio_tx_region)=0A= > + + offsetof(struct virtio_tx_region, tx_indir);=0A= > +=0A= > + start_dp[idx].addr =3D txvq->virtio_net_hdr_mem + offs;=0A= > + start_dp[idx].len =3D (seg_num + 1) * sizeof(struct vring_desc);=0A= a. In indirect mode, as we always use one descriptor, could we allocate=0A= one fixed descriptor as what i did in RX/TX ring layout optimization? :).= =0A= b. If not a, we could cache the descriptor, avoid update unless the=0A= fields are different. In current implementation of free desc list, we=0A= could make them always use the same tx desc for the same ring slot. I am=0A= to submit a patch for normal rx path.=0A= c. Could we initialize the length of all tx descriptors to be=0A= VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok=0A= here? Does the spec require that the length field reflects the length of=0A= real used descs, as we already have the next field to indicate the last=0A= descriptor.=0A= =0A= =0A= > + start_dp[idx].flags =3D VRING_DESC_F_INDIRECT;=0A= > +=0A= > + start_dp =3D txr[idx].tx_indir;=0A= > + idx =3D 0;=0A= > + } else {=0A= > + offs =3D idx * sizeof(struct virtio_tx_region)=0A= > + + offsetof(struct virtio_tx_region, tx_hdr);=0A= > +=0A= > + start_dp[idx].addr =3D txvq->virtio_net_hdr_mem + offs;=0A= > + start_dp[idx].len =3D txvq->hw->vtnet_hdr_size;=0A= > + start_dp[idx].flags =3D VRING_DESC_F_NEXT;=0A= > + }=0A= > =0A= > for (; ((seg_num > 0) && (cookie !=3D NULL)); seg_num--) {=0A= > idx =3D start_dp[idx].next;=0A= This looks weird to me. Why skip the first user provided descriptor?=0A= idx =3D 0=0A= idx =3D start_dp[idx].next=0A= start_dp[idx].addr =3D ...=0A= =0A= > @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struc= t rte_mbuf *cookie)=0A= > }=0A= > =0A= > start_dp[idx].flags &=3D ~VRING_DESC_F_NEXT;=0A= > - idx =3D start_dp[idx].next;=0A= > +=0A= > + if (use_indirect)=0A= > + idx =3D txvq->vq_ring.desc[head_idx].next;=0A= > + else=0A= > + idx =3D start_dp[idx].next;=0A= > +=0A= > txvq->vq_desc_head_idx =3D idx;=0A= > if (txvq->vq_desc_head_idx =3D=3D VQ_RING_DESC_CHAIN_END)=0A= > txvq->vq_desc_tail_idx =3D idx;=0A= > @@ -261,7 +284,7 @@ static void=0A= > virtio_dev_vring_start(struct virtqueue *vq, int queue_type)=0A= > {=0A= > struct rte_mbuf *m;=0A= > - int i, nbufs, error, size =3D vq->vq_nentries;=0A= > + int nbufs, error, size =3D vq->vq_nentries;=0A= > struct vring *vr =3D &vq->vq_ring;=0A= > uint8_t *ring_mem =3D vq->vq_ring_virt_mem;=0A= > =0A= > @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int que= ue_type)=0A= > vq->vq_free_cnt =3D vq->vq_nentries;=0A= > memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries)= ;=0A= > =0A= > - /* Chain all the descriptors in the ring with an END */=0A= > - for (i =3D 0; i < size - 1; i++)=0A= > - vr->desc[i].next =3D (uint16_t)(i + 1);=0A= > - vr->desc[i].next =3D VQ_RING_DESC_CHAIN_END;=0A= > + vring_desc_init(vr->desc, size);=0A= > =0A= > /*=0A= > * Disable device(host) interrupting guest=0A= > @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **t= x_pkts, uint16_t nb_pkts)=0A= > =0A= > for (nb_tx =3D 0; nb_tx < nb_pkts; nb_tx++) {=0A= > struct rte_mbuf *txm =3D tx_pkts[nb_tx];=0A= > - int need =3D txm->nb_segs - txvq->vq_free_cnt + 1;=0A= > + int use_indirect, slots, need;=0A= > +=0A= > + use_indirect =3D vtpci_with_feature(txvq->hw,=0A= > + VIRTIO_RING_F_INDIRECT_DESC)=0A= > + && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);=0A= > +=0A= > + /* How many ring entries are needed to this Tx? */=0A= "how many descs" is more accurate , at least to me, ring entries/slots=0A= means entries/slots in avail ring.=0A= If it is OK, s/slots/descs/ as well.=0A= =0A= > + slots =3D use_indirect ? 1 : 1 + txm->nb_segs;=0A= > + need =3D slots - txvq->vq_free_cnt;=0A= > =0A= > /* Positive value indicates it need free vring descriptors */=0A= > if (need > 0) {=0A= > @@ -769,7 +797,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx= _pkts, uint16_t nb_pkts)=0A= > need =3D RTE_MIN(need, (int)nb_used);=0A= > =0A= > virtio_xmit_cleanup(txvq, need);=0A= > - need =3D txm->nb_segs - txvq->vq_free_cnt + 1;=0A= > + need =3D slots - txvq->vq_free_cnt;=0A= > if (unlikely(need > 0)) {=0A= > PMD_TX_LOG(ERR,=0A= > "No free tx descriptors to transmit");=0A= > @@ -787,7 +815,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx= _pkts, uint16_t nb_pkts)=0A= > }=0A= > =0A= > /* Enqueue Packet buffers */=0A= > - error =3D virtqueue_enqueue_xmit(txvq, txm);=0A= > + error =3D virtqueue_enqueue_xmit(txvq, txm, use_indirect);=0A= > if (unlikely(error)) {=0A= > if (error =3D=3D ENOSPC)=0A= > PMD_TX_LOG(ERR, "virtqueue_enqueue Free count =3D 0");=0A= > diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueu= e.h=0A= > index 7789411..fe3fa66 100644=0A= > --- a/drivers/net/virtio/virtqueue.h=0A= > +++ b/drivers/net/virtio/virtqueue.h=0A= > @@ -237,6 +237,25 @@ struct virtio_net_hdr_mrg_rxbuf {=0A= > uint16_t num_buffers; /**< Number of merged rx buffers */=0A= > };=0A= > =0A= > +/* Region reserved to allow for transmit header and indirect ring */=0A= > +#define VIRTIO_MAX_TX_INDIRECT 8=0A= > +struct virtio_tx_region {=0A= > + struct virtio_net_hdr_mrg_rxbuf tx_hdr;=0A= Any reason to use merge-able header here?=0A= > + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]=0A= > + __attribute__((__aligned__(16)));=0A= WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))= =0A= [...]=0A= =0A=