From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 066E0234 for ; Mon, 19 Oct 2015 18:18:48 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP; 19 Oct 2015 09:18:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,702,1437462000"; d="scan'208";a="830447641" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga002.fm.intel.com with ESMTP; 19 Oct 2015 09:18:25 -0700 Received: from fmsmsx152.amr.corp.intel.com (10.18.125.5) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 19 Oct 2015 09:18:24 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by FMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 19 Oct 2015 09:18:24 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.96]) by shsmsx102.ccr.corp.intel.com ([169.254.2.253]) with mapi id 14.03.0248.002; Tue, 20 Oct 2015 00:18:22 +0800 From: "Xie, Huawei" To: Stephen Hemminger Thread-Topic: [PATCH 3/5] virtio: use indirect ring elements Thread-Index: AdEKcNUdTVFv92u0RmaE7X1gJWxEEw== Date: Mon, 19 Oct 2015 16:18:21 +0000 Message-ID: References: <1445231772-17467-1-git-send-email-stephen@networkplumber.org> <1445231772-17467-4-git-send-email-stephen@networkplumber.org> <20151019084715.33d89e50@xeon-e3> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Oct 2015 16:18:50 -0000 On 10/19/2015 11:47 PM, Stephen Hemminger wrote:=0A= > On Mon, 19 Oct 2015 13:19:50 +0000=0A= > "Xie, Huawei" wrote:=0A= >=0A= >>> static int=0A= >>> -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie= )=0A= >>> +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie= ,=0A= >>> + int use_indirect)=0A= >>> {=0A= >>> struct vq_desc_extra *dxp;=0A= >>> struct vring_desc *start_dp;=0A= >>> uint16_t seg_num =3D cookie->nb_segs;=0A= >>> - uint16_t needed =3D 1 + seg_num;=0A= >>> + uint16_t needed =3D use_indirect ? 1 : 1 + seg_num;=0A= >>> uint16_t head_idx, idx;=0A= >>> - uint16_t head_size =3D txvq->hw->vtnet_hdr_size;=0A= >>> + unsigned long offs;=0A= >>> =0A= >>> if (unlikely(txvq->vq_free_cnt =3D=3D 0))=0A= >>> return -ENOSPC;=0A= >>> @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, st= ruct rte_mbuf *cookie)=0A= >>> dxp =3D &txvq->vq_descx[idx];=0A= >>> dxp->cookie =3D (void *)cookie;=0A= >>> dxp->ndescs =3D needed;=0A= >>> -=0A= >>> start_dp =3D txvq->vq_ring.desc;=0A= >>> - start_dp[idx].addr =3D=0A= >>> - txvq->virtio_net_hdr_mem + idx * head_size;=0A= >>> - start_dp[idx].len =3D (uint32_t)head_size;=0A= >>> - start_dp[idx].flags =3D VRING_DESC_F_NEXT;=0A= >>> +=0A= >>> + if (use_indirect) {=0A= >>> + struct virtio_tx_region *txr=0A= >>> + =3D txvq->virtio_net_hdr_mz->addr;=0A= >>> +=0A= >>> + offs =3D idx * sizeof(struct virtio_tx_region)=0A= >>> + + offsetof(struct virtio_tx_region, tx_indir);=0A= >>> +=0A= >>> + start_dp[idx].addr =3D txvq->virtio_net_hdr_mem + offs;=0A= >>> + start_dp[idx].len =3D (seg_num + 1) * sizeof(struct vring_desc); = =0A= >> a. In indirect mode, as we always use one descriptor, could we allocate= =0A= >> one fixed descriptor as what i did in RX/TX ring layout optimization? :)= .=0A= > The code can not assume all packets will be in indirect mode. If using=0A= > any_layout, then some packets will use that. Also if you give a packet=0A= > where nb_segs is very large, then it falls back to old mode.=0A= > Also some hosts (like vhost) don't support indirect.=0A= Agree.=0A= With always one descriptor, it is ok.=0A= For the packets with more than VIRTIO_MAX_INDIRECT segs, we fall to old=0A= mode.=0A= >=0A= >> b. If not a, we could cache the descriptor, avoid update unless the=0A= >> fields are different. In current implementation of free desc list, we=0A= >> could make them always use the same tx desc for the same ring slot. I am= =0A= >> to submit a patch for normal rx path.=0A= > See above=0A= I don't mean using fixed descriptors. Even in general implementation,=0A= the desc allocated for the ring entry would be normally the same.=0A= So we cache the descriptor id(in the case only one desc is used) last=0A= allocated for each avail ring entry , compare, if the same, skip the store.= =0A= This introduces some overhead, but considering vhost has to fetch this=0A= L1M cache line from virtio's processing core, it might be worth.=0A= Mst has posted a patch for virtio's testcase.=0A= Maybe it makes more sense for RX as currently we always uses one descriptor= .=0A= If you mean the "only one descriptor" again, skip this comment.=0A= >=0A= >> c. Could we initialize the length of all tx descriptors to be=0A= >> VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok=0A= >> here? Does the spec require that the length field reflects the length of= =0A= >> real used descs, as we already have the next field to indicate the last= =0A= >> descriptor.=0A= > The largest VIRTIO_MAX_INDIRECT possible is very large 4K=0A= instead of=0A= =0A= start_dp[idx].len =3D (seg_num + 1) * sizeof(struct vring_desc)=0A= Is it ok to use=0A= start_dp[idx].len =3D 4096 * sizeof(struct vring_desc)?=0A= =0A= >=0A= >=0A= >>> + start_dp[idx].flags =3D VRING_DESC_F_INDIRECT;=0A= >>> +=0A= >>> + start_dp =3D txr[idx].tx_indir;=0A= >>> + idx =3D 0;=0A= >>> + } else {=0A= >>> + offs =3D idx * sizeof(struct virtio_tx_region)=0A= >>> + + offsetof(struct virtio_tx_region, tx_hdr);=0A= >>> +=0A= >>> + start_dp[idx].addr =3D txvq->virtio_net_hdr_mem + offs;=0A= >>> + start_dp[idx].len =3D txvq->hw->vtnet_hdr_size;=0A= >>> + start_dp[idx].flags =3D VRING_DESC_F_NEXT;=0A= >>> + }=0A= >>> =0A= >>> for (; ((seg_num > 0) && (cookie !=3D NULL)); seg_num--) {=0A= >>> idx =3D start_dp[idx].next; =0A= >> This looks weird to me. Why skip the first user provided descriptor?=0A= >> idx =3D 0=0A= >> idx =3D start_dp[idx].next=0A= >> start_dp[idx].addr =3D ...=0A= > The first descriptor (0) is initialized once to point to the static=0A= > all zeros tx header. Then code skips to second entry to initailize the=0A= > first data block.=0A= Ah, forget the header.=0A= >>> @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, str= uct rte_mbuf *cookie)=0A= >>> }=0A= >>> =0A= >>> start_dp[idx].flags &=3D ~VRING_DESC_F_NEXT;=0A= >>> - idx =3D start_dp[idx].next;=0A= >>> +=0A= >>> + if (use_indirect)=0A= >>> + idx =3D txvq->vq_ring.desc[head_idx].next;=0A= >>> + else=0A= >>> + idx =3D start_dp[idx].next;=0A= >>> +=0A= >>> txvq->vq_desc_head_idx =3D idx;=0A= >>> if (txvq->vq_desc_head_idx =3D=3D VQ_RING_DESC_CHAIN_END)=0A= >>> txvq->vq_desc_tail_idx =3D idx;=0A= >>> @@ -261,7 +284,7 @@ static void=0A= >>> virtio_dev_vring_start(struct virtqueue *vq, int queue_type)=0A= >>> {=0A= >>> struct rte_mbuf *m;=0A= >>> - int i, nbufs, error, size =3D vq->vq_nentries;=0A= >>> + int nbufs, error, size =3D vq->vq_nentries;=0A= >>> struct vring *vr =3D &vq->vq_ring;=0A= >>> uint8_t *ring_mem =3D vq->vq_ring_virt_mem;=0A= >>> =0A= >>> @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int q= ueue_type)=0A= >>> vq->vq_free_cnt =3D vq->vq_nentries;=0A= >>> memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentrie= s);=0A= >>> =0A= >>> - /* Chain all the descriptors in the ring with an END */=0A= >>> - for (i =3D 0; i < size - 1; i++)=0A= >>> - vr->desc[i].next =3D (uint16_t)(i + 1);=0A= >>> - vr->desc[i].next =3D VQ_RING_DESC_CHAIN_END;=0A= >>> + vring_desc_init(vr->desc, size);=0A= >>> =0A= >>> /*=0A= >>> * Disable device(host) interrupting guest=0A= >>> @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf *= *tx_pkts, uint16_t nb_pkts)=0A= >>> =0A= >>> for (nb_tx =3D 0; nb_tx < nb_pkts; nb_tx++) {=0A= >>> struct rte_mbuf *txm =3D tx_pkts[nb_tx];=0A= >>> - int need =3D txm->nb_segs - txvq->vq_free_cnt + 1;=0A= >>> + int use_indirect, slots, need;=0A= >>> +=0A= >>> + use_indirect =3D vtpci_with_feature(txvq->hw,=0A= >>> + VIRTIO_RING_F_INDIRECT_DESC)=0A= >>> + && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);=0A= >>> +=0A= >>> + /* How many ring entries are needed to this Tx? */ =0A= >> "how many descs" is more accurate , at least to me, ring entries/slots= =0A= >> means entries/slots in avail ring.=0A= >> If it is OK, s/slots/descs/ as well.=0A= > The virtio spec doesn't use the words descriptors. that is more an Intel= =0A= > driver terminolgy.=0A= >=0A= >>> + slots =3D use_indirect ? 1 : 1 + txm->nb_segs;=0A= >>> + need =3D slots - txvq->vq_free_cnt;=0A= >>> =0A= >>> /* Positive value indicates it need free vring descriptors */=0A= >>> if (need > 0) {=0A= >>> @@ -769,7 +797,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **= tx_pkts, uint16_t nb_pkts)=0A= >>> need =3D RTE_MIN(need, (int)nb_used);=0A= >>> =0A= >>> virtio_xmit_cleanup(txvq, need);=0A= >>> - need =3D txm->nb_segs - txvq->vq_free_cnt + 1;=0A= >>> + need =3D slots - txvq->vq_free_cnt;=0A= >>> if (unlikely(need > 0)) {=0A= >>> PMD_TX_LOG(ERR,=0A= >>> "No free tx descriptors to transmit");=0A= >>> @@ -787,7 +815,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **= tx_pkts, uint16_t nb_pkts)=0A= >>> }=0A= >>> =0A= >>> /* Enqueue Packet buffers */=0A= >>> - error =3D virtqueue_enqueue_xmit(txvq, txm);=0A= >>> + error =3D virtqueue_enqueue_xmit(txvq, txm, use_indirect);=0A= >>> if (unlikely(error)) {=0A= >>> if (error =3D=3D ENOSPC)=0A= >>> PMD_TX_LOG(ERR, "virtqueue_enqueue Free count =3D 0");=0A= >>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqu= eue.h=0A= >>> index 7789411..fe3fa66 100644=0A= >>> --- a/drivers/net/virtio/virtqueue.h=0A= >>> +++ b/drivers/net/virtio/virtqueue.h=0A= >>> @@ -237,6 +237,25 @@ struct virtio_net_hdr_mrg_rxbuf {=0A= >>> uint16_t num_buffers; /**< Number of merged rx buffers */=0A= >>> };=0A= >>> =0A= >>> +/* Region reserved to allow for transmit header and indirect ring */= =0A= >>> +#define VIRTIO_MAX_TX_INDIRECT 8=0A= >>> +struct virtio_tx_region {=0A= >>> + struct virtio_net_hdr_mrg_rxbuf tx_hdr; =0A= >> Any reason to use merge-able header here?=0A= >>> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]=0A= >>> + __attribute__((__aligned__(16))); =0A= >> WARNING: __aligned(size) is preferred over __attribute__((aligned(size))= )=0A= >> [...]=0A= >=0A= =0A=