From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0C172A2EDB for ; Wed, 2 Oct 2019 06:45:50 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5AC0949DF; Wed, 2 Oct 2019 06:45:48 +0200 (CEST) Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30086.outbound.protection.outlook.com [40.107.3.86]) by dpdk.org (Postfix) with ESMTP id B73C6235 for ; Wed, 2 Oct 2019 06:45:46 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mKIxIfsp35uoZhD06oQLTNRp7q6A8JBcnadJRdfNPXGPMdRZwAhG5/5M0T+WR7Gt2t8R1pRYKrPsNwVcQhqMitHop05kQ/znZrAkGp3F9PimhzkBmf7yXCdIuBeafFPP23TC/exiho4mDZal1jsTgz9dpGGsH6ECaFDoXBrgjByfwZvacXoJ4hmjnd2c5PWOn1lGnj6u82i5utErsENO8hHmmZMHUjLazIqtpO32zkBHdC+9xtAxjCNhPQhQDVHyDzQ+8GZ25IN95XZmm3SfyxmNobTnSPMCkhHdukRmuSndL+m6opTrFjBieZokLj+tn+oYDkfiWHQe5gyuCSd8DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YD7tVlAzOrvEP+VQFJYsW0uj6PwjBrT+vcXnD7jDbeE=; b=idDAA6Z8jjZHz+1RI0tHG5Xn6/6KEHkCVFz822c+/CjI/MMVfzmI0QH724DLNCSfoc9V3QQiFyo9XPBXcUZv4EYjYWgKidqmJI7tprG+XbElFwwbaYq69pIPG19jCc52VAJ80xOAFTnebNHadHxyHsh/I/ZAbWPZvjfd6dOsRlGXBXqOE03ekxX3IGZBXZ6St0jkCW3Smuqm9Z35JnsUS9lupWCV0p/vRiVKFTjOoFvxSxKVicywzG+JKrJ1+V1u/aSPVn2vm2E23yMAW2cVYWGb9FpU64U+gL21HbDWkZ3aqXGfCMZ0efqbz/YvMOLUYV4TwwuB1RDEwdMGkmrgAg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YD7tVlAzOrvEP+VQFJYsW0uj6PwjBrT+vcXnD7jDbeE=; b=B1h0162QpnjVdtJplcWy+EnKSCcOBPDM4HtFAyiei2yT6NIxPHuI4dZciert2R2RFQ+091PLuX2Z9baIHwxF3plaSA7p3Mxue1vU+Q/fURZzC3fUz9zeMedi1RadXD7PRW+cbYMNlNHhbhGfctPlox6ZNCDxAPuw34lNJx0D8Fc= Received: from AM0PR0502MB3795.eurprd05.prod.outlook.com (52.133.45.150) by AM0PR0502MB3700.eurprd05.prod.outlook.com (52.133.44.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2305.15; Wed, 2 Oct 2019 04:45:45 +0000 Received: from AM0PR0502MB3795.eurprd05.prod.outlook.com ([fe80::683e:2a0f:9e1:e50c]) by AM0PR0502MB3795.eurprd05.prod.outlook.com ([fe80::683e:2a0f:9e1:e50c%7]) with mapi id 15.20.2305.017; Wed, 2 Oct 2019 04:45:45 +0000 From: Shahaf Shuler To: Flavio Leitner , "dev@dpdk.org" CC: Maxime Coquelin , Tiwei Bie , Zhihong Wang , Obrembski MichalX , Stokes Ian Thread-Topic: [dpdk-dev] [PATCH] vhost: add support to large linear mbufs Thread-Index: AQHVeKZifaPcR3gO1EG3W76NqR4rsKdGxLYA Date: Wed, 2 Oct 2019 04:45:45 +0000 Message-ID: References: <20191001221935.12140-1-fbl@sysclose.org> In-Reply-To: <20191001221935.12140-1-fbl@sysclose.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=shahafs@mellanox.com; x-originating-ip: [31.154.10.105] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ab776341-87a2-45da-7576-08d746f36336 x-ms-office365-filtering-ht: Tenant x-ms-traffictypediagnostic: AM0PR0502MB3700: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:345; x-forefront-prvs: 0178184651 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(366004)(136003)(39860400002)(376002)(346002)(396003)(199004)(189003)(66556008)(25786009)(9686003)(66946007)(478600001)(66446008)(66476007)(64756008)(76116006)(110136005)(54906003)(4326008)(7736002)(86362001)(316002)(76176011)(305945005)(33656002)(6506007)(26005)(102836004)(2906002)(74316002)(99286004)(7696005)(476003)(14444005)(486006)(446003)(5024004)(11346002)(2501003)(6116002)(256004)(3846002)(186003)(71190400001)(71200400001)(8676002)(6246003)(66066001)(5660300002)(52536014)(81156014)(8936002)(6436002)(229853002)(55016002)(81166006)(14454004); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR0502MB3700; H:AM0PR0502MB3795.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: wL0vM03lp8ups11P94pKoHtR0dE6T97745NwNTfumgZqzshkYwX5Iwl/xkT0PKc5uSoqzCOzU4iM3/xmEpBp+UnOobKf08/DXcOEZdxOH9N9wQXZD54Hvav5HYaPjN6QOYmqZkd3QCj/HxKZmi8VJ/DOdZvdCm9pO2i0XhV2ng/pZsjxdblVoJ3oULuL8JerbouqYhgzhdXAlVtMZjnXZyKLZhOLy+5iMly9skOA1YibwinAlVeP+1PlaAHYcVdAQDrZLmqfmfDX+BBcAYBpgZUeAMnHOJesaGAhOMDS7QPCz3X3K1gExVLv0yn3x9j+ksim/mDlm/AMyGLrcE6eVI4aQjCLRtYWJxcu2hU6iJMGRvJ/I4Rt6ixyzY/HJ0OqinW4JBhTnYvFzeo/NeJcVuHAu3y7td8ys2a2Tii3uvA= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: ab776341-87a2-45da-7576-08d746f36336 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Oct 2019 04:45:45.0695 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: SBujubEC7b1Lg+eqoHOZZFeE4+gN45Uru+hBNTAmOTRjZjOgwIoZ/U+M9B6KAxhwI49FRd0uOTzcszdLGSRWvQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR0502MB3700 Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear mbufs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Wednesday, October 2, 2019 1:20 AM, Flavio Leitner: > Subject: [dpdk-dev] [PATCH] vhost: add support to large linear mbufs >=20 > The rte_vhost_dequeue_burst supports two ways of dequeuing data. If the > data fits into a buffer, then all data is copied and a single linear buff= er is > returned. Otherwise it allocates additional mbufs and chains them togethe= r > to return a multiple segments mbuf. >=20 > While that covers most use cases, it forces applications that need to wor= k > with larger data sizes to support multiple segments mbufs. > The non-linear characteristic brings complexity and performance implicati= ons > to the application. >=20 > To resolve the issue, change the API so that the application can optional= ly > provide a second mempool containing larger mbufs. If that is not provided > (NULL), the behavior remains as before the change. > Otherwise, the data size is checked and the corresponding mempool is used > to return linear mbufs. I understand the motivation.=20 However, providing a static pool w/ large buffers is not so efficient in te= rms of memory footprint. You will need to prepare to worst case (all packet= are large) w/ max size of 64KB.=20 Also, the two mempools are quite restrictive as the memory fill of the mbuf= s might be very sparse. E.g. mempool1 mbuf.size =3D 1.5K , mempool2 mbuf.si= ze =3D 64K, packet size 4KB.=20 Instead, how about using the mbuf external buffer feature?=20 The flow will be: 1. vhost PMD always receive a single mempool (like today) 2. on dequeue, PMD looks on the virtio packet size. If smaller than the mbu= f size use the mbuf as is (like today) 3. otherwise, allocate a new buffer (inside the PMD) and link it to the mbu= f as external buffer (rte_pktmbuf_attach_extbuf) The pros of this approach is that you have full flexibility on the memory a= llocation, and therefore a lower footprint. The cons is the OVS will need to know how to handle mbuf w/ external buffer= s (not too complex IMO). =20 >=20 > Signed-off-by: Flavio Leitner > --- > drivers/net/vhost/rte_eth_vhost.c | 4 +-- > examples/tep_termination/main.c | 2 +- > examples/vhost/main.c | 2 +- > lib/librte_vhost/rte_vhost.h | 5 ++- > lib/librte_vhost/virtio_net.c | 57 +++++++++++++++++++++++-------- > 5 files changed, 50 insertions(+), 20 deletions(-) >=20 > diff --git a/drivers/net/vhost/rte_eth_vhost.c > b/drivers/net/vhost/rte_eth_vhost.c > index 46f01a7f4..ce7f68a5b 100644 > --- a/drivers/net/vhost/rte_eth_vhost.c > +++ b/drivers/net/vhost/rte_eth_vhost.c > @@ -393,8 +393,8 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, > uint16_t nb_bufs) > VHOST_MAX_PKT_BURST); >=20 > nb_pkts =3D rte_vhost_dequeue_burst(r->vid, r- > >virtqueue_id, > - r->mb_pool, &bufs[nb_rx], > - num); > + r->mb_pool, NULL, > + &bufs[nb_rx], num); >=20 > nb_rx +=3D nb_pkts; > nb_receive -=3D nb_pkts; > diff --git a/examples/tep_termination/main.c > b/examples/tep_termination/main.c index ab956ad7c..3ebf0fa6e 100644 > --- a/examples/tep_termination/main.c > +++ b/examples/tep_termination/main.c > @@ -697,7 +697,7 @@ switch_worker(__rte_unused void *arg) > if (likely(!vdev->remove)) { > /* Handle guest TX*/ > tx_count =3D rte_vhost_dequeue_burst(vdev- > >vid, > - VIRTIO_TXQ, mbuf_pool, > + VIRTIO_TXQ, mbuf_pool, > NULL, > pkts_burst, > MAX_PKT_BURST); > /* If this is the first received packet we need > to learn the MAC */ > if (unlikely(vdev->ready =3D=3D > DEVICE_MAC_LEARNING) && tx_count) { diff --git a/examples/vhost/main.c > b/examples/vhost/main.c index ab649bf14..e9b306af3 100644 > --- a/examples/vhost/main.c > +++ b/examples/vhost/main.c > @@ -1092,7 +1092,7 @@ drain_virtio_tx(struct vhost_dev *vdev) > pkts, MAX_PKT_BURST); > } else { > count =3D rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ, > - mbuf_pool, pkts, MAX_PKT_BURST); > + mbuf_pool, NULL, pkts, > MAX_PKT_BURST); > } >=20 > /* setup VMDq for the first packet */ > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h = index > 19474bca0..b05fd8e2a 100644 > --- a/lib/librte_vhost/rte_vhost.h > +++ b/lib/librte_vhost/rte_vhost.h > @@ -593,6 +593,8 @@ uint16_t rte_vhost_enqueue_burst(int vid, uint16_t > queue_id, > * virtio queue index in mq case > * @param mbuf_pool > * mbuf_pool where host mbuf is allocated. > + * @param mbuf_pool_large > + * mbuf_pool where larger host mbuf is allocated. > * @param pkts > * array to contain packets to be dequeued > * @param count > @@ -601,7 +603,8 @@ uint16_t rte_vhost_enqueue_burst(int vid, uint16_t > queue_id, > * num of packets dequeued > */ > uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count); > + struct rte_mempool *mbuf_pool, struct rte_mempool > *mbuf_pool_large, > + struct rte_mbuf **pkts, uint16_t count); >=20 > /** > * Get guest mem table: a list of memory regions. > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.= c index > 5b85b832d..da9d77732 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -1291,10 +1291,12 @@ get_zmbuf(struct vhost_virtqueue *vq) >=20 > static __rte_noinline uint16_t > virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > + struct rte_mempool *mbuf_pool, struct rte_mempool > *mbuf_pool_large, > + struct rte_mbuf **pkts, uint16_t count) > { > uint16_t i; > uint16_t free_entries; > + uint16_t mbuf_avail; >=20 > if (unlikely(dev->dequeue_zero_copy)) { > struct zcopy_mbuf *zmbuf, *next; > @@ -1340,32 +1342,42 @@ virtio_dev_tx_split(struct virtio_net *dev, struc= t > vhost_virtqueue *vq, > VHOST_LOG_DEBUG(VHOST_DATA, "(%d) about to dequeue %u > buffers\n", > dev->vid, count); >=20 > + /* If the large mpool is provided, find the threshold */ > + mbuf_avail =3D 0; > + if (mbuf_pool_large) > + mbuf_avail =3D rte_pktmbuf_data_room_size(mbuf_pool) - > +RTE_PKTMBUF_HEADROOM; > + > for (i =3D 0; i < count; i++) { > struct buf_vector buf_vec[BUF_VECTOR_MAX]; > uint16_t head_idx; > - uint32_t dummy_len; > + uint32_t buf_len; > uint16_t nr_vec =3D 0; > + struct rte_mempool *mpool; > int err; >=20 > if (unlikely(fill_vec_buf_split(dev, vq, > vq->last_avail_idx + i, > &nr_vec, buf_vec, > - &head_idx, &dummy_len, > + &head_idx, &buf_len, > VHOST_ACCESS_RO) < 0)) > break; >=20 > if (likely(dev->dequeue_zero_copy =3D=3D 0)) > update_shadow_used_ring_split(vq, head_idx, 0); >=20 > - pkts[i] =3D rte_pktmbuf_alloc(mbuf_pool); > + if (mbuf_pool_large && buf_len > mbuf_avail) > + mpool =3D mbuf_pool_large; > + else > + mpool =3D mbuf_pool; > + > + pkts[i] =3D rte_pktmbuf_alloc(mpool); > if (unlikely(pkts[i] =3D=3D NULL)) { > RTE_LOG(ERR, VHOST_DATA, > "Failed to allocate memory for mbuf.\n"); > break; > } >=20 > - err =3D copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > - mbuf_pool); > + err =3D copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > mpool); > if (unlikely(err)) { > rte_pktmbuf_free(pkts[i]); > break; > @@ -1411,9 +1423,11 @@ virtio_dev_tx_split(struct virtio_net *dev, struct > vhost_virtqueue *vq, >=20 > static __rte_noinline uint16_t > virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > + struct rte_mempool *mbuf_pool, struct rte_mempool > *mbuf_pool_large, > + struct rte_mbuf **pkts, uint16_t count) > { > uint16_t i; > + uint16_t mbuf_avail; >=20 > if (unlikely(dev->dequeue_zero_copy)) { > struct zcopy_mbuf *zmbuf, *next; > @@ -1448,17 +1462,23 @@ virtio_dev_tx_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > VHOST_LOG_DEBUG(VHOST_DATA, "(%d) about to dequeue %u > buffers\n", > dev->vid, count); >=20 > + /* If the large mpool is provided, find the threshold */ > + mbuf_avail =3D 0; > + if (mbuf_pool_large) > + mbuf_avail =3D rte_pktmbuf_data_room_size(mbuf_pool) - > +RTE_PKTMBUF_HEADROOM; > + > for (i =3D 0; i < count; i++) { > struct buf_vector buf_vec[BUF_VECTOR_MAX]; > uint16_t buf_id; > - uint32_t dummy_len; > + uint32_t buf_len; > uint16_t desc_count, nr_vec =3D 0; > + struct rte_mempool *mpool; > int err; >=20 > if (unlikely(fill_vec_buf_packed(dev, vq, > vq->last_avail_idx, > &desc_count, > buf_vec, &nr_vec, > - &buf_id, &dummy_len, > + &buf_id, &buf_len, > VHOST_ACCESS_RO) < 0)) > break; >=20 > @@ -1466,15 +1486,19 @@ virtio_dev_tx_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq, > update_shadow_used_ring_packed(vq, buf_id, 0, > desc_count); >=20 > - pkts[i] =3D rte_pktmbuf_alloc(mbuf_pool); > + if (mbuf_pool_large && buf_len > mbuf_avail) > + mpool =3D mbuf_pool_large; > + else > + mpool =3D mbuf_pool; > + > + pkts[i] =3D rte_pktmbuf_alloc(mpool); > if (unlikely(pkts[i] =3D=3D NULL)) { > RTE_LOG(ERR, VHOST_DATA, > "Failed to allocate memory for mbuf.\n"); > break; > } >=20 > - err =3D copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > - mbuf_pool); > + err =3D copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], > mpool); > if (unlikely(err)) { > rte_pktmbuf_free(pkts[i]); > break; > @@ -1526,7 +1550,8 @@ virtio_dev_tx_packed(struct virtio_net *dev, struct > vhost_virtqueue *vq, >=20 > uint16_t > rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > - struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > + struct rte_mempool *mbuf_pool, struct rte_mempool > *mbuf_pool_large, > + struct rte_mbuf **pkts, uint16_t count) > { > struct virtio_net *dev; > struct rte_mbuf *rarp_mbuf =3D NULL; > @@ -1598,9 +1623,11 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > } >=20 > if (vq_is_packed(dev)) > - count =3D virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, > count); > + count =3D virtio_dev_tx_packed(dev, vq, mbuf_pool, > mbuf_pool_large, pkts, > + count); > else > - count =3D virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); > + count =3D virtio_dev_tx_split(dev, vq, mbuf_pool, > mbuf_pool_large, pkts, > + count); >=20 > out: > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > -- > 2.20.1