From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 80D7B11A4 for ; Mon, 1 Apr 2019 07:47:46 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Mar 2019 22:47:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,295,1549958400"; d="xml'?pptx'72,48,150?scan'72,48,150,72,48,208,150?rels'72,48,150,72,48,208,150?wdp'72,48,150,72,48,208,150?png'72,48,150,72,48,208,150,150"; a="160174594" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by fmsmga001.fm.intel.com with ESMTP; 31 Mar 2019 22:47:41 -0700 Received: from fmsmsx151.amr.corp.intel.com (10.18.125.4) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.408.0; Sun, 31 Mar 2019 22:47:41 -0700 Received: from shsmsx107.ccr.corp.intel.com (10.239.4.96) by FMSMSX151.amr.corp.intel.com (10.18.125.4) with Microsoft SMTP Server (TLS) id 14.3.408.0; Sun, 31 Mar 2019 22:47:32 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.93]) by SHSMSX107.ccr.corp.intel.com ([169.254.9.153]) with mapi id 14.03.0415.000; Mon, 1 Apr 2019 13:47:30 +0800 From: "Zhang, Qi Z" To: "Ye, Xiaolong" , Olivier Matz CC: "dev@dpdk.org" , David Marchand , Andrew Rybchenko , "Karlsson, Magnus" , "Topel, Bjorn" , "Maxime Coquelin" , Stephen Hemminger , "Yigit, Ferruh" , "Luca Boccassi" , "Richardson, Bruce" , "Ananyev, Konstantin" Thread-Topic: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management Thread-Index: AQHU487/TxtLJrNC7UKpF7TJFbYxkKYiX/UAgALPjACAAZmJsA== Date: Mon, 1 Apr 2019 05:47:29 +0000 Message-ID: <039ED4275CED7440929022BC67E706115336089C@SHSMSX103.ccr.corp.intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190326122029.59359-1-xiaolong.ye@intel.com> <20190326122029.59359-5-xiaolong.ye@intel.com> <20190329174256.aicblxcasado4ld2@platinum> <20190331123818.GA108886@intel.com> In-Reply-To: <20190331123818.GA108886@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDFkOGQ5OGItYWJlZC00NDUyLTk0ZGQtMGI1YjEwYmE0ZTYxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoicDZDZVBlODA0b0x6djN4aWFvMXZEOHBKV0pQblpPMldOS0txaEJPempEVVhGR1MwN1dDODVIV2haK2tNQ1NDNiJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Mailman-Approved-At: Mon, 01 Apr 2019 10:23:29 +0200 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Apr 2019 05:47:49 -0000 > -----Original Message----- > From: Ye, Xiaolong > Sent: Sunday, March 31, 2019 8:38 PM > To: Olivier Matz > Cc: dev@dpdk.org; David Marchand ; Andrew > Rybchenko ; Zhang, Qi Z = ; > Karlsson, Magnus ; Topel, Bjorn > ; Maxime Coquelin ; > Stephen Hemminger ; Yigit, Ferruh > ; Luca Boccassi ; Richardson, B= ruce > ; Ananyev, Konstantin > > Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for > buffer management >=20 > Hi, Olivier >=20 > Thanks for the comments. >=20 > On 03/29, Olivier Matz wrote: > >Hi Xiaolong, > > > >On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote: > >> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf > >> allocated from rte_mempool can be converted to xdp_desc's address and > >> vice versa. > >> > >> Signed-off-by: Xiaolong Ye > >> --- > >> drivers/net/af_xdp/rte_eth_af_xdp.c | 117 > >> +++++++++++++++++----------- > >> 1 file changed, 72 insertions(+), 45 deletions(-) > >> > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c > >> index 47a496ed7..a1fda9212 100644 > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > >> @@ -48,7 +48,11 @@ static int af_xdp_logtype; > >> > >> #define ETH_AF_XDP_FRAME_SIZE > XSK_UMEM__DEFAULT_FRAME_SIZE > >> #define ETH_AF_XDP_NUM_BUFFERS 4096 > >> -#define ETH_AF_XDP_DATA_HEADROOM 0 > >> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 byte= s) */ > >> +#define ETH_AF_XDP_MBUF_OVERHEAD 192 > >> +/* data start from offset 320 (192 + 128) bytes */ > >> +#define ETH_AF_XDP_DATA_HEADROOM \ > >> + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) > > > >Having these constants looks quite dangerous too me. It imposes the > >size of the mbuf, and the mempool header size. It would at least > >require compilation checks. > > > >[...] > > > >> + umem->mb_pool =3D > rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", > >> + ETH_AF_XDP_NUM_BUFFERS, > >> + 250, 0, > >> + ETH_AF_XDP_FRAME_SIZE - > >> + ETH_AF_XDP_MBUF_OVERHEAD, > >> + MEMPOOL_F_NO_SPREAD | > MEMPOOL_CHUNK_F_PAGE_ALIGN, > >> + SOCKET_ID_ANY); > >> + if (umem->mb_pool =3D=3D NULL || umem->mb_pool->nb_mem_chunks !=3D 1= ) > { > >> + AF_XDP_LOG(ERR, "Failed to create mempool\n"); > >> goto err; > >> } > >> + base_addr =3D (void *)get_base_addr(umem->mb_pool); > > > >Creating a mempool in the pmd like this does not look good to me for > >several reasons: > >- the user application creates its mempool with a specific private > > area in its mbufs. Here there is no private area, so it will break > > applications doing this. > >- in patch 3 (mempool), you ensure that the chunk starts at a > > page-aligned address, and you expect that given the other flags and > > the constants at the top of the file, the data will be aligned. In > > my opinion it is not ideal. > >- the user application may create a large number of mbufs, for instance > > if the application manages large reassembly queues, or tcp sockets. > > Here the driver limits the number of mbufs to 4k per rx queue. > >- the socket_id is any, so it won't be efficient on numa architectures. >=20 > Our mbuf/mempool change regarding to zero copy does have limitations. Just want to clarify, the above code is only reached for non-zero copy case= . here we create a private memory pool be used to manage AF_XDP umem, it's no= t the Rx queues' s memory pool itself. so I don't think there is concern on the private area and 4k per rx queue f= or above code. =20 >=20 > > > >May I suggest another idea? > > > >Allocate the xsk_umem almost like in patch 1, but using rte_memzone > >allocation instead of posix_memalign() (and it will be faster, because > >it will use hugepages). And do not allocate any mempool in the driver. > > >=20 > rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in= my > first patch. >=20 > >When you receive a packet in the xsk_umem, allocate a new mbuf from the > >standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the xsk > >memory to the mbuf. You will have to register a callback to return the > >xsk memory when the mbuf is transmitted or freed. >=20 > I'll try to investigate how to implement it. >=20 > > > >This is, by the way, something I don't understand in your current > >implementation: what happens if a mbuf is received in the af_xdp > >driver, and freed by the application? How does the xsk buffer is returne= d? >=20 > It is coordinated by the fill ring. The fill ring is used by the applicat= ion ( user space) > to send down addr for the kernel to fill in with Rx packet data. > So for the free side, we just return it to the mempool, and each time in > rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs = to > fill ring, that's how we return the xsk buffer to kernel. >=20 > > > >Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and > >mempool, at the price of (maybe) decreasing the performance. But I think > >there are some places where it can be optimized. > > > >I understand my feedback comes late -- as usual :( -- but if you are in >=20 > Sorry for not Cc you in my patch set. >=20 > >a hurry for RC1, maybe we can consider to put the 1st patch only, and > >add the zero-copy mode in a second patch later. What do you think? >=20 > Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first. >=20 >=20 > Thanks, > Xiaolong >=20 > > > >Regards, > >Olivier > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id E0EC5A068B for ; Mon, 1 Apr 2019 10:23:31 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E8A2F343C; Mon, 1 Apr 2019 10:23:30 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 80D7B11A4 for ; Mon, 1 Apr 2019 07:47:46 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Mar 2019 22:47:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,295,1549958400"; d="xml'?pptx'72,48,150?scan'72,48,150,72,48,208,150?rels'72,48,150,72,48,208,150?wdp'72,48,150,72,48,208,150?png'72,48,150,72,48,208,150,150"; a="160174594" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by fmsmga001.fm.intel.com with ESMTP; 31 Mar 2019 22:47:41 -0700 Received: from fmsmsx151.amr.corp.intel.com (10.18.125.4) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.408.0; Sun, 31 Mar 2019 22:47:41 -0700 Received: from shsmsx107.ccr.corp.intel.com (10.239.4.96) by FMSMSX151.amr.corp.intel.com (10.18.125.4) with Microsoft SMTP Server (TLS) id 14.3.408.0; Sun, 31 Mar 2019 22:47:32 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.93]) by SHSMSX107.ccr.corp.intel.com ([169.254.9.153]) with mapi id 14.03.0415.000; Mon, 1 Apr 2019 13:47:30 +0800 From: "Zhang, Qi Z" To: "Ye, Xiaolong" , Olivier Matz CC: "dev@dpdk.org" , David Marchand , Andrew Rybchenko , "Karlsson, Magnus" , "Topel, Bjorn" , "Maxime Coquelin" , Stephen Hemminger , "Yigit, Ferruh" , "Luca Boccassi" , "Richardson, Bruce" , "Ananyev, Konstantin" Thread-Topic: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management Thread-Index: AQHU487/TxtLJrNC7UKpF7TJFbYxkKYiX/UAgALPjACAAZmJsA== Date: Mon, 1 Apr 2019 05:47:29 +0000 Message-ID: <039ED4275CED7440929022BC67E706115336089C@SHSMSX103.ccr.corp.intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190326122029.59359-1-xiaolong.ye@intel.com> <20190326122029.59359-5-xiaolong.ye@intel.com> <20190329174256.aicblxcasado4ld2@platinum> <20190331123818.GA108886@intel.com> In-Reply-To: <20190331123818.GA108886@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDFkOGQ5OGItYWJlZC00NDUyLTk0ZGQtMGI1YjEwYmE0ZTYxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoicDZDZVBlODA0b0x6djN4aWFvMXZEOHBKV0pQblpPMldOS0txaEJPempEVVhGR1MwN1dDODVIV2haK2tNQ1NDNiJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Mailman-Approved-At: Mon, 01 Apr 2019 10:23:29 +0200 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190401054729.KC8_DTsf6j4m8tybkfgYR-U6DwlghPp7FeWQvOkRvOo@z> > -----Original Message----- > From: Ye, Xiaolong > Sent: Sunday, March 31, 2019 8:38 PM > To: Olivier Matz > Cc: dev@dpdk.org; David Marchand ; Andrew > Rybchenko ; Zhang, Qi Z = ; > Karlsson, Magnus ; Topel, Bjorn > ; Maxime Coquelin ; > Stephen Hemminger ; Yigit, Ferruh > ; Luca Boccassi ; Richardson, B= ruce > ; Ananyev, Konstantin > > Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for > buffer management >=20 > Hi, Olivier >=20 > Thanks for the comments. >=20 > On 03/29, Olivier Matz wrote: > >Hi Xiaolong, > > > >On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote: > >> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf > >> allocated from rte_mempool can be converted to xdp_desc's address and > >> vice versa. > >> > >> Signed-off-by: Xiaolong Ye > >> --- > >> drivers/net/af_xdp/rte_eth_af_xdp.c | 117 > >> +++++++++++++++++----------- > >> 1 file changed, 72 insertions(+), 45 deletions(-) > >> > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c > >> index 47a496ed7..a1fda9212 100644 > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > >> @@ -48,7 +48,11 @@ static int af_xdp_logtype; > >> > >> #define ETH_AF_XDP_FRAME_SIZE > XSK_UMEM__DEFAULT_FRAME_SIZE > >> #define ETH_AF_XDP_NUM_BUFFERS 4096 > >> -#define ETH_AF_XDP_DATA_HEADROOM 0 > >> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 byte= s) */ > >> +#define ETH_AF_XDP_MBUF_OVERHEAD 192 > >> +/* data start from offset 320 (192 + 128) bytes */ > >> +#define ETH_AF_XDP_DATA_HEADROOM \ > >> + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) > > > >Having these constants looks quite dangerous too me. It imposes the > >size of the mbuf, and the mempool header size. It would at least > >require compilation checks. > > > >[...] > > > >> + umem->mb_pool =3D > rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", > >> + ETH_AF_XDP_NUM_BUFFERS, > >> + 250, 0, > >> + ETH_AF_XDP_FRAME_SIZE - > >> + ETH_AF_XDP_MBUF_OVERHEAD, > >> + MEMPOOL_F_NO_SPREAD | > MEMPOOL_CHUNK_F_PAGE_ALIGN, > >> + SOCKET_ID_ANY); > >> + if (umem->mb_pool =3D=3D NULL || umem->mb_pool->nb_mem_chunks !=3D 1= ) > { > >> + AF_XDP_LOG(ERR, "Failed to create mempool\n"); > >> goto err; > >> } > >> + base_addr =3D (void *)get_base_addr(umem->mb_pool); > > > >Creating a mempool in the pmd like this does not look good to me for > >several reasons: > >- the user application creates its mempool with a specific private > > area in its mbufs. Here there is no private area, so it will break > > applications doing this. > >- in patch 3 (mempool), you ensure that the chunk starts at a > > page-aligned address, and you expect that given the other flags and > > the constants at the top of the file, the data will be aligned. In > > my opinion it is not ideal. > >- the user application may create a large number of mbufs, for instance > > if the application manages large reassembly queues, or tcp sockets. > > Here the driver limits the number of mbufs to 4k per rx queue. > >- the socket_id is any, so it won't be efficient on numa architectures. >=20 > Our mbuf/mempool change regarding to zero copy does have limitations. Just want to clarify, the above code is only reached for non-zero copy case= . here we create a private memory pool be used to manage AF_XDP umem, it's no= t the Rx queues' s memory pool itself. so I don't think there is concern on the private area and 4k per rx queue f= or above code. =20 >=20 > > > >May I suggest another idea? > > > >Allocate the xsk_umem almost like in patch 1, but using rte_memzone > >allocation instead of posix_memalign() (and it will be faster, because > >it will use hugepages). And do not allocate any mempool in the driver. > > >=20 > rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in= my > first patch. >=20 > >When you receive a packet in the xsk_umem, allocate a new mbuf from the > >standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the xsk > >memory to the mbuf. You will have to register a callback to return the > >xsk memory when the mbuf is transmitted or freed. >=20 > I'll try to investigate how to implement it. >=20 > > > >This is, by the way, something I don't understand in your current > >implementation: what happens if a mbuf is received in the af_xdp > >driver, and freed by the application? How does the xsk buffer is returne= d? >=20 > It is coordinated by the fill ring. The fill ring is used by the applicat= ion ( user space) > to send down addr for the kernel to fill in with Rx packet data. > So for the free side, we just return it to the mempool, and each time in > rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs = to > fill ring, that's how we return the xsk buffer to kernel. >=20 > > > >Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and > >mempool, at the price of (maybe) decreasing the performance. But I think > >there are some places where it can be optimized. > > > >I understand my feedback comes late -- as usual :( -- but if you are in >=20 > Sorry for not Cc you in my patch set. >=20 > >a hurry for RC1, maybe we can consider to put the 1st patch only, and > >add the zero-copy mode in a second patch later. What do you think? >=20 > Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first. >=20 >=20 > Thanks, > Xiaolong >=20 > > > >Regards, > >Olivier > > > >