From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id E3D0E11A4 for ; Sun, 31 Mar 2019 14:43:01 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Mar 2019 05:43:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,292,1549958400"; d="scan'208";a="138915453" Received: from yexl-server.sh.intel.com (HELO localhost) ([10.67.110.206]) by fmsmga007.fm.intel.com with ESMTP; 31 Mar 2019 05:42:58 -0700 Date: Sun, 31 Mar 2019 20:38:18 +0800 From: Ye Xiaolong To: Olivier Matz Cc: dev@dpdk.org, David Marchand , Andrew Rybchenko , Qi Zhang , Karlsson Magnus , Topel Bjorn , Maxime Coquelin , Stephen Hemminger , Ferruh Yigit , Luca Boccassi , Bruce Richardson , Ananyev Konstantin Message-ID: <20190331123818.GA108886@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190326122029.59359-1-xiaolong.ye@intel.com> <20190326122029.59359-5-xiaolong.ye@intel.com> <20190329174256.aicblxcasado4ld2@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190329174256.aicblxcasado4ld2@platinum> User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Mar 2019 12:43:02 -0000 Hi, Olivier Thanks for the comments. On 03/29, Olivier Matz wrote: >Hi Xiaolong, > >On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote: >> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf >> allocated from rte_mempool can be converted to xdp_desc's address and >> vice versa. >> >> Signed-off-by: Xiaolong Ye >> --- >> drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++----------- >> 1 file changed, 72 insertions(+), 45 deletions(-) >> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c >> index 47a496ed7..a1fda9212 100644 >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c >> @@ -48,7 +48,11 @@ static int af_xdp_logtype; >> >> #define ETH_AF_XDP_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE >> #define ETH_AF_XDP_NUM_BUFFERS 4096 >> -#define ETH_AF_XDP_DATA_HEADROOM 0 >> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */ >> +#define ETH_AF_XDP_MBUF_OVERHEAD 192 >> +/* data start from offset 320 (192 + 128) bytes */ >> +#define ETH_AF_XDP_DATA_HEADROOM \ >> + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) > >Having these constants looks quite dangerous too me. It imposes the size >of the mbuf, and the mempool header size. It would at least require >compilation checks. > >[...] > >> + umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", >> + ETH_AF_XDP_NUM_BUFFERS, >> + 250, 0, >> + ETH_AF_XDP_FRAME_SIZE - >> + ETH_AF_XDP_MBUF_OVERHEAD, >> + MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN, >> + SOCKET_ID_ANY); >> + if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) { >> + AF_XDP_LOG(ERR, "Failed to create mempool\n"); >> goto err; >> } >> + base_addr = (void *)get_base_addr(umem->mb_pool); > >Creating a mempool in the pmd like this does not look good to me for >several reasons: >- the user application creates its mempool with a specific private > area in its mbufs. Here there is no private area, so it will break > applications doing this. >- in patch 3 (mempool), you ensure that the chunk starts at a > page-aligned address, and you expect that given the other flags and > the constants at the top of the file, the data will be aligned. In > my opinion it is not ideal. >- the user application may create a large number of mbufs, for instance > if the application manages large reassembly queues, or tcp sockets. > Here the driver limits the number of mbufs to 4k per rx queue. >- the socket_id is any, so it won't be efficient on numa architectures. Our mbuf/mempool change regarding to zero copy does have limitations. > >May I suggest another idea? > >Allocate the xsk_umem almost like in patch 1, but using rte_memzone >allocation instead of posix_memalign() (and it will be faster, because >it will use hugepages). And do not allocate any mempool in the driver. > rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in my first patch. >When you receive a packet in the xsk_umem, allocate a new mbuf from >the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the >xsk memory to the mbuf. You will have to register a callback to return >the xsk memory when the mbuf is transmitted or freed. I'll try to investigate how to implement it. > >This is, by the way, something I don't understand in your current >implementation: what happens if a mbuf is received in the af_xdp driver, >and freed by the application? How does the xsk buffer is returned? It is coordinated by the fill ring. The fill ring is used by the application ( user space) to send down addr for the kernel to fill in with Rx packet data. So for the free side, we just return it to the mempool, and each time in rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs to fill ring, that's how we return the xsk buffer to kernel. > >Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and >mempool, at the price of (maybe) decreasing the performance. But I think >there are some places where it can be optimized. > >I understand my feedback comes late -- as usual :( -- but if you are in Sorry for not Cc you in my patch set. >a hurry for RC1, maybe we can consider to put the 1st patch only, and >add the zero-copy mode in a second patch later. What do you think? Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first. Thanks, Xiaolong > >Regards, >Olivier > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 68F1FA00B9 for ; Sun, 31 Mar 2019 14:43:04 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7C6BF1E33; Sun, 31 Mar 2019 14:43:03 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id E3D0E11A4 for ; Sun, 31 Mar 2019 14:43:01 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Mar 2019 05:43:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,292,1549958400"; d="scan'208";a="138915453" Received: from yexl-server.sh.intel.com (HELO localhost) ([10.67.110.206]) by fmsmga007.fm.intel.com with ESMTP; 31 Mar 2019 05:42:58 -0700 Date: Sun, 31 Mar 2019 20:38:18 +0800 From: Ye Xiaolong To: Olivier Matz Cc: dev@dpdk.org, David Marchand , Andrew Rybchenko , Qi Zhang , Karlsson Magnus , Topel Bjorn , Maxime Coquelin , Stephen Hemminger , Ferruh Yigit , Luca Boccassi , Bruce Richardson , Ananyev Konstantin Message-ID: <20190331123818.GA108886@intel.com> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190326122029.59359-1-xiaolong.ye@intel.com> <20190326122029.59359-5-xiaolong.ye@intel.com> <20190329174256.aicblxcasado4ld2@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline In-Reply-To: <20190329174256.aicblxcasado4ld2@platinum> User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190331123818.dPWbBolHt--2P32spZ7pFEu-zYweKItMDluuBqkXu2Q@z> Hi, Olivier Thanks for the comments. On 03/29, Olivier Matz wrote: >Hi Xiaolong, > >On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote: >> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf >> allocated from rte_mempool can be converted to xdp_desc's address and >> vice versa. >> >> Signed-off-by: Xiaolong Ye >> --- >> drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++----------- >> 1 file changed, 72 insertions(+), 45 deletions(-) >> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c >> index 47a496ed7..a1fda9212 100644 >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c >> @@ -48,7 +48,11 @@ static int af_xdp_logtype; >> >> #define ETH_AF_XDP_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE >> #define ETH_AF_XDP_NUM_BUFFERS 4096 >> -#define ETH_AF_XDP_DATA_HEADROOM 0 >> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */ >> +#define ETH_AF_XDP_MBUF_OVERHEAD 192 >> +/* data start from offset 320 (192 + 128) bytes */ >> +#define ETH_AF_XDP_DATA_HEADROOM \ >> + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) > >Having these constants looks quite dangerous too me. It imposes the size >of the mbuf, and the mempool header size. It would at least require >compilation checks. > >[...] > >> + umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", >> + ETH_AF_XDP_NUM_BUFFERS, >> + 250, 0, >> + ETH_AF_XDP_FRAME_SIZE - >> + ETH_AF_XDP_MBUF_OVERHEAD, >> + MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN, >> + SOCKET_ID_ANY); >> + if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) { >> + AF_XDP_LOG(ERR, "Failed to create mempool\n"); >> goto err; >> } >> + base_addr = (void *)get_base_addr(umem->mb_pool); > >Creating a mempool in the pmd like this does not look good to me for >several reasons: >- the user application creates its mempool with a specific private > area in its mbufs. Here there is no private area, so it will break > applications doing this. >- in patch 3 (mempool), you ensure that the chunk starts at a > page-aligned address, and you expect that given the other flags and > the constants at the top of the file, the data will be aligned. In > my opinion it is not ideal. >- the user application may create a large number of mbufs, for instance > if the application manages large reassembly queues, or tcp sockets. > Here the driver limits the number of mbufs to 4k per rx queue. >- the socket_id is any, so it won't be efficient on numa architectures. Our mbuf/mempool change regarding to zero copy does have limitations. > >May I suggest another idea? > >Allocate the xsk_umem almost like in patch 1, but using rte_memzone >allocation instead of posix_memalign() (and it will be faster, because >it will use hugepages). And do not allocate any mempool in the driver. > rte_memzone_reserve_aligned is better than posix_memalign, I'll use it in my first patch. >When you receive a packet in the xsk_umem, allocate a new mbuf from >the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the >xsk memory to the mbuf. You will have to register a callback to return >the xsk memory when the mbuf is transmitted or freed. I'll try to investigate how to implement it. > >This is, by the way, something I don't understand in your current >implementation: what happens if a mbuf is received in the af_xdp driver, >and freed by the application? How does the xsk buffer is returned? It is coordinated by the fill ring. The fill ring is used by the application ( user space) to send down addr for the kernel to fill in with Rx packet data. So for the free side, we just return it to the mempool, and each time in rx_pkt_burst, we would allocate new mbufs and submit corresponding addrs to fill ring, that's how we return the xsk buffer to kernel. > >Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and >mempool, at the price of (maybe) decreasing the performance. But I think >there are some places where it can be optimized. > >I understand my feedback comes late -- as usual :( -- but if you are in Sorry for not Cc you in my patch set. >a hurry for RC1, maybe we can consider to put the 1st patch only, and >add the zero-copy mode in a second patch later. What do you think? Sounds a sensible plan, I'll try to exteranl mbuf buffer scheme first. Thanks, Xiaolong > >Regards, >Olivier > >