From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 0D26E343C for ; Fri, 29 Mar 2019 18:42:58 +0100 (CET) Received: from lfbn-1-5920-128.w90-110.abo.wanadoo.fr ([90.110.126.128] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h9vZE-0002mL-8A; Fri, 29 Mar 2019 18:45:25 +0100 Received: by droids-corp.org (sSMTP sendmail emulation); Fri, 29 Mar 2019 18:42:56 +0100 Date: Fri, 29 Mar 2019 18:42:56 +0100 From: Olivier Matz To: Xiaolong Ye Cc: dev@dpdk.org, David Marchand , Andrew Rybchenko , Qi Zhang , Karlsson Magnus , Topel Bjorn , Maxime Coquelin , Stephen Hemminger , Ferruh Yigit , Luca Boccassi , Bruce Richardson , Ananyev Konstantin Message-ID: <20190329174256.aicblxcasado4ld2@platinum> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190326122029.59359-1-xiaolong.ye@intel.com> <20190326122029.59359-5-xiaolong.ye@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190326122029.59359-5-xiaolong.ye@intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Mar 2019 17:42:59 -0000 Hi Xiaolong, On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote: > Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf > allocated from rte_mempool can be converted to xdp_desc's address and > vice versa. > > Signed-off-by: Xiaolong Ye > --- > drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++----------- > 1 file changed, 72 insertions(+), 45 deletions(-) > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c > index 47a496ed7..a1fda9212 100644 > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > @@ -48,7 +48,11 @@ static int af_xdp_logtype; > > #define ETH_AF_XDP_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE > #define ETH_AF_XDP_NUM_BUFFERS 4096 > -#define ETH_AF_XDP_DATA_HEADROOM 0 > +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */ > +#define ETH_AF_XDP_MBUF_OVERHEAD 192 > +/* data start from offset 320 (192 + 128) bytes */ > +#define ETH_AF_XDP_DATA_HEADROOM \ > + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) Having these constants looks quite dangerous too me. It imposes the size of the mbuf, and the mempool header size. It would at least require compilation checks. [...] > + umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", > + ETH_AF_XDP_NUM_BUFFERS, > + 250, 0, > + ETH_AF_XDP_FRAME_SIZE - > + ETH_AF_XDP_MBUF_OVERHEAD, > + MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN, > + SOCKET_ID_ANY); > + if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) { > + AF_XDP_LOG(ERR, "Failed to create mempool\n"); > goto err; > } > + base_addr = (void *)get_base_addr(umem->mb_pool); Creating a mempool in the pmd like this does not look good to me for several reasons: - the user application creates its mempool with a specific private area in its mbufs. Here there is no private area, so it will break applications doing this. - in patch 3 (mempool), you ensure that the chunk starts at a page-aligned address, and you expect that given the other flags and the constants at the top of the file, the data will be aligned. In my opinion it is not ideal. - the user application may create a large number of mbufs, for instance if the application manages large reassembly queues, or tcp sockets. Here the driver limits the number of mbufs to 4k per rx queue. - the socket_id is any, so it won't be efficient on numa architectures. May I suggest another idea? Allocate the xsk_umem almost like in patch 1, but using rte_memzone allocation instead of posix_memalign() (and it will be faster, because it will use hugepages). And do not allocate any mempool in the driver. When you receive a packet in the xsk_umem, allocate a new mbuf from the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the xsk memory to the mbuf. You will have to register a callback to return the xsk memory when the mbuf is transmitted or freed. This is, by the way, something I don't understand in your current implementation: what happens if a mbuf is received in the af_xdp driver, and freed by the application? How does the xsk buffer is returned? Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and mempool, at the price of (maybe) decreasing the performance. But I think there are some places where it can be optimized. I understand my feedback comes late -- as usual :( -- but if you are in a hurry for RC1, maybe we can consider to put the 1st patch only, and add the zero-copy mode in a second patch later. What do you think? Regards, Olivier From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id CA329A05D3 for ; Fri, 29 Mar 2019 18:43:07 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A35892BF4; Fri, 29 Mar 2019 18:43:06 +0100 (CET) Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 0D26E343C for ; Fri, 29 Mar 2019 18:42:58 +0100 (CET) Received: from lfbn-1-5920-128.w90-110.abo.wanadoo.fr ([90.110.126.128] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h9vZE-0002mL-8A; Fri, 29 Mar 2019 18:45:25 +0100 Received: by droids-corp.org (sSMTP sendmail emulation); Fri, 29 Mar 2019 18:42:56 +0100 Date: Fri, 29 Mar 2019 18:42:56 +0100 From: Olivier Matz To: Xiaolong Ye Cc: dev@dpdk.org, David Marchand , Andrew Rybchenko , Qi Zhang , Karlsson Magnus , Topel Bjorn , Maxime Coquelin , Stephen Hemminger , Ferruh Yigit , Luca Boccassi , Bruce Richardson , Ananyev Konstantin Message-ID: <20190329174256.aicblxcasado4ld2@platinum> References: <20190301080947.91086-1-xiaolong.ye@intel.com> <20190326122029.59359-1-xiaolong.ye@intel.com> <20190326122029.59359-5-xiaolong.ye@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline In-Reply-To: <20190326122029.59359-5-xiaolong.ye@intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for buffer management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190329174256.sVE7OZE5dWZ1jilI0HoKJ1vfY7oCKNZchqwmgMlFaig@z> Hi Xiaolong, On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote: > Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf > allocated from rte_mempool can be converted to xdp_desc's address and > vice versa. > > Signed-off-by: Xiaolong Ye > --- > drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++----------- > 1 file changed, 72 insertions(+), 45 deletions(-) > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c > index 47a496ed7..a1fda9212 100644 > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c > @@ -48,7 +48,11 @@ static int af_xdp_logtype; > > #define ETH_AF_XDP_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE > #define ETH_AF_XDP_NUM_BUFFERS 4096 > -#define ETH_AF_XDP_DATA_HEADROOM 0 > +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */ > +#define ETH_AF_XDP_MBUF_OVERHEAD 192 > +/* data start from offset 320 (192 + 128) bytes */ > +#define ETH_AF_XDP_DATA_HEADROOM \ > + (ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM) Having these constants looks quite dangerous too me. It imposes the size of the mbuf, and the mempool header size. It would at least require compilation checks. [...] > + umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool", > + ETH_AF_XDP_NUM_BUFFERS, > + 250, 0, > + ETH_AF_XDP_FRAME_SIZE - > + ETH_AF_XDP_MBUF_OVERHEAD, > + MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN, > + SOCKET_ID_ANY); > + if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) { > + AF_XDP_LOG(ERR, "Failed to create mempool\n"); > goto err; > } > + base_addr = (void *)get_base_addr(umem->mb_pool); Creating a mempool in the pmd like this does not look good to me for several reasons: - the user application creates its mempool with a specific private area in its mbufs. Here there is no private area, so it will break applications doing this. - in patch 3 (mempool), you ensure that the chunk starts at a page-aligned address, and you expect that given the other flags and the constants at the top of the file, the data will be aligned. In my opinion it is not ideal. - the user application may create a large number of mbufs, for instance if the application manages large reassembly queues, or tcp sockets. Here the driver limits the number of mbufs to 4k per rx queue. - the socket_id is any, so it won't be efficient on numa architectures. May I suggest another idea? Allocate the xsk_umem almost like in patch 1, but using rte_memzone allocation instead of posix_memalign() (and it will be faster, because it will use hugepages). And do not allocate any mempool in the driver. When you receive a packet in the xsk_umem, allocate a new mbuf from the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the xsk memory to the mbuf. You will have to register a callback to return the xsk memory when the mbuf is transmitted or freed. This is, by the way, something I don't understand in your current implementation: what happens if a mbuf is received in the af_xdp driver, and freed by the application? How does the xsk buffer is returned? Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and mempool, at the price of (maybe) decreasing the performance. But I think there are some places where it can be optimized. I understand my feedback comes late -- as usual :( -- but if you are in a hurry for RC1, maybe we can consider to put the 1st patch only, and add the zero-copy mode in a second patch later. What do you think? Regards, Olivier