From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <olivier.matz@6wind.com>
Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67])
 by dpdk.org (Postfix) with ESMTP id 0D26E343C
 for <dev@dpdk.org>; Fri, 29 Mar 2019 18:42:58 +0100 (CET)
Received: from lfbn-1-5920-128.w90-110.abo.wanadoo.fr ([90.110.126.128]
 helo=droids-corp.org)
 by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.89) (envelope-from <olivier.matz@6wind.com>)
 id 1h9vZE-0002mL-8A; Fri, 29 Mar 2019 18:45:25 +0100
Received: by droids-corp.org (sSMTP sendmail emulation);
 Fri, 29 Mar 2019 18:42:56 +0100
Date: Fri, 29 Mar 2019 18:42:56 +0100
From: Olivier Matz <olivier.matz@6wind.com>
To: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: dev@dpdk.org, David Marchand <david.marchand@redhat.com>,
 Andrew Rybchenko <arybchenko@solarflare.com>,
 Qi Zhang <qi.z.zhang@intel.com>,
 Karlsson Magnus <magnus.karlsson@intel.com>,
 Topel Bjorn <bjorn.topel@intel.com>,
 Maxime Coquelin <maxime.coquelin@redhat.com>,
 Stephen Hemminger <stephen@networkplumber.org>,
 Ferruh Yigit <ferruh.yigit@intel.com>, Luca Boccassi <bluca@debian.org>,
 Bruce Richardson <bruce.richardson@intel.com>,
 Ananyev Konstantin <konstantin.ananyev@intel.com>
Message-ID: <20190329174256.aicblxcasado4ld2@platinum>
References: <20190301080947.91086-1-xiaolong.ye@intel.com>
 <20190326122029.59359-1-xiaolong.ye@intel.com>
 <20190326122029.59359-5-xiaolong.ye@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190326122029.59359-5-xiaolong.ye@intel.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for
 buffer management
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Mar 2019 17:42:59 -0000

Hi Xiaolong,

On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote:
> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
> allocated from rte_mempool can be converted to xdp_desc's address and
> vice versa.
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
>  1 file changed, 72 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 47a496ed7..a1fda9212 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
>  
>  #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>  #define ETH_AF_XDP_NUM_BUFFERS		4096
> -#define ETH_AF_XDP_DATA_HEADROOM	0
> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
> +/* data start from offset 320 (192 + 128) bytes */
> +#define ETH_AF_XDP_DATA_HEADROOM				\
> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)

Having these constants looks quite dangerous too me. It imposes the size
of the mbuf, and the mempool header size. It would at least require
compilation checks.

[...]

> +	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
> +			ETH_AF_XDP_NUM_BUFFERS,
> +			250, 0,
> +			ETH_AF_XDP_FRAME_SIZE -
> +			ETH_AF_XDP_MBUF_OVERHEAD,
> +			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
> +			SOCKET_ID_ANY);
> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
>  		goto err;
>  	}
> +	base_addr = (void *)get_base_addr(umem->mb_pool);

Creating a mempool in the pmd like this does not look good to me for
several reasons:
- the user application creates its mempool with a specific private
  area in its mbufs. Here there is no private area, so it will break
  applications doing this.
- in patch 3 (mempool), you ensure that the chunk starts at a
  page-aligned address, and you expect that given the other flags and
  the constants at the top of the file, the data will be aligned. In
  my opinion it is not ideal.
- the user application may create a large number of mbufs, for instance
  if the application manages large reassembly queues, or tcp sockets.
  Here the driver limits the number of mbufs to 4k per rx queue.
- the socket_id is any, so it won't be efficient on numa architectures.

May I suggest another idea?

Allocate the xsk_umem almost like in patch 1, but using rte_memzone
allocation instead of posix_memalign() (and it will be faster, because
it will use hugepages). And do not allocate any mempool in the driver.

When you receive a packet in the xsk_umem, allocate a new mbuf from
the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the
xsk memory to the mbuf. You will have to register a callback to return
the xsk memory when the mbuf is transmitted or freed.

This is, by the way, something I don't understand in your current
implementation: what happens if a mbuf is received in the af_xdp driver,
and freed by the application? How does the xsk buffer is returned?

Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and
mempool, at the price of (maybe) decreasing the performance. But I think
there are some places where it can be optimized.

I understand my feedback comes late -- as usual :( -- but if you are in
a hurry for RC1, maybe we can consider to put the 1st patch only, and
add the zero-copy mode in a second patch later. What do you think?

Regards,
Olivier

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id CA329A05D3
	for <public@inbox.dpdk.org>; Fri, 29 Mar 2019 18:43:07 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id A35892BF4;
	Fri, 29 Mar 2019 18:43:06 +0100 (CET)
Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67])
 by dpdk.org (Postfix) with ESMTP id 0D26E343C
 for <dev@dpdk.org>; Fri, 29 Mar 2019 18:42:58 +0100 (CET)
Received: from lfbn-1-5920-128.w90-110.abo.wanadoo.fr ([90.110.126.128]
 helo=droids-corp.org)
 by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.89) (envelope-from <olivier.matz@6wind.com>)
 id 1h9vZE-0002mL-8A; Fri, 29 Mar 2019 18:45:25 +0100
Received: by droids-corp.org (sSMTP sendmail emulation);
 Fri, 29 Mar 2019 18:42:56 +0100
Date: Fri, 29 Mar 2019 18:42:56 +0100
From: Olivier Matz <olivier.matz@6wind.com>
To: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: dev@dpdk.org, David Marchand <david.marchand@redhat.com>,
 Andrew Rybchenko <arybchenko@solarflare.com>,
 Qi Zhang <qi.z.zhang@intel.com>,
 Karlsson Magnus <magnus.karlsson@intel.com>,
 Topel Bjorn <bjorn.topel@intel.com>,
 Maxime Coquelin <maxime.coquelin@redhat.com>,
 Stephen Hemminger <stephen@networkplumber.org>,
 Ferruh Yigit <ferruh.yigit@intel.com>, Luca Boccassi <bluca@debian.org>,
 Bruce Richardson <bruce.richardson@intel.com>,
 Ananyev Konstantin <konstantin.ananyev@intel.com>
Message-ID: <20190329174256.aicblxcasado4ld2@platinum>
References: <20190301080947.91086-1-xiaolong.ye@intel.com>
 <20190326122029.59359-1-xiaolong.ye@intel.com>
 <20190326122029.59359-5-xiaolong.ye@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Disposition: inline
In-Reply-To: <20190326122029.59359-5-xiaolong.ye@intel.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Subject: Re: [dpdk-dev] [PATCH v6 4/5] net/af_xdp: use mbuf mempool for
 buffer management
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190329174256.sVE7OZE5dWZ1jilI0HoKJ1vfY7oCKNZchqwmgMlFaig@z>

Hi Xiaolong,

On Tue, Mar 26, 2019 at 08:20:28PM +0800, Xiaolong Ye wrote:
> Now, af_xdp registered memory buffer is managed by rte_mempool. mbuf
> allocated from rte_mempool can be converted to xdp_desc's address and
> vice versa.
> 
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 117 +++++++++++++++++-----------
>  1 file changed, 72 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 47a496ed7..a1fda9212 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -48,7 +48,11 @@ static int af_xdp_logtype;
>  
>  #define ETH_AF_XDP_FRAME_SIZE		XSK_UMEM__DEFAULT_FRAME_SIZE
>  #define ETH_AF_XDP_NUM_BUFFERS		4096
> -#define ETH_AF_XDP_DATA_HEADROOM	0
> +/* mempool hdrobj size (64 bytes) + sizeof(struct rte_mbuf) (128 bytes) */
> +#define ETH_AF_XDP_MBUF_OVERHEAD	192
> +/* data start from offset 320 (192 + 128) bytes */
> +#define ETH_AF_XDP_DATA_HEADROOM				\
> +	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)

Having these constants looks quite dangerous too me. It imposes the size
of the mbuf, and the mempool header size. It would at least require
compilation checks.

[...]

> +	umem->mb_pool = rte_pktmbuf_pool_create_with_flags("af_xdp_mempool",
> +			ETH_AF_XDP_NUM_BUFFERS,
> +			250, 0,
> +			ETH_AF_XDP_FRAME_SIZE -
> +			ETH_AF_XDP_MBUF_OVERHEAD,
> +			MEMPOOL_F_NO_SPREAD | MEMPOOL_CHUNK_F_PAGE_ALIGN,
> +			SOCKET_ID_ANY);
> +	if (umem->mb_pool == NULL || umem->mb_pool->nb_mem_chunks != 1) {
> +		AF_XDP_LOG(ERR, "Failed to create mempool\n");
>  		goto err;
>  	}
> +	base_addr = (void *)get_base_addr(umem->mb_pool);

Creating a mempool in the pmd like this does not look good to me for
several reasons:
- the user application creates its mempool with a specific private
  area in its mbufs. Here there is no private area, so it will break
  applications doing this.
- in patch 3 (mempool), you ensure that the chunk starts at a
  page-aligned address, and you expect that given the other flags and
  the constants at the top of the file, the data will be aligned. In
  my opinion it is not ideal.
- the user application may create a large number of mbufs, for instance
  if the application manages large reassembly queues, or tcp sockets.
  Here the driver limits the number of mbufs to 4k per rx queue.
- the socket_id is any, so it won't be efficient on numa architectures.

May I suggest another idea?

Allocate the xsk_umem almost like in patch 1, but using rte_memzone
allocation instead of posix_memalign() (and it will be faster, because
it will use hugepages). And do not allocate any mempool in the driver.

When you receive a packet in the xsk_umem, allocate a new mbuf from
the standard pool. Then, use rte_pktmbuf_attach_extbuf() to attach the
xsk memory to the mbuf. You will have to register a callback to return
the xsk memory when the mbuf is transmitted or freed.

This is, by the way, something I don't understand in your current
implementation: what happens if a mbuf is received in the af_xdp driver,
and freed by the application? How does the xsk buffer is returned?

Using rte_pktmbuf_attach_extbuf() would remove changes in mbuf and
mempool, at the price of (maybe) decreasing the performance. But I think
there are some places where it can be optimized.

I understand my feedback comes late -- as usual :( -- but if you are in
a hurry for RC1, maybe we can consider to put the 1st patch only, and
add the zero-copy mode in a second patch later. What do you think?

Regards,
Olivier