DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Kusztal, ArkadiuszX" <arkadiuszx.kusztal@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>,
	"Marchand, David" <david.marchand@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Ji, Kai" <kai.ji@intel.com>,
	"Dooley, Brian" <brian.dooley@intel.com>
Subject: RE: [PATCH v2 1/3] net: add thread-safe crc api
Date: Wed, 9 Oct 2024 07:48:20 +0000	[thread overview]
Message-ID: <PH0PR11MB5013F251BE81CE4A56D91C2C9F7F2@PH0PR11MB5013.namprd11.prod.outlook.com> (raw)
In-Reply-To: <b6b839de-1580-48a5-a705-cf06cbae928e@amd.com>



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, October 9, 2024 3:03 AM
> To: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; Marchand, David
> <david.marchand@redhat.com>
> Cc: dev@dpdk.org; Ji, Kai <kai.ji@intel.com>; Dooley, Brian
> <brian.dooley@intel.com>
> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api
> 
> On 10/8/2024 9:51 PM, Kusztal, ArkadiuszX wrote:
> > Hi Ferruh,
> > Thanks for the review, comments inline,
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Tuesday, October 8, 2024 5:43 AM
> >> To: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; Marchand,
> >> David <david.marchand@redhat.com>
> >> Cc: dev@dpdk.org; Ji, Kai <kai.ji@intel.com>; Dooley, Brian
> >> <brian.dooley@intel.com>
> >> Subject: Re: [PATCH v2 1/3] net: add thread-safe crc api
> >>
> >> On 10/1/2024 7:11 PM, Arkadiusz Kusztal wrote:
> >>> The current net CRC API is not thread-safe, this patch solves this
> >>> by adding another, thread-safe API functions.
> >>> This API is also safe to use across multiple processes, yet with
> >>> limitations on max-simd-bitwidth, which will be checked only by the
> >>> process that created the CRC context; all other processes will use
> >>> the same CRC function when used with the same CRC context.
> >>> It is an undefined behavior when process binaries are compiled with
> >>> different SIMD capabilities when the same CRC context is used.
> >>>
> >>> Signed-off-by: Arkadiusz Kusztal <arkadiuszx.kusztal@intel.com>
> >>
> >> <...>
> >>
> >>> +static struct
> >>> +{
> >>> +	uint32_t (*f[RTE_NET_CRC_REQS])
> >>> +		(const uint8_t *data, uint32_t data_len);
> >>>
> >>
> >> It increases readability to typedef function pointers.
> >
> > Agree, though this typedef would be used here only, that’s why I left it out.
> > But I can add it then.
> >
> >>
> >>
> >>> +} handlers[RTE_NET_CRC_AVX512 + 1];
> >>>
> >>> -/**
> >>> - * Reflect the bits about the middle
> >>> - *
> >>> - * @param val
> >>> - *   value to be reflected
> >>> - *
> >>> - * @return
> >>> - *   reflected value
> >>> - */
> >>> -static uint32_t
> >>> +static inline uint32_t
> >>>
> >>
> >> Does changing to 'inline' required, as function is static compiler
> >> can do the same.
> >
> > True, though it may be more readable sometimes.
> > Of course there is no way that in O3 these functions would not be inlined by
> the compiler, regardless if the inline hint is present or not.
> >
> >>
> >>>  reflect_32bits(uint32_t val)
> >>>  {
> >>>  	uint32_t i, res = 0;
> >>> @@ -99,26 +43,7 @@ reflect_32bits(uint32_t val)
> >>>  	return res;
> >>>  }
> >>>
> >>> -static void
> >>> -crc32_eth_init_lut(uint32_t poly,
> >>> -	uint32_t *lut)
> >>> -{
> >>> -	uint32_t i, j;
> >>> -
> >>> -	for (i = 0; i < CRC_LUT_SIZE; i++) {
> >>> -		uint32_t crc = reflect_32bits(i);
> >>> -
> >>> -		for (j = 0; j < 8; j++) {
> >>> -			if (crc & 0x80000000L)
> >>> -				crc = (crc << 1) ^ poly;
> >>> -			else
> >>> -				crc <<= 1;
> >>> -		}
> >>> -		lut[i] = reflect_32bits(crc);
> >>> -	}
> >>> -}
> >>> -
> >>> -static __rte_always_inline uint32_t
> >>> +static inline uint32_t
> >>>
> >>
> >> Why not forcing inline anymore?
> >> Are these inline changes related to the thread-safety?
> >
> > O3 will inline it anyway, and with always_inline it will be inline even in debug
> mode. I just see no reason forcing it upon the compiler.
> >
> >>
> >>>  crc32_eth_calc_lut(const uint8_t *data,
> >>>  	uint32_t data_len,
> >>>  	uint32_t crc,
> >>> @@ -130,20 +55,9 @@ crc32_eth_calc_lut(const uint8_t *data,
> >>>  	return crc;
> >>>  }
> >>>
> >>> -static void
> >>> -rte_net_crc_scalar_init(void)
> >>> -{
> >>> -	/* 32-bit crc init */
> >>> -	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
> >>> -
> >>> -	/* 16-bit CRC init */
> >>> -	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
> >>> -}
> >>> -
> >>>  static inline uint32_t
> >>> -rte_crc16_ccitt_handler(const uint8_t *data, uint32_t data_len)
> >>> +crc16_ccitt(const uint8_t *data, uint32_t data_len)
> >>>  {
> >>> -	/* return 16-bit CRC value */
> >>>
> >>
> >> Why not keep comments? Are they wrong?
> >
> > Functions names are very self-explanatory, that’s why I dropped comments. I
> can add comments if needed.
> >
> 
> I am for restricting changes to the target of the patch which is making CRC
> calculation thread safe, unless code changes invalidates the comments, lets
> keep them. Same goes with inline related modifications.
> 
> >>
> >> <...>
> >>
> >>> +static void
> >>> +crc_scalar_init(void)
> >>> +{
> >>> +	crc32_eth_init_lut(CRC32_ETH_POLYNOMIAL, crc32_eth_lut);
> >>> +	crc32_eth_init_lut(CRC16_CCITT_POLYNOMIAL << 16, crc16_ccitt_lut);
> >>> +
> >>> +	handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC16_CCITT] =
> >> crc16_ccitt;
> >>> +	handlers[RTE_NET_CRC_SCALAR].f[RTE_NET_CRC32_ETH] = crc32_eth;
> >>>
> >>
> >> +1 to remove global handlers pointer and add context,
> >>
> >> But current handlers array content is static, it can be set when
> >> defined, instead of functions.
> >
> > Can do it for scalars, but for SIMD there is this runtime check like this:
> > 	if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { So compiled binary on
> AVX512
> > machine could filter it out on the machine which does not support it.
> > There is no NULL check in crc function so it would not change much when
> called -> Invalid address vs Invalid instruction.
> > There are not many checks there, as this is CRC after all, it should be a small as
> possible, yet probably NULL check could be advisable in crc function then.
> >
> 
> There is already AVX512_VPCLMULQDQ_CPU_SUPPORTED etc checks in
> 'rte_net_crc_set()'.
> So does it work to update 'handlers' statically, without condition, but have
> conditions when use them.
> 
> >
> >>
> >> <...>
> >>
> >>> -static uint32_t
> >>> -rte_crc32_eth_default_handler(const uint8_t *data, uint32_t
> >>> data_len)
> >>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg,
> >>> +	enum rte_net_crc_type type)
> >>>  {
> >>> -	handlers = NULL;
> >>> -	if (max_simd_bitwidth == 0)
> >>> -		max_simd_bitwidth = rte_vect_get_max_simd_bitwidth();
> >>> -
> >>> -	handlers = avx512_vpclmulqdq_get_handlers();
> >>> -	if (handlers != NULL)
> >>> -		return handlers[RTE_NET_CRC32_ETH](data, data_len);
> >>> -	handlers = sse42_pclmulqdq_get_handlers();
> >>> -	if (handlers != NULL)
> >>> -		return handlers[RTE_NET_CRC32_ETH](data, data_len);
> >>> -	handlers = neon_pmull_get_handlers();
> >>> -	if (handlers != NULL)
> >>> -		return handlers[RTE_NET_CRC32_ETH](data, data_len);
> >>> -	handlers = handlers_scalar;
> >>> -	return handlers[RTE_NET_CRC32_ETH](data, data_len);
> >>> -}
> >>> +	uint16_t max_simd_bitwidth;
> >>>
> >>> -/* Public API */
> >>> -
> >>> -void
> >>> -rte_net_crc_set_alg(enum rte_net_crc_alg alg) -{
> >>> -	handlers = NULL;
> >>> -	if (max_simd_bitwidth == 0)
> >>> -		max_simd_bitwidth = rte_vect_get_max_simd_bitwidth();
> >>> +	max_simd_bitwidth = rte_vect_get_max_simd_bitwidth();
> >>>
> >>>  	switch (alg) {
> >>>  	case RTE_NET_CRC_AVX512:
> >>> -		handlers = avx512_vpclmulqdq_get_handlers();
> >>> -		if (handlers != NULL)
> >>> -			break;
> >>> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
> >>> +		if (AVX512_VPCLMULQDQ_CPU_SUPPORTED &&
> >>> +				max_simd_bitwidth >= RTE_VECT_SIMD_512) {
> >>> +			return (struct rte_net_crc){ RTE_NET_CRC_AVX512,
> >> type };
> >>> +		}
> >>> +#endif
> >>>  		/* fall-through */
> >>>  	case RTE_NET_CRC_SSE42:
> >>> -		handlers = sse42_pclmulqdq_get_handlers();
> >>> -		break; /* for x86, always break here */
> >>> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> >>> +		if (SSE42_PCLMULQDQ_CPU_SUPPORTED &&
> >>> +				max_simd_bitwidth >= RTE_VECT_SIMD_128) {
> >>> +			return (struct rte_net_crc){ RTE_NET_CRC_SSE42, type
> >> };
> >>> +		}
> >>> +#endif
> >>> +		break;
> >>>  	case RTE_NET_CRC_NEON:
> >>> -		handlers = neon_pmull_get_handlers();
> >>> -		/* fall-through */
> >>> -	case RTE_NET_CRC_SCALAR:
> >>> -		/* fall-through */
> >>> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> >>> +		if (NEON_PMULL_CPU_SUPPORTED &&
> >>> +				max_simd_bitwidth >= RTE_VECT_SIMD_128) {
> >>> +			return (struct rte_net_crc){ RTE_NET_CRC_NEON, type
> >> };
> >>> +		}
> >>> +#endif
> >>>
> >>
> >> Is it more readable as following, up to you:
> >
> > Agree, I will change it.
> >
> >>
> >> ```
> >> rte_net_crc_set(alg, type) {
> >>   enum rte_net_crc_alg new_alg = RTE_NET_CRC_SCALAR;
> >>   switch (alg) {
> >>   case AVX512:
> >>     new_alg = ..
> >>   case NEON:
> >>     new_alg = ..
> >>   }
> >>   return struct rte_net_crc){ new_alg, type }; ```
> >>
> >>
> >>
> >>
> >>> +		break;
> >>>  	default:
> >>>  		break;
> >>>  	}
> >>> -
> >>> -	if (handlers == NULL)
> >>> -		handlers = handlers_scalar;
> >>> +	return (struct rte_net_crc){ RTE_NET_CRC_SCALAR, type };
> >>>  }
> >>>
> >>> -uint32_t
> >>> -rte_net_crc_calc(const void *data,
> >>> -	uint32_t data_len,
> >>> -	enum rte_net_crc_type type)
> >>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx,
> >>> +	const void *data, const uint32_t data_len)
> >>>  {
> >>> -	uint32_t ret;
> >>> -	rte_net_crc_handler f_handle;
> >>> -
> >>> -	f_handle = handlers[type];
> >>> -	ret = f_handle(data, data_len);
> >>> -
> >>> -	return ret;
> >>> +	return handlers[ctx->alg].f[ctx->type](data, data_len);
> >>>
> >>
> >> 'rte_net_crc()' gets input from user and "struct rte_net_crc" is not
> >> opaque, so user can provide invalid input, ctx->alg & ctx->type.
> >> To protect against it input values should be checked before using.
> >>
> >> Or I think user not need to know the details of the "struct
> >> rte_net_crc", so it can be an opaque variable for user.
> >
> > I would love it to be opaque, but then I would have to return a pointer, which
> then would involve some allocations/deallocations and I wanted to keep is as
> simple as possible.
> > So probably the checks would be a way to go.
> >
> 
> True, +1 for simplicity.
> 
> >>
> >> <...>
> >>
> >>> -/**
> >>> - * CRC compute API
> >>> - *
> >>> - * @param data
> >>> - *   Pointer to the packet data for CRC computation
> >>> - * @param data_len
> >>> - *   Data length for CRC computation
> >>> - * @param type
> >>> - *   CRC type (enum rte_net_crc_type)
> >>> - *
> >>> - * @return
> >>> - *   CRC value
> >>> - */
> >>> -uint32_t
> >>> -rte_net_crc_calc(const void *data,
> >>> -	uint32_t data_len,
> >>> +struct rte_net_crc rte_net_crc_set(enum rte_net_crc_alg alg,
> >>>  	enum rte_net_crc_type type);
> >>>
> >>> +uint32_t rte_net_crc(const struct rte_net_crc *ctx,
> >>> +	const void *data, const uint32_t data_len);
> >>> +
> >>>
> >>
> >> As these are APIs, can you please add doxygen comments to them?
> > +1
> > I think this change could be deferred to 25.03.
> > Adding this API without removing the old one should be possible without any
> unwanted consequences?
> >
> 
> This is not a new functionality but replacement of existing one, so it will be
> confusing to have two set of APIs for same functionality with similar names:
> rte_net_crc_calc()    and   rte_net_crc()
> rte_net_crc_set_alg() and   rte_net_crc_set()
> 
> Also there are some internal functions used by these APIs and supporting both
> new and old may force to have two version of these internal functions and it will
> create complexity/noise.
> 
> As this release is ABI break release, it is easier to update APIs (although
> deprecation notice is missing for this change).
> 
> 
> As an alternative option, do you think applying ABI versioning in 25.03 works for
> these APIs?
> If so, old version can be cleaned in v25.11.

I would go with versioning. I can preserve old function names as it's basically the same functionality. 

> 
> 
> > I still have some second thoughts about this max-simd-width. DPDK does
> > not impose any restrictions on this parameter in the multi-process usage, there
> may be some room to alter some things there.
> >
> >>
> >>>  #ifdef __cplusplus
> >>>  }
> >>>  #endif
> >>> diff --git a/lib/net/version.map b/lib/net/version.map index
> >>> bec4ce23ea..47daf1464a 100644
> >>> --- a/lib/net/version.map
> >>> +++ b/lib/net/version.map
> >>> @@ -4,11 +4,25 @@ DPDK_25 {
> >>>  	rte_eth_random_addr;
> >>>  	rte_ether_format_addr;
> >>>  	rte_ether_unformat_addr;
> >>> -	rte_net_crc_calc;
> >>> -	rte_net_crc_set_alg;
> >>>  	rte_net_get_ptype;
> >>>  	rte_net_make_rarp_packet;
> >>>  	rte_net_skip_ip6_ext;
> >>> +	rte_net_crc;
> >>> +	rte_net_crc_set;
> >>>
> >>>  	local: *;
> >>>  };
> >>> +
> >>> +INTERNAL {
> >>> +	global:
> >>> +
> >>> +	rte_net_crc_sse42_init;
> >>> +	rte_crc16_ccitt_sse42_handler;
> >>> +	rte_crc32_eth_sse42_handler;
> >>> +	rte_net_crc_avx512_init;
> >>> +	rte_crc16_ccitt_avx512_handler;
> >>> +	rte_crc32_eth_avx512_handler;
> >>> +	rte_net_crc_neon_init;
> >>> +	rte_crc16_ccitt_neon_handler;
> >>> +	rte_crc32_eth_neon_handler;
> >>> +};
> >>>
> >>
> >> +1 to David's comment, these are used only within component, no need
> >> +to
> >> export.
> >>
> >


  reply	other threads:[~2024-10-09  7:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-01 18:11 [PATCH v2 0/3] " Arkadiusz Kusztal
2024-10-01 18:11 ` [PATCH v2 1/3] " Arkadiusz Kusztal
2024-10-01 21:44   ` Stephen Hemminger
2024-10-02  8:28     ` Kusztal, ArkadiuszX
2024-10-02  7:42   ` David Marchand
2024-10-02  8:41     ` Kusztal, ArkadiuszX
2024-10-02  9:01       ` David Marchand
2024-10-02  9:16         ` Kusztal, ArkadiuszX
2024-10-08  3:42   ` Ferruh Yigit
2024-10-08 20:51     ` Kusztal, ArkadiuszX
2024-10-09  1:03       ` Ferruh Yigit
2024-10-09  7:48         ` Kusztal, ArkadiuszX [this message]
2024-10-09  9:11           ` Ferruh Yigit
2024-10-01 18:11 ` [PATCH v2 2/3] crypto/qat: use process safe " Arkadiusz Kusztal
2024-10-01 18:11 ` [PATCH v2 3/3] test/crc: replace thread-unsafe api functions Arkadiusz Kusztal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR11MB5013F251BE81CE4A56D91C2C9F7F2@PH0PR11MB5013.namprd11.prod.outlook.com \
    --to=arkadiuszx.kusztal@intel.com \
    --cc=brian.dooley@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=kai.ji@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).