Hi! Thanks for submitting this. Some inline comments follow.


> -----Original Message-----
> From: 苏赛 <susai.ss@bytedance.com>
> Sent: Thursday 31 July 2025 10:55
> To: jasvinder.singh@intel.com
> Cc: dev@dpdk.org
> Subject: [PATCH] net/cksum: compute raw cksum for several segments
>
> The rte_raw_cksum_mbuf function is used to compute
> the raw checksum of a packet.
> If the packet payload stored in multi mbuf, the function
> will goto the hard case. In hard case,
> the variable 'tmp' is a type of uint32_t,
> so rte_bswap16 will drop high 16 bit.
> Meanwhile, the variable 'sum' is a type of uint32_t,
> so 'sum += tmp' will drop the carry when overflow.
> Both drop will make cksum incorrect.
> This commit fixes the above bug.
>
> Signed-off-by: Su Sai <susai.ss@bytedance.com>

> diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h
> index a8e8927952..aa584d5f8d 100644
> --- a/lib/net/rte_cksum.h
> +++ b/lib/net/rte_cksum.h
> @@ -80,6 +80,25 @@ __rte_raw_cksum_reduce(uint32_t sum)
>          return (uint16_t)sum;
>  }
>
> +/**
> + * @internal Reduce a sum to the non-complemented checksum.
> + * Helper routine for the rte_raw_cksum_mbuf().
> + *
> + * @param sum
> + *   Value of the sum.
> + * @return
> + *   The non-complemented checksum.
> + */
> +static inline uint16_t
> +__rte_raw_cksum_reduce_u64(uint64_t sum)
> +{
> +        uint32_t tmp;
> +
> +        tmp = __rte_raw_cksum_reduce((uint32_t)sum);
> +        tmp += __rte_raw_cksum_reduce((uint32_t)(sum >> 32));

What if this addition overflows?

To my taste I would not try to call `__rte_raw_cksum_reduce ` and instead reduce uint64_t directly to uint16_t, but it’s up to you.

> +        return __rte_raw_cksum_reduce(tmp);
> +}
> +
>  /**
>   * Process the non-complemented checksum of a buffer.
>   *
> @@ -119,8 +138,9 @@ rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len,
>  {
>          const struct rte_mbuf *seg;
>          const char *buf;
> -        uint32_t sum, tmp;
> +        uint32_t tmp;
>          uint32_t seglen, done;
> +        uint64_t sum;
>
>          /* easy case: all data in the first segment */
>          if (off + len <= rte_pktmbuf_data_len(m)) {
> @@ -157,7 +177,7 @@ rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len,
>          for (;;) {
>                  tmp = __rte_raw_cksum(buf, seglen, 0);
>                  if (done & 1)
> -                        tmp = rte_bswap16((uint16_t)tmp);
> +                        tmp = rte_bswap32(tmp);

This part probably deserves a comment, since we only need to swap odd and even bytes, but we instead reverse all of them abusing the fact that order of 2-byte pairs does not matter for the algorithm.

>                  sum += tmp;
>                  done += seglen;
>                  if (done == len)
> @@ -169,7 +189,7 @@ rte_raw_cksum_mbuf(const struct rte_mbuf *m, uint32_t off, uint32_t len,
>                          seglen = len - done;
>          }
>
> -        *cksum = __rte_raw_cksum_reduce(sum);
> +        *cksum = __rte_raw_cksum_reduce_u64(sum);
>          return 0;
>  }

Changes to this function look correct to my eye, but given how many pitfalls we have already found I think we need tests.