From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7F74745B01; Thu, 10 Oct 2024 12:30:04 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 680B440279; Thu, 10 Oct 2024 12:30:04 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 48EF34025E for ; Thu, 10 Oct 2024 12:30:03 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id AD3161E862 for ; Thu, 10 Oct 2024 12:30:02 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 9E96D1E897; Thu, 10 Oct 2024 12:30:02 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.0 Received: from [192.168.30.130] (host-217-213-113-219.mobileonline.telia.com [217.213.113.219]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id E93FE1E861; Thu, 10 Oct 2024 12:30:00 +0200 (CEST) Message-ID: <99ebb9d7-7da2-4d57-9e4d-81b5d90c6ddd@lysator.liu.se> Date: Thu, 10 Oct 2024 12:29:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 7/7] vhost: optimize memcpy routines when cc memcpy is used To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , dev@dpdk.org, maxime.coquelin@redhat.com Cc: Stephen Hemminger , David Marchand , Pavan Nikhilesh , Bruce Richardson References: <20240724075357.546248-2-mattias.ronnblom@ericsson.com> <20240920102716.738940-1-mattias.ronnblom@ericsson.com> <20240920102716.738940-8-mattias.ronnblom@ericsson.com> <98CBD80474FA8B44BF855DF32C47DC35E9F7B2@smartserver.smartshare.dk> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F7B2@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-10-09 23:25, Morten Brørup wrote: >> +#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64) >> +static __rte_always_inline void >> +pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len) >> +{ > > A comment describing why batch_copy_elem.dst and src point to 16 byte aligned data would be nice. > Good point. As I think I mentioned at some point, I'm not sure they are. From what I recall, having (or pretending) the data is 16-bit aligned does give a noticeable performance increase on x86_64. Is this something I should look into for 24.11, or this patch set is not going to make it anyway? >> + void *dst = __builtin_assume_aligned(in_dst, 16); >> + const void *src = __builtin_assume_aligned(in_src, 16); >> + >> + if (len <= 256) { >> + size_t left; >> + >> + for (left = len; left >= 32; left -= 32) { >> + memcpy(dst, src, 32); >> + dst = RTE_PTR_ADD(dst, 32); >> + src = RTE_PTR_ADD(src, 32); >> + } >> + >> + memcpy(dst, src, left); >> + } else >> + memcpy(dst, src, len); >> +} >> +#else >> +static __rte_always_inline void >> +pktcpy(void *dst, const void *src, size_t len) >> +{ >> + rte_memcpy(dst, src, len); >> +} >> +#endif >> + >> static inline void >> do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue >> *vq) >> __rte_shared_locks_required(&vq->iotlb_lock) >> @@ -240,7 +273,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct >> vhost_virtqueue *vq) >> int i; >> >> for (i = 0; i < count; i++) { >> - rte_memcpy(elem[i].dst, elem[i].src, elem[i].len); >> + pktcpy(elem[i].dst, elem[i].src, elem[i].len); >> vhost_log_cache_write_iova(dev, vq, elem[i].log_addr, >> elem[i].len); >> PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0); >> @@ -257,7 +290,7 @@ do_data_copy_dequeue(struct vhost_virtqueue *vq) >> int i; >> >> for (i = 0; i < count; i++) >> - rte_memcpy(elem[i].dst, elem[i].src, elem[i].len); >> + pktcpy(elem[i].dst, elem[i].src, elem[i].len); >> >> vq->batch_copy_nb_elems = 0; >> } >> -- >> 2.43.0 > > Anyway, > Acked-by: Morten Brørup >