From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DBD7B43382; Wed, 22 Nov 2023 16:35:35 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 664C3402E8; Wed, 22 Nov 2023 16:35:35 +0100 (CET) Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by mails.dpdk.org (Postfix) with ESMTP id EB6494028C for ; Wed, 22 Nov 2023 16:35:33 +0100 (CET) Received: by mail-yb1-f177.google.com with SMTP id 3f1490d57ef6-da819902678so6444702276.1 for ; Wed, 22 Nov 2023 07:35:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700667333; x=1701272133; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=G76IK47e+kw3WsqmsBtHbWASUNMv45dLcjN4/PBjwEw=; b=kC7dn0FVpn27GFZ/S1HdBykaxG0uVJiyoc84/UgMRMotHO8qJo4H6HyOhQmLl96wmb zZTEVRuZ8VWYFBasZGLggf0RN2FC7DItqqphpl4aKhPw/Ue4LB2V82L0yaNXg56aZLQC +71fA/A9OJnG/j2N6yrU4i68Y2Xi+8qxGrmPOOyEg2bVEaxjIyndxF1jTfL172tO3bVb Y0KPD+yh7eDSWW7x3VYb1b9B/YC1isfa8n0IF5VgWRVbAOLaIePiXrb6yw401Dg4PFFK 5vxfXpIF4dbpUoJh4PWh1fG5zqwO44XtZbAKY/S492Y0pg/huR+E6SEGyXfeEDbReLk7 j+Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700667333; x=1701272133; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=G76IK47e+kw3WsqmsBtHbWASUNMv45dLcjN4/PBjwEw=; b=sdrMeGfB9jJ/NP1PyPoibZWt6WPY2uQ5SGGKqmUFC7x5ATQDEJRlfhDs1asTyq+EvH 9BVLp3Io+aEpKlcAHp/S5cQo5lEMyGUa6C29L2p1tDQrqZegh6UnXIORSwUtOLexo4wR kqMqfahtO9nyaYjq2tkbFZ3TTJV37IuGVA6XOJwBm8demi5vqXWJYfPVvD8hEn2+OXn8 SwEsCrY8sjPJXs0gKcD7Nck1AUB4FEkyy31S6KwAWKNVjHfylBMtyqhaaC2L8ADFdlaF 31eB0d4L/uuMi1fKbWtXrwohzv6awAFQDbHA0/aoFh+tbasbqW/GmnZwwW5+JHpSd1FV evbQ== X-Gm-Message-State: AOJu0YyW9DHkEZHt4GFlog7rljYzOSWx/tgZqiL7OvGYcny2dZ30cs9f 8AYMRPuibTAHQoqiO8sBsigZbx75DfEFkn8mMMU= X-Google-Smtp-Source: AGHT+IGr0HdrCElAfsU8SRbqQDP8boa5s9WUMVVDjqGe/ZWVHL3BHDN7cYgO6NDDcGOTywfmNbp3Ct56oWU4QhuZ4YY= X-Received: by 2002:a5b:e8e:0:b0:daf:657d:b657 with SMTP id z14-20020a5b0e8e000000b00daf657db657mr2366347ybr.31.1700667333050; Wed, 22 Nov 2023 07:35:33 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: kumaraparameshwaran rathinavel Date: Wed, 22 Nov 2023 21:05:21 +0530 Message-ID: Subject: Re: RFC - GRO Flowlookup Optimisation To: Ferruh Yigit Cc: dev@dpdk.org, hujiayu.hu@foxmail.com Content-Type: multipart/alternative; boundary="0000000000004488f0060abf78da" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --0000000000004488f0060abf78da Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Nov 22, 2023 at 4:05=E2=80=AFPM Ferruh Yigit = wrote: > On 11/22/2023 6:01 AM, kumaraparameshwaran rathinavel wrote: > > Hi Folks, > > > > The current GRO code uses an unoptimised version of flow lookup where > > each flow in the table is iterated over during the flow matching > > process. For a rte_gro_reassemble_burst in lightweight mode this would > > not cause much of an impact. But with rte_gro_reassemble which is done > > with a timeout interval, this causes higher CPU utilisation during > > throughput tests. The proposal here is to use a Hash based flowtable > > which could make use of the rte_hash table implementation in DPDK. > > There could be a hash table for each of the GRO types. The lookup > > function and the key could be different for each one of the types. If > > there is a consensus that this could have a better performance impact I > > would work on an initial patch set. Please let me know your thoughts. > > > > > Hi Kumara, > > Your proposal looks reasonable to me, I think it worth to try. > cc'ed techboard for more comment. > >> Thanks Ferruh - Sure I will get a initial patch set with TCP/IPv4 GRO >> type. >> > > Do you have any performance measurement with the existing code? To have > it helps to evaluate impact of the change. > >> I did some testing sometime back and the observations were that on a >> 10Gbps link, the throughput value with iperf testing >> of unoptimised and optimised were almost the same, but the CPU >> conservation was upto 30-35%. So any tests running in >> parallel like imix kind of traffic would definitely have better results. >> I will try to profile the two cases with some performance impacting >> results. >> > --0000000000004488f0060abf78da Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Nov 22, 2023 at 4:05=E2=80=AF= PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
On 11/22/2023 6:01 AM, kumaraparameshwaran rathinave= l wrote:
> Hi Folks,
>
> The current GRO code uses an unoptimised version of flow lookup where<= br> > each flow in the table is iterated over during the flow matching
> process. For a rte_gro_reassemble_burst in lightweight mode this would=
> not cause much of an impact. But with rte_gro_reassemble which is done=
> with a timeout interval, this causes higher CPU utilisation during
> throughput tests. The proposal here is to use a Hash based flowtable > which could make use of the=C2=A0 rte_hash table implementation in DPD= K.
> There could be a hash table for each of the GRO types. The lookup
> function and the key could be different for each one of the types. If<= br> > there is a consensus that this could have a better performance impact = I
> would work on an initial patch set. Please let me know your thoughts.<= br> >


Hi Kumara,

Your proposal looks reasonable to me, I think it worth to try.
cc'ed techboard for more comment.
Thanks Ferruh - Sure I will get a initial patch set with TC= P/IPv4 GRO type.

Do you have any performance measurement with the existing code? To have
it helps to evaluate impact of the change.
I did some testing sometime back and the observat= ions were that on a 10Gbps link, the throughput value with iperf testing of unoptimised and optimised were almost the same, but the CPU conservati= on was upto 30-35%. So any tests running in=C2=A0
parallel l= ike imix kind of traffic would definitely have better results. I will try t= o profile the two cases with some performance impacting=C2=A0
results.
--0000000000004488f0060abf78da--