From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3EF7245A3A for ; Thu, 26 Sep 2024 19:03:24 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2EB0B40B94; Thu, 26 Sep 2024 19:03:24 +0200 (CEST) Received: from sonic316-20.consmr.mail.ne1.yahoo.com (sonic316-20.consmr.mail.ne1.yahoo.com [66.163.187.146]) by mails.dpdk.org (Postfix) with ESMTP id 3740A40672 for ; Thu, 26 Sep 2024 19:03:22 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1727370201; bh=677CZHmr+jXc+iAye5eQ5OyPxqKvzANYX2cfymZx1T4=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From:Subject:Reply-To; b=rB7OJkZr/7HRBZB6pRty1hC5ygfTIB6f+DkmeNTdUiwuwX0r0HCXGB+VKBmER5/OZZO4dfuUXoEk5OfNHJXU1t10SBm+tj2qy7e8bPU6XVc0rdhowibC7X2HgCue+K282uLOaefA3QW4TXX8S59p7sMEiIyXCdmrK2z6FTVVef1/eS8UylSrODnueHIUzPjYdlEXCy4k+OMhX+Ie2+LCAyiq+oBE3dQ59yx3oJsm05ECodGCW2pQOicogNtKYxA/0BFmiI9YK5K96DaEJwX8mPCPddGbn2Owba9+Cj7QkgY0XJ9FuzIahx0zhnAoYljufofpsu8cVctBmiRwBvOO+A== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1727370201; bh=1n+Ubvlsx5a2tmaeyo6G1HeRk1c/GG0ljMd9Lk2wdL+=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=N6s9fJPi2M95E2RYBAjtIQkG9KNUoX6Z6ft/xK5u4lXDFPs3/O0uXobqoDIx+xW8r62gGrZZl7QF9NmTy5lAnblWck09wKznyUEJDWXWwQx3p9gXjjKfSLEsbe+ybNZF2iPw2FfSfmjLiPUy7ykcmp/mcMBGRpbwpb4ohJDssRgUDyj/7+U+zjzUE6yCnzZe/nNWG4K4TdcsZ+FZAo4Esi5gN8CHlEB4Qt1513zBzM/jNPOLRZI+J3zhEZvmsIU2zpynRAypQTNkhC/TUFQoyQmmLehJvxEswQq5uFNwq6XQkhclAlDvEq2VjyK3gCJjktA8nGwlOMdAsB0N8E/FBg== X-YMail-OSG: yfVFwQEVM1nhCtr1vLQ4wsY.eZfM2RgnA6_VorIm4v0MclcKwnFUw1AzA2LincE aTGfOvEvGqWghfrY5acwjOeUiIoVn8xApVR3MvZ9GSAuxo4nuUQ7A4OPF9uoVrZlE9gZgDf3fUcd LJ3dIgOCrHFdSnETJ72.OUXtrrLYrtAiaOanHvvDQkf2PQ62ELOICV1WQgOcmqrkJqPUQfcY6Yji cxLYReNQqdemzV6FklrDR.5NyFX1ST7CBSsA4CmNxrKeDIv2chuwV6QKJPyn_Xs9e8goq7vCwdoM GYx6LzBuAuM2dbchOEqyyAJvwtrLJKwBO5Z1HhYkkaAhAah6rrAJrcoPViuQXS01zpneWe2azdSh R9L_35XkUIVZDx1.lUszI9FCHdJz2rbYHEGToOG7lzbV5Fwy_dczRHDUur_BZjr.rZf.DdP3dnCJ i4ESEGR2Z8a3ZEc8JV4DDD0fnv9bmfDu0Cv7FmMtJNn3BEi6bS_dMpHzYpxxlmqtQjVHWiVd4OTv BkXBbyyAbg2NKahm9sVCynG7DfAd83o5dHaYEOo6hcFfWwW3ZdyAXzLsDDFj_IZqdV5XhJelezmP QqhvUC.8uGJPX.F.H01n4jo_lEJdJwECSynYZo8mPytr9PJAPsdXEOPdmB9_OuD9napH9X3bBXqm DkHRKI_g7MVrzYX7uP8CMzuL9XbJG9TdwLQedo2D2SaCEJVzy.D4BdS3oT2wII.JWWCJmQ8WH9Td Wazy90RJgn9EWvm1kbcNAEzYBDnrUH1JZZ6Ld96WH7_Vh3JcjU_FO4j0rpu7pz6wuG3twe4D1o4k zsQ6oogm7.4LT0w1qO9B1T1Fxzp9lyYotGIEd_hGkv3GbqXws2xYfxgGo7EZljwE8fWBBggFuh.n YOZ0smvFYpA.UDW6iLBFBItZIF9gsL6o9r42s6.B4gfjgSY68xg8.aUMVBbBDUa2AzbRFcHbthAS YFdzqGDo6JqnRTKTUDTj1Yy7CEPty98S6yWSjpU7rQuuFI8L.vPg_OucnqF1fhpgeHwOvRzV4Ciq 2xFpbxFQdKOntvHgMA7eRGlaMayFePItQ.yDzCXtVXJ_HJQbzjxvGV9GndvNKFscr.O6D0xA5ASc vdfuopKHfHKc.HscJXVkEWafUf8Nnz9mFe_74sibDO4wL8oNvvXwYcnske0Dwq4dxMf4_6V1ZRcZ lxJMvLSqi74pnB1NMIIdzLcQMzee9mEK9qpQq6P6a3ZJNNncWYZ_4aRF7R5faPaHosWihFXUt9vL KJiilGgJsnu8lDn4DBfIlZDhCMAHcAdFtvMQAyvU1dMoaMhUdadkYxGsDYun1fCtxTI3qsWKvBtK Wh5W2l7wrEP0JzJ.c2Lbue.iakJdZK35JWy71QCSurLqIFlEjmqiw4ZVrsFzcIJ6WQuXgjJ6dedE zfv_c5anq6xIge1aZMTTfTn1TR5Gw6U7gj9UFY6.BbuF8q5g6bENb6Ke1mWYznVqdo6.00HtpFMN Imv6nc5T9WrROfWGPb6HslcuM2WiYQ.1kcj0RPMTEU3dinaMw92nK_TbLl3jNBJ6_YyO8Dkw.suG BhUkeyg70cIhG6Vfa6EtaDGgqXxkqpJi7t9rSg2F_yXFxijPME.a03kJjbD_kve2VgTa8APMvd8x TMavm1GW1F15I1c7g28Q9ycWsjjeko5hcwtTZHNZDb63CNnGXGUngfxpHMrxC.ncvIQdyaGhVQd2 mvDyrVl6oy.LkxDnCC28vbKURSG_FyRiUy2rXZ.FMzk9u.9YRLLnTfU2n3u6g_fmWpgRuAQau4gq .GW8JfjHwPpBb73He1y8s.x59wE2QMOeheUzFWU5dlRt_6C02iwwbZNkYjUTjg6GNf10pGus2gUV SCmslFiKW3_ztvedOkgabF.opO8v6qjslCrKgLCgUM_pan0Q8_q27MSzZTIenyJPTkoGqZs.STt2 19dvyjbBfLw.nw9Qnhg7QDy2xSy5x00..envbv73qJ2NxNkcs5kzHRXkvNjNDkfmFj_L2cY3RB9f H90BQzFSPq_xuCC.nH7B5Jn0eIzNnA8R4UVT7460YOdyMj4tcOXzmE6kHrNKHHpA.14qkSDFyH.7 3HF3vMGtzWd.fo0CKa3uEvn25wIqiZMroKlsbEllgUkyrcMWsFccKcOIOK7S7WXXg72Fj.xK60q0 bTBOrrbmvP._t0CacewE.BoYx971_aHv3EQl8lWDPoK_sHp5XC2W7Q3vPLk6NIiFI6KRGu1AtDRt PeI.W8d7Fzv1C2Tab6BJFFj4juTqUmWJQn2AZG82LOeg- X-Sonic-MF: X-Sonic-ID: 8899ef90-ddd7-4c64-8620-bb3d2cc44043 Received: from sonic.gate.mail.ne1.yahoo.com by sonic316.consmr.mail.ne1.yahoo.com with HTTP; Thu, 26 Sep 2024 17:03:21 +0000 Date: Thu, 26 Sep 2024 17:03:17 +0000 (UTC) From: amit sehas To: Stephen Hemminger Cc: Nishant Verma , "users@dpdk.org" Message-ID: <610700496.13049973.1727370197636@mail.yahoo.com> In-Reply-To: <47151973.13053687.1727369764729@mail.yahoo.com> References: <1987164393.11670398.1727125003663.ref@mail.yahoo.com> <1987164393.11670398.1727125003663@mail.yahoo.com> <1299564509.11731667.1727133474900@mail.yahoo.com> <2025533199.11789856.1727143607670@mail.yahoo.com> <2042269904.11975457.1727188849962@mail.yahoo.com> <20240924093813.29a01783@fedora> <26643152.12164440.1727210825368@mail.yahoo.com> <810098753.12902128.1727353971978@mail.yahoo.com> <47151973.13053687.1727369764729@mail.yahoo.com> Subject: Re: core performance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Mailer: WebService/1.1.22645 YMailNorrin X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org If there is a way to determine: vCPU thread utilization numbers over a period of time, such as a few hours or which processes are consuming the most CPU top always indicates that the server is consuming the most CPU. Now i am begining to wonder if 8 vCPU threads really are capable of running= 6 high intensity threads or only 4 such threads? Dont know Also tried to utilize=C2=A0pthread_setschedparam() explicitly on some of th= e threads, it made no difference to the performance. But if we do it on mor= e than 1-2 threads then it hangs the whole system. This is primarily a matter of CPU scheduling, and if we restirct context sw= itching on even 2 critical threads we have a win. regards On Thursday, September 26, 2024 at 09:56:04 AM PDT, amit sehas wrote:=20 Belos is the lscpu that was requested, it appears to suggest an 8 vCPU thre= ad setup ... if am reading it correctly: $ lscpu Architecture:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0x86_64 =C2=A0 CPU op-mode(s):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A032-bit, 64-bit =C2=A0 Address sizes:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 46 bits physical, 4= 8 bits virtual =C2=A0 Byte Order:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Little En= dian CPU(s):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A08 =C2=A0 On-line CPU(s) list:=C2=A0 =C2=A0 0-7 Vendor ID:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 GenuineIn= tel =C2=A0 BIOS Vendor ID:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Intel(R) Corporatio= n =C2=A0 Model name:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Intel(R) = Xeon(R) Platinum 8259CL CPU @ 2.50GHz =C2=A0 =C2=A0 BIOS Model name:=C2=A0 =C2=A0 =C2=A0 Intel(R) Xeon(R) Platinu= m 8259CL CPU @ 2.50GHz =C2=A0 =C2=A0 CPU family:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A06 =C2=A0 =C2=A0 Model:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 85 =C2=A0 =C2=A0 Thread(s) per core:=C2=A0 =C2=A02 =C2=A0 =C2=A0 Core(s) per socket:=C2=A0 =C2=A04 =C2=A0 =C2=A0 Socket(s):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 Stepping:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A07 =C2=A0 =C2=A0 BogoMIPS:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04999= .99 =C2=A0 =C2=A0 Flags:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cl =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 flush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp l= m constant_tsc re =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 p_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_kn= own_freq pni pclm =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 ulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popc= nt tsc_deadline_t =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 imer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dn= owprefetch invpci =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 d_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms= invpcid mpx avx5 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 12f avx512dq rdseed adx smap clflushopt clwb avx512cd avx= 512bw avx512vl xs =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 aveopt xsavec xgetbv1 xsaves ida arat pku ospke Virtualization features: =C2=A0 Hypervisor vendor:=C2=A0 =C2=A0 =C2=A0 KVM =C2=A0 Virtualization type:=C2=A0 =C2=A0 full Caches (sum of all): =C2=A0 L1d:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 128 KiB (4 instances) =C2=A0 L1i:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 128 KiB (4 instances) =C2=A0 L2:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A04 MiB (4 instances) =C2=A0 L3:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A035.8 MiB (1 instance) NUMA: =C2=A0 NUMA node(s):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 NUMA node0 CPU(s):=C2=A0 =C2=A0 =C2=A0 0-7 Vulnerabilities: =C2=A0 Gather data sampling:=C2=A0 =C2=A0Unknown: Dependent on hypervisor s= tatus =C2=A0 Itlb multihit:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 KVM: Mitigation: VM= X unsupported =C2=A0 L1tf:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Mitigation; PTE Inversion =C2=A0 Mds:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host stat= e unkn =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 own =C2=A0 Meltdown:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Miti= gation; PTI =C2=A0 Mmio stale data:=C2=A0 =C2=A0 =C2=A0 =C2=A0 Vulnerable: Clear CPU bu= ffers attempted, no microcode; SMT Host state unkn =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 own =C2=A0 Reg file data sampling: Not affected =C2=A0 Retbleed:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Vuln= erable =C2=A0 Spec rstack overflow:=C2=A0 =C2=A0Not affected =C2=A0 Spec store bypass:=C2=A0 =C2=A0 =C2=A0 Vulnerable =C2=A0 Spectre v1:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Mitigatio= n; usercopy/swapgs barriers and __user pointer sanitization =C2=A0 Spectre v2:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Mitigatio= n; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affec =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 ted; BHI Retpoline =C2=A0 Srbds:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= Not affected =C2=A0 Tsx async abort:=C2=A0 =C2=A0 =C2=A0 =C2=A0 Not affected On Thursday, September 26, 2024 at 05:32:52 AM PDT, amit sehas wrote:=20 Simply reordering the launch of different threads brings back a lot of the = lost performance, this is a clear evidence that some CPU threads are more p= redisposed to context switches than the others. This is a thread scheduling issue at the CPU level as we have expected. In = a previous exchange someone has suggested that utilizing=C2=A0rte_thread_se= t_priority to=C2=A0RTE_THREAD_PRIORITY_REALTIME_CRITICAL is not a good idea= =C2=A0 we should be able to prioritize some threads over the other threads ... sin= ce we are utilizing rte_eal_remote_launch, one would think that such a func= tonality should be a part of the library ... any ideas folks? regards On Tuesday, September 24, 2024 at 01:47:05 PM PDT, amit sehas wrote:=20 Thanks for the suggestions, so this is a database server which is doing lot= s of stuff, not every thread is heavily involved in dpdk packet processing.= As a result the guidelines for attaining the most dpdk performance are app= licable to only a few threads. In this particular issue we are specificially looking at CPU scheduling of = threads that are primarily heavily processing database queries. These threa= ds, from our measurements, are not being uniformly scheduled on the CPU ... This is our primary concern, since we utilized rte_eal_remote_launch to spa= wn the threads, we are wondering if there are any options in this API that = will allow us to more uniformly allocate the CPU to threads that are critic= al... regards On Tuesday, September 24, 2024 at 09:38:16 AM PDT, Stephen Hemminger wrote:=20 On Tue, 24 Sep 2024 14:40:49 +0000 (UTC) amit sehas wrote: > Thanks for your response, and thanks for your input on the set_priority,= =C2=A0 >=20 > The best guess we have at this point is that this is not a dpdk performan= ce issue. This is an issue with some threads running into more context swit= ches than the others and hence not getting the same slice of the CPU. We ar= e certain that this is not a dpdk performance issue, the code > is uniformly slow in one thread versus the other and the threads are doin= g a very large amount of work including accessing databases. The threads in= question are not really doing packet processing as much as other work. >=20 > So this is certainly not a dpdk performance issue. This is an issue of ke= rnel threads not being scheduled properly or in the worse case the cores ru= nning on different frequency (which is quite unlikely no the AWS Xeons we a= re running this on). >=20 > If you are asking for the dpdk config files to check for dpdk related per= formance issue then we are quite certain the issue is not with dpdk perform= ance ... > On Mon, Sep 23, 2024 at 10:06=E2=80=AFPM amit sehas wro= te: > > Thanks for your response, i am not sure i understand your question ... = we have our product that utilizes dpdk ... the commands are just our server= commands and parameters ... and the lscpu is the hyperthreaded 8 thread Xe= on instance in AWS ... The rules of getting performance in DPDK: =C2=A0 - use DPDK threads (pinned) for datapath =C2=A0 - use isolated CPU's for those DPDK threads =C2=A0 - do not do any system calls =C2=A0 - avoid floating point You can use tracing tools like strace or BPF to see what the thread is doin= g.