From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailfilter02.viettel.com.vn (mailfilter02.viettel.com.vn [125.235.240.54]) by dpdk.org (Postfix) with ESMTP id 63FD12FDD; Tue, 18 Jul 2017 03:36:17 +0200 (CEST) X-IronPort-AV: E=Sophos;i="5.40,376,1496077200"; d="scan'208";a="43768203" Received: from 125.235.240.45.adsl.viettel.vn (HELO mta2.viettel.com.vn) ([125.235.240.45]) by mailfilter02.viettel.com.vn with ESMTP; 18 Jul 2017 08:36:15 +0700 Received: from localhost (localhost [127.0.0.1]) by mta2.viettel.com.vn (Postfix) with ESMTP id CB8C4681A32; Tue, 18 Jul 2017 08:36:54 +0700 (ICT) Received: from mta2.viettel.com.vn ([127.0.0.1]) by localhost (mta2.viettel.com.vn [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id LOPQyZk-lAX3; Tue, 18 Jul 2017 08:36:54 +0700 (ICT) Received: from localhost (localhost [127.0.0.1]) by mta2.viettel.com.vn (Postfix) with ESMTP id AC3B4681508; Tue, 18 Jul 2017 08:36:54 +0700 (ICT) X-Virus-Scanned: amavisd-new at Received: from mta2.viettel.com.vn ([127.0.0.1]) by localhost (mta2.viettel.com.vn [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9_UYY16lwg5P; Tue, 18 Jul 2017 08:36:54 +0700 (ICT) Received: from vuonglv.vttek (unknown [27.68.241.28]) by mta2.viettel.com.vn (Postfix) with ESMTPSA id 7F8D46819D4; Tue, 18 Jul 2017 08:36:54 +0700 (ICT) To: cristian.dumitrescu@intel.com References: <3EB4FA525960D640B5BDFFD6A3D891267BA810FB@IRSMSX108.ger.corp.intel.com> Cc: "users@dpdk.org" , "dev@dpdk.org" Message-ID: <8709002a-8520-ba2a-3460-1e0ef14dbf09@viettel.com.vn> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D891267BA810FB@IRSMSX108.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit MilterAction: FORWARD Date: Tue, 18 Jul 2017 08:36:54 +0700 (ICT) From: vuonglv@viettel.com.vn Subject: Re: [dpdk-dev] Rx Can't receive anymore packet after received 1.5 billion packet. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2017 01:36:19 -0000 On 07/17/2017 05:31 PM, cristian.dumitrescu@intel.com wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of >> vuonglv@viettel.com.vn >> Sent: Monday, July 17, 2017 3:04 AM >> Cc: users@dpdk.org; dev@dpdk.org >> Subject: [dpdk-dev] Rx Can't receive anymore packet after received 1.5 >> billion packet. >> >> Hi DPDK team, >> Sorry when I send this email to both of group users and dev. But I have >> big problem: Rx core on my application can not receive anymore packet >> after I did the stress test to it (~1 day Rx core received ~ 1.5 billion >> packet). Rx core still alive but didn't receive any packet and didn't >> generate any log. Below is my system configuration: >> - OS: CentOS 7 >> - Kernel: 3.10.0-514.16.1.el7.x86_64 >> - Huge page: 32G: 16384 page 2M >> - NIC card: Intel 85299 >> - DPDK version: 16.11 >> - Architecture: Rx (lcore 1) received packet then queue to the ring >> ----- Worker (lcore 2) dequeue packet in the ring and free it (use >> rte_pktmbuf_free() function). >> - Mempool create: rte_pktmbuf_pool_create ( >> "rx_pool", /* >> name */ >> 8192, /* >> number of elemements in the mbuf pool */ >> 256, /* Size of per-core >> object cache */ >> 0, /* Size of >> application private are between rte_mbuf struct and data buffer */ >> RTE_MBUF_DEFAULT_BUF_SIZE, /* >> Size of data buffer in each mbuf (2048 + 128)*/ >> 0 /* socket id */ >> ); >> If I change "number of elemements in the mbuf pool" from 8192 to 512, Rx >> have same problem after shorter time (~ 30s). >> >> Please tell me if you need more information. I am looking forward to >> hearing from you. >> >> >> Many thanks, >> Vuong Le > Hi Vuong, > > This is likely to be a buffer leakage problem. You might have a path in your code where you are not freeing a buffer and therefore this buffer gets "lost", as the application is not able to use this buffer any more since it is not returned back to the pool, so the pool of free buffers shrinks over time up to the moment when it eventually becomes empty, so no more packets can be received. > > You might want to periodically monitor the numbers of free buffers in your pool; if this is the root cause, then you should be able to see this number constantly decreasing until it becomes flat zero, otherwise you should be able to the number of free buffers oscillating around an equilibrium point. > > Since it takes a relatively big number of packets to get to this issue, it is likely that the code path that has this problem is not executed very frequently: it might be a control plane packet that is not freed up, or an ARP request/reply pkt, etc. > > Regards, > Cristian Hi Cristian, Thanks for your response, I am doing your ideal. But let me show you another case i have tested before. I changed architecture of my application as below: - Architecture: Rx (lcore 1) received packet then queue to the ring ----- after that: Rx (lcore 1) dequeue packet in the ring and free it immediately. (old architecture as above) With new architecture Rx still receive packet after 2 day and everything look good. Unfortunately, My application must run in old architecture. Any ideal for me? Many thanks, Vuong Le