From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f43.google.com (mail-qg0-f43.google.com [209.85.192.43]) by dpdk.org (Postfix) with ESMTP id AF76A2BDF for ; Fri, 4 Mar 2016 14:28:39 +0100 (CET) Received: by mail-qg0-f43.google.com with SMTP id t4so42903874qge.0 for ; Fri, 04 Mar 2016 05:28:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=RPMDrc2JO7H+LFM6sOC0BWOgJxD0j78X+qP3MHQO0lU=; b=Q+LW3g4QoSjy7yh9mLfV/waZLZUoY5iVrFYY4qfzjepAHs3e46Er1EsaG9Z7wLKeTw HY9wSvAINdrgFlf1mBL0UO6ehbMAVtgfJFd8Qhy/aMIjg88/B7wk5cntY/n6A/XRyxrn s5H/OQPZFG08RhGL4nqru1du3PUZDlOGEBba5GuNI0cgbufbkzz8+kYlhVabYXS63zwR fAtrLNBuPATvlXoqCVngriLES6NvobMQ2o9Bxjw7dJogn+st4LAopFdX0AHakkrAo+zq livXKpRpphCCTVlhOyhXPNTJZ4pEtQyE+14KZumvaRUwtQSocgsv/ZLaS6EU9gGjDa99 jyAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RPMDrc2JO7H+LFM6sOC0BWOgJxD0j78X+qP3MHQO0lU=; b=X89zzjwOASH8pwamtcdhPzS1SRrRc3z02IEu9vQXX8ua5Gfu/Af84ByF3pFXwr6L2q IyYMavBJoTZgMlny0d9mc+jor0OPmQfrGUvRbJ+6u8BYTQuEVZ14aww36lDixalje7DM PMK92246/eklSfJgdxv37ptC/3/RsGzV+96EhsCHMWLQzX5mmd5gBh920cLJ2LqW8JgR FX5MXZcWLTuOZFbk69FxXfMQHG4aoGhAf82WKpo4eAEL4BN4Y6YsWVEV0hQjfjaL7sCN T1sJcWhd2YrvgCmQeNP/E+6B2KQB9EaU8tKqH8FaQ7hs9iLQgPjoZq8M0mHRvi6Uvev+ 89aw== X-Gm-Message-State: AD7BkJL2QXxGIabO24dNtPrUFF230yndc7HBozAxaHLRjfxxu1PUrPhLTTbHGjSInBKOD5/Vo6X4p8DP09ncmgP6 X-Received: by 10.140.180.207 with SMTP id b198mr10141736qha.41.1457098119115; Fri, 04 Mar 2016 05:28:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.55.207.20 with HTTP; Fri, 4 Mar 2016 05:28:18 -0800 (PST) From: Christian Ehrhardt Date: Fri, 4 Mar 2016 14:28:18 +0100 Message-ID: To: dev Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] Analysis for "lpm/lpm6: fix missing free of rules_tbl and lpm" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Mar 2016 13:28:40 -0000 Hi everybody, I created a fix for it which will hit the mailing list soon, but considered it important to send this mail ahead. All that analysis has no place in the patch description, but it helps to understand why/what was going on. The follow up patch will have title "lpm/lpm6: fix missing free of rules_tbl and lpm" I ran into issues with the lpm6 autotest failing for me. Looking at it I saw all kind of these: Error at line 679: ERROR: LPM Test tests6[i]: FAIL LPM: LPM memory allocation failed [...] It turned out that 2500M memory would have been enough, but that couldn't be the solution With some debugging eventually it boiled down to find_suitable_element(heap, size, flags, align, bound) not finding any space. While for the same sized allocation before it did find it. Note: Along the way I found the use after free I submitted a patch this morning. I expected a leak, but valgrind wasn't too helpful, but then that was expected as I guess that would be more an internal leak/fragmentation in the structures than a real leak. Thinking of a leak / fragmentation I have broken up the loop in test_lpm6 and ran them in segments: - 1-end: failing at 13 and following as reported - 13-end: working - skipping some ... (you get the idea) A bit like bisecting :-) It turned out that idx 2 (=> test2) was very important, but not the only source of the issue. This particular test does iterative allocation and free with slightly changed config (a bit smaller) each time. It always failed at the 22nd allocation via rte_lpm6_create and all later ones failed. It really just is this innerloop: for (i = 0; i < 100; i++) { config.max_rules = MAX_RULES - 100 + i; printf("INFO: %s - allocating for %d rules (%d/100)\n", __func__, config.max_rules, i); lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config); TEST_LPM_ASSERT(lpm != NULL); rte_lpm6_free(lpm); } But while we see "LPM: LPM memory allocation failed" the following assertion doesn't trigger. NOTE: that is what was fixed by my patch this morning. The failing alloc is for the rules tables: rte_zmalloc_socket -> rte_malloc_socket -> malloc_heap_alloc -> find_suitable_element with sizes usually at or close to "18000000". That is ~17MB, as it fails at alloc 22 with a leak that would be ~374M for these alone. So as a ballpark estimation a leak or a fragmenting consumption makes sense to assume. Reporting heap->alloc_count in find_suitable_element proved that it was exhausting the pool. Once can see that the alloc_count is always increasing. Then I realized that while the assignment that eventually fails is this: lpm->rules_tbl = (struct rte_lpm6_rule *)rte_zmalloc_socket(NULL,(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id); There is no free for that pointer ever grep -Hrn -C 3 rules_tbl * | grep free So I found in rte_lpm6_free that - lpm might not be freed if it didn't find a te - lpm->rules_tbl was not freed ever As I said a patch will follow soon. Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd