From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 755D3A0613 for ; Tue, 30 Jul 2019 20:51:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 738711BFE3; Tue, 30 Jul 2019 20:51:08 +0200 (CEST) Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by dpdk.org (Postfix) with ESMTP id 758FC1BFD4 for ; Tue, 30 Jul 2019 20:51:07 +0200 (CEST) Received: by mail-wr1-f68.google.com with SMTP id p17so66872344wrf.11 for ; Tue, 30 Jul 2019 11:51:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=jJYkVGsgdgqXPawMFiPaW3AzIyjEQm5662vCvgkApM4=; b=JaAEyk4Q4W9G7zDDpQHwdX6VHykpCTsG9W3sGPJVU5B/1pHrrIbe+njlSUO03h6hMQ UTOtEmLpBCOtMdk888b9DSCSmCwSpvmfFPnOiFnzf/UnxJ6Px7ZXXgH8MW2z76Ca5IxT WNlEvnCV+MxDF6b9qJ+MedKPiiEL79t5SMW7q3i1CRzlmQTqf3K1oJS7bYZiynjBicu8 9CVrM3GH6w+HS36nnNVq5F848GmtI5ti2lpvuzAxzrIek+8hrvj2S/G6r4OtVtI/VjL0 wSrZ4lSQ2/cBHZ9cq4gmKwdtI/tq1JRXs24mh7uPj3XGChJFS7I5VTNMVxzL8DKQSB3k gA2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=jJYkVGsgdgqXPawMFiPaW3AzIyjEQm5662vCvgkApM4=; b=l14x/0CNE2DypA4LaguDKvzde0E6t9ZwBpWFkDcPdLN0DXl0HLzn57jPoYun18vdzh DyadTr4/5eHQU/PIrpseiGuxvqDM+T8nxojN0+ZZXSpznCYZGXNEkj+Ji5Y1u6LvSAzb 4ZSgN0OUyervT7uSeK+sXi3cTYqullWu5Gn4c6IKw+tyck4aJ/4JghY4fxSTgZg6L78A 3hHB4u9BStQSf5F74gFJ1NIjwA3/ti7DWhCuvbQPDH903rEGzugm4z82bPKEdtKV5eir NWDjaUzis6uknWReiXxWFet2lZYVBieIjzgvKjsH7P5fxIEJJeNv5KhhDb7P996NMyC6 oGDQ== X-Gm-Message-State: APjAAAVyZgB601rmac6yaPqriccna7dpPEaURZchj7uUDU8/R1GJSGEi O+7TQAa0a6SkRUujnPCZWXvetw== X-Google-Smtp-Source: APXvYqwKMD9Vn2F6ro9TaYgxs8Ug7Q62R3EAtfh4i518fR0RR1CvdDkJWjkI1KqJ8mXlkVroZ20wqA== X-Received: by 2002:a5d:460a:: with SMTP id t10mr101196547wrq.83.1564512667138; Tue, 30 Jul 2019 11:51:07 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id l25sm49900702wme.13.2019.07.30.11.51.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Jul 2019 11:51:06 -0700 (PDT) Date: Tue, 30 Jul 2019 20:51:04 +0200 From: Adrien Mazarguil To: Aaron Conole Cc: Ferruh Yigit , David Marchand , Bernard Iremonger , dev , dpdk stable , Thomas Monjalon , "Singh, Jasvinder" , Flavia Musatescu Message-ID: <20190730185104.GF4512@6wind.com> References: <1562670596-27129-1-git-send-email-bernard.iremonger@intel.com> <10372251.KTS5ePcUbj@xps> <20190730161831.GE4512@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] librte_flow_classify: fix out-of-bounds access X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Jul 30, 2019 at 01:27:41PM -0400, Aaron Conole wrote: > Ferruh Yigit writes: > > > On 7/30/2019 5:18 PM, Adrien Mazarguil wrote: > >> On Tue, Jul 30, 2019 at 03:48:31PM +0100, Ferruh Yigit wrote: > >>> On 7/30/2019 3:42 PM, Aaron Conole wrote: > >>>> David Marchand writes: > >>>> > >>>>> On Wed, Jul 10, 2019 at 11:49 PM Thomas Monjalon wrote: > >>>>>> > >>>>>> 09/07/2019 13:09, Bernard Iremonger: > >>>>>>> This patch fixes the out-of-bounds coverity issue by removing the > >>>>>>> offending line of code at line 107 in rte_flow_classify_parse.c > >>>>>>> which is never executed. > >>>>>>> > >>>>>>> Coverity issue: 343454 > >>>>>>> > >>>>>>> Fixes: be41ac2a330f ("flow_classify: introduce flow classify library") > >>>>>>> Cc: stable@dpdk.org > >>>>>>> Signed-off-by: Bernard Iremonger > >>>>>> > >>>>>> Applied, thanks > >>>>> > >>>>> We have a segfault in the unit tests since this patch. > >>>> > >>>> I think this patch is still correct. The issue is in the semantic of > >>>> the flow classify pattern. It *MUST* always have a valid end marker, > >>>> but the test passes an invalid end marker. This causes the bounds to > >>>> exceed. > >>>> > >>>> So, it would be best to fix it, either by having a "failure" on unknown > >>>> markers (f.e. -1), or by passing a length around. However, the crash > >>>> should be expected. The fact that the previous code was also incorrect > >>>> and resulted in no segfault is pure luck. > >>>> > >>>> See rte_flow_classify_parse.c:80 and test_flow_classify.c:387 > >>>> > >>>> I would be in favor of passing the lengths of the two arrays to these > >>>> APIs. That would let us still make use of the markers (for valid > >>>> construction), but also let us reason about lengths in a sane way. > >>>> > >>>> WDYT? > >>>> > >>> > >>> +1, I also just replied with something very similar. > >>> > >>> With current API the testcase is wrong, and it will crash, also the invalid > >>> action one has exact same problem. > >>> > >>> The API can be updated as you suggested, with a length field and testcases can > >>> be added back. > >>> > >>> What worries me more is the rte_flow, which uses same arguments, and open to > >>> same errors, should we consider updating rte_flow APIs to have lengths values too? > >> > >> (Jumping in since all dashboard lights in my control room went red after > >> "rte_flow" was detected in this discussion) > > > > :) > > > >> > >> Length values for patterns and action lists were considered during design > >> but END was preferred as the better solution for convenience and because > >> it's actually safer: > >> > >> - C programmers are well aware of the dire consequences of omitting the > >> ending NUL byte in strings so it's not a foreign concept. This is > >> documented as such for rte_flow. > > > > I believe, C string functions are one of the most error prone part of the libc, > > even after a dozen of years it is not rare to crash the applications because of > > omitted terminating NULL, so I think this is not the best example :) > > +1 Of course, but I see such crashes as a *feature* when something's wrong in the code. Silent data corruption is much, much worse. Those are not recoverable errors, so it's no different from ignoring SIGSEGV and hoping for the best (whee, no more crashes!) > >> > >> - Static initialization of flow rules (i.e. defining a large fixed array) > >> is much easier if one doesn't have to encode its size as well, think about > >> compilation directives (#ifdef) on some of its elements. > >> > >> - Like omitting the END element, providing the wrong array size by mistake > >> remains a possibility, with similar or possibly worse consequences as > >> it's less likely to crash early and more prone to silent data corruption. > > > > It is easy to pass the array length, sizeof(...), and this can prevent API to > > walk through beyond the pattern array. > > And having the END withing the array can be verified in API level before passing > > the data to the drivers, so driver interface and code can stay intact. > > Encoding 'END' within the array can only be enforced as an application > semantic. > > The size of the array is a program / system semantic. > > They cannot be used interchangeably, and we certainly shouldn't omit the > system semantic. Notice how we're fixing a case that was directly > because of a programmer doing "the wrong thing" and an API that cannot > protect against it in any fashion. That's in spite of some of your very > first comments: > > because it's actually safer > > It isn't. People and programmers make mistakes. It's easier and more > efficient to calculate the size of an array (ARRAY_SIZE() is a fairly > well known macro) and pass it around. ARRAY_SIZE() doesn't work with pointers, in which case a miscalculated size when dynamically building/modifying flow rules is as much a possibility as a missing END marker, in which case: - A shorter size will usually translate to a valid flow rule that silently doesn't behave as expected. - A moderately larger size will typically not crash, but whatever comes afterward in memory will be interpreted as part of that rule. In both situations a crash would have been preferable (well, IMO). > It's worse to _recalculate_ the > size of an array each time (exponential execution) and have to > constantly walk elements from the head. I think there's also a misconception here, rte_flow patterns and action lists are crafted in a way that makes their size irrelevant. PMDs are expected to parse them as fast as possible in a *single pass* till they hit END. Except for wasting CPU registers on additional arguments, knowing the size in advance is useless to them. > I didn't see the discussions on the flow API but I would have been > really critical of passing flat arrays without a corresponding length. Phew, I think this API would never have landed with us arguing forever about that :) I know the size can be useful to applications in some situations, be it for allocations, housekeeping and whatnot, but PMDs really do not need it. Applications are free to pass the size around for their own needs, but at the DPDK API level, it's useless. I refuse to accept "hiding programming mistakes" as a valid reason. > > >> > >> - [tons of other good reasons here] > > As Ferruh notes, there are *billions* of examples of C strings being a > problem, and they are conceptually no different (a flat array with an > embedded end marker). I think there might be 'reasons,' but I would > hesitate to know any of them as 'good'. I only used C strings to illustrate the well-known terminating NUL approach, but rte_flow thankfully won't provide as many manipulation functions as the C string library, so the scope is quite limited here. In practice flow rules are typically built on the stack by applications to be immediately passed to PMDs, they are not expected to be processed much if at all on their way. And if an application happens to need it, well it's free to maintain as much metadata as required. Looks like you dislike C strings for the wrong reasons. They're simple, elegant, useful for their tendency to crash early when mishandled while not preventing users from adding as much complexity as they like on top. -- Adrien Mazarguil 6WIND